Search
  Advanced Search
 
Journal search
Journal cover: Library Hi Tech

Library Hi Tech

ISSN: 0737-8831

Online from: 1983

Subject Area: Library and Information Studies

Content: Latest Issue | icon: RSS Latest Issue RSS | Previous Issues

Options: To add Favourites and Table of Contents Alerts please take a Emerald profile

Previous article.Icon: Print.Table of Contents.Next article.Icon: .

Enhancing document modeling by means of open topic models: Crossing the frontier of classification schemes in digital libraries by example of the DDC


Document Information:
Title:Enhancing document modeling by means of open topic models: Crossing the frontier of classification schemes in digital libraries by example of the DDC
Author(s):Alexander Mehler, (Faculty of Technology, Bielefeld University, Bielefeld, Germany), Ulli Waltinger, (Faculty of Technology, Bielefeld University, Bielefeld, Germany)
Citation:Alexander Mehler, Ulli Waltinger, (2009) "Enhancing document modeling by means of open topic models: Crossing the frontier of classification schemes in digital libraries by example of the DDC", Library Hi Tech, Vol. 27 Iss: 4, pp.520 - 539
Keywords:Digital libraries, Document management, Modelling
Article type:Research paper
DOI:10.1108/07378830911007646 (Permanent URL)
Publisher:Emerald Group Publishing Limited
Acknowledgements:The authors gratefully acknowledge financial support from the German Research Foundation (DFG) through the EC 277 Cognitive Interaction Technology, the Research Group 437 Text Technological Information Modeling and the DFG-LIS-Project P2P-Agents for Thematic Structuring and Search Optimization in Digital Libraries at Bielefeld University. They also thank Bielefeld University Library which kindly provided the test data used in this article.
Abstract:

Purpose – The purpose of this paper is to present a topic classification model using the Dewey Decimal Classification (DDC) as the target scheme. This is to be done by exploring metadata as provided by the Open Archives Initiative (OAI) to derive document snippets as minimal document representations. The reason is to reduce the effort of document processing in digital libraries. Further, the paper seeks to perform feature selection and extension by means of social ontologies and related web-based lexical resources. This is done to provide reliable topic-related classifications while circumventing the problem of data sparseness. Finally, the paper aims to evaluate the model by means of two language-specific corpora. The paper bridges digital libraries, on the one hand, and computational linguistics, on the other. The aim is to make accessible computational linguistic methods to provide thematic classifications in digital libraries based on closed topic models such as the DDC.

Design/methodology/approach – The approach takes the form of text classification, text-technology, computational linguistics, computational semantics, and social semantics.

Findings – It is shown that SVM-based classifiers perform best by exploring certain selections of OAI document metadata.

Research limitations/implications – The findings show that it is necessary to further develop SVM-based DDC-classifiers by using larger training sets possibly for more than two languages in order to get better F-measure values.

Originality/value – Algorithmic and formal-mathematical information is provided on how to build DDC-classifiers for digital libraries.



Fulltext Options:

Login

Login

Existing customers: login
to access this document

Login


- Forgot password?
- Athens/Institutional login

Purchase

Purchase

Downloadable; Printable; Owned
HTML, PDF (307kb)Purchase

To purchase this item please login or register.

Login


- Forgot password?

Recommend to your librarian

Complete and print this form to request this document from your librarian


Marked list


Bookmark & share

Reprints & permissions