Online from: 1945
Subject Area: Library and Information Studies
Options: To add Favourites and Table of Contents Alerts please take a Emerald profile
|Title:||A study on automatic creation of a comparable document collection in cross-language information retrieval|
|Author(s):||Tuomas Talvensaari, (Department of Computer Sciences, University of Tampere, Finland), Jorma Laurikkala, (Department of Computer Sciences, University of Tampere, Finland), Kalervo Järvelin, (Department of Information Studies, University of Tampere, Finland), Martti Juhola, (Department of Computer Sciences, University of Tampere, Finland)|
|Citation:||Tuomas Talvensaari, Jorma Laurikkala, Kalervo Järvelin, Martti Juhola, (2006) "A study on automatic creation of a comparable document collection in cross-language information retrieval", Journal of Documentation, Vol. 62 Iss: 3, pp.372 - 387|
|Keywords:||Document management, Information retrieval, Language and literature|
|Article type:||Research paper|
|DOI:||10.1108/00220410610666510 (Permanent URL)|
|Publisher:||Emerald Group Publishing Limited|
Purpose – To present a method for creating a comparable document collection from two document collections in different languages.
Design/methodology/approach – The best query keys were extracted from a Finnish source collection (articles of the newspaper
Findings – The combined alignment scheme was found the best, when the relatedness of the document pairs was assessed with a five-degree relevance scale. Of the 400 document pairs, roughly 40 percent were highly or fairly related and 75 percent included at least lexical similarity.
Research limitations/implications – The number of alignment pairs was small due to the short common time period of the two collections, and their geographical (and thus, topical) remoteness. In future, our aim is to build larger comparable corpora in various languages and use them as source of translation knowledge for the purposes of cross-language information retrieval (CLIR).
Practical implications – Readily available parallel corpora are scarce. With this method, two unrelated document collections can relatively easily be aligned to create a CLIR resource.
Originality/value – The method can be applied to weakly linked collections and morphologically complex languages, such as Finnish.
To purchase this item please login or register.
Complete and print this form to request this document from your librarian