Multilinguality in digital libraries

The Electronic Library

ISSN: 0264-0473

Article publication date: 6 April 2012

533

Citation

He, D. (2012), "Multilinguality in digital libraries", The Electronic Library, Vol. 30 No. 2. https://doi.org/10.1108/el.2012.26330baa.001

Publisher

:

Emerald Group Publishing Limited

Copyright © 2012, Emerald Group Publishing Limited


Multilinguality in digital libraries

Article Type: Guest editorial From: The Electronic Library, Volume 30, Issue 2

This is a fascinating period in the history of library services. For the first time, it is possible to provide significant, diverse, and universally-accessible library services using collections of digital information and delivering those services over an information infrastructure at the global scale. The digital library field brings together researchers and experts from many different disciplines and backgrounds, and enables changes with profound social, organizational and legal implications. Over the past decades, digital libraries have drawn on heterogeneous resources, served diverse user populations, and carried out tasks that are getting more and more complicated. Increasingly, there are demands for multimedia, multicultural, and multilingual digital libraries (Borgman, 1997).

A body of research has grown around multilingual communication, which enables the dissemination of information beyond the boundaries of languages (Oard and Diekema, 1998). Nearly every sector of our increasingly global economy and culturally diverse society needs to master multilingual communication. Digital information has been created in more than one language, and at the same time, world-wide open access has created a large user population of very diverse language and cultural backgrounds. Studying multilingual technologies and resources, therefore, helps digital library users to search, browse and synthesize information from sets of multilingual multimedia information objects.

The study of multilingual technology has existed for at least 15 years and many new technologies – such as multilingual information access systems, machine translation systems, multilingual thesaurus, etc. – have been developed (He and Wang, 2009). However, technology development has not completely solved the technology-related problems, not to mention the communications and society-related issues. For example, people still mainly search for information within their own language unless searching for academic information. To date, there is no effective ontology or metadata schema, which are very important resources in digital libraries whether in one language or multiple languages.

It is under these conditions that we organized this special issue on multilinguality in digital libraries. Through an open-call for submissions, and a double blind review process, we selected nine papers that examine the recent achievements in terms of the services, the user, the collection development, and the supporting technology for multilinguality in digital libraries.

One of the articles in this special issue, “A review of multilinguality in digital libraries” reviewed a core set of research papers on multilingual digital libraries based on a thorough literature search in four different databases. The author discovered that there were a limited number of existing multilingual digital libraries, and only a small number of available user studies on these digital libraries. This work also demonstrates that the difficulties of developing multilingual digital libraries are related to the fact that it is a complex collaborative effort between different organization, which involves challenges in data management (e.g., localization and language process), representation (dealing with different fonts and character codes), development (creating international software, cross-cultural collaboration), and interoperability (system architecture and data sharing).

This special issue contains two articles discussing users in multilingual digital libraries. The article, “Multilingual needs and expectations in digital libraries: a survey of academic users with different languages” surveys academic users’ needs and expectations for multilingual information processing when they interacted with digital libraries. Using questions covering multilingual needs, behavior, resources, and desired functions in digital libraries, their study shows that academic users exhibit many multilingual needs during their academic activities; academic users use online translation resources and tools, but they are not satisfied with the translation quality; and they want multilingual capabilities which mostly focus on domain-specific and language-specific translation functions, and sophisticated multilingual search interfaces. Academic users from different countries or who speak different languages show significant differences in their multilingual needs and expectations for digital libraries.

The second paper about users is an article “Analysing user’s queries for cross-language image retrieval from digital library collections.” It describes a study of the queries generated from a user experiment for cross-language information retrieval (CLIR) from a historic image archive. The results highlighted the diversity in requests for similar visual content and the weaknesses of using machine translation for query translation. The article shows the individual characteristics of user’s searches, the overlap obtained between query terms and structured image captions, and the connections between query terms and the foreground of an image.

The article on “Collaboration and crowdsourcing: the cases of multilingual digital libraries” aims to understand key features of existing multilingual digital libraries and to suggest strategies for building and/or sustaining multilingual information access for digital libraries. Four multilingual digital libraries: Project Gutenberg, Meeting of Frontiers, The International Children’s Digital Library, and the Latin American Open Archives Portal were studied using a framework derived from digital library evaluation practice, where the collaboration and crowdsourcing characteristics were highlighted and discussed. The authors found that all four digital libraries benefited substantially from the collaboration between different countries or groups with different language expertise. However, machine translation and cross-language information retrieval technologies were not fully utilized to provide multilingual information access in these digital libraries.

The remaining five articles cover many aspects of supporting technology for multilinguality in digital libraries. One article, “Exploration and study of multilingual thesauri automation construction for digital libraries in China” explores the automatic construction of multilingual thesauri based on freely-available digital library resources. The methods presented in this paper consist of automatic extracting and filtering terms, judging and building relationship among terms, building the multilingual parallel corpus, and extracting term pairs between languages through calculating their associated probability. The results of the experiments show that their method is better than some traditional methods.

The article, “Effective medical resources searching using an ontology-driven medical information retrieval system: H1N1 case study,” describes the design of an ontology-driven medical information retrieval (OMIR) system by building a medical ontology based on the Centers for Disease Control and Prevention’s (CDC) medical records. The OMIR system contains a medical ontology transformed from a traditional cataloging scheme, and experiments show that the medical ontology could be used to filter out unsuitable resources based on semantic relationships, and the OMIR system provides better relevancy and shorter search times as compared to alternative systems.

In “ A preliminary evaluation of metadata records machine translation,” the effect of online machine translation services in translating metadata records is evaluated. Randomly-selected metadata records were translated from English into Chinese using Google, Bing, and SYSTRAN machine translation systems. These translations were then evaluated using a five-point scale for both fluency and adequacy. Their results showed that although there was no significant difference among the three systems, 70% of Google and Bing’s translations could reach to a reasonable level of fluency and adequacy.

“Mapping multilingual lexical semantics for knowledge organization systems” aims to investigate, through mapping analysis, the operation of knowledge organization systems in different languages (English and Chinese), the types of term equivalence and the degree of similarity between different conceptual structures. The authors selected terms from the Art & Architecture Thesaurus developed by the Getty Research Institute in the US (source language) and the terms in Taiwan e-Learning and Digital Archives Program (target language). By using both mapping analysis and content analysis, they found that “exact equivalence” appeared most frequently, and that there were four degrees of similarity between different conceptual structures.

“Bilingual terminology extraction using multi-level termhood” observes that terminologies and general words usually have differently distribution in a corpus. Therefore, termhood could be used to constrain and enhance the performance of term alignment when aligning bilingual terms in the parallel corpus. The paper then proposes a bilingual term alignment method based on termhood constraints. Experiment results showed that multi-level termhood could provide better performance in terminology extraction.

The articles in this special issue provide the most contemporary understanding of multiple factors in the design, development and use of multilinguality in digital libraries. The volume is intended to be useful to researchers, students and faculty who are interested in strategies for effective use and efficient creation of multilingual capabilities in digital libraries.

Daqing HeSchool of Information Sciences, University of Pittsburgh, Pittsburgh, USA

Dan WuSchool of Information Management, Wuhan University, Wuhan, China

About the authors

Dr Daqing He is an Associate Professor at the School of Information Sciences, the University of Pittsburgh. He is also an Affiliated Professor in the Intelligent Systems Program, also at the University of Pittsburgh. He earned his PhD degree in Artificial Intelligence from the University of Edinburgh, Scotland. Prior to joining the University of Pittsburgh in 2004, he served on the research faculties of the Robert Gordon University, Scotland and the University of Maryland at College Park, USA. His work centered on adaptive and interactive monolingual/multilingual information retrieval. Currently, Dr He’s main research interests cover information retrieval (monolingual and multilingual), adaptive web systems and user modeling, interactive retrieval interface design, and web log mining and analysis. Dr He is the Principal Investigator (PI) and Co-PI for more than ten research projects, funded by the National Science Foundation (NSF), United States Defense Advanced Research Projects Agency (DARPA), University of Pittsburgh, and other agencies. He has published more than 60 articles in internationally-recognized journals and conferences in these areas. Dr He has served as a member on the program committees for more than ten major international conferences in the area of information retrieval and web technologies, and has been called upon to be a reviewer for many top-ranked international journals in the same areas. Daqing He is the corresponding author and can be contacted at: dah44@pitt.edu

Dr Dan Wu is an Associate Professor of the School of Information Management, Wuhan University, China. She earned her PhD degree in Information Science from Peking University, China in 2008, MS degree in Computer Science in 2004, and dual BS degrees in Information Science and Computer Science in 2001. While she was working toward her PhD degree, she received a scholarship to be a visiting researcher at University of Pittsburgh, USA, from 2006 to 2007. Her research interests include information retrieval, information organization, multilingual information processing, metadata, and digital library. She has published about 50 papers in academic journals and conferences. Dr Wu is the Principal Investigator (PI) of ten research projects that are supported by Social Science Foundation of China, Ministry of Education of China, China Post Doctor Foundation, Hubei Provincial Government, Wuhan Municipal Government and Wuhan University. Dr Wu has served as the member of program committees for more than ten major international conferences and the reviewer for several top international journals.

References

Borgman, C.L. (1997), “Multi-media, multi-cultural, and multi-lingual digital libraries, or how do we exchange data in 400 languages?”, D-Lib Magazine, June, available at: www.dlib.org/dlib/june97/06borgman.html

He, D. and Wang, J. (2009), “Cross-language information retrieval”, in Goker, A. and Davies, J. (Eds), Information Retrieval: Searching in the 21st Century, John Wiley, Chichester, pp. 233–53

Oard, D.W. and Diekema, A.R. (1998), “Cross-language information retrieval”, in Cronin, B. (Ed.), Annual Review of Information Science and Technology, American Society for Information Science, Silver Spring, MD, pp. 223–56

Related articles