To read this content please select one of the options below:

Mining user queries with information extraction methods and linked data

Anne Chardonnens (State Archives of Belgium/CegeSoma, Brussels, Belgium) (Information and Communication Science Department, Université libre de Bruxelles (ULB), Brussels, Belgium)
Ettore Rizza (Information and Communication Science Department, Université libre de Bruxelles (ULB), Brussels, Belgium)
Mathias Coeckelbergs (Information and Communication Science Department, Université libre de Bruxelles (ULB), Brussels, Belgium)
Seth van Hooland (Information and Communication Science Department, Université libre de Bruxelles (ULB), Brussels, Belgium)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 14 May 2018

Issue publication date: 3 August 2018

469

Abstract

Purpose

Advanced usage of web analytics tools allows to capture the content of user queries. Despite their relevant nature, the manual analysis of large volumes of user queries is problematic. The purpose of this paper is to address the problem of named entity recognition in digital library user queries.

Design/methodology/approach

The paper presents a large-scale case study conducted at the Royal Library of Belgium in its online historical newspapers platform BelgicaPress. The object of the study is a data set of 83,854 queries resulting from 29,812 visits over a 12-month period. By making use of information extraction methods, knowledge bases (KBs) and various authority files, this paper presents the possibilities and limits to identify what percentage of end users are looking for person and place names.

Findings

Based on a quantitative assessment, the method can successfully identify the majority of person and place names from user queries. Due to the specific character of user queries and the nature of the KBs used, a limited amount of queries remained too ambiguous to be treated in an automated manner.

Originality/value

This paper demonstrates in an empirical manner how user queries can be extracted from a web analytics tool and how named entities can then be mapped with KBs and authority files, in order to facilitate automated analysis of their content. Methods and tools used are generalisable and can be reused by other collection holders.

Keywords

Acknowledgements

The authors would like to extend their gratitude to the Royal Library of Belgium. The authors are particularly grateful for the assistance given by Erwin Van Wesemael. The support of the promoters of the ADOCHS project has also been invaluable to the success of the research and the conception of this paper. The authors would therefore like to thank Ann Dooms, Florence Gillet and Frederic Lemmers. The research underlying the results presented in this paper was funded by the Belgian Science Policy Office in the context of Contract No. BR/154/A6/ADOCHS.

Citation

Chardonnens, A., Rizza, E., Coeckelbergs, M. and van Hooland, S. (2018), "Mining user queries with information extraction methods and linked data", Journal of Documentation, Vol. 74 No. 5, pp. 936-950. https://doi.org/10.1108/JD-09-2017-0133

Publisher

:

Emerald Publishing Limited

Copyright © 2018, Emerald Publishing Limited

Related articles