To read this content please select one of the options below:

Okapi‐based XML indexing

Wei Lu (School of Information Management, Wuhan University, Wuhan City, China)
Andrew MacFarlane (Department of Information Science, City University London, London, UK)
Fabio Venuti (Department of Information Science, City University London, London, UK)

Aslib Proceedings

ISSN: 0001-253X

Article publication date: 18 September 2009

431

Abstract

Purpose

Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing. Clear use cases for structured search in XML have been established. However, most of the research in the area is either based on relational database systems or specialized semi‐structured data management systems. This paper aims to propose a method for XML indexing based on the information retrieval (IR) system Okapi.

Design/methodology/approach

First, the paper reviews the structure of inverted files and gives an overview of the issues of why this indexing mechanism cannot properly support XML retrieval, using the underlying data structures of Okapi as an example. Then the paper explores a revised method implemented on Okapi using path indexing structures. The paper evaluates these index structures through the metrics of indexing run time, path search run time and space costs using the INEX and Reuters RVC1 collections.

Findings

Initial results on the INEX collections show that there is a substantial overhead in space costs for the method, but this increase does not affect run time adversely. Indexing results on differing sized Reuters RVC1 sub‐collections show that the increase in space costs with increasing the size of a collection is significant, but in terms of run time the increase is linear. Path search results show sub‐millisecond run times, demonstrating minimal overhead for XML search.

Practical implications

Overall, the results show the method implemented to support XML search in a traditional IR system such as Okapi is viable.

Originality/value

The paper provides useful information on a method for XML indexing based on the IR system Okapi.

Keywords

Citation

Lu, W., MacFarlane, A. and Venuti, F. (2009), "Okapi‐based XML indexing", Aslib Proceedings, Vol. 61 No. 5, pp. 483-499. https://doi.org/10.1108/00012530910989634

Publisher

:

Emerald Group Publishing Limited

Copyright © 2009, Emerald Group Publishing Limited

Related articles