Information Retrieval: Implementing and Evaluating Search Engines

Alastair Smith (Victoria University of Wellington, New Zealand)

The Electronic Library

ISSN: 0264-0473

Article publication date: 15 November 2011

496

Citation

Smith, A. (2011), "Information Retrieval: Implementing and Evaluating Search Engines", The Electronic Library, Vol. 29 No. 6, pp. 853-854. https://doi.org/10.1108/02640471111188088

Publisher

:

Emerald Group Publishing Limited

Copyright © 2011, Emerald Group Publishing Limited


This text is aimed at graduate students in CompScience and engineering, and situates information retrieval theory firmly in the world of modern search engines.

The book starts with an overview of the basic techniques of information retrieval: inverted indexes, stemming, relevance ranking, etc. There is also an overview of evaluation: precision, recall and details of the influential TREC test collections. This initial section could be useful for readers who want a summary of this topic without the detail of the subsequent sections, which examine in detail the algorithms behind indexing, retrieval and ranking, categorisation and filtering, and evaluation.

After dealing with generic information retrieval the book turns to searching on the worldwide web, where the inherent structure of links and markup are factors to consider in designing the search algorithms. This section gives insights into the complex compromises that have to be made to provide the instant gratification of a search result from that clean simple looking search box – how the search results might have been cached or anticipated for example.

Most queries to search engines are short – averaging between two and three terms. Librarians of course are aware of the importance of including alternative terms for concepts, and tend to advise longer queries with more terms. So it is an interesting insight that, in general, search engines are optimised for the prevalent short query, and longer queries may in fact perform less well. It is also useful to be made aware of the role of user feedback – your search is likely being monitored, along with millions of others, in order to tweak the results for subsequent users.

As is common in computer science writing, chapters are clearly structured, with an overview, summary, and further reading, and exercises appropriate for computer science courses. Concepts are explained using examples from Shakespeare and Monty Python, as well as the TREC collections.

To get full benefit from the book the reader has to be familiar with mathematical concepts – matrices and vectors for example. The approach is very much at the system level, so TEL readers may be more interested in the human factors of how these systems are evaluated and used, and evaluation of content. The book mentions the concept of “information need” but does not draw on information seeking behaviour research – “information need” is translated into “search query” without much consideration of the stages in between.

When the Book Reviews Editor passed this book to me he asked if it might be useful for a library studies class in information retrieval. It would not be. However there is a need for a library studies oriented book about information retrieval for the Web; something between the gee‐whiz of, for example, Ran Hock's Extreme Searcher's Internet Handbook, and the detailed engineering approach of computer science texts such as this.

Related articles