Effective Information Retrieval from the Internet: An Advanced User's Guide

Fletcher Cole (University of New South Wales)

Online Information Review

ISSN: 1468-4527

Article publication date: 1 August 2005

192

Keywords

Citation

Cole, F. (2005), "Effective Information Retrieval from the Internet: An Advanced User's Guide", Online Information Review, Vol. 29 No. 4, pp. 434-435. https://doi.org/10.1108/14684520510617974

Publisher

:

Emerald Group Publishing Limited

Copyright © 2005, Emerald Group Publishing Limited


This book provides advice to web searchers about “how to use search engines appropriately and effectively” for information retrieval from textual sources, and about the design of training programmes to achieve this.

Chapter 1 deals with fundamentals. The authors choose as their framework a search typology which includes: “general topic searches” where the “question is not clearly defined” and “the scope of the answer is not known” in advance; “direct information searches” where the question is “well defined” and the answer is likely to be “contained within a single page”; and well‐defined “derived” searches, needing reference to multiple sources for the answer. “Resource searches” for non‐textual resources (pictures, sound files, etc.) are briefly referred to also.

Readers are advised “to move away from general topic searches”, which are best accommodated by classified tools such as directories, “towards direct and derived information searches as quickly as possible”. The reasons given are that “directories are not exhaustive and are subjective”. Hardly mentioned are the quality assurance functions assumed by some directories and specialist subject gateways.

The ideal search query is seen as one that is “accurate”, “sufficiently narrow for the desired information to be returned in the first few hits by the search engine” and, at the same time, “exhaustive … the query should not be so specific that relevant information is neglected”. The authors negotiate the recall/precision trade‐off, although they do not use these terms, by recommending a series of narrow (high precision) searches in order to search exhaustively (i.e. compensate for low recall).

Chapter 2 advises on the construction of search statements, suggesting the value of taking into account the likely information provider, the natural language involved and the structure and organization of the web. Ways of measuring query quality are presented, including a method involving two‐term search comparisons and an extension of this for multi‐term searches.

Chapter 3 discusses the systematic reiteration of searches and introduces automated techniques for this. Several detailed examples are given of the use of (Python) scripts for repeated web page loading and scanning, exhaustive searches with synonyms, spidering, and assessing query quality. (The authors define spidering as “... the repeated and automated following of links” in order to create/maintain a search engine index.)

Chapter 4 addresses the question of source quality with an extended discussion of “accuracy” and “bias” as web site assessment criteria, suggesting a number of linguistic and statistical techniques for making the assessment. This is useful but has its limitations; only some of the considerable number of other possible criteria are touched upon (e.g. “completeness”), so the reader needs to go elsewhere for a more comprehensive survey.

Chapter 5 outlines a modular training programme for “teaching internet searching skills to novice users”. Curriculum planning dependency diagrams usefully show the relation of topics in the various teaching modules. Appendices describe the workings of search engines, the characteristics of directories, and recommend programming guidelines. There are no illustrations, but a number of diagrams summarize points made in the text. An index and list of references are provided.

Related articles