Advances in Digital Document Processing and Retrieval

Madely du Preez (University of South Africa)

Online Information Review

ISSN: 1468-4527

Article publication date: 8 June 2015

183

Citation

Madely du Preez (2015), "Advances in Digital Document Processing and Retrieval", Online Information Review, Vol. 39 No. 3, pp. 436-437. https://doi.org/10.1108/OIR-04-2015-0120

Publisher

:

Emerald Group Publishing Limited

Copyright © 2015, Emerald Group Publishing Limited


The digital preservation of all kinds of documents has become possible through various IT developments. These preservation possibilities include the preservation of digitally created documents as well as non-digitally created documents such as manuscripts, cheques, handwritten letters and so on. Although metadata creation techniques support the organisation of digital documents, access remain restricted to the descriptive metadata (and the quality thereof) provided by cataloguers, classifiers and indexers. New developments are changing this, as in computer science much progress is being made in online and offline handwriting analysis. These developments allow information retrieval systems such as document mining systems to interrogate non-digitally created documents. Retrieval of these documents is therefore no longer solely reliant on descriptive metadata as multiple retrieval techniques are now possible.

Advances in Digital Document Processing and Retrieval is a compilation of partly invited papers presented at the International Conference on Computing: Theory and Applications held in connection with the platinum jubilee celebration of the Indian Statistical Institute (ISI). The editors point out that the progress made in handwriting analysis has special significance in countries like India, where multiple languages/scripts are used.

The first chapter exposes readers to the Markovian models that are successfully used to process and recognise speech and script image signals. Chapter 9 compliments this chapter, as it deals with statistical models that could also be applied to this task. Other aspects receiving attention include document synthesis technology with digital pens; this technology supports the maintenance of both paper and e-copies of the same document, which then also allows for prompt and accurate information retrieval. The ability of document image analysis approaches to ensure secure and accurate voting by paper-based ballot systems is explored in Chapter 3.

Three chapters focus on the problem of information retrieval from document image databases. One suggested retrieval method involves the searching of document images on the basis of word shape coding. Chapter 5 deals with the indexing and retrieval from handwritten documents, whereas Chapter 6 deals with new components of the old problem posed by bank cheque reading.

The last three chapters deal with statistical deformation models for handwritten character recognition and a survey of the statistical modelling of document appearance. E-documents are increasingly being created; it is therefore important that both the physical and logical structure of especially PDF documents are recovered. The final chapter deals with the recognition of handwriting in an Indian script – an aspect which has not received much attention in the literature to date.

As the editors had hoped, this book provides readers with glimpses of the frontiers of digital document processing. It also exposes students and researchers to digital document processing and retrieval techniques. Each chapter includes a useful bibliography. The book reads easily, and discussions are well illustrated with drawings and diagrams. The inclusion of an index would have contributed much to the value of the book.

Related articles