Advanced Search
Journal search
Journal cover: Library Hi Tech

Library Hi Tech

ISSN: 0737-8831

Online from: 1983

Subject Area: Library and Information Studies

Content: Latest Issue | icon: RSS Latest Issue RSS | Previous Issues

Options: To add Favourites and Table of Contents Alerts please take a Emerald profile

Previous article.Icon: Print.Table of Contents.Next article.Icon: .

Heuristics for identification of bibliographic elements from verso of title pages

Document Information:
Title:Heuristics for identification of bibliographic elements from verso of title pages
Author(s):A.R.D. Prasad, (Associate Professor in the Documentation Research and Training Centre, Indian Statistical Institute, Bangalore, Karnataka, India), Durga Sankar Rath, (Lecturer in the Department of Library and Information Science, Ravindra Bharati University, Kolkata, India)
Citation:A.R.D. Prasad, Durga Sankar Rath, (2004) "Heuristics for identification of bibliographic elements from verso of title pages", Library Hi Tech, Vol. 22 Iss: 4, pp.397 - 403
Keywords:Bibliographic systems, Cataloguing, Classification schemes, Data handling, Information operations
Article type:Research paper
DOI:10.1108/07378830410570502 (Permanent URL)
Publisher:Emerald Group Publishing Limited
Abstract:This paper presents a methodology to capture bibliographic data from the verso of the title pages of documents. A survey has been undertaken to identify the syntactic and semantic features of bibliographic elements on the verso of title pages. These features include the font size, line numbers and appearence of certain string of characters. Emphasis is given to the study of “cataloguing-in-publication” data. The results of the survey are used to develop heuristics which can help in developing a program to automatically identify the various bibliogaphic data elements. The back of the title pages are scanned and stored as HTML pages using optical recognition software. The heuristics are then applied on the HTML pages. Few samples of input and the output generated are presented. Finally, the problems related to OCR and the heuristics are enumerated.

Fulltext Options:



Existing customers: login
to access this document


- Forgot password?
- Athens/Institutional login



Downloadable; Printable; Owned
HTML, PDF (514kb)Purchase

To purchase this item please login or register.


- Forgot password?

Recommend to your librarian

Complete and print this form to request this document from your librarian

Marked list

Bookmark & share

Reprints & permissions