Advanced Search
Journal search
Journal cover: Online Information Review

Online Information Review

ISSN: 1468-4527

Online from: 1977

Subject Area: Library and Information Studies

Content: Latest Issue | icon: RSS Latest Issue RSS | Previous Issues


Icon: .Table of Contents.Next article.Icon: .

Effective techniques for automatic extraction of Web publications

Document Information:
Title:Effective techniques for automatic extraction of Web publications
Author(s):A.C.M. Fong, (A.C.M. Fong works at the Institute of Information and Mathematical Sciences of Massey University, Auckland, New Zealand.), S.C. Hui, (S.C. Hui is an Associate Professor at the School of Computer Engineering at Nanyang Technological University, Singapore.), H.L. Vu, (H.L. Vu is a Research Student, at the School of Computer Engineering at Nanyang Technological University, Singapore.)
Citation:A.C.M. Fong, S.C. Hui, H.L. Vu, (2002) "Effective techniques for automatic extraction of Web publications", Online Information Review, Vol. 26 Iss: 1, pp.4 - 18
Keywords:Content analysis, Electronic publishing, Internet, Research
Article type:General review
DOI:10.1108/14684520210418347 (Permanent URL)
Publisher:MCB UP Ltd
Abstract:Research organisations and individual researchers increasingly choose to share their research findings by providing lists of their published works on the World Wide Web. To facilitate the exchange of ideas, the lists often include links to published papers in portable document format (PDF) or Postscript (PS) format. Generally, these publication Web sites are updated regularly to include new works. While manual monitoring of relevant Web sites is tedious, commercial search engines and information monitoring systems are ineffective in finding and tracking scholarly publications. Analyses the characteristics of publication index pages and describes effective automatic extraction techniques that the authors have developed. The authors’ techniques combine lexical and syntactic analyses with heuristics. The proposed techniques have been implemented and tested for more than 14,000 Web pages and achieved consistently high success rates of around 90 percent.

Fulltext Options:



Existing customers: login
to access this document


- Forgot password?
- Athens/Institutional login



Downloadable; Printable; Owned
HTML, PDF (2132kb)

Due to our platform migration, pay-per-view is temporarily unavailable.

To purchase this item please login or register.


- Forgot password?

Recommend to your librarian

Complete and print this form to request this document from your librarian

Marked list

Bookmark & share

Reprints & permissions