To read this content please select one of the options below:

A strategy for extracting information from semi‐structured web pages

Mahmoud Shaker (Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang, Malaysia)
Hamidah Ibrahim (Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang, Malaysia)
Aida Mustapha (Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang, Malaysia)
Lili Nurliyana Abdullah (Department of Computer Science, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Serdang, Malaysia)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 23 November 2010

339

Abstract

Purpose

The aim of this paper is to propose a strategy for extracting information from web tables.

Design/methodology/approach

The paper presents a strategy for extracting information from web tables of semi‐structured web pages (WPs) by handling the issue of synonym which emerges as these WPs have been designed and created without referring to any standards or guidelines.

Findings

The paper finds that this strategy extracts information with high precision, and extracts the attributes besides the sub‐attributes that describe the extracted attributes and values of the sub‐attributes.

Practical implications

Experiment conducted on the Nokia products domain demonstrated that the proposed strategy extracts information from web tables with high precision which is 98.98 percent.

Originality/value

This paper contributes to the research on extracting information.

Keywords

Citation

Shaker, M., Ibrahim, H., Mustapha, A. and Nurliyana Abdullah, L. (2010), "A strategy for extracting information from semi‐structured web pages", International Journal of Web Information Systems, Vol. 6 No. 4, pp. 304-318. https://doi.org/10.1108/17440081011090239

Publisher

:

Emerald Group Publishing Limited

Copyright © 2010, Emerald Group Publishing Limited

Related articles