Intelligent Databases: Technologies and Applications

Johnson D. Paul (Deputy Director of Publishing and Research Services, National Library Board of Singapore, Sigapore)

Program: electronic library and information systems

ISSN: 0033-0337

Article publication date: 15 February 2008

264

Keywords

Citation

Paul, J.D. (2008), "Intelligent Databases: Technologies and Applications", Program: electronic library and information systems, Vol. 42 No. 1, pp. 89-91. https://doi.org/10.1108/00330330810851690

Publisher

:

Emerald Group Publishing Limited

Copyright © 2008, Emerald Group Publishing Limited


Databases have been the focus of contemporary ICT research. Intelligent Databases: Technologies and Applications compiles current research and practical applications. The authors are proponents of change and have painstakingly documented processes and algorithms that translate their new ideas into applications. The edited title is a difficult read but a must for technologists and database administrators.

The contributions of the authors are in several distinct areas, illustrated in the form of experiments. The first is in dealing with association rule mining which traditionally only studied positive correlations. In particular the authors, Kouris, Makris and Tsakalidis argue the need for examining “non‐selections” or negative correlations with experimental evidence of its importance. With a step‐by‐step guide as to how it could be technically deployed, the authors painstakingly pen the formulas and the codes for effective implementation. Added to this is an experimental evaluation of data that admitted to its inadequacy in terms of the authenticity of data and the absence of cross‐examination against other alternative relevance feedback methods in the market. Their unique contribution is in suggesting the need for a spatiotemporal prism through which transactional data could be studied in relation to needs of users and for different applications.

The second significant contribution is the experiment on a new index structure for outlier detection. The outlier detection is significant for fraud detection, network robustness analysis, and intrusion detection. Most such applications are high dimensional domains in which the data can contain hundreds of dimensions. Many recent algorithms use concepts of proximity and depth in cell‐based disk algorithm in order to find outliers based on their relationship to the rest of the data. However, in high dimensional space, the data is sparse and the notion of proximity fails to retain its meaningfulness. Faxin Zhao, Yubin Bao, Huanliang Sun and Ge Yu propose a CD‐Tree structure where a cluster technique is deployed. The experimental results outperform cell‐based approaches.

The third experiment is on the development of an efficient mining algorithm that can maintain discovered information in an ever‐growing database that has to balance data insertion with data deletion. Tzung‐Pei Hong and Ching‐Yao Wang explore the concept of pre‐large itemsets and an algorithm that maintains discovered association rules for deleted records. The results demonstrate that the algorithm significantly reduces the number of rescans over the original database until the accumulative numbers of deleted records exceed the safety bound.

The fourth and most impressive experiment is the development of inductive databases and the proposal for XML for data mining (XDM) to facilitate knowledge discovery processes. Meo and Psaila go into great detail on how their XDM model effectively integrates source data with mined patterns for representation, retrieval and manipulation. The experimentation surfaced problems of waste of space in large data sets and the need to evolve XDM into a distributed, grid‐like system effectively enabling distributed data sources and computational sources to connect to each other to build a unique XDM database.

The fifth experiment explores the possibility of integrating data warehousing and geographical information system technologies to create a spatial data warehouse or SDW. Sampaio, Baptista, Gormes de Sousa and Ferreira Do Nascimento describe how spatial dimensions and measures can help decision support systems with their common warehouse model (CWM). Admitting the infancy of their research effort, they have, however, specified a prototype with experimentation on interoperable standards. Coined as the MapWarehouse, the effort was intended to enhance decision support systems with spatial capabilities and hoped to extend it to web services.

From a case study perspective, S.A Oke attempts to apply the decision tree data‐mining tool to analyse manufacturing data. Whilst decision tree is a promising new technology it does not have predicting capabilities and hence needs the integration of statistical and artificial intelligence techniques to help management with early detection. Gian Piero Zarri explores a “narrative knowledge representation language” to create a differencing environment for intelligent exploitation of narrative knowledge where traditional ontologies are complemented with ontology of events. The NKRL software is a dynamic tool attempting to extract economically relevant information that is embedded in a large corpus of multimedia corporate documents. Z.M. Ma reviews fuzzy database modelling techniques and examines issues associated with conceptual data modelling and underlines three different approaches with recommendations on how a fuzzy database can be developed.

The second section is an emporium of ideas on how to create intelligent systems. J. Gerard Wolff introduces a unified model that integrates database management systems with artificial intelligence systems. Known as the SP theory of computing and cognition, it focuses on information processing and removing redundancy. The technology simply attempts to add compressed information to a repository of old information and with the help of pattern recognition create possibilities for new search and retrieval, deduction and unsupervised learning. The SP system claims to support natural language search, best match and semantic information retrieval. Data integrity is another dimension important for intelligent systems. Davide Martinenghi, Henning Christiansen and Hendrik Decker propose a system that automatically checks and maintains semantic correctness of data. Integrity checking and maintenance have matured but applications of this in XML databases and distributed databases are still relatively new. The authors have specified languages and paradigms for integrity checking and integrity constraints in an XML environment. Hassina Bounif completes the investigation with a predictive approach to database evolution. As databases undergo changes during their lifecycles, they affect the applications, schema and corresponding data. The chapter is dedicated to schema evolution combining the disciplines of databases, artificial intelligence, ontologies, networking and statistics. It re‐examines user and design requirements and how they are integrated into the schema.

Each chapter has an introduction, problem statement, related work and literature review, theoretical basis, algorithm, experiment review and conclusion. Authors document in great detail their computational algorithms and have embraced an open source approach to sharing knowledge in this new domain. Contributors have different origins, from China, Africa to Brazil and Europe. The intercontinental contribution adds to the diversity of knowledge presented in the book. Intelligent Databases is a one stop guide to issues pertaining to databases and its construction and maintenance. It is a courageous attempt to redefine and address problems associated with existing databases and a presentation of alternatives. Contribution from North American counterparts would have added greater value to the readers as the largest online databases are with Yahoo!, Google and MSN.

The title maintains a good measure of consistency, but because of varying points of origin, there seems to be a substantial overlap and repetition of ideas or connected ideas. The book deserves a more integrated approach. It is a compiled edition but does not carry a conclusion either. Nevertheless, the title is a necessary addition to infocommunication technology libraries.

Related articles