Data Mining and Knowledge Discovery Technologies

Fernando Bação (Universidade Nova de Lisboa)

Online Information Review

ISSN: 1468-4527

Article publication date: 21 November 2008

660

Keywords

Citation

Bação, F. (2008), "Data Mining and Knowledge Discovery Technologies", Online Information Review, Vol. 32 No. 6, pp. 866-867. https://doi.org/10.1108/14684520810923980

Publisher

:

Emerald Group Publishing Limited

Copyright © 2008, Emerald Group Publishing Limited


Data mining and knowledge discovery have a growing audience in many different research and business fields. In recent years the “art and science of finding interesting and useful patterns in data” has moved from the statistics and computer science research laboratories into the business world and various research areas. The ever‐increasing size and complexity of the online information world warrants new and improved tools that are able to automatically find, organise and present relevant information.

This edited collection contains 14 articles divided into four sections. Section 1, Association Rules, contains four chapters that cover association rules from different perspectives and contexts. The first chapter proposes enriching Online Analytical Processing (OLAP) through the integration of association rules with data cubes. Chapter 2 presents an overview of interesting measures for association rules and explores relations among the different measures. Chapter 3 deals with the application of association rules in the specific context of XML data, proposing a framework and implementing a priori and FP‐growth algorithms. Chapter 4 concludes this section and presents two algorithms for extracting association rules from web server access logs in order to discover user navigational patterns.

In Section 2, Clustering and Classification, four chapters explore clustering, classification of sequence data and privacy issues. Chapter 5 proposes the use of genetic algorithms for clustering, which has already been done several times in the last decade. Particularly, the authors use genetic algorithms to optimise fuzzy c‐means and entropy‐based fuzzy clustering and to visualize the results using a self‐Organising map. Chapter 6 adds radial basis functions to the standard k‐means algorithm, thus adding the requirement to estimate an additional parameter, the kernel width, the explanation of which constitutes the remainder of the chapter. Chapter 7 provides an interesting overview of the classification of sequence data, presents a similari metric for sequential data and applies it using a k‐nearest neighbour classifier. Chapter 8 addresses the privacy‐preserving problem; more specifically it uses homomorphic encryption, digital envelope and k‐nearest neighbour techniques to enable collaborative data mining among multiple parties, without breaching data privacy.

Section 3, Domain Driven and Model Free contains two chapters. The first, Chapter 9, delves into methodological issues related to the combination of domain intelligence and traditional data‐driven philosophy, so as to improve on the actionability of the data mining results. This is at the cutting edge in business applications, where frequently it is difficult to translate mining results into profitable actions. Chapter 10 deals with another crucial problem in data mining: input selection and dimensionality reduction. The authors briefly review previous work on input selection and propose a new method, independent of the model chosen for data mining, based on consistency analysis.

The last section, Issues and Applications, consists of four chapters on quite different subjects. Chapter 11 addresses different problems that undermine the effectiveness of data mining in a business context. The authors subdivide the text into four major categories (statistical, data, technical and organizational issues) and from there explore the pitfalls of data mining. Chapter 12 is dedicated to the specific topic of protein‐protein interaction (PPI), part of the wide and increasingly important field of bio‐informatics. The authors provide an overview of computation methods for PPI and present their own research using inductive logic programming. Chapter 13 focuses on the potential for data mining in finding solutions to social problems and presents an iterative attribute‐elimination process to identify useful attributes. The last chapter, Chapter 14, deals with the role that data mining methods might have in improving the learning experience of educationally motivated web users. The authors use support vector machines, AdaBoost, Naïve Bayes and neural networks to address the one‐stop learning problem, which consists of filtering and returning appropriate web pages given a technical subject by the user.

The editor's role is far from easy. It is challenging to persuade authors to make cuts and rearrangements to their texts in order to improve the readability and the harmony of the book, so that the reading is pleasant and fairly fluid. Edited books always struggle with these problems but they should be considered as the price to pay for having world‐renowned experts writing about their fields and research ideas. Nevertheless, even edited books should have some identity, some type of underlying theme that provides unity.

One of the problems of this book is that the underlying theme (data mining and knowledge discovery) has become too broad to be useful. Unfortunately, the graphic quality of the book is not always great, which sometimes makes it difficult to follow the arguments that rely on figures and also adds to the feeling of unevenness of the book. There are some typos, but more annoying and detrimental are the notation inconsistencies, sometimes within the same chapter.

The book is uneven in terms of quality, scientific relevance and pedagogic content. Some articles are of very good quality, while others are less so; some articles provide useful state of the art reviews, others do not; some articles present cutting edge research, others do not. For instance, if you are interested in association rules and want to improve your knowledge, you should probably consider buying the book. But if you are interested in clustering, then this book will be of little use.

Related articles