Data Mining: A Heuristic Approach

T.T. Goh (Victoria University of Wellington, tiong.goh@vuw.ac.nz)

Online Information Review

ISSN: 1468-4527

Article publication date: 1 October 2003

345

Keywords

Citation

Goh, T.T. (2003), "Data Mining: A Heuristic Approach", Online Information Review, Vol. 27 No. 5, pp. 364-365. https://doi.org/10.1108/14684520310502315

Publisher

:

Emerald Group Publishing Limited

Copyright © 2003, MCB UP Limited


Data mining is a relatively new aspect of information management that offers great promise for making sense of large data sets. This book has 13 chapters and is divided into five parts. Each chapter is devoted to technical details of data mining using heuristic approaches. The approaches described in this book contain modern heuristic approaches that are required to solve complex real‐life problems. These approaches include simulation annealing, tabu searches, genetic algorithms, immune systems, and ant colony optimisation. Other approaches such evolutionary algorithm and parallel data mining are presented in an exhilarating manner.

Chapter 1, which briefly covers the fundamentals and provides a technical outline of modern heuristic techniques, is tailored for beginners. However, important approaches such as evolutionary strategies and genetic programming are not addressed. Readers will not find discussions of actual technological applications in this chapter.

Chapter 2 demonstrates the use of approximation of objective criteria and proximity information in the development of scalable and robust methods in data clustering. Readers will find an overview of clustering methods in this chapter.

Chapter 3 begins with the topic of evolutionary algorithms, focusing on feature extraction, feature selection and classification and surveys some state of the art applications of evolutionary algorithm. Some of the interesting applications include synthetic aperture radar imaging, image processing, and neural networks.

The interesting concept of nugget discovery, together with concepts of interest and fitness, are introduced in chapter 4. Specific heuristic algorithms, namely genetic algorithms, simulated annealing and tabu searches, are introduced in the data‐mining section. A comparison study using four databases from the UCI repository is used to demonstrate these interesting concepts.

Data mining often runs into the problem of high dimensionality. Chapter 5 discusses the problem of large dimensionality and offers feature subset selection as a solution. Four probabilistic models (PBIL, BSC, MIMIC and TREE) that enable factorisation to be performed are used.

Continuing with the problem of large sample size, two new approaches are proposed in Chapter 6. Evolution algorithms and Bayesian learning have been very successful in data mining applications. This chapter discusses a new approach with the fusion of a simple local Bayesian engine and learning classifier system rule‐based architecture.

A more practical and applied focus is adopted in Chapter 7. This short chapter presents the real world problem of detection and classification of unexploded ordinance.

Chapter 8 begins Part 3 on genetic programming. What happens when we use a GP search? What happens when the system becomes stagnant? This chapter offers guidelines to overcome the problem of exponential code growth leading to stagnation. The methods suggested are physical means, parsimony pressure, alternative selection and alternative crossover.

A new building‐block approach in rule extraction using genetic programming is presented in Chapter 9. A comparative study between the traditional machine‐learning algorithm of C4.5 and CN2 using the UCI database has been performed and experimental results have shown that BGP is more accurate.

Chapter 10 begins Part 4 with the topics “ant colony optimisation” and “immune systems”. An algorithm for rule discovery called Ant‐Miner is presented from recent research.

Researchers in the field of immunological computation have recently adopted the novel approach of using immune system characteristics, notably the highly distributed, adaptive, self‐organising memory registry and continuous learning ability, to solve computational problems in data mining. Chapter 11 presents this interesting topic well. However, the issue of scalability has yet to be substantially addressed.

Chapter 12 is devoted to a novel artificial immune network model with the aim of clustering and filtering high dimensional data samples.

The book concludes with a final chapter on parallel data mining adopted from the concept of parallel computing and databases to enhance the efficiency of data mining computation. This is a key new text on the topic. The mathematics required will defeat many, but for specialists in the field it is a recommended purchase.

Related articles