Premium
Improved methods for extracting frequent itemsets from interim‐support trees
Author(s) -
Coenen F.,
Leng P.,
Pagourtzis A.,
Rytter W.,
Souliou D.
Publication year - 2008
Publication title -
software: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.437
H-Index - 70
eISSN - 1097-024X
pISSN - 0038-0644
DOI - 10.1002/spe.902
Subject(s) - computer science , trie , tree (set theory) , data mining , pruning , node (physics) , speedup , tree structure , search tree , theoretical computer science , task (project management) , interim , data structure , algorithm , mathematics , search algorithm , binary tree , parallel computing , mathematical analysis , structural engineering , agronomy , biology , programming language , engineering , history , management , archaeology , economics
Mining association rules in relational databases is a significant computational task with lots of applications. A fundamental ingredient of this task is the discovery of sets of attributes ( itemsets ) whose frequency in the data exceeds some threshold value. In this paper we describe two algorithms for completing the calculation of frequent sets using a tree structure for storing partial supports, called interim‐support (IS) tree. The first of our algorithms ( T ‐Tree‐First (TTF)) uses a novel tree pruning technique, based on the notion of ( fixed‐prefix ) potential inclusion , which is specially designed for trees that are implemented using only two pointers per node. This allows to implement the IS tree in a space‐efficient manner. The second algorithm ( P ‐Tree‐First (PTF)) explores the idea of storing the frequent itemsets in a second tree structure, called the total support tree ( T ‐tree); the main innovation lies in the use of multiple pointers per node, which provides rapid access to the nodes of the T ‐tree and makes it possible to design a new, usually faster, method for updating them. Experimental comparison shows that these techniques result in considerable speedup for both algorithms compared with earlier approaches that also use IS trees ( Principles of Data Mining and Knowledge Discovery, Proceedings of the 5th European Conference, PKDD, 2001 , Freiburg, September 2001 ( Lecture Notes in Artificial Intelligence , vol. 2168). Springer: Berlin, Heidelberg, 54–66; Journal of Knowledge‐Based Syst . 2000; 13 :141–149). Further comparison between the two new algorithms, shows that the PTF is generally faster on instances with a large number of frequent itemsets, provided that they are relatively short, whereas TTF is more appropriate whenever there exist few or quite long frequent itemsets; in addition, TTF behaves well on instances in which the densities of the items of the database have a high variance. Copyright © 2008 John Wiley & Sons, Ltd.