Improved methods for extracting frequent itemsets from interim‐support trees | Zendy

Coenen F. | Zendy; Leng P. | Zendy; Pagourtzis A. | Zendy; Rytter W. | Zendy; Souliou D. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Improved methods for extracting frequent itemsets from interim‐support trees

Author(s) -

Coenen F.,

Leng P.,

Pagourtzis A.,

Rytter W.,

Souliou D.

Publication year - 2008

Publication title -

software: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.437

H-Index - 70

eISSN - 1097-024X

pISSN - 0038-0644

DOI - 10.1002/spe.902

Subject(s) - computer science , trie , tree (set theory) , data mining , pruning , node (physics) , speedup , tree structure , search tree , theoretical computer science , task (project management) , interim , data structure , algorithm , mathematics , search algorithm , binary tree , parallel computing , mathematical analysis , structural engineering , agronomy , biology , programming language , engineering , history , management , archaeology , economics

Mining association rules in relational databases is a significant computational task with lots of applications. A fundamental ingredient of this task is the discovery of sets of attributes ( itemsets ) whose frequency in the data exceeds some threshold value. In this paper we describe two algorithms for completing the calculation of frequent sets using a tree structure for storing partial supports, called interim‐support (IS) tree. The first of our algorithms ( T ‐Tree‐First (TTF)) uses a novel tree pruning technique, based on the notion of ( fixed‐prefix ) potential inclusion , which is specially designed for trees that are implemented using only two pointers per node. This allows to implement the IS tree in a space‐efficient manner. The second algorithm ( P ‐Tree‐First (PTF)) explores the idea of storing the frequent itemsets in a second tree structure, called the total support tree ( T ‐tree); the main innovation lies in the use of multiple pointers per node, which provides rapid access to the nodes of the T ‐tree and makes it possible to design a new, usually faster, method for updating them. Experimental comparison shows that these techniques result in considerable speedup for both algorithms compared with earlier approaches that also use IS trees ( Principles of Data Mining and Knowledge Discovery, Proceedings of the 5th European Conference, PKDD, 2001 , Freiburg, September 2001 ( Lecture Notes in Artificial Intelligence , vol. 2168). Springer: Berlin, Heidelberg, 54–66; Journal of Knowledge‐Based Syst . 2000; 13 :141–149). Further comparison between the two new algorithms, shows that the PTF is generally faster on instances with a large number of frequent itemsets, provided that they are relatively short, whereas TTF is more appropriate whenever there exist few or quite long frequent itemsets; in addition, TTF behaves well on instances in which the densities of the items of the database have a high variance. Copyright © 2008 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore