Direct local pattern sampling by efficient two-step random procedures
Author(s) -
Mario Boley,
Claudio Lucchese,
Daniel Paurat,
Thomas Gärtner
Publication year - 2011
Publication title -
fraunhofer-publica (fraunhofer-gesellschaft)
Language(s) - English
Resource type - Conference proceedings
DOI - 10.1145/2020408.2020500
Subject(s) - computer science , scalability , estimator , sampling (signal processing) , markov chain monte carlo , controllability , algorithm , set (abstract data type) , data mining , markov process , pattern recognition (psychology) , artificial intelligence , mathematics , statistics , filter (signal processing) , bayesian probability , database , computer vision , programming language
We present several exact and highly scalable local pattern sampling algorithms. They can be used as an alternative to exhaustive local pattern discovery methods (e.g, frequent set mining or optimistic-estimator-based subgroup discovery) and can substantially improve eficiency as well as controllability of pattern discovery processes. While previous sampling approaches mainly rely on the Markov chain Monte Carlo method, our procedures are direct, i.e., non processsimulating, sampling algorithms. The advantages of these direct methods are an almost optimal time complexity per pattern as well as an exactly controlled distribution of the produced patterns. Namely, the proposed algorithms can sample (item-)sets according to frequency, area, squared frequency, and a class discriminativity measure. Experiments demonstrate that these procedures can improve the accuracy of pattern-based models similar to frequent sets and often also lead to substantial gains in terms of scalability
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom