Premium
Discovering arbitrary event types in time series
Author(s) -
Preston Dan,
Protopapas Pavlos,
Brodley Carla
Publication year - 2009
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.10060
Subject(s) - pruning , event (particle physics) , computer science , sliding window protocol , series (stratigraphy) , generalization , gravitational microlensing , window (computing) , data mining , time series , algorithm , machine learning , mathematics , quantum mechanics , agronomy , computer vision , paleontology , mathematical analysis , stars , physics , biology , operating system
The discovery of events in time series can have important implications, such as identifying microlensing events in astronomical surveys, or changes in a patient's electrocardiogram. Current methods for identifying events require a sliding window of a fixed size, which is not ideal for all applications and could overlook important events. In this work, we develop probability models for calculating the significance of an arbitrary‐sized sliding window and use these probabilities to find areas of significance. Because a brute force search of all sliding windows and all window sizes would be computationally intractable, we introduce a method for quickly approximating the results. We apply our method to over 100 000 astronomical time series from the MACHO survey, in which 56 different sections of the sky are considered, each with one or more known events. Our method was able to recover 100% of these events in the top 1% of the results, essentially pruning 99% of the data. Interestingly, our method was able to identify events that do not pass traditional event discovery procedures. In this extended work, we present a generalization of our algorithm to discover different event types characterized by distinct patterns. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 396‐411, 2009