z-logo
open-access-imgOpen Access
Interactive Rare-Category-of-Interest Mining from Large Datasets
Author(s) -
Zhenguang Liu,
Sihao Hu,
Yifang Yin,
Jianhai Chen,
Kevin Chiew,
Luming Zhang,
Zetian Wu
Publication year - 2020
Publication title -
proceedings of the aaai conference on artificial intelligence
Language(s) - English
Resource type - Journals
eISSN - 2374-3468
pISSN - 2159-5399
DOI - 10.1609/aaai.v34i04.5935
Subject(s) - computer science , benchmark (surveying) , rare events , data mining , process (computing) , task (project management) , information retrieval , construct (python library) , similarity (geometry) , encode , artificial intelligence , mathematics , biochemistry , statistics , chemistry , management , geodesy , economics , image (mathematics) , gene , programming language , geography , operating system
In the era of big data, rare category data examples are often of key importance despite their scarcity, e.g., rare bird audio is usually more valuable than common bird audio. However, existing efforts on rare category mining consider only the statistical characteristics of rare category data examples, while ignoring their ‘true’ interestingness to the user. Moreover, current approaches are unable to support real-time user interactions due to their prohibitive computational costs for answering a single user query.In this paper, we contribute a new model named IRim, which can interactively mine rare category data examples of interest over large datasets. The mining process is carried out by two steps, namely rare category detection (RCD) followed by rare category exploration (RCE). In RCD, by introducing an offline phase and high-level knowledge abstractions, IRim reduces the time complexity of answering a user query from quadratic to logarithmic. In RCE, by proposing a collaborative-reconstruction based approach, we are able to explicitly encode both user preference and rare category characteristics. Extensive experiments on five diverse real-world datasets show that our method achieves the response time in seconds for user interactions, and outperforms state-of-the-art competitors significantly in accuracy and number of queries. As a side contribution, we construct and release two benchmark datasets which to our knowledge are the first public datasets tailored for rare category mining task.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom