Premium
MiTexCube: MicroTextCluster Cube for online analysis of text cells and its applications
Author(s) -
Zhang Duo,
Zhai ChengXiang,
Han Jiawei
Publication year - 2013
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.11159
Subject(s) - computer science , overhead (engineering) , information retrieval , materialized view , space (punctuation) , data mining , data cube , representation (politics) , quality (philosophy) , view , database design , philosophy , epistemology , politics , political science , law , operating system
Abstract A fundamental problem of multidimensional text database analysis is efficient and effective support of various kinds of online applications, such as summarizing the content of a text cell or comparing the contents across multiple text cells. In this paper, we propose a new infrastructure called MicroTextCluster Cube (or MiTexCube ) to support efficient online text analysis on multidimensional text databases by introducing micro‐clusters of text documents as a compact representation of text content. Experimental results on real multidimensional text databases show that (i) MiTexCube can be materialized efficiently with reasonable overhead in space, and (ii) applications based on the proposed materialized MiTexCube are more efficient than the baseline method of direct analysis based on document units in each cell, without sacrificing much quality of analysis, and MiTexCube naturally accommodates flexible trade‐off between efficiency and quality of analysis. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 6: 243–259, 2013