z-logo
Premium
Exploiting temporal characteristics of features for effectively discovering event episodes from news corpora
Author(s) -
Wei ChihPing,
Lee YenHsien,
Chiang YuSheng,
Chen ChunTa,
Yang Christopher C.C.
Publication year - 2014
Publication title -
journal of the association for information science and technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.903
H-Index - 145
eISSN - 2330-1643
pISSN - 2330-1635
DOI - 10.1002/asi.22995
Subject(s) - computer science , benchmark (surveying) , tf–idf , event (particle physics) , cluster analysis , information retrieval , data mining , hierarchical clustering , selection (genetic algorithm) , artificial intelligence , term (time) , geography , physics , geodesy , quantum mechanics
An organization performing environmental scanning generally monitors or tracks various events concerning its external environment. One of the major resources for environmental scanning is online news documents, which are readily accessible on news websites or infomediaries. However, the proliferation of the World Wide Web, which increases information sources and improves information circulation, has vastly expanded the amount of information to be scanned. Thus, it is essential to develop an effective event episode discovery mechanism to organize news documents pertaining to an event of interest. In this study, we propose two new metrics, Term Frequency × Inverse Document Frequency Tempo ( TF × IDF Tempo ) and TF ×Enhanced‐ IDF Tempo , and develop a temporal‐based event episode discovery ( TEED ) technique that uses the proposed metrics for feature selection and document representation. Using a traditional TF × IDF ‐based hierarchical agglomerative clustering technique as a performance benchmark, our empirical evaluation reveals that the proposed TEED technique outperforms its benchmark, as measured by cluster recall and cluster precision. In addition, the use of TF ×Enhanced‐ IDF Tempo significantly improves the effectiveness of event episode discovery when compared with the use of TF × IDF Tempo .

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here