Exploiting associations between word clusters and document classes for cross‐domain text categorization † | Zendy

Zhuang Fuzhen | Zendy; Luo Ping | Zendy; Xiong Hui | Zendy; He Qing | Zendy; Xiong Yuhong | Zendy; Shi Zhongzhi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Exploiting associations between word clusters and document classes for cross‐domain text categorization †

Author(s) -

Zhuang Fuzhen,

Luo Ping,

Xiong Hui,

He Qing,

Xiong Yuhong,

Shi Zhongzhi

Publication year - 2011

Publication title -

statistical analysis and data mining: the asa data science journal

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.381

H-Index - 33

eISSN - 1932-1872

pISSN - 1932-1864

DOI - 10.1002/sam.10099

Subject(s) - computer science , domain (mathematical analysis) , categorization , word (group theory) , artificial intelligence , natural language processing , matrix decomposition , exploit , text categorization , non negative matrix factorization , data mining , information retrieval , pattern recognition (psychology) , mathematics , mathematical analysis , eigenvalues and eigenvectors , physics , geometry , computer security , quantum mechanics

Cross‐domain text categorization targets on adapting the knowledge learnt from a labeled source domain to an unlabeled target domain, where the documents from the source and target domains are drawn from different distributions. However, in spite of the different distributions in raw‐word features, the associations between word clusters (conceptual features) and document classes may remain stable across different domains. In this paper, we exploit these unchanged associations as the bridge of knowledge transformation from the source domain to the target domain by the non‐negative matrix tri‐factorization. Specifically, we formulate a joint optimization framework of the two matrix tri‐factorizations for the source‐ and target‐domain data, respectively, in which the associations between word clusters and document classes are shared between them. Then, we give an iterative algorithm for this optimization and theoretically show its convergence. The comprehensive experiments show the effectiveness of this method. In particular, we show that the proposed method can deal with some difficult scenarios where baseline methods usually do not perform well. © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 4: 100–114, 2011

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research