Short Text Document Clustering using Distributed Word Representation and Document Distance | Zendy

Supavit Kongwudhikunakorn | Zendy; Kitsana Waiyamai | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Short Text Document Clustering using Distributed Word Representation and Document Distance

Author(s) -

Supavit Kongwudhikunakorn,

Kitsana Waiyamai

Publication year - 2018

Publication title -

walailak journal of science and technology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.146

H-Index - 15

eISSN - 2228-835X

pISSN - 1686-3933

DOI - 10.48048/wjst.2019.4133

Subject(s) - document clustering , cluster analysis , computer science , word (group theory) , rand index , representation (politics) , information retrieval , natural language processing , artificial intelligence , metric (unit) , n gram , index (typography) , tf–idf , similarity (geometry) , data mining , language model , mathematics , world wide web , term (time) , political science , law , economics , physics , image (mathematics) , quantum mechanics , operations management , geometry , politics

This paper presents a method for clustering short text documents, such as instant messages, SMS, or news headlines. Vocabularies in the texts are expanded using external knowledge sources and represented by a Distributed Word Representation. Clustering is done using the K-means algorithm with Word Mover's Distance as the distance metric. Experiments were done to compare the clustering quality of this method, and several leading methods, using large datasets from BBC headlines, SearchSnippets, StackExchange, and Twitter. For all datasets, the proposed algorithm produced document clusters with higher accuracy, precision, F1-score, and Adjusted Rand Index. We also observe that cluster description can be inferred from keywords represented in each cluster.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore