z-logo
open-access-imgOpen Access
Assessment of Twitter Data Clusters with Cosine-Based Validation Metrics Using Hybrid Topic Models
Author(s) -
Noorullah Renigunta Mohammed,
Moulana Mohammed
Publication year - 2020
Publication title -
ingénierie des systèmes d information
Language(s) - English
Resource type - Journals
eISSN - 2116-7125
pISSN - 1633-1311
DOI - 10.18280/isi.250606
Subject(s) - cluster analysis , computer science , cosine similarity , set (abstract data type) , data mining , metric (unit) , trigonometric functions , data set , euclidean distance , cluster (spacecraft) , artificial intelligence , information retrieval , pattern recognition (psychology) , mathematics , engineering , programming language , operations management , geometry
Received: 6 August 2020 Accepted: 17 October 2020 Text data clustering is performed for organizing the set of text documents into the desired number of coherent and meaningful sub-clusters. Modeling the text documents in terms of topics derivations is a vital task in text data clustering. Each tweet is considered as a text document, and various topic models perform modeling of tweets. In existing topic models, the clustering tendency of tweets is assessed initially based on Euclidean dissimilarity features. Cosine metric is more suitable for more informative assessment, especially of text clustering. Thus, this paper develops a novel cosine based external and interval validity assessment of cluster tendency for improving the computational efficiency of tweets data clustering. In the experimental, tweets data clustering results are evaluated using cluster validity indices measures. Experimentally proved that cosine based internal and external validity metrics outperforms the other using benchmarked and Twitter-based datasets.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom