
An Optimized K-means Algorithm for Text Clustering
Author(s) -
Junsan Zhao
Publication year - 2021
Publication title -
converter
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.104
H-Index - 1
ISSN - 0010-8189
DOI - 10.17762/converter.85
Subject(s) - cluster analysis , computer science , centrality , data mining , objectivity (philosophy) , cluster (spacecraft) , algorithm , document clustering , artificial intelligence , mathematics , combinatorics , philosophy , epistemology , programming language
In the process of data mining, the two major problems confronted by K-means clustering analysis are the determination of the initial cluster center and the valuing of k. The traditional K-means algorithm has obvious subjectivity in the above-mentioned two aspects, which will directly affect the clustering effect. In this paper, an analysis method combining relational matrix and degree centrality is proposed to determine the initial center point and the k value of K-means algorithm. The improved K-means algorithm is applied to the clustering analysis of the Chinese entrepreneurial policy text collection, and the clustered topic effects are visually displayed through the word cloud graphs. This empirical analysis not only verifies its effectiveness and objectivity for the improved algorithm in processing large clusters of long text document clusters with random unknown number of categories and category topics, but also provides an approach for the objective classification of Chinese entrepreneurial policy text collections in the meanwhile.