Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm | Zendy

Nan-Chao Luo | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm

Author(s) -

Nan-Chao Luo

Publication year - 2019

Publication title -

journal of advanced computational intelligence and intelligent informatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.172

H-Index - 20

eISSN - 1343-0130

pISSN - 1883-8014

DOI - 10.20965/jaciii.2019.p0362

Subject(s) - computer science , cluster analysis , data mining , canopy clustering algorithm , feature vector , centroid , cure data clustering algorithm , set (abstract data type) , data set , hierarchical clustering , web mining , data stream clustering , correlation clustering , algorithm , pattern recognition (psychology) , artificial intelligence , web page , world wide web , programming language

The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research