A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus | Zendy

Jia Luo | Zendy; Dongwen Yu | Zendy; Zong Dai | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Latent Dirichlet Allocation and Fuzzy Clustering Based Machine Learning Model for Text Thesaurus

Author(s) -

Jia Luo,

Dongwen Yu,

Zong Dai

Publication year - 2020

Publication title -

international journal of computers communications and control

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.422

H-Index - 33

eISSN - 1841-9844

pISSN - 1841-9836

DOI - 10.15837/ijccc.2020.2.3811

Subject(s) - latent dirichlet allocation , computer science , word2vec , artificial intelligence , cluster analysis , topic model , text processing , machine learning , latent semantic analysis , precision and recall , fuzzy logic , process (computing) , word (group theory) , natural language processing , data mining , mathematics , embedding , geometry , operating system

It is not quite possible to use manual methods to process the huge amount of structured and semi-structured data. This study aims to solve the problem of processing huge data through machine learning algorithms. We collected the text data of the company’s public opinion through crawlers, and use Latent Dirichlet Allocation (LDA) algorithm to extract the keywords of the text, and uses fuzzy clustering to cluster the keywords to form different topics. The topic keywords will be used as a seed dictionary for new word discovery. In order to verify the efficiency of machine learning in new word discovery, algorithms based on association rules, N-Gram, PMI, andWord2vec were used for comparative testing of new word discovery. The experimental results show that the Word2vec algorithm based on machine learning model has the highest accuracy, recall and F-value indicators.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research