
Indonesian Online News Topics Classification using Word2Vec and K-Nearest Neighbor
Author(s) -
Nur Ghaniaviyanto Ramadhan
Publication year - 2021
Publication title -
jurnal resti (rekayasa sistem dan teknologi informasi)
Language(s) - English
Resource type - Journals
ISSN - 2580-0760
DOI - 10.29207/resti.v5i6.3547
Subject(s) - word2vec , computer science , random forest , k nearest neighbors algorithm , the internet , support vector machine , tf–idf , artificial intelligence , word embedding , sentiment analysis , statistical classification , data mining , information retrieval , natural language processing , world wide web , embedding , term (time) , physics , quantum mechanics
News is information disseminated by newspapers, radio, television, the internet, and other media. According to the survey results, there are many news titles from various topics spread on the internet. This of course makes newsreaders have difficulty when they want to find the desired news topic to read. These problems can be solved by grouping or so-called classification. The classification process is carried out of course by using a computerized process. This study aims to classify several news topics in Indonesian language using the KNN classification model and word2vec to convert words into vectors which aim to facilitate the classification process. The use of KNN in this study also determines the optimal K value to be used. In addition to using the classification model, this study also uses a word embedding-based model, namely word2vec. The results obtained using the word2vec and KNN models have an accuracy of 89.2% with a value of K=7. The word2vec and KNN models are also superior to the support vector machine, logistic regression, and random forest classification models.