
Klasifikasi Dialek Pengujar Bahasa Inggris Menggunakan Random Forest
Author(s) -
Muhamad Azhar,
Hilman F. Pardede
Publication year - 2021
Publication title -
jurnal media informatika budidarma/jurnal media informatika budidarma
Language(s) - English
Resource type - Journals
eISSN - 2614-5278
pISSN - 2548-8368
DOI - 10.30865/mib.v5i2.2754
Subject(s) - random forest , computer science , feature (linguistics) , speech recognition , stress (linguistics) , mel frequency cepstrum , artificial intelligence , variety (cybernetics) , oversampling , process (computing) , feature extraction , pattern recognition (psychology) , natural language processing , machine learning , linguistics , operating system , computer network , philosophy , bandwidth (computing)
Speech recognition is one of the important research fields which is currently widely used for various applications. However, speech recognition performance is affected by the dialect of the speaker. Therefore, dialect recognition is often used as an additional feature in speech recognition. The process of recognizing dialects is not easy. Currently, Machine Learning technology is widely applied in dialect recognition. One of the challenges in the introduction of machine learning-based dialects is the imbalance of classes and overlaps in a wide variety of classification techniques. This study applies Random Forest-based oversampling technology for dialect recognition. For hyper-parameter optimization of the random forest algorithm, we apply the Grid Search method. Experiments on Speech Accent Archive data using the MFCC feature resulted in an accuracy of 0.91 and AUC of 0.95