
Implementation of hybrid sampling technique for predicting active compound and protein interaction in unbalanced dataset
Author(s) -
Wisnu Ananta Kusuma,
Ade Rahmi,
Rudi Heryanto
Publication year - 2019
Publication title -
iop conference series. earth and environmental science
Language(s) - English
Resource type - Journals
eISSN - 1755-1307
pISSN - 1755-1315
DOI - 10.1088/1755-1315/335/1/012005
Subject(s) - oversampling , sampling (signal processing) , computer science , support vector machine , data mining , machine learning , fuzzy logic , artificial intelligence , virtual screening , biology , bioinformatics , drug discovery , computer network , filter (signal processing) , bandwidth (computing) , computer vision
Indonesia Jamu Herbs (Ijah) web server aims to predict Jamu efficacy based on interaction between active compound and disease’s protein. However, the interaction between compound and protein data is unbalance since there are many unknown interactions between active compounds and protein target. Thus, the prediction result is still not optimal. In this research, the hybrid sampling technique, combining complementary fuzzy support vector machine (CMTFSVM) and synthetic minority oversampling technique (SMOTE) was used to handle imbalanced data interaction between active compound and protein for Ijah, web server to predict candidate Jamu formula for certain disease. Performance was measured using geometric mean (Gmean), area under curve (AUC), and accuracy. The evaluation results showed that the hybrid sampling technique could increase the instance of minority class three times. Moreover, the prediction model could obtain the value of 0.8346, 0.6812, and 0.5319 for accuracy, Gmean, and AUC, respectively.