z-logo
Premium
The Effect of Resampling on Data‐imbalanced Conditions for Prediction towards Nuclear Receptor Profiling Using Deep Learning
Author(s) -
Lee Yong Oh,
Kim Young Jun
Publication year - 2020
Publication title -
molecular informatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.481
H-Index - 68
eISSN - 1868-1751
pISSN - 1868-1743
DOI - 10.1002/minf.201900131
Subject(s) - resampling , chemical toxicity , artificial intelligence , prioritization , machine learning , computer science , toxicogenomics , profiling (computer programming) , toxicity , in silico , data mining , chemistry , biochemistry , gene expression , organic chemistry , management science , economics , gene , operating system
In toxicity evaluation based on the nuclear receptor signalling pathway, in silico prediction tools are used for the detection of the early stages of long‐term toxicities, the prioritization of newly synthesized chemicals and the acquisition of the selectivity and sensitivity. Computational prediction model is one of the promising tools for the toxicity screening of the chemical‐protein interaction as deep learning has been improved the prediction accuracies. However, the challenge is that data‐imbalanced conditions, where the volume of toxic chemical compound dataset is much smaller than the nontoxic dataset, result in low prediction accuracy of the toxic dataset providing valid information to toxicity hazard. In this paper, we have examined the effect of data imbalance in the toxicity assessment data of AR (LBD), ER (LBD), AhR, and PPAR as nuclear receptors, and identified the severe imbalance between the prediction of the toxic and nontoxic datasets. As the acquisition of the balanced selectivity and sensitivity is required for the assessment of toxicity hazards, data resampling methods have been investigated in order to improve the bias problem in binary classification for toxicity hazard profiling of nuclear receptor. The experimental results achieved a sensitivity of 0.714 and a specificity of 0.787, with an overall accuracy of 0.829 and a ROC‐AUC of 0.822 by the simple resampling methods.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here