Premium
Performance of smoothly clipped absolute deviation as a variable selection method in the artificial neural network‐based QSAR studies
Author(s) -
Mozafari Zeinab,
Arab Chamjangali Mansour,
Arashi Mohammad,
Goudarzi Nasser
Publication year - 2021
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.3338
Subject(s) - quantitative structure–activity relationship , artificial neural network , test set , loo , mean squared error , applicability domain , mathematics , feature selection , scad , correlation coefficient , artificial intelligence , computer science , statistics , machine learning , psychology , psychiatry , myocardial infarction
A hybrid of smoothly clipped absolute deviation (SCAD) and Levenberg‐ Marquardt artificial neural network (LM‐ANN) was used as an efficient approach (SCAD‐LM‐ANN) in the ANN‐based quantitative structure–activity relationship (QSAR) studies. The proposed technique (SCAD‐LM‐ANN) exploits the useful shrinkage nature of SCAD in the reduction of high dimensional data prior to modeling by the robust LM‐ANN method. The performance of the method was examined by establishing a QSAR model between about 3224 Dragon‐derived descriptors and biological activities (pEC 50 ) for a set of thioacetamide/acetanilide derivatives as HIV inhibitors. SCAD method with the tenfold random splitting of data was applied to the data set in the absence of compounds in the external test set. A number of 11 descriptors were selected at a λ with the lowest cross‐validation error (λ min ). The selected descriptors were used as inputs of the LM‐ANN modeling method. All parameters affecting model performance were optimized, and the LM‐ANN model with the architecture of 5‐5‐1 was selected as the optimal QSAR model. Several statistical parameters such as determination coefficient ( R 2 ) and mean square error (MSE) were calculated for the predicted pEC 50 values for the external test set and the whole data set through the leave one out (LOO) technique. The results ( R test 2 = 0.92, MSE test = 0.12, R LOO 2 = 0.81, and MSE LOO = 0.12) prove the generalizability and predictability of the proposed SCAD‐LM‐ANN model. According to the established relationship in the recommended QSAR model, new derivatives were designed and suggested as new active HIV inhibitors for further studies. The accuracy of the suggested compounds was studied and confirmed by analyzing the ligand–receptor (LR) interactions derived from the molecular docking studies.