z-logo
Premium
Correlation weighted successive projections algorithm as a novel method for variable selection in QSAR studies: investigation of anti‐HIV activity of HEPT derivatives
Author(s) -
KompanyZareh Mohsen,
Akhlaghi Yousef
Publication year - 2007
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.1073
Subject(s) - quantitative structure–activity relationship , mathematics , feature selection , molecular descriptor , correlation coefficient , test set , linear regression , selection (genetic algorithm) , cross validation , algorithm , statistics , artificial intelligence , computer science , chemistry , stereochemistry
Correlation weighted successive projections algorithm (CWSPA), as a modified version of successive projections algorithm (SPA), is proposed for selection of descriptors in the non‐linear quantitative structure‐activity relationship (QSAR) study of a series of 1‐[2‐hydroxyethoxy‐methyl]‐6‐(phenylthio)thymine] (HEPT) derivatives, as non‐nucloside reverse transcriptase inhibitors (NNRTIs). In the proposed procedure the correlation coefficient of each descriptor with the activities ( r g ) was an additional criterion for selection of descriptors. The extent of contribution of r in the selection of variables, m , was also optimized and r   g 4 ‐CWSPA was the selected condition ( m  = 4). Three layer radial basis function networks (RBFNs) and molecular descriptors derived solely from molecular structure were used to construct the non‐linear QSAR models. Utilizing r   g 4 ‐CWSPA a limited number of uncorrelated and informative descriptors were selected. The relative standard error percent in anti‐HIV activity predictions for the training set by the application of cross‐validation (RSECV%) was 9.77%, and for prediction set (RSEP%) was 8.61% when the selected number of descriptors were 20. The obtained model outperforms those given in the literature in both the fitting and predicting stages. RBFN analysis yielded predicted activities in an acceptable agreement with the experimentally obtained values (cross‐validation r  = 0.924, prediction r  = 0.939). Compared to SPA, r   g m ‐CWSPA resulted in a lower RSECV% and RSEP% values using lower number of selected variables. The results show that considering the correlation of variables to the independent variables increase the performance of selection, as a result, the quality of the set of selected variables. Finally, a simple procedure for selection of variables using r   g 4 ‐CWSPA was proposed in which there was no need to test all possible initial descriptors. The results from the simple procedure were comparable to the procedure in which all of the possible initial descriptors were tested. The proposed method was successfully validated by five different training and test sets. Copyright © 2007 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here