z-logo
Premium
Optimal QSAR analysis of the carcinogenic activity of drugs by correlation ranking and genetic algorithm‐based PCR
Author(s) -
Hemmateenejad Bahram
Publication year - 2004
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.891
Subject(s) - ranking (information retrieval) , quantitative structure–activity relationship , principal component analysis , correlation , set (abstract data type) , eigenvalues and eigenvectors , mathematics , cross validation , computer science , statistics , data mining , machine learning , physics , geometry , quantum mechanics , programming language
The major problem associated with principal component regression (PCR), especially in QSAR studies, is that this model extracts the eigenvectors solely from the matrix of predictor variables and therefore they might not have an essentially good relationship with the predicted variable. This paper describes the application of PCR to model the structure–carcinogenic activity of drugs. To obtain the optimal model, correlation ranking and a genetic algorithm were employed for selecting the best set of principal components (PCs). A large data set containing 735 carcinogenic activities and 1355 descriptors was used. Two cross‐validation procedures (leave‐many‐out and ν‐fold cross‐validation) and the hold‐out‐a‐test‐sample (HOTS) method were used to validate the models. It was found that introduction of PCs by the conventional eigenvalue ranking procedure did not produce the perfect model. Instead, factor selection by correlation ranking and genetic algorithm produced good models of similar quality. The models could explain more than 80% of the variances in carcinogenic activity. Copyright © 2005 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here