Premium
The applications of PCA in QSAR studies: A case study on CCR5 antagonists
Author(s) -
Yoo ChangKyoo,
Shahlaei Mohsen
Publication year - 2018
Publication title -
chemical biology and drug design
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.59
H-Index - 77
eISSN - 1747-0285
pISSN - 1747-0277
DOI - 10.1111/cbdd.13064
Subject(s) - quantitative structure–activity relationship , principal component analysis , outlier , molecular descriptor , artificial intelligence , computer science , cheminformatics , pattern recognition (psychology) , data mining , machine learning , chemistry , computational chemistry
Principal component analysis (PCA), as a well‐known multivariate data analysis and data reduction technique, is an important and useful algebraic tool in drug design and discovery. PCA, in a typical quantitative structure–activity relationship (QSAR) study, analyzes an original data matrix in which molecules are described by several intercorrelated quantitative dependent variables (molecular descriptors). Although extensively applied, there is disparity in the literature with respect to the applications of PCA in the QSAR studies. This study investigates the different applications of PCA in QSAR studies using a dataset including CCR5 inhibitors. The different types of preprocessing are used to compare the PCA performances. The use of PC plots in the exploratory investigation of matrix of descriptors is described. This work is also proved PCA analysis to be a powerful technique for exploring complex datasets in QSAR studies for identification of outliers. This study shows that PCA is able to easily apply to the pool of calculated structural descriptors and also the extracted information can be used to help decide upon an appropriate harder model for further analysis.