Premium
Is your QSAR/QSPR descriptor real or trash?
Author(s) -
Kiralj Rudolf,
Ferreira Márcia M. C.
Publication year - 2010
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.1331
Subject(s) - quantitative structure–activity relationship , univariate , sign (mathematics) , multivariate statistics , regression , linear regression , regression analysis , mathematics , identification (biology) , feature selection , statistics , artificial intelligence , computer science , econometrics , machine learning , mathematical analysis , botany , biology
The sign change problem in quantitative structure–activity relationship (QSAR), quantitative structure–property relationship (QSPR) and related studies is the controversy related to the signs of correlation coefficients and regression coefficients of a descriptor in univariate and multivariate regressions, before and after the data split. Among 50 investigated regression models with 227 descriptors extracted from the literature, the sign change problem was shown to have a very high frequency, according to four new criteria proposed in this work for its assessment. The sign change problem can be substantially reduced and even eliminated for a given dataset by statistically based variable selection and by checking for the sign change problem before model validation and interpretation. Knowing the fundamentals of statistics related to the sign change problem, its identification and understanding aid in finding effective means to remedy regression models with this deficiency. Copyright © 2010 John Wiley & Sons, Ltd.