z-logo
Premium
Feature selection based on graph Laplacian by using compounds with known and unknown activities
Author(s) -
Sheikhpour Razieh,
Sarram Mehdi Agha,
Gharaghani Sajjad,
Chahooki Mohammad Ali Zare
Publication year - 2017
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2899
Subject(s) - quantitative structure–activity relationship , feature selection , artificial intelligence , molecular descriptor , graph , pattern recognition (psychology) , laplacian matrix , computer science , kernel (algebra) , mathematics , machine learning , data mining , theoretical computer science , combinatorics
A semisupervised feature selection method based on graph Laplacian (S 2 FSGL) was proposed for quantitative structure‐activity relationship (QSAR) models, which uses an ℓ 2,1 ‐norm and compounds with both known and unknown activities. In the proposed S 2 FSGL method, 2 graphs G unsup and G sup are constructed. It uses the label information of compounds with known activities and the local structure of compounds with known and unknown activities to select the most important descriptors. The weight matrix of graph G unsup models the local structure of the compounds with known and unknown activities. The S 2 FSGL method uses the ℓ 2,1 ‐norm to consider the correlation between different descriptors when conducting descriptor selection. The performance of the proposed S 2 FSGL coupled with a kernel smoother model was evaluated using 2 QSAR data sets and compared with the performance of other feature selection methods. For the evaluation of the performance of QSAR models and selected descriptors, several different training and test sets were produced for each data set. The comparison between the statistical parameters of QSAR models built based on the semisupervised feature selection method and those obtained by other feature selection methods revealed the superiority of the proposed S 2 FSGL in selecting the most relevant descriptors. The results showed that the use of compounds with unknown activities beside compounds with known activities can be helpful in selecting the relevant descriptors of QSAR models.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here