Premium
Deriving the phylogenetic information from some physicochemical properties of protein sequences computed
Author(s) -
Chiu ShihHau,
Chen ChienChi,
Yuan GwoFang,
Lin ThyHou
Publication year - 2010
Publication title -
journal of computational chemistry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.907
H-Index - 188
eISSN - 1096-987X
pISSN - 0192-8651
DOI - 10.1002/jcc.21599
Subject(s) - phylogenetic tree , support vector machine , phylogenetic nomenclature , phylogenetics , classifier (uml) , computational biology , genome , artificial intelligence , feature selection , pattern recognition (psychology) , biology , gene , computer science , evolutionary biology , genetics , clade
The evolutionary relationships of organisms are traditionally delineated by the alignment‐based methods using some DNA or protein sequences. In the post‐genome era, the phylogenetics of life could be inferred from many sources such as genomic features, not just from comparison of one or several genes. To investigate the possibility that the physicochemical properties of protein sequences might reflect the phylogenetic ones, an alignment‐free method using a support vector machine (SVM) classifier is implemented to establish the phylogenetic relationships between some protein sequences. There are two types of datasets, namely, the “Enzymatic” (assigned by an EC accession) and “Proteins” used to train the SVM classifiers. By computing the F ‐score for feature selection, we find that the classification accuracies of trained SVM classifiers could be significantly enhanced to 84% and 80%, respectively, for the enzymatic and “proteins” datasets classified if the protein sequences are represented with some top 255 features selected. These show that some physicochemical features of amino acid sequences selected are sufficient for inferring the phylogenetic properties of the protein sequences. Moreover, we find that the selected physicochemical features appear to correlate with the physiological characteristic of the taxonomic classes classified. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010