Feature selection for genetic sequence classification.
Author(s) -
Nadia Chuzhanova,
Alan J. Jones,
Steve Margetts
Publication year - 1998
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/14.2.139
Subject(s) - pattern recognition (psychology) , feature (linguistics) , sequence (biology) , feature selection , feature vector , k nearest neighbors algorithm , artificial intelligence , selection (genetic algorithm) , phylogenetic tree , computer science , a priori and a posteriori , mathematics , data mining , biology , genetics , philosophy , linguistics , epistemology , gene
Most of the existing methods for genetic sequence classification are based on a computer search for homologies in nucleotide or amino acid sequences. The standard sequence alignment programs scale very poorly as the number of sequences increases or the degree of sequence identity is <30%. Some new computationally inexpensive methods based on nucleotide or amino acid compositional analysis have been proposed, but prediction results are still unsatisfactory and depend on the features chosen to represent the sequences.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom