Fast Protein Superfamily Classification Using Principal Component Null Space Analysis
Author(s) -
Leon French,
Alioune Ngom,
Luis Rueda
Publication year - 2005
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
ISBN - 3-540-25864-7
DOI - 10.1007/11424918_17
Subject(s) - computer science , principal component analysis , artificial intelligence , curse of dimensionality , classifier (uml) , artificial neural network , machine learning , pattern recognition (psychology) , support vector machine , data mining
The protein family classification problem, which consists of determining the family memberships of given unknown protein sequences, is very important for a biologist for many practical reasons, such as drug discovery, prediction of molecular functions and medical diagnosis Neural networks and bayesian methods have performed well on the protein classification problem, achieving accuracy ranging from 90% to 98% while running relatively slowly in the learning stage In this paper, we present a principal component null space analysis (PCNSA) linear classifier to the problem and report excellent results compared to those of neural networks and support vector machines The two main parameters of PCNSA are linked to the high dimensionality of the dataset used, and were optimized in an exhaustive manner to maximize accuracy.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom