Premium
Singular value decomposition analysis of protein sequence alignment score data
Author(s) -
Fogolari F.,
Tessari S.,
Molinari H.
Publication year - 2001
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.10032
Subject(s) - singular value decomposition , cluster analysis , computer science , computational biology , data mining , pairwise comparison , pattern recognition (psychology) , sequence analysis , dimensionality reduction , bioinformatics , algorithm , artificial intelligence , biology , genetics , gene
One of the standard tools for the analysis of data arranged in matrix form is singular value decomposition (SVD). Few applications to genomic data have been reported to date mainly for the analysis of gene expression microarray data. We review SVD properties, examine mathematical terms and assumptions implicit in the SVD formalism, and show that SVD can be applied to the analysis of matrices representing pairwise alignment scores between large sets of protein sequences. In particular, we illustrate SVD capabilities for data dimension reduction and for clustering protein sequences. A comparison is performed between SVD‐generated clusters of proteins and annotation reported in the SWISS‐PROT Database for a set of protein sequences forming the calycin superfamily, entailing all entries corresponding to the lipocalin, cytosolic fatty acid‐binding protein, and avidin–streptavidin Prosite patterns. Proteins 2002;46:161–170. © 2001 Wiley‐Liss, Inc.