Premium
Toward more meaningful hierarchical classification of protein three‐dimensional structures
Author(s) -
May Alex C.W.
Publication year - 1999
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/(sici)1097-0134(19991001)37:1<20::aid-prot3>3.0.co;2-v
Subject(s) - mathematics , protein family , similarity measure , hierarchical clustering , jackknife resampling , tree structure , cluster analysis , similarity (geometry) , pattern recognition (psychology) , robustness (evolution) , artificial intelligence , computer science , combinatorics , statistics , biology , biochemistry , estimator , gene , binary tree , image (mathematics)
Recently, several hierarchical classifications of protein three‐dimensional (3‐D) structures have been published. However, none of them provides any assessment of the validity of a hierarchical representation or test individual clusters contained within. In fact, testing here of published trees reveals that they vary in meaning. Protein structure similarity measures are then assessed in terms of the robustness of the resulting trees for 24 protein families. A meaningful tree is defined as one in which all the clusters are found to be reliable according to a jackknife test. With the use of this criterion, a previously published similarity measure described as a “better RMS” is shown in fact to be usually less suited to protein fold classification than normal RMS after superposition. Here the “best” protein structure similarity measure for hierarchical classification—in terms of that which after clustering produces the highest number of meaningful trees, 20, for the 24 families—is found to be a new one. This measure includes information on the relationship of a distance at a given aligned position in a pair to the rest of the unique distances at that position in a protein family. There are only 2 families of the 24 tested, the globins (3 trees) and Kazal‐type serine proteinase inhibitors (21 trees), in which the topology (branching order) of the meaningful 3D structure‐based trees is constant. Thus, a new view of protein family sequence‐structure relationships is afforded by comparing meaningful trees for each family. More generally, there is a need for care in interpretation of the results of those molecular biology algorithms that force a tree structure on data without assessing its applicability. Proteins 1999;37:20–29. © 1999 Wiley‐Liss, Inc.