Using substitution probabilities to improve position-specific scoring matrices
Author(s) -
Jorja G. Henikoff,
Steven Henikoff
Publication year - 1996
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/12.2.135
Subject(s) - position (finance) , substitution (logic) , representation (politics) , set (abstract data type) , amino acid substitution , imperfect , computer science , algorithm , simple (philosophy) , mathematics , biology , mutation , genetics , linguistics , philosophy , finance , epistemology , politics , political science , law , economics , gene , programming language
Each column of amino acids in a multiple alignment of protein sequences can be represented as a vector of 20 amino acid counts. For alignment and searching applications, the count vector is an imperfect representation of a position, because the observed sequences are an incomplete sample of the full set of related sequences. One general solution to this problem is to model unobserved sequences by adding artificial 'pseudo-counts' to the observed counts. We introduce a simple method for computing pseudo-counts that combines the diversity observed in each alignment position with amino acid substitution probabilities. In extensive empirical tests, this position-based method out-performed other pseudo-count methods and was a substantial improvement over the traditional average score method used for constructing profiles.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom