z-logo
Premium
Information‐theoretical entropy as a measure of sequence variability
Author(s) -
Shenkin Peter S.,
Erman Batu,
Mastrandrea Lucy D.
Publication year - 1991
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.340110408
Subject(s) - entropy (arrow of time) , measure (data warehouse) , mathematics , rank (graph theory) , sequence (biology) , position (finance) , information theory , statistical physics , combinatorics , statistics , computer science , biology , genetics , physics , data mining , finance , quantum mechanics , economics
We propose the use of the information‐theoretical entropy, S = −Σp i log 2 P i , as a measure of variability at a given position in a set of aligned sequences. p i stands for the fraction of times the i‐th type appears at a position. For protein sequences, the sum has up to 20 terms, for nucleotide sequences, up to 4 terms, and for codon sequences, up to 61 terms. We compare S and V S , a related measure, in detail with V K , the traditional measure of immunoglobulin sequence variability, both in the abstract and as applied to the immunoglobulins. We conclude that S has desirable mathematical properties that V K lacks and has intuitive and statistical meanings that accord well with the notion of variability. We find that V K and the S‐based measures are highly correlated for the immunoglobulins. We show by analysis of sequence data and by means of a mathematical model that this correlation is due to a strong tendency for the frequency of occurrence of amino acid types at a given position to be log‐linear. It is not known whether the immunoglobulins are typical or atypical of protein families in this regard, nor is the origin of the observed rank‐frequency distribution obvious, although we discuss several possible etiologies.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here