
Hidden Markov models of biological primary sequence information.
Author(s) -
Pierre Baldi,
Yves Chauvin,
Tim Hunkapiller,
Marcella A. McClure
Publication year - 1994
Publication title -
proceedings of the national academy of sciences of the united states of america
Language(s) - English
Resource type - Journals
eISSN - 1091-6490
pISSN - 0027-8424
DOI - 10.1073/pnas.91.3.1059
Subject(s) - hidden markov model , markov chain , sequence (biology) , markov model , computer science , pattern recognition (psychology) , sequence motif , computational biology , algorithm , artificial intelligence , mathematics , biology , genetics , machine learning , gene
Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences.