Using programmatic motifs and genetic programming to classify protein sequences as to cellular location
Author(s) -
John R. Koza,
Forrest H Bennett,
David André
Publication year - 1998
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
ISBN - 3-540-64891-7
DOI - 10.1007/bfb0040796
Subject(s) - genetic programming , subroutine , computer science , motif (music) , artificial intelligence , computational biology , theoretical computer science , programming language , biology , physics , acoustics
As newly sequenced proteins are deposited into the world's ever- growing archives, they are typically immediately tested by various algorithms for clues as to their biological structure and function. One question about a new protein involves its cellular location - that is, where the protein resides in a living organism (e.g., extracellular, membrane, nuclear). A human-created five-way algorithm for cellular location using statistical techniques with 76% accuracy was recently reported. This paper describes a two-way algorithm that was evolved using genetic programming with 83% accuracy for determining whether a protein is an extracellular protein, 84% for nuclear proteins, 89% for membrane proteins, and 83% for anchored membrane proteins. Unlike the statistical calculation, the genetically evolved programs employ a large and varied arsenal of computational capabilities, including arithmetic functions, conditional operations, subroutines, iterations, named memory, indexed memory, set-creating operations, and look-ahead. The genetically evolved classification program can be viewed as an extension (which we call a programmatic motif) of the conventional notion of a protein motif.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom