Premium
Why are there four bases in DNA?
Author(s) -
Seybold Paul G.
Publication year - 2009
Publication title -
international journal of quantum chemistry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.484
H-Index - 105
eISSN - 1097-461X
pISSN - 0020-7608
DOI - 10.1002/qua.560100708
Subject(s) - symbol (formal) , genetic code , code (set theory) , sequence (biology) , systematic code , coding (social sciences) , maximization , computer science , constraint (computer aided design) , variable length code , theoretical computer science , mathematics , arithmetic , algorithm , dna , biology , decoding methods , genetics , code rate , set (abstract data type) , statistics , programming language , mathematical optimization , geometry
The universal genetic code of living organisms consists of a linear sequence of four different nucleotides. The evolutionary choice of a four‐symbol code is examined from the perspective of the code's biological roles of information storage and transmission. Maximization of the information content of a linear sequence of symbols under the constraint of a constant number of units in a symbol pool leads to an optimum number of symbol types near three. Complementary (Watson‐Crick) replication requires that the number of symbol types be even. Reasons are presented for the choice of four rather than two symbols in the code. It is suggested that free energy considerations may have played a role in the natural selection of the coding form.