Availability of short amino acid sequences in proteins | Zendy

Otaki Joji M. | Zendy; Ienaka Shunsuke | Zendy; Gotoh Tomonori | Zendy; Yamamoto Haruhiko | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Availability of short amino acid sequences in proteins

Author(s) -

Otaki Joji M.,

Ienaka Shunsuke,

Gotoh Tomonori,

Yamamoto Haruhiko

Publication year - 2005

Publication title -

protein science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.353

H-Index - 175

eISSN - 1469-896X

pISSN - 0961-8368

DOI - 10.1110/ps.041092605

Subject(s) - proteome , context (archaeology) , computational biology , structural classification of proteins database , biology , amino acid , similarity (geometry) , protein structure database , population , fusion protein , escherichia coli , database , protein structure , biochemistry , computer science , sequence database , gene , artificial intelligence , paleontology , demography , sociology , image (mathematics) , recombinant dna

Much attention is being paid to protein databases as an important information source for proteome research. Although used extensively for similarity searches, protein databases themselves have not fully been characterized. In a systematic attempt to reveal protein‐database characters that could contribute to revealing how protein chains are constructed, frequency distributions of all possible combinatorial sets of three, four, and five amino acids (“triplets,” “quartets,” and “pentats”; collectively called constituent sequences) have been examined in the nonredundant (nr) protein database, demonstrating the existence of nonrandom bias in their “availability” at the population level. Nonexistent short sequences of pentats were found that showed low availability in biological proteins against their expected probabilities of occurrence. Among them, six representative ones were successfully synthesized as peptides with reasonably high yields in a conventional Fmoc method, excluding the possibility that a putative physicochemical energy barrier in forming them could be a direct cause for the low availability. They were also expressed as soluble fusion proteins in a conventional Escherichia coli BL21Star(DE3) system with reasonably high yield, again excluding a possible difficulty in their biological synthesis. Together, these results suggest that information on three‐dimensional structures and functions of proteins exists in the context of connections of short constituent sequences, and that proteins are composed of evolutionarily selected constituent sequences, which are reflected in their availability differences in the database. These results may have biological implications for protein structural studies.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research