Premium
The Protein‐Coding Human Genome: Annotating High‐Hanging Fruits
Author(s) -
Hatje Klas,
Mühlhausen Stefanie,
Simm Dominic,
Kollmar Martin
Publication year - 2019
Publication title -
bioessays
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.175
H-Index - 184
eISSN - 1521-1878
pISSN - 0265-9247
DOI - 10.1002/bies.201900066
Subject(s) - pseudogene , human genome , biology , genome , computational biology , splice , gene , exon , genetics , alternative splicing , annotation , gene prediction , rna splicing , genome project , gene annotation , rna
Abstract The major transcript variants of human protein‐coding genes are annotated to a certain degree of accuracy combining manual curation, transcript data, and proteomics evidence. However, there is considerable disagreement on the annotation of about 2000 genes—they can be protein‐coding, noncoding, or pseudogenes—and on the annotation of most of the predicted alternative transcripts. Pure transcriptome mapping approaches seem to be limited in discriminating functional expression from noise. These limitations have partially been overcome by dedicated algorithms to detect alternative spliced micro‐exons and wobble splice variants. Recently, knowledge about splice mechanism and protein structure are incorporated into an algorithm to predict neighboring homologous exons, often spliced in a mutually exclusive manner. Predicted exons are evaluated by transcript data, structural compatibility, and evolutionary conservation, revealing hundreds of novel coding exons and splice mechanism re‐assignments. The emerging human pan‐genome is necessitating distinctive annotations incorporating differences between individuals and between populations.