Premium
The path of no return—Truncated protein N‐termini and current ignorance of their genesis
Author(s) -
Fortelny Nikolaus,
Pavlidis Paul,
Overall Christopher M.
Publication year - 2015
Publication title -
proteomics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.26
H-Index - 167
eISSN - 1615-9861
pISSN - 1615-9853
DOI - 10.1002/pmic.201500043
Subject(s) - annotation , proteome , biology , computational biology , human proteome project , protein function , rna splicing , alternative splicing , cleavage (geology) , proteomics , genetics , bioinformatics , gene , messenger rna , rna , paleontology , fracture (geology)
Almost all regulatory processes in biology ultimately lead to or originate from modifications of protein function. However, it is unclear to which extent each mechanism of regulation actually affects proteins and thus phenotypes. We assessed the extent of N‐terminal protein truncation in a global analysis of N‐terminomics data and find that most proteins have N‐terminally truncated proteoforms. Because N‐terminomics analyses do not identify the process generating the identified N‐termini, we compared identified termini to the three N‐termini generating events: protein cleavage, alternative translation, and alternative splicing. Of these, we sought to identify the most likely cause of N‐terminal protein truncations in the human proteome. We found that protease cleavage and alternative protein translation are the likely cause for most shortened proteoforms. However, the vast majority (about 90%) of N‐termini remain unexplained by any of these processes identified to date, so revealing large gaps in our knowledge of protein termini and their genesis. Further analysis and annotation of terminomics data is required, to which end we have created the TopFIND database, a major systematic annotation effort for protein termini. We outline the new features in version 3.0 of the updated database and the new bioinformatics tools available and encourage submission of generated data to fill current knowledge gaps.