Premium
O‐Glycosylation Prediction Electronic Tool (OGPET): a new algorithm for prediction of O‐glycosylation sites
Author(s) -
Torres Rafael,
Almeida Igor C
Publication year - 2006
Publication title -
the faseb journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.709
H-Index - 277
eISSN - 1530-6860
pISSN - 0892-6638
DOI - 10.1096/fasebj.20.5.a1362-d
O ‐Glycosylation (OG) is a key post‐translational modification of proteins and is altered in certain pathological conditions (e.g., cancer). Due to its diagnostic/therapeutic relevance few algorithms for OG site prediction were developed. The most used one, NetOGlyc, is based on neural network analysis and showed an overall accuracy of 83% (Hansen et al., Glycoconj. J. 15:115, 1998). Here, we developed an algorithm (OGPET) that uses amino acid (aa) composition analysis at key positions around the potential Thr or Ser residue to increase the OG site prediction performance. The initial training set consisted of 242 glycoproteins from OGlycBase v6.00, with 2400 mucin‐type O ‐glycosylated sites experimentally mapped. The presence of Pro, Ala and Val, among other aa, at positions −1, +1 and +3 determines the availability of the site to be O ‐glycosylated (position 0). The scores given to each Thr or Ser residue are based upon the number of occurrences that the formed aa combination has within the sequences. Thus, the more often that specific aa combination appears, the higher the score will be. To test the performance of the algorithm we randomly selected 25 glycoproteins from O‐GlycBase and compared the OG site prediction between OGPET and NetOGlyc 3.1. The latter showed a mean of 39.2% of true‐positive hits, 33.4% of false‐positive and 27.4% false‐negative hits. In contrast, OGPET showed a much higher specificity, being able to predict 84.3% of the true‐positive OG sites and only few false‐positive (14.7%) and false‐negative (1.1%) hits were found. The OGPET software is still under development and will be available through WWW. Supported by BBRC/Biology/UTEP (NIH # 5G12RR008124).