
iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC
Author(s) -
Jianhua Jia,
Zi Li,
Xuan Xiao,
Bingxiang Liu,
KuoChen Chou
Publication year - 2016
Publication title -
oncotarget
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.373
H-Index - 127
ISSN - 1949-2553
DOI - 10.18632/oncotarget.9148
Subject(s) - carbonylation , pseudo amino acid composition , protein sequencing , computational biology , protein carbonylation , proteomics , proteome , drug discovery , lysine , bioinformatics , computer science , peptide sequence , chemistry , combinatorial chemistry , biology , biochemistry , amino acid , glutathione , enzyme , carbon monoxide , catalysis , dipeptide , gene
Carbonylation is a posttranslational modification (PTM or PTLM), where a carbonyl group is added to lysine (K), proline (P), arginine (R), and threonine (T) residue of a protein molecule. Carbonylation plays an important role in orchestrating various biological processes but it is also associated with many diseases such as diabetes, chronic lung disease, Parkinson's disease, Alzheimer's disease, chronic renal failure, and sepsis. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of K, P, R, or T, which ones can be carbonylated, and which ones cannot? To address this problem, we have developed a predictor called iCar-PseCp by incorporating the sequence-coupled information into the general pseudo amino acid composition, and balancing out skewed training dataset by Monte Carlo sampling to expand positive subset. Rigorous target cross-validations on a same set of carbonylation-known proteins indicated that the new predictor remarkably outperformed its existing counterparts. For the convenience of most experimental scientists, a user-friendly web-server for iCar-PseCp has been established at http://www.jci-bioinfo.cn/iCar-PseCp, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics.