Premium
HIV‐1 protease cleavage site prediction based on amino acid property
Author(s) -
Niu Bing,
Lu Lin,
Liu Liang,
Gu Tian Hong,
Feng KaiYan,
Lu WenCong,
Cai YuDong
Publication year - 2008
Publication title -
journal of computational chemistry
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.907
H-Index - 188
eISSN - 1096-987X
pISSN - 0192-8651
DOI - 10.1002/jcc.21024
Subject(s) - protease , feature selection , jackknife resampling , cleavage (geology) , artificial intelligence , computer science , hiv 1 protease , human immunodeficiency virus (hiv) , computational biology , mathematics , algorithm , machine learning , chemistry , biology , biochemistry , virology , enzyme , statistics , paleontology , estimator , fracture (geology)
Abstract Knowledge of the polyprotein cleavage sites by HIV protease will refine our understanding of its specificity, and the information thus acquired is useful for designing specific and efficient HIV protease inhibitors. Recently, several works have approached the HIV‐1 protease specificity problem by applying a number of classifier creation and combination methods. The pace in searching for the proper inhibitors of HIV protease will be greatly expedited if one can find an accurate, robust, and rapid method for predicting the cleavage sites in proteins by HIV protease. In this article, we selected HIV‐1 protease as the subject of the study. 299 oligopeptides were chosen for the training set, while the other 63 oligopeptides were taken as a test set. The peptides are represented by features constructed by AAIndex (Kawashima et al., Nucleic Acids Res 1999, 27, 368; Kawashima and Kanehisa, Nucleic Acids Res 2000, 28, 374). The mRMR method (Maximum Relevance, Minimum Redundancy; Ding and Peng, Proc Second IEEE Comput Syst Bioinformatics Conf 2003, 523; Peng et al., IEEE Trans Pattern Anal Mach Intell 2005, 27, 1226) combining with incremental feature selection (IFS) and feature forward search (FFS) are applied to find the two important cleavage sites and to select 364 important biochemistry features by jackknife test. Using KNN (K‐nearest neighbors) to combine the selected features, the prediction model obtains high accuracy rate of 91.3% for Jackknife cross‐validation test and 87.3% for independent‐set test. It is expected that our feature selection scheme can be referred to as a useful assistant technique for finding effective inhibitors of HIV protease, especially for the scientists in this field. © 2008 Wiley Periodicals, Inc. J Comput Chem 2009