Premium
Prediction of β‐Lactamase Proteins using Random Forest
Author(s) -
White Clarence,
Dukka KC
Publication year - 2017
Publication title -
the faseb journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.709
H-Index - 277
eISSN - 1530-6860
pISSN - 0892-6638
DOI - 10.1096/fasebj.31.1_supplement.927.7
Subject(s) - random forest , artificial intelligence , computer science , class (philosophy) , feature (linguistics) , machine learning , sequence (biology) , algorithm , pattern recognition (psychology) , biology , genetics , philosophy , linguistics
β‐Lactamases (BL) are enzymes produced by some bacteria that provide resistance to β‐Lactam antibiotics. Resistance to β‐lactam antibiotics is an especially severe threat because these antibiotics are effective against a broad spectrum of pathogen and have very low toxicity to humans. In this work, we developed an algorithm, termed RF‐PredBL, using Random Forest (RF) to predict whether a given protein sequence is a β‐lactamase enzyme or not and if so, it's corresponding B‐lactamase class. In order to validate our method, we compared our results to a well‐known predictor program called PredLactamase (PL) using various performance metrics. In order to mitigate the problem of inconsistent accuracy detection, we use three distinct methods in this work. We increased the data we use to train our predictor, we then used extreme feature engineering, and finally modified the Feature Extraction from Protein Sequence (FEPS) code. Lastly, we used a powerful ensemble algorithm, known as Random Forest (RF), to train our predictor. In the testing of our predictor, we have determined that RF‐PredBL performs better or equally as well as the existing algorithms for BL classification. During the 10‐fold cross validation, we increased the accuracy of the classic BL prediction from 90.63% to 98.79%. We also increased the accuracy of Class A (61.82% to 98.07%), Class B (89.09% to 96.79%), Class C (70.91% to 95.94%) and Class D (70.91% to 99.84%) predictions. To further improve our predictor, we plan to add features that include physiochemical properties to our feature set and reevaluate their importance.