Premium
A generalized machine‐learning aided method for targeted identification of industrial enzymes from metagenome: A xylanase temperature dependence case study
Author(s) -
Foroozandeh Shahraki Mehdi,
Farhadyar Kiana,
Kavousi Kaveh,
Azarabad Mohammad H.,
Boroomand Amin,
Ariaeenejad Shohreh,
Hosseini Salekdeh Ghasem
Publication year - 2021
Publication title -
biotechnology and bioengineering
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.136
H-Index - 189
eISSN - 1097-0290
pISSN - 0006-3592
DOI - 10.1002/bit.27608
Subject(s) - metagenomics , xylanase , thermophile , in silico , identification (biology) , biochemical engineering , computational biology , xylose , computer science , microbiology and biotechnology , biology , machine learning , enzyme , artificial intelligence , biochemistry , engineering , fermentation , ecology , gene
Growing industrial utilization of enzymes and the increasing availability of metagenomic data highlight the demand for effective methods of targeted identification and verification of novel enzymes from various environmental microbiota. Xylanases are a class of enzymes with numerous industrial applications and are involved in the degradation of xylose, a component of lignocellulose. The optimum temperature of enzymes is an essential factor to be considered when choosing appropriate biocatalysts for a particular purpose. Therefore, in silico prediction of this attribute is a significant cost and time‐effective step in the effort to characterize novel enzymes. The objective of this study was to develop a computational method to predict the thermal dependence of xylanases. This tool was then implemented for targeted screening of putative xylanases with specific thermal dependencies from metagenomic data and resulted in the identification of three novel xylanases from sheep and cow rumen microbiota. Here we present thermal activity prediction for xylanase, a new sequence‐based machine learning method that has been trained using a selected combination of various protein features. This random forest classifier discriminates non‐thermophilic, thermophilic, and hyper‐thermophilic xylanases. The model's performance was evaluated through multiple iterations of sixfold cross‐validations as well as holdout tests, and it is freely accessible as a web‐service at arimees.com.