Open Access
Machine learning prediction of novel pectinolytic enzymes in Aspergillus niger through integrating heterogeneous (post-) genomics data
Author(s) -
Mao Peng,
Ronald P. de Vries
Publication year - 2021
Publication title -
microbial genomics
Language(s) - English
Resource type - Journals
ISSN - 2057-5858
DOI - 10.1099/mgen.0.000674
Subject(s) - aspergillus niger , pectinase , enzyme , pectin , biology , computational biology , gene , genomics , microbiology and biotechnology , genome , biochemistry
Pectinolytic enzymes are a variety of enzymes involved in breaking down pectin, a complex and abundant plant cell-wall polysaccharide. In nature, pectinolytic enzymes play an essential role in allowing bacteria and fungi to depolymerize and utilize pectin. In addition, pectinases have been widely applied in various industries, such as the food, wine, textile, paper and pulp industries. Due to their important biological function and increasing industrial potential, discovery of novel pectinolytic enzymes has received global interest. However, traditional enzyme characterization relies heavily on biochemical experiments, which are time consuming, laborious and expensive. To accelerate identification of novel pectinolytic enzymes, an automatic approach is needed. We developed a machine learning (ML) approach for predicting pectinases in the industrial workhorse fungus, Aspergillus niger . The prediction integrated a diverse range of features, including evolutionary profile, gene expression, transcriptional regulation and biochemical characteristics. Results on both the training and the independent testing dataset showed that our method achieved over 90 % accuracy, and recalled over 60 % of pectinolytic genes. Application of the ML model on the A. niger genome led to the identification of 83 pectinases, covering both previously described pectinases and novel pectinases that do not belong to any known pectinolytic enzyme family. Our study demonstrated the tremendous potential of ML in discovery of new industrial enzymes through integrating heterogeneous (post-) genomimcs data.