Open Access
A directed learning strategy integrating multiple omic data improves genomic prediction
Author(s) -
Hu Xuehai,
Xie Weibo,
Wu Chengchao,
Xu Shizhong
Publication year - 2019
Publication title -
plant biotechnology journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.525
H-Index - 115
eISSN - 1467-7652
pISSN - 1467-7644
DOI - 10.1111/pbi.13117
Subject(s) - biology , bottleneck , computational biology , genome , trait , gene , phenotype , transcriptome , quantitative trait locus , computer science , machine learning , genetics , gene expression , programming language , embedded system
Summary Genomic prediction (GP) aims to construct a statistical model for predicting phenotypes using genome‐wide markers and is a promising strategy for accelerating molecular plant breeding. However, current progress of phenotype prediction using genomic data alone has reached a bottleneck, and previous studies on transcriptomic and metabolomic predictions ignored genomic information. Here, we designed a novel strategy of GP called multilayered least absolute shrinkage and selection operator (MLLASSO) by integrating multiple omic data into a single model that iteratively learns three layers of genetic features (GFs) supervised by observed transcriptome and metabolome. Significantly, MLLASSO learns higher order information of gene interactions, which enables us to achieve a significant improvement of predictability of yield in rice from 0.1588 (GP alone) to 0.2451 (MLLASSO). In the prediction of the first two layers, some genes were found to be genetically predictable genes (GPGs) as their expressions were accurately predicted with genetic markers. Interestingly, we made three dramatic discoveries for the GPGs: (i) GPGs are good predictors for highly complex traits like yield; (ii) GPGs are mostly eQTL genes ( cis or trans ); and (iii) trait‐related transcriptional factor families are enriched in GPGs. These findings support the notion that learned GFs not only are good predictors for traits but also have specific biological implications regarding regulation of gene expressions. To differentiate the new method from conventional GP models, we called MLLASSO a directed learning strategy supervised by intermediate omic data. This new prediction model appears to be more reliable and more robust than conventional GP models.