
Machine learning approaches to identify core and dispensable genes in pangenomes
Author(s) -
Yocca Alan E.,
Edger Patrick P.
Publication year - 2022
Publication title -
the plant genome
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.403
H-Index - 41
ISSN - 1940-3372
DOI - 10.1002/tpg2.20135
Subject(s) - biology , genome , gene , brachypodium distachyon , computational biology , genetics , oryza sativa , brachypodium , leverage (statistics) , artificial intelligence , computer science
A gene in a given taxonomic group is either present in every individual (core) or absent in at least a single individual (dispensable). Previous pangenomic studies have identified certain functional differences between core and dispensable genes. However, identifying if a gene belongs to the core or dispensable portion of the genome requires the construction of a pangenome, which involves sequencing the genomes of many individuals. Here we aim to leverage the previously characterized core and dispensable gene content for two grass species [ Brachypodium distachyon (L.) P. Beauv. and Oryza sativa L.] to construct a machine learning model capable of accurately classifying genes as core or dispensable using only a single annotated reference genome. Such a model may mitigate the need for pangenome construction, an expensive hurdle especially in orphan crops, which often lack the adequate genomic resources.