z-logo
open-access-imgOpen Access
Reconstruction of evolving gene variants and fitness from short sequencing reads
Author(s) -
Max Shen,
Kevin Tianmeng Zhao,
David R. Liu
Publication year - 2021
Publication title -
nature chemical biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.412
H-Index - 216
eISSN - 1552-4469
pISSN - 1552-4450
DOI - 10.1038/s41589-021-00876-6
Subject(s) - computational biology , gene , dna sequencing , biology , genetics , evolutionary biology
Directed evolution can generate proteins with tailor-made activities. However, full-length genotypes, their frequencies and fitnesses are difficult to measure for evolving gene-length biomolecules using most high-throughput DNA sequencing methods, as short read lengths can lose mutation linkages in haplotypes. Here we present Evoracle, a machine learning method that accurately reconstructs full-length genotypes (R 2  = 0.94) and fitness using short-read data from directed evolution experiments, with substantial improvements over related methods. We validate Evoracle on phage-assisted continuous evolution (PACE) and phage-assisted non-continuous evolution (PANCE) of adenine base editors and OrthoRep evolution of drug-resistant enzymes. Evoracle retains strong performance (R 2  = 0.86) on data with complete linkage loss between neighboring nucleotides and large measurement noise, such as pooled Sanger sequencing data (~US$10 per timepoint), and broadens the accessibility of training machine learning models on gene variant fitnesses. Evoracle can also identify high-fitness variants, including low-frequency 'rising stars', well before they are identifiable from consensus mutations.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here