
On the sparsity of fitness functions and implications for learning
Author(s) -
David H. Brookes,
Amirali Aghazadeh,
Jennifer Listgarten
Publication year - 2021
Publication title -
proceedings of the national academy of sciences of the united states of america
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 5.011
H-Index - 771
eISSN - 1091-6490
pISSN - 0027-8424
DOI - 10.1073/pnas.2109649118
Subject(s) - fitness approximation , fitness function , epistasis , sequence (biology) , generalization , fitness landscape , mathematics , genetic fitness , function (biology) , computer science , artificial intelligence , mathematical optimization , selection (genetic algorithm) , genetic algorithm , biology , population , mathematical analysis , biochemistry , genetics , demography , evolutionary biology , sociology , gene
Significance The properties of proteins and other biological molecules are encoded in large part in the sequence of amino acids or nucleotides that defines them. Increasingly, researchers estimate functions that map sequences to a particular property using machine learning and related statistical approaches. However, an important question remains unanswered: How many experimental measurements are needed in order to accurately learn these “fitness” functions? We leverage perspectives from the fields of biophysics, evolutionary biology, and signal processing to develop a theoretical framework that enables us to make progress on answering this question. We demonstrate that this framework can be used to make useful calculations on real-world data and suggest how these calculations may be used to guide experiments.