The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis | Zendy

Nelson Gil | Zendy; András Fiser | Zendy

Open Access

The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis

Author(s) -

Nelson Gil,

András Fiser

Publication year - 2018

Publication title -

bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.599

H-Index - 390

eISSN - 1367-4811

pISSN - 1367-4803

DOI - 10.1093/bioinformatics/bty523

Subject(s) - false positive paradox , sequence (biology) , set (abstract data type) , computer science , protein superfamily , computational biology , matching (statistics) , multiple sequence alignment , conserved sequence , sequence alignment , data mining , mathematics , biology , artificial intelligence , genetics , peptide sequence , statistics , gene , programming language

The analysis of sequence conservation patterns has been widely utilized to identify functionally important (catalytic and ligand-binding) protein residues for over a half-century. Despite decades of development, on average state-of-the-art non-template-based functional residue prediction methods must predict ∼25% of a protein's total residues to correctly identify half of the protein's functional site residues. The overwhelming proportion of false positives results in reported 'F-Scores' of ∼0.3. We investigated the limits of current approaches, focusing on the so-far neglected impact of the specific choice of homologs included in multiple sequence alignments (MSAs).

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research