Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics | Zendy

Olivier Bastien | Zendy; Jean-Christophe Aude | Zendy; Sylvaine Roy | Zendy; Éric Maréchal | Zendy

Open Access

Fundamentals of massive automatic pairwise alignments of protein sequences: theoretical significance of Z-value statistics

Author(s) -

Olivier Bastien,

Jean-Christophe Aude,

Sylvaine Roy,

Éric Maréchal

Publication year - 2004

Publication title -

bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.599

H-Index - 390

eISSN - 1367-4811

pISSN - 1367-4803

DOI - 10.1093/bioinformatics/btg440

Subject(s) - pairwise comparison , value (mathematics) , statistics , computer science , mathematics

Different automatic methods of sequence alignments are routinely used as a starting point for homology searches and function inference. Confidence in an alignment probability is one of the major fundamentals of massive automatic genome-scale pairwise comparisons, for clustering of putative orthologs and paralogs, sequenced genome annotation or multiple-genomic tree constructions. Extreme value distribution based on the Karlin-Altschul model, usually advised for large-scale comparisons are not always valid, particularly in the case of comparisons of non-biased with nucleotide-biased genomes (such that of Plasmodium falciparum). Z-values estimates based on Monte Carlo technics, can be calculated experimentally for any alignment output, whatever the method used. Empirically, a Z-value higher than approximately 8 is supposed reasonable to assess that an alignment score is significant, but this arbitrary figure was never theoretically justified.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research