
Combinatorial methods for gene recognition
Author(s) -
Pavel A. Pevzner
Publication year - 1997
Language(s) - English
Resource type - Reports
DOI - 10.2172/764709
Subject(s) - exon , unix , software , gene , computer science , intron , algorithm , genetics , biology , computational biology , artificial intelligence , programming language
The major result of the project is the development of a new approach to gene recognition called spliced alignment algorithm. They have developed an algorithm and implemented a software tool (for both IBM PC and UNIX platforms) which explores all possible exon assemblies in polynomial time and finds the multi-exon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully performs exons assemblies even in the case of short exons or exons with unusual codon usage; they also report correct assemblies for the genes with more than 10 exons provided a homologous protein is already known. On a test sample of human genes with known mammalian relatives the average overlap between the predicted and the actual genes was 99%, which is remarkably well as compared to other existing methods. At that, the algorithm absolute correctly reconstructed 87% of genes. The rare discrepancies between the predicted and real axon-intron structures were restricted either to extremely short initial or terminal exons or proved to be results of alternative splicing. Moreover, the algorithm performs reasonably well with non-vertebrate and even prokaryote targets. The spliced alignment software PROCRUSTES has been in extensive use by the academic community since its announcement in August, 1996 via the WWW server (www-hto.usc.edu/software/procrustes) and by biotech companies via the in-house UNIX version