On-line tools for sequence retrieval and multivariate statistics in molecular biology
Author(s) -
Guy Perrière,
Jean Thioulouse
Publication year - 1996
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/12.1.63
Subject(s) - genbank , computer science , sequence analysis , multivariate statistics , sequence (biology) , sequence database , multivariate analysis , biology , information retrieval , computational biology , gene , genetics , machine learning
We have developed a World-Wide Web server for browsing sequence collections structured under the ACNUC format and for performing multivariate analyses on sequences. General collections (like GenBank or EMBL), as well as specialized data banks (like Hovergen and NRSub) can be accessed. This system allows complex queries to be constructed, and the result of each query, represented by a list of sequences, is stored on the server. It is then possible to reuse this list to compute multivariate analyses on the sequences. Two examples of applications are shown. The first one consists in a study of codon usage with correspondence analysis on all the protein genes of Haemophilus influenzae Rd. This study allows the highly expressed genes and the integral membrane proteins of this organism to be identified. The second one consists in an ordering of 70 aligned protein sequences of growth hormone with principal coordinate analysis. With this method, we are able to re-establish the patterns of relationships between the sequences previously determined with tree building programs.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom