z-logo
open-access-imgOpen Access
An analysis pipeline for the processing, annotation, and dissemination of Expressed Sequence Tags.
Author(s) -
Joseph Morris
Publication year - 2009
Language(s) - English
Resource type - Dissertations/theses
DOI - 10.18297/etd/1009
Subject(s) - expressed sequence tag , pipeline (software) , annotation , sequence (biology) , contig , sequence analysis , unigene , gene annotation , sequence database , computer science , set (abstract data type) , biology , data mining , computational biology , genome , information retrieval , bioinformatics , gene , genetics , programming language
Due to the complex nature of interactions at the genomic level as well as the large number of proteins present in an organism, understanding the functions of various genes that are expressed is essential. Creating an analysis pipeline for Expressed Sequence Tags (ESTs) is one way to accomplish this, allowing a researcher to quickly take a set of sequences, perform all necessary analysis operations, and publish the data in a database with a graphical user interface (GUI). This pipeline falls into several steps. First, the data must be preprocessed to remove any extraneous sequence data, low-complexity regions, and regions that repeat throughout the genome. Next, it is necessary to combine a large number of ESTs into larger sequences that better describe the underlying mRNA. After larger contiguous sequences have been constructed, putative functions can be assigned to each sequence, whether part of a larger grouping or a singleton. An application of this pipeline using 3906 ESTs generated from trichome tissue of Pelargonium xhotorum (commonly, the geranium plant) resulted in 425 contiguous sequences using the CAP3 program. These sequences, along with the 2208 sequences that are not a part of a contig, were then BLASTed against the non-redundant protein database to assign putative functions to each sequence. Finally, BLAST2GO was run on these BLAST results in order to assign a GO (Gene Ontology) to each sequence. These annotations were then added to the database for later investigation by researchers. In order to aid researchers in the further analysis of the annotated sequences, a mySQL database was used for data storage and a GUI was developed using Java and Java v Server Pages. In addition, an applet for viewing the Sanger trace files for each sequence is included to further aid the researcher in determining the validity of the data. vi ACKNOWLEDGMENTS First, I would like to thank my parents, James and Mary Morris, as well as the rest of my family, for supporting me through the time it took for me to complete this project. It was only with their encouragement and support that I was able to accomplish everything that I have accomplished. for ther time, advice, and guidance. I would especially like to thank Dr. Eric Rouchka for his patience and invaluable advice that helped to make this a reality. and everyone else that has given me advice along the way. I would also like to thank Steven Yelton …

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom