An analysis pipeline for the processing, annotation, and dissemination of Expressed Sequence Tags. | Zendy

Joseph Morris | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

An analysis pipeline for the processing, annotation, and dissemination of Expressed Sequence Tags.

Author(s) -

Joseph Morris

Publication year - 2009

Language(s) - English

Resource type - Dissertations/theses

DOI - 10.18297/etd/1009

Subject(s) - expressed sequence tag , pipeline (software) , annotation , sequence (biology) , contig , sequence analysis , unigene , gene annotation , sequence database , computer science , set (abstract data type) , biology , data mining , computational biology , genome , information retrieval , bioinformatics , gene , genetics , programming language

Due to the complex nature of interactions at the genomic level as well as the large number of proteins present in an organism, understanding the functions of various genes that are expressed is essential. Creating an analysis pipeline for Expressed Sequence Tags (ESTs) is one way to accomplish this, allowing a researcher to quickly take a set of sequences, perform all necessary analysis operations, and publish the data in a database with a graphical user interface (GUI). This pipeline falls into several steps. First, the data must be preprocessed to remove any extraneous sequence data, low-complexity regions, and regions that repeat throughout the genome. Next, it is necessary to combine a large number of ESTs into larger sequences that better describe the underlying mRNA. After larger contiguous sequences have been constructed, putative functions can be assigned to each sequence, whether part of a larger grouping or a singleton. An application of this pipeline using 3906 ESTs generated from trichome tissue of Pelargonium xhotorum (commonly, the geranium plant) resulted in 425 contiguous sequences using the CAP3 program. These sequences, along with the 2208 sequences that are not a part of a contig, were then BLASTed against the non-redundant protein database to assign putative functions to each sequence. Finally, BLAST2GO was run on these BLAST results in order to assign a GO (Gene Ontology) to each sequence. These annotations were then added to the database for later investigation by researchers. In order to aid researchers in the further analysis of the annotated sequences, a mySQL database was used for data storage and a GUI was developed using Java and Java v Server Pages. In addition, an applet for viewing the Sanger trace files for each sequence is included to further aid the researcher in determining the validity of the data. vi ACKNOWLEDGMENTS First, I would like to thank my parents, James and Mary Morris, as well as the rest of my family, for supporting me through the time it took for me to complete this project. It was only with their encouragement and support that I was able to accomplish everything that I have accomplished. for ther time, advice, and guidance. I would especially like to thank Dr. Eric Rouchka for his patience and invaluable advice that helped to make this a reality. and everyone else that has given me advice along the way. I would also like to thank Steven Yelton …

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research