Premium
Interrogating the human genome using uninterpreted mass spectrometry data
Author(s) -
Choudhary Jyoti S.,
Blackstock Walter P.,
Creasy David M.,
Cottrell John S.
Publication year - 2001
Publication title -
proteomics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.26
H-Index - 167
eISSN - 1615-9861
pISSN - 1615-9853
DOI - 10.1002/1615-9861(200104)1:5<651::aid-prot651>3.0.co;2-n
Subject(s) - unigene , human genome , genome , computational biology , genome project , computer science , coding (social sciences) , biology , data mining , genetics , gene , expressed sequence tag , mathematics , statistics
The public availability of a draft assembly of the human genome has enabled us to demonstrate, for the first time, the feasibility of searching a complete, unmasked eukaryotic genome using uninterpreted mass spectrometry data. A complex LC‐MS/MS data set, containing peptides from at least 22 human proteins, was searched against a comprehensive, nonidentical protein database, an expressed sequence tag (EST) database, and the International Human Genome Project draft assembly of the human genome. The results from the three searches are compared in detail, and the merits of the different databases for this application are discussed. In the case of the EST database, the UniGene index provided a method of simplifying and summarising the search results. In the case of the genomic DNA, the presence of introns prevented matching of roughly one quarter of the spectra, but the technique can provide primary experimental verification of predicted coding sequences, and has the potential to identify novel coding sequences.