Getting Started in Computational Mass Spectrometry–Based Proteomics
Author(s) -
Olga Vitek
Publication year - 2009
Publication title -
plos computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.628
H-Index - 182
eISSN - 1553-7358
pISSN - 1553-734X
DOI - 10.1371/journal.pcbi.1000366
Subject(s) - proteomics , computational biology , phosphorylation , protein function , function (biology) , biology , protein phosphorylation , quantitative proteomics , identification (biology) , posttranslational modification , mass spectrometry , protein–protein interaction , signal transduction , chemistry , biochemistry , microbiology and biotechnology , gene , protein kinase a , chromatography , enzyme , botany
Proteomics aims at a large-scale characterization of localization, abundance, post-translational modifications, and biomolecular interactions of the proteins in an organism, with the goal of understanding their function. An extensive insight can be obtained by identifying and quantifying the components of biological mixtures. For example, a) In studies of biomolecular networks, partners interacting with a protein can help determine its function. It is possible to experimentally isolate protein complexes, e.g., using tag affinity purification. Identification of the components of this mixture helps determine potential interactors [1]. b) Post-translational modifications such as phosphorylation play an important role in regulating biological processes, e.g., cellular growth and signaling. Identification and quantification of phosphorylated proteins and their substrates helps elucidate complex signaling pathway phosphorylation events [2]. c) Molecular biomarkers, i.e., proteins for which changes in abundance are indicative of an early onset of a disease or a therapy response, are of interest in clinical research. Identifying and quantifying components of a biofluid such as serum helps detect proteins with such discriminative ability [3]. d) A goal of genome annotation is the discovery and validation of protein-coding regions. Identifying peptides and proteins in a cell helps confirm and improve the annotations at the translational level, e.g., by confirming the presence of intron boundaries or alternative splicings [4]. Mass spectrometry is a method of choice for protein identification and quantification due to its sensitivity and to the versatility of the instrumentation [5],[6]. A typical “bottom-up” workflow experimentally digests the proteins into a mixture of peptides with an enzyme such as trypsin. This is necessary, in part, because the sensitivity of the mass spectrometer is much higher for peptides than for proteins. The peptides are then injected onto a liquid chromatography (LC) column from which they elute sequentially. The eluted peptides are ionized and separated by the mass spectrometer according to their ratio of mass to charge (m/z) in a mass spectrum (MS). The collection of mass spectra obtained at different elution times forms an LC-MS run shown in Figure 1A. Peaks in the run correspond to peptide ions; however, the sequence of amino acids underlying each peak is unknown. For identification, the mass spectrometer isolates the biological material from a peak (called precursor ion in this context), and subjects it to a high-collision energy. The energy breaks the peptide at different amide bonds, and the resulting fragments are separated according to their m/z in a secondary spectrum (called MS2, MS/MS, or tandem MS), shown in Figure 1B. Distances between peaks in the MS/MS spectrum are used to infer the peptide sequence of the parent LC-MS peak. Figure 1 Example of spectral data. Peak intensity is related to the abundances of peptides, and can be used for relative quantification. With the label-free approach, a separate LC-MS run is obtained for each biological sample, and peaks are quantified and compared across runs. In stable isotopic labeling workflow, samples from different groups are labeled metabolically (e.g., in SILAC, where stable isotopes are included in the growth medium of an organism), or chemically (e.g., in ICAT or iTRAQ, where reacting chemical labels are applied after tryptic digestion). Several samples (e.g., one from each group) are then mixed, and their peaks are identified and quantified within the same run. Finally, a targeted workflow based, for example, on selected reaction monitoring (SRM) [7], increases sensitivity and specificity by monitoring signals from a list of predefined peptides. The design of proteomic experiments, and subsequent analysis of the spectra, involves extensive computation and requires expertise at the intersection of computer science, engineering, and statistics. It presents exciting opportunities for both methodological and applied computational research.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom