z-logo
open-access-imgOpen Access
Database Searches with Multiple Oligopeptides Containing Ambiguous Residues
Author(s) -
Michael H. Vodkin,
Robert J. Novak,
Gerald L. McLaughlin
Publication year - 1996
Publication title -
biotechniques
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.617
H-Index - 131
eISSN - 1940-9818
pISSN - 0736-6205
DOI - 10.2144/96216bc02
Subject(s) - library science , natural history , oligopeptide , computer science , biology , ecology , biochemistry , peptide
Several techniques in molecular biology frequently yield partial and ambiguous data on genes and gene products. For instance, N-terminal sequence analysis of oligopeptide cleavage products generates this type of sequence data. Typically, data generated from blotted or HPLC-resolved peptides consist of disconnected and unordered oligopeptides derived from N-terminal analysis of fragments resulting from complete or partial trypsin, chymotrypsin or CNBr digestion; such sequences are also “linked” if they were derived from the same isolated polypeptide, e.g., a band identified after sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDSPAGE) and blotting. To empirically identify the protein represented by such data, labor-intensive sequencing, with or without cloning, is frequently required. We became interested in defining a strategy to more reliably identify the source protein from existing sequence databases without an investment of additional laboratory experiments. Several algorithms are readily available to rapidly search the databases for proteins or nucleic acids that are identical to or related to a specified query sequence. One popular program is the basic local alignment search tool (BLAST) (Reference 1; see Availability). However, BLAST can search databases (e.g., SWISS-PROT) with only moderate sensitivity. At the National Institutes of Health (NIH) address (see Availability), BLAST can search the updated, nonredundant protein or nucleic acid databases. A disadvantage of BLAST is that very limited ambiguity is allowed at each position. For amino acids, “X” designates an unknown, “B” designates aspartate or asparagine, “Z” designates glutamate or glutamine and “-” designates a gap of indeterminate length. Table 1 shows an actual example of such data. When the individual oligopeptides listed in Table 1 were used to search the SWISS-PROT database, multiple related and unrelated sequences with similar or identical scores were retrieved. Even by comparing the individual lists for common, multiple hits, it was not possible to determine a unique candidate protein that was related to all or most of the oligopeptides. Another approach tested was to search with BLAST in pairwise or N-wise combinations of the oligopeptides, either as a continuous string of residues or as a broken string with hyphens designated as a discontinuity. (BLAST at the NIH supports the latter syntax; however, BLAST at some other addresses does not.) The first method created strings of characters that were not originally juxtaposed and thus did not allow the correct identification. The second method, when used in various pairwise combinations, still did not detect homologous proteins in the database. We therefore utilized an alternative search algorithm and show its utility for identifying a protein in the database when query sequences include several linked oligopeptide fragments with some ambiguous amino acid residues. FindPatterns, or Find (a subset of the GCG package; see Availability), was used for the peptide data set in Table 1. Find has more versatility for managing ambiguous residues and multiple, discontinuous oligopeptides. Each individual residue of the query oligopeptide can be specified as either unknown (X) or up to a 20-fold ambiguity (every amino acid candidate at a position is encoded, as in Table 1). The gap size between the unordered fragments can also be specified with a minimum

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom