Premium
PseqIP: A nonredundant and exhaustive protein sequence data bank generated from 4 major existing collections
Author(s) -
Claverie Jean Michel,
Bricault Laurence
Publication year - 1986
Publication title -
proteins: structure, function, and bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.699
H-Index - 191
eISSN - 1097-0134
pISSN - 0887-3585
DOI - 10.1002/prot.340010110
Subject(s) - ascii , data bank , protein data bank , computer science , sequence (biology) , protein sequencing , data file , sequence database , data mining , computation , sequence alignment , algorithm , sequence analysis , peptide sequence , protein structure , biology , database , genetics , programming language , telecommunications , biochemistry , gene
Four major protein sequence data collections (NBRF‐PIR, PSD‐Kyoto, PGtrans, and NEWAT) have been merged into a single nonredundant data bank called PseqIP. The data bank entries were automatically matched by a heuristic computer program relying on the fast computation of the number of tetrapeptides shared by two sequences. PseqIP 1.0 includes 6,068 different protein sequences for a total of 1,357,067 residues, representing most of the available sequence information to date. During the course of this work, we found about 600 occurrences course of a protein sequence recorded with a one‐amino‐acid variation in at least two different data banks. A flat file (ASCII computer‐readable format) version of PseqIP 1.0, well‐suited for exhaustive homology searches and statistical sequence analysis, is available from our laboratory.