Multiple sequence alignment -- the gateway to further analysis | Zendy

Lisa Mullan | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Multiple sequence alignment -- the gateway to further analysis

Author(s) -

Lisa Mullan

Publication year - 2002

Publication title -

briefings in bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.204

H-Index - 113

eISSN - 1477-4054

pISSN - 1467-5463

DOI - 10.1093/bib/3.3.303

Subject(s) - gateway (web page) , sequence (biology) , computer science , sequence analysis , world wide web , biology , genetics , gene

Whether the ultimate aim is a phylogenetic analysis of several orthologues, the identification of a pattern for particular feature or motif, or the basis for structural modelling, multiple sequence alignments allow the researcher to gather more biological information than a single sequence can offer. Possibly the most popular method for comparing three or more sequences is the clustering algorithm used in applications such as the Clustal (ClustalW and ClustalX) series of programs. It is certainly by no means the only method of alignment, but will be used to illustrate this text. Initial clustering of sequence pairs reduces the computing time required to align multiple sequences and this can be achieved using one of two possible methods. Slow clustering is the more rigorous of the two options, but is noticeably much slower for approximately 20 or more sequences, or fewer, longer regions. It uses the dynamic programming method of Needleman–Wunsch to align each sequence with another according to a weight matrix and gap penalties. The ultimate aim of the computer program is to achieve the highest score possible, within the constraints the program has been placed under. Weight matrices have been developed using homologous sequences, and allocate a score to each residue or nucleotide base indicating the probability of it replacing a different residue or nucleotide base as a possible mutation. In the case of protein sequences, this has been done for all 20 amino acid residues, together with the three ambiguity codes (B 1⁄4 Asp and Asn, Z 1⁄4 Glu and Gln, X 1⁄4 any residue) using several different methods. Nucleotide matrices have also been developed, and in general indicate a positive score for an identical match, and no score, or a negative one for a mismatch. Because of its very nature, and the existence of only four common bases, more information for the alignment can be obtained by using protein sequences, and it often makes sense to translate regions of coding DNA into protein sequence before aligning them. Once a high score has been achieved for each of the sequence pairs in the alignment, they are clustered together in accordance with their relative scores, using the neighbour-joining method to link the closest pairings together, and less similar sequences more remotely. This information is stored as a series of numerical distances arranged by means of nested brackets in a dendrogram file. This file is in no way representative of evolutionary distances, and should not be presented as such. It merely represents the proximity of each sequence within a cluster, and each cluster to another and is used to form the final alignment. The information retained in the dendrogram file may be kept and used to align other multiple sequence sets. Larger sequence volumes may be compared using a faster method, in order to reduce computing time. This is based on the algorithm of Wilbur and Lipman and is quicker but less accurate than the dynamic programming methods of the slow comparison. It involves definition of

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research