Multalin help page


Table of contents :

Introduction

How to use MultAlin

Input formats

Output format

Other parameters


Introduction

Welcome to Multalin!
This software will allow you to align simultaneously several biological sequences.

What is a Multiple sequence alignment? It is the arrangement of several protein or nucleic acid sequences with postulated gaps so that similar residues are juxtaposed. A positive score is attached to identities, conservative or non-conservative substitutions (the score amplitude measuring the similarity) and a penalty to gaps; an ideal program would maximize the total score, taking account of all possible alignments and allowing for any length gap at any position.

Unfortunately the computing requirements, both of time and memory, grow as the nth power, where n is the sequence number, so this ideal alignment can be found only for two sequences or three short sequences. In the general case, to be practicable programs must restrict the conditions of the optimization. Nevertheless it is undeniably useful to have an automatic system available for multiple sequence alignment to provide a starting point for a more human analysis.

Multalin creates a multiple sequence alignment from a group of related sequences using progressive pairwise alignments. The method used is described in "Multiple sequence alignment with hierarchical clustering", F.Corpet, 1988, Nucl. Acids Res. 16 10881-10890.


How to use MultAlin

Warning : No computer skills are required to use MultAlin, only www.ogical knowledge !

On the MultAlin home page you will see a large rectangle. This is where you are going to paste (as in cut and paste) your sequences (try a sample sequence the first time).

The next step is to set the parameters. These are only of www.ogical difficulty but you will be able to find help by clicking on the associated question mark. Simply use the pop up menus or type in text or numbers where required. When you are ready click on the "submit data" button (you can use either the buttons at top or at bottom of the page.

Now you will have to wait for our server to calculate.(this can take up to a few hours for very large sequences) The result will be sent back to your internet browser in the form of a GIF image. You will be able to change the colours, font size, line size etc. and even the consensus levels.

The procedure is the same as for the MultAlin setup, just use the pop up menus and type in text or numbers where required. When ready click on the "Apply Changes" button. The new image will appear shortly after. (only the image is changed, no realignment is done)


Input formats

MultAlin-Fasta - GenBank - EMBL - SwissProt

MultAlin-Fasta

The MultAlin format is similar to Fasta.

	> SeqName the sequence name is the
        > first word of the first comment line 
        > max: 8 letters 
        > comment lines begin with >
        AAAACCGTTAAA...
        > SeqNam2 the 2nd sequence beginning  
        > shows the end of the first one 
        AAACCTGGAC...

GenBank


	LOCUS      SeqName  
        any lines  
        ORIGIN     anything              
        1 aggtcccttt tgtgttgttt
The sequence name is the first word after the LOCUS key-word. The sequence begins on the line following the ORIGIN key-word. The next sequence information begins with the LOCUS key-word.

EMBL - SwissProt


	ID   SeqName  
        any lines 
        SQ   anything  
        aauccagug gagaucaaag          
        any sequence lines  
        //
The sequence name is the first word after the ID key-word. The sequence begins on the line following the SQ key-word. The next sequence information begins on the line following //


Output format

Option: Graphical results Yes (default)

The GIF image that you will see is configurable. You can change the colours of comment text, font size, background colour, high and low consensus colours and the neutral colour. You can also adjust the consensus levels.

Just underneath you will be able to see the aligned sequence file, the input sequence file, the cluster file, the results in text format and the results in text format with colour indications. Any of these files can be saved to your local disk, simply using your WWW browser. To translate the colour indications to true colours, you can use Microsoft Word and the MultAlin macro (FTP multalin.dot and save to disk) as follow:

Open your .doc file with Microsoft Word (File/Open)
Change the templates (File/Models..., Link..., search the disk to 
 select multalin.dot, Open)
Run MultAlin Macro (Tools/Macro..., select MultAlin, Run)

You can also add MultAlin macro to your current model (Normal.dot):
Tools/Macro..., Organizer, Close File then Open File (on the same 
 button), search the disk to select multalin.dot, Open, select MultAlin,
Copy >> into Normal.dot, Close

Option: Graphical results No

If you choose "Graphical results: no", you will not get the GIF image but only the text output file.


Symbol comparison table-Gap penalties-Gap penalty at extremities-One iteration only

Symbol comparison table

Blosum62 symbol comparison table

S. Henikoff and J.G. Henikoff, Amino acid substitution matrices from protein blocks, 1992, P.N.A.S. USA 89, 10915-10919.

This table is the original Blosum62 with a value of 4 added to each entry for it to be non-negative.


Dayhoff symbol comparison table

M.O. Dayfoff, R.M. Schwartz and B.C. Orcutt, Atlas of Protein and Sequence Structure, Ed M.O. Dayhoff, National Biomedical Research Foundation (Washington D.C. 1979).

This table is Dayhoff's PAM250 with a value of 8 added to each entry for it to be non-negative.


Genetiq symbol comparison table

Each value is the maximum number of common bases in the corresponding amino acid codon.


Risler symbol comparison table

J.L. Risler, M.O Delorme, H. Delacroix, A.Henaut, Journal of Molecular Biology, 204, 1019, 1988.


DNA symbol comparison table

This table scores a match for any overlap between any IUB (International Union of Biochemits) nucleic acid ambiquity symbols, execept X/N, as follows :
A or C = M; A or G = R; A or T = W; C or G = S; C or T = Y; G or T =K; A or C or G = V; A or C or T = H; A or G or T =D; C or G or T = B; A or C or G or T = X or N;
These codes are compatible with the codes used by the EMBL, GenBank and PIR data libraries and by the GCG package.


Alternate DNA symbol comparison table
This table scores :

8 for a match
6 for a match with two base ambiguity symbol
4 for a match with a three base ambiguity symbol
3 for a match with a four base ambiguity symbol

where the ambiguity symbols are :

A or C = M; A or G = R; A or T = W; C or G = S; C or T = Y; G or T =K; A or C or G = V; A or C or T = H; A or G or T =D; C or G or T = B; A or C or G or T = X or N;
These codes are compatible with the codes used by the EMBL, GenBank and PIR data libraries and by the GCG package.


Identity symbol comparison table

This table scores 1 for a match and 0 for a mismatch between any two letters.


Gap penalties

This penalty is subtracted to the alignment score of 2 clusters each time a new gap is inserted in one cluster. This penalty is length dependent: it is the sum of "penalty at gap opening" and of "penalty at gap extension" times the gap length; both values must be non negative; their maximum value is 255.

The similarity score is equal to the sum of the values of the matches (each match scored with the scoring table) less the gap penalties. The gap penalty is charged for every internal gap. By default, no penalty is charged for terminal gaps.

An optimal alignment is one with the maximum possible score. It is sensitive to the symbol comparison values and to the gap penalties.

Gap penalty at extremities

By default no penalty is charged for terminal gap. The user can change that for particular alignments where terminal gaps must be considered as the internalones. Choose "beginning" to charge a gap at the sequence beginning, "end" tocharge one at the end and "both" to charge all terminal gaps.

One iteration only

With this option, final alignment can be obtained more quickly, but it may not be the best possible alignment.


Florence Corpet MultAlin's author.
(Comments and suggestions very welcome)

Tim Downs was the original WWW interface programmer.