MView: Frequently Asked Questions

What is MView?
MView is a tool for converting the results of a sequence database search into the form of a coloured multiple alignment of hits stacked against the query. Alternatively, an existing multiple alignment can be processed. In either case, the output is simply HTML, so the result is platform independent and does not require a separate application or applet to be loaded. MView is NOT a multiple alignment program, nor is it a general purpose alignment editor.

If you use MView in your work, please cite:

Brown, N.P., Leroy C., Sander C. (1998). MView: A Web compatible database search or multiple alignment viewer. Bioinformatics. 14(4):380-381. HTML MEDLINE

Examples
Some examples illustrate use of various command line options.

The EBI operates a web service for BLAST2 and FASTA3, which uses MView, as do GeneQuiz and the GeneQuiz sequence submission services.

What input formats are recognised?
The code has been tested for the following formats and versions for protein and nucleotide sequences:

BLAST (NCBI series 2.0)
format tested versions status MView option
blastp 2.0.4, 2.0.5 ok -in blast
blastn 2.0.4, 2.0.5 ok -in blast
blastx 2.0.5 ok -in blast
tblastn 2.0.5 ok -in blast
tblastx 2.0.5 ok -in blast
psi-blast 2.0.2, 2.0.4, 2.0.5, 2.0.6 ok -in blast
BLAST (NCBI series 1.4)
format tested versions status MView option
blastp 1.4.7, 1.4.9 ok -in blast
blastn 1.4.9 ok -in blast
blastx 1.4.9 ok -in blast
tblastn 1.4.9 ok -in blast
tblastx 1.4.9 ok -in blast
BLAST (WashU series 2.0)
format tested versions status MView option
blastp 2.0a13, 2.0a19 ok -in blast
blastn 2.0a19 ok -in blast
blastx 2.0a19 ok -in blast
tblastn 2.0a19 ok -in blast
tblastx 2.0a19 ok -in blast
FASTA (series 3)
format tested versions status MView option
fasta3 3.0t76 ok -in fasta
tfastx3 3.0t82, 3.1t07 ok -in fasta
FASTA (series 2)
format tested versions status MView option
fasta 2.0u ok -in fasta
tfastx 2.0u63 ok -in fasta
FASTA (series 1)
format tested versions status MView option
fasta 1.6c24 ok -in fasta
multiple alignment formats
format versions status MView option
plain  - ok -in plain
Pearson/FASTA  - ok -in pearson
PIR  - ok -in pir
MSF  - ok -in msf
CLUSTAL W 1.60, 1.70 ok -in clustal
MaxHom/HSSP 1.0 1991 ok -in hssp
MULTAS/MULTAL  - experimental-in multas

The "plain" multiple alignment format is a trivial format comprising a column of identifiers and an adjacent column of aligned sequences. If you can convert some strange alignment to this you can always read it into MView. More formats can be expected to follow.

Note that as of version 1.37 MView automatically selects an appropriate parser for the particular BLAST or FASTA program/version once it knows it is dealing with input from either program suite.

What output formats are there?
The default behaviour is to (re)produce a multiple alignment from the input data as plain ASCII. HTML markup will be added if any of the HTML-specific or colouring options are set. Other useful output formats are dumps of the input in Pearson/FASTA format (-out pearson), PIR format (-out pir), or MSF format (-out msf) for processing by another program , or as an RDB table for storage/manipulation in relational database form (-out rdb).

Historically, the default mode can also be explicitly selected (-out new) in contrast to an obsolete mode (-out old) which interdigitates extra rows into the alignment containing just the sequence identities as in a sequence comparison.

Can MView process data from a Web page?
Basically, no, unless you are lucky or prepared to edit the Web page. The MView parsers are all built to recognise the raw text output produced by the respective programs (BLAST, FASTA, etc.) or to recognise particular flat-file formats (MSF, PIR, etc.). When a site adds HTML markup to this to make a Web page, arbitrary parts are changed/deleted/added polluting the text, so that even dumping the page in text-only format still leaves traces. If you feel that MView is sufficiently useful, you could ask the Web site maintainer to add an MView output option to their service as has been done for the EBI BLAST2 and FASTA3 services. Afterall, the program is stable, free, portable, and intended to be used as widely as possible.

Command line options
Probably the first place to start is to invoke MView with:
    mview -help

There are a lot of options, but the commonest ones are detailed here. The basic action of the program is to generate a plain text dump of the input data with percent sequence identities computed with respect to the first sequence in the output.

There are more command line options that I haven't documented below - some were added for locally used features. Expect changes and new options as the software evolves.

Basic usage
Given an existing alignment in a file "data" in "plain" format, the minimal use might be:
    mview -in plain data > data.out
Or you might attach MView on the end of a pipeline:
    some_process | mview -in plain > data.out

To change the input format to scan a FASTA run, also in "data", use:

    mview -in fasta data > data.out

Basic usage - adding HTML [update]
To add some HTML markup a few extra options are needed, for example:
    mview -in fasta -html body data > data.html

produces a page of HTML wrapped inside <BODY> </BODY> tags with a coloured background, and you can load this into your Web browser with a URL like "file://your_path/data.html".

If you want a complete Web page, you can use -html full (gives MIME-type, <HTML>, <BODY> tags) or -html head (gives <HTML>, <BODY> tags).

To get just the alignment block without these tags use -html data.

Adding some colour is simple. To colour all the residues:

    mview -in fasta -html head -coloring any data > data.html

and this looks better in my Netscape if the residues are emboldened, so

    mview -in fasta -html head -coloring any -bold data > data.html

Now try colouring by identity to the first sequence:

    mview -in fasta -html head -coloring identity -bold data > data.html

and then make the non-identical residues and gaps grey, instead of black:

    mview -in fasta -html head -coloring identity -bold -symcolor gray 
               -gapcolor gray data > data.html

Now try using an internal style sheet to get blocked colouring. The -bold option is no longer needed:

    mview -in fasta -html head -css on -coloring identity -symcolor gray 
               -gapcolor gray data > data.html

The -in option isn't always necessary. If the filename extension, or the filename itself minus any directory path begins with or contains the first few letters of the valid -in options (eg., mydata.msf or mydata.fasta or tfastx_run1.dat), MView tries to choose a sensible input format, allowing multiple files in mixed formats to be supplied on the command line. The -in option will always override this mechanism but requires that all input files be of the same format.

Rulers
Add a ruler along the top, with -ruler on. Only one kind of ruler is currently provided, numbering the columns of the final alignment from M to N (incrementing) or N to M (decrementing) based on the input sequence numbering, if any. This defaults to 1 to the length of the alignment for multiple alignments. TBLASTX rulers differ slightly in that the native query numbering is given in nucleotide units, but MView reports amino acid units instead (using modulo 3 arithmetic).

Alignment colouring modes
There are several ways to colour the alignment:

-coloring any, will colour every residue according to the currently selected palette.

-coloring identity, will colour only those residues that are identical to some reference sequence (usually the query or first row).

-coloring consensus, will colour only those residues that belong to a specified physicochemical class that is conserved in at least a specified percentage of all rows for a given column. This defaults to 70% and and may be set to another threshold, eg., -coloring consensus -threshold 80 would specify 80%. Note that the physicochemical classes in question can be confined to individual residues.

-coloring group, is like -coloring consensus, but colours residues by the colour of the class to which they belong.

By default, the consensus computation counts gap characters, so that sections of the alignment may be uncolored where the presence of gaps prevents the non-gap count from reaching the threshold. Setting -con_gaps off prevents this, allowing sequence-only based consensus thresholding.

The default palette assumes the input alignment is of protein sequences and sets their colours according to amino acid physicochemical properties: another palette should be selected for DNA or RNA alignments.

Consensus colouring is complicated and some understanding of palettes and consensus patterns is required first before trying to explain alignment consensus colouring.

Colour palettes [update]
Palettes have (arbitrary) names, eg., MView assumes a protein alignment and defaults to the palette P1 for proteins or D1 for nucleotides. To change default molecule type use -dna. Different palettes are explicitly selected using the -colormap option. For example, to select one of the built-in palettes for viewing nucleotide sequences, use -colormap D1.

There are default palettes for protein and nucleotide sequences. The latter can be selected with the -dna option.

The built-in palettes can be listed from the command line with -listcolors, and new colour schemes can be loaded from a file using -colorfile in exactly the same format as produced by -listcolors. Palette names are case-insensitive, while symbols to be coloured are case-sensitive. Lines can contain comments beginning with a hash '#' character. Colours are specified as hexadecimal RGB codes prefixed with hash '#', exactly as used in HTML markup (named colours may not be supported equally by all browsers). Here are the default palettes:


#symbol ->/=> colour (RGB hex or colorname)     #comment

#amino acids
[P1]
Gg   =>   #33cc00  #bright green      
Aa   =>   #33cc00  #bright green      
Ii   =>   #33cc00  #bright green      
Vv   =>   #33cc00  #bright green      
Ll   =>   #33cc00  #bright green      
Mm   =>   #33cc00  #bright green      
Ff   =>   #009900  #dark green        
Yy   =>   #009900  #dark green        
Ww   =>   #009900  #dark green        
Hh   =>   #009900  #dark green        
Cc   =>   #ffff00  #yellow            
Pp   =>   #33cc00  #bright green      
Kk   =>   #cc0000  #bright red        
Rr   =>   #cc0000  #bright red        
Dd   =>   #0033ff  #bright blue       
Ee   =>   #0033ff  #bright blue       
Qq   =>   #6600cc  #purple            
Nn   =>   #6600cc  #purple            
Ss   =>   #0099ff  #dull blue         
Tt   =>   #0099ff  #dull blue         
Bb   =>   #666666  #dark grey (D or N)
Zz   =>   #666666  #dark grey (E or Q)
Xx   =>   #666666  #dark grey         
?    =>   #999999  #light grey        
*    =>   #666666  #dark grey

#DNA/RNA
#symbol ->/=> colour (RGB hex or colorname)     #comment
[D1]
Aa   =>   #0033ff  #bright blue
Gg   =>   #0033ff  #bright blue
Tt   =>   #0099ff  #dull blue
Cc   =>   #0099ff  #dull blue
Uu   =>   #0099ff  #dull blue
Mm   =>   #666666  #dark grey (A or C)  
Rr   =>   #666666  #dark grey (A or G)
Ww   =>   #666666  #dark grey (A or T)
Ss   =>   #666666  #dark grey (C or G)
Yy   =>   #666666  #dark grey (C or T)
Kk   =>   #666666  #dark grey (G or T)
Vv   =>   #666666  #dark grey (A or C or G; not T)
Hh   =>   #666666  #dark grey (A or C or T; not G)
Dd   =>   #666666  #dark grey (A or G or T; not C)
Bb   =>   #666666  #dark grey (C or G or T; not A)
Nn   =>   #666666  #dark grey (A or C or G or T)
Xx   =>   #666666  #dark grey
?    =>   #999999  #light grey
*    =>   #666666  #dark grey
In these examples, both lower and uppercase versions of each residue are given with their associated colour to ensure that either case is coloured the same.

The arrow separating the symbols from the colour codes can be double => or single ->. When style sheets have been selected -css on, a double arrow means that the colour should be applied to the background of the symbol while a single arrow means that only the letter should be coloured. When Style Sheets are off, only letters can be coloured anyway and the arrows are equivalent.

Consensus patterns [update]
A block of consensus lines can be added beneath the alignment using -consensus on. By default, this adds 4 extra lines giving consensus patterns computed at thresholds of 100,90,80,70%.

Consensus patterns are based on residue equivalence classes, that is, sets of residues that share some physicochemical property. There are two default consensus group definitions for protein P1 and nucleotide D1 alignments, the latter being selected with the -dna option.

At a given percentage threshold, the most discriminating equivalence class is chosen to represent the residues in a given column and an associated symbol is displayed. For example, the default protein and nucleotide consensus groups define the following symbols and equivalence class mappings:


#description     =>  symbol  members
[P1]
*                =>  .     
A                =>  A       { A }
C                =>  C       { C }
D                =>  D       { D }
E                =>  E       { E }
F                =>  F       { F }
G                =>  G       { G }
H                =>  H       { H }
I                =>  I       { I }
K                =>  K       { K }
L                =>  L       { L }
M                =>  M       { M }
N                =>  N       { N }
P                =>  P       { P }
Q                =>  Q       { Q }
R                =>  R       { R }
S                =>  S       { S }
T                =>  T       { T }
V                =>  V       { V }
W                =>  W       { W }
Y                =>  Y       { Y }
alcohol          =>  o       { S, T }
aliphatic        =>  l       { I, L, V }
aromatic         =>  a       { F, H, W, Y }
charged          =>  c       { D, E, H, K, R }
hydrophobic      =>  h       { A, C, F, G, H, I, K, L, M, R, T, V, W, Y }
negative         =>  -       { D, E }
polar            =>  p       { C, D, E, H, K, N, Q, R, S, T }
positive         =>  +       { H, K, R }
small            =>  s       { A, C, D, G, N, P, S, T, V }
tiny             =>  u       { A, G, S }
turnlike         =>  t       { A, C, D, E, G, H, K, N, Q, R, S, T }

#description     =>  symbol  members
[D1]
A
G
C
T
U
purine      =>   r   { A, G }
pyrimidine  =>   y   { C, T, U }
Alternative equivalence classes can be selected using -con_groupmap, the available list of built-ins can be seen with -listgroups, and new groups can be defined in the same format and read in from a file using -groupfile.

Alternative thresholds to be displayed can be specified as a comma-separated list using the -con_threshold option.

Tip: A useful capability is to control whether only consensus properties (-con_ignore singleton) or just the conserved residues themselves (-con_ignore class) are displayed in consensus lines. The default is to show both using whichever equivalence class is most specific.

By default, the consensus computation counts gap characters, so that sections of the alignment may have gaps as the consensus. Setting -con_gaps off prevents this, producing consensi based only on sequence.

You can specify a colour scheme for the consensus lines using -con_coloring and -con_colormap to change the default palette (PC1 for protein or DC1 for nucleotide). These options are analogous to those for controlling the alignment colouring and follow the same naming scheme.

Alignment consensus colouring [update]
This section assumes an understanding of palettes and consensus patterns.

Colouring of an alignment by consensus determines which residues to colour and the colours to use based on (1) the consensus threshold chosen for the colouring operation (covered in the section on alignment colouring modes), (2) a consideration of the common physicochemical properties of the residues in that column, and (3) the chosen colour scheme:

Given the most specific equivalence class describing the column using the prevailing consensus equivalence classes, any residues in the column belonging to that class will be coloured using the prevailing palette.

In practice, for the default situation of a protein alignment and no special selection of palettes or consensus groups from the command line, then the P1 (D1) equivalence classes and the P1 (D1) colour palette will be used (option -dna).

Tip: If you want to see only the conserved residues above the threshold (ie., only one type of conserved residue per column), add the option -ignore class.

Alternative consensus classes and palettes can be specified using -groupmap and -colormap. Note that these are distinct from any settings used to control displayed consensus lines, although the option naming is similar.

Sequence numbering or ranking
One can colour and compute identities with respect to a sequence other than the first/query sequence using the -reference option. This takes either the sequence identifier or an integer argument corresponding to the ranking or ordering of a sequence. For multiple alignment input formats, sequences are numbered from 1, while for searches the hits are numbered from 1, but the query itself is 0, so beware.

Labelling and pagination
The labelling information can be too broad, so you can switch some off. Labels at the left of the alignment are in blocks numbered from zero (0) rank, (1) identifier, (2) description, (3) score block, (4) percent identities. Each of these can be switched off with an option like -label2 to remove descriptions.

The default layout is a single unbroken horizontal band of alignment - fine if scrolling inside Netscape. However, you may prefer to break the alignment into vertically stacked panes. For panes, for example, 80 columns wide, set -width 80. Widths refer to the alignment, not to the descriptor information at left.

It is possible to narrow (or expand!) the displayed sequence range, for example, -range 10:78 would select only that column range of the alignment using the numbering scheme reported when -ruler on is set (see Rulers). The order of the numbers is unimportant making it simpler to state interest in a region of the alignment that might actually be reversed in the output (eg., a BLASTN search hit matching the reverse complement of the query strand).

Filtering alignments
Usually, specifying a limited number of hits to view from a long search alignment speeds things up a lot as there's less parsing and less formatting to be generated, so to get the best 10 hits, use the option -top 10.

You also can squeeze more out of a deep alignment and get a less biased view if a threshold on the pairwise sequence identity is set using -maxident N, where N is some value between 0 and 100.

Other filters specific to BLASTP, FASTA, etc., input formats allow cutoffs on scores or p-values, etc. In particular, it is possible to apply some control over the selection of HSPs used in building the MView alignment using the -hsp filtering option.

Of interest to anyone using PSI-BLAST, you can display alignments for any/all iterations of a psi-blast run using, say:

    mview -in blast -cycle 1,5,10,20 mydata  
to get just those iterations. The default is to display only the last iteration. If you want all output, use -cycle '*'.

If you want to apply filtering, yet wish to force some sequence or sequences to remain, you can do this with the -keep option, which requires a comma separated list of identifiers or numbers or number ranges (see above for an explanation of sequence rank numbers) as its argument.

Similarly, if having examined the output, you wish to discard selected rows, use the -disc option with a comma separated list of row numbers and number ranges (eg., -disc 6,7,30-35,42).

The -disc option overrides -keep whenever a row id or number is common to both. The reference row is always kept, so if you explicitly attempt to discard it, you must select an alternative row to act as the reference sequence.

Another control option can be used to prevent MView from using rows for colouring or for calculation of percent identities although these rows will still be displayed. Use -nop to specify a list (comma separated as usual) of id's or row numbers to flag for 'no processing'. This is useful for displaying non-alignment data (eg., secondary structure predictions) alongside the alignment.

SRS (Sequence Retrieval System) links [update]
If HTML markup is produced it is possible to embed SRS links in sequence identifiers if they conform to the following patterns:
    database|accession|identifier 
    database:identifier
as produced by some BLAST and FASTA servers. Such links will be to the EBI and EMBL SRS services and will only be constructed if the database names are listed in the SRS.pm library with this software. This library can be modified for your site if you know some Perl and a little SRS syntax.

Using Cascading Style Sheets

Release 1.40 added cascading style sheets allowing more specific control of HTML elements. In particular, this enables selective colouring of text fore/backgrounds allowing alignments to use coloured blocks instead of just coloured lettering.

This is enabled with the -css on option in combination with the -html option to switch HTML processing on generally. It is disabled with -css off. You can refer to an external style file with -css URL where the URL give a valid path for the Web server to find the file (ie., file:/some/path or http://server/path).

Having loaded your own colour schemes into MView with the -colorfile option, you can dump these as a style file with -html css which just dumps the style sheet to standard output for redirection to a file.

Controlling coloured fore/backgrounds for alignment lettering is handled in the colour scheme definition mechanism.

How can I print?
With difficulty from Netscape. First my UNIX Netscape (4.03) won't produce colour postscript, but a Mac version does. On the Mac one can zoom and preview the image. To produce something that fits on A4 one must set -width 60 or similar and turn off some of the leading text, eg., -label2 -label3.

What do the percent identities mean?
Percent identities reported in each alignment row are calculated with respect to the reference sequence (usually the query or first row), as follows:
            number of identical residues
           ------------------------------  x 100
            length of ungapped reference
            sequence over aligned region
Still, in the case of BLAST MView output, minor deviations from the percentages reported by BLAST are due to 1) different rounding, and 2) the way MView assembles a single pseudo-sequence for a hit composed of multiple HSPs, giving an averaged percent identity.

Can I switch off HTML markup?
Yes - the program defaults to plain ASCII output unless the -html option is set. Also, selecting the relational RDB output format will switch off HTML. [update]

Some sequences are incomplete or look corrupted?
There are three kinds of input to MView: (i) a preformatted multiple alignment, (ii) an ungapped search (eg., BLAST 1.4.x), or (iii) a gapped search (eg., FASTA, BLAST2). Multiple alignments require minimal parsing and are subjected only to formatting stages. Searches are processed according to whether they are are ungapped or gapped, and this can lead to some apparent inconsistencies in the output (discussed below).

Why is the BLAST query sequence incomplete?
The query sequence is recovered from the input search results. If, like in BLAST, portions of the query were unmatched, they will not appear on the output. Nevertheless MView will pad the missing sequence with 'X' characters based on the numeric match ranges - the worst that can happen then is that the trailing end (ie., the C-terminus) of the alignment is missing. Occasionally, you may see a '?' character - this means that a non-standard residue was seen on input.

How are overlapping BLAST HSPs processed?
Ungapped input is processed to produce a stack of hit sequence strings aligned against a contiguous query sequence. The query sequence acts as a template for each hit sequence onto which hit fragments are overlayed in the query positions.

In outline the default method of processing of HSPs is as follows:

For BLAST (series 1), as of MView version 1.37, only the HSPs contributing to the ranked hit contribute to this overlay process. A sorting scheme ensures that the best of these fragments are overlayed last and are not obscured by weaker ones, for example, BLAST hits are sorted by score and length. Differences of ordering of fragments along query and hit naturally result in a patchwork that may not correspond exactly to the real hit sequences. Nevertheless, the resulting alignment stack is very informative, and the user can always run and view a gapped search if that is preferred.

For BLAST (series 2), only the single gapped hit reported in the ranking is used and the patchwork problem does not arise.

More detailed descriptions of the rules for HSP selection and tiling are available. Some control over the choice of HSPs is available through the -hsp mode option described therein which allows a) only ranked HSPs (the default) to be tiled; b) all HSPs to be tiled, or c) all HSPs to be extracted separately.

What happens to sequence gaps?
Gapped input (eg., FASTA) is stacked in a slightly different way. The query sequence again acts as a template, but gapped regions introduced into the query are excised from both query and hit to ensure a contiguous query string. In the affected hit sequence, the position of the excision is marked by lowercasing the boundary symbols. Again, the stacked alignments produced are very informative since only unmatched regions of hits are lost from the display.

System requirements
MView and its underlying class libraries are implemented in Perl, version 5, for UNIX, and should be easily portable to other systems.

As of MView release 1.40, the code requires a minimum of perl version 5.004 and has been tested with 5.004_03/04 and 5.005_02. However, if you only have perl 5.003, you can run older versions of MView, also available from the ftp site.

Formatting and colouring of HTML alignments requires a fixed-width font (eg., Courier) and support for the <FONT> tag, so a recent version of a browser such as Netscape is recommended. In particular, use of style sheets as of MView release 1.40 requires that your browser supports HTML 4.0.

Obtaining the software
The latest version of the software is available free as a UNIX gzipped tar archive. Be aware of the Copyright restrictions.

Installation
Installation on UNIX is easy, but assumes you already have Perl5.

Save the archive to your software area, eg., /usr/local.

  1. Gunzip it and extract it through tar, eg.,
        gunzip < mview-1.24.tar.gz | tar xvof -
    This would create the subdirectory mview-1.24.
  2. Change to this directory and load bin/mview into an editor.
  3. Set a perl interpreter path valid for your machine after the '#!' magic number.
  4. Change the "use lib 'some stuff';" line to, in our example,
        use lib '/usr/local/mview-1.24/lib';
  5. Finally, copy mview to somewhere on your PATH and rehash or login again.
Ask your system manager or a Perl guru for help if this looks weird.

Contacts and acknowledgments
Nigel P. Brown
National Institute for Medical Research, Tel: +44 (0)181 959 3666
The Ridgeway, Mill Hill, Fax: +44 (0)181 913 8545
London NW7 1AA, U.K. Email: nbrown@nimr.mrc.ac.uk

People who have contributed include C. Leroy (prototype FASTA parser, BLAST2 (WashU) modifications to BLAST 1.4 parser, prototype PSI-BLAST parser). Useful suggestions came from R. Lopez, and my former colleagues in the defunct Sander group.

This project is unrelated to the Bioperl project, but probably should be...

Copyright and Licensing information
MView and associated libraries are free software, subject to restrictions concerning commercial use. All users must adhere to the licensing terms, acceptance of which is implicit when the software is downloaded.

Maintained by Nigel Brown. Last update Mar 30 1999 .