| MUMMER(1) | General Commands Manual | MUMMER(1) | 
NAME¶
mummer - package for sequence alignment of multiple genomes
SYNOPSIS¶
mummer-annotate <gapfile><datafile>
  
  combineMUMs
    <RefSequence><MatchSequences><GapsFile>
  
  dnadiff [options]<reference><query> or
    [options]-d<deltafile>
  
  exact-tandems <file><min-match-len>
  
  gaps
  
  mapview
    [options]<coordsfile>[UTRcoords][CDScoords]
  
  mgaps
    [-d<DiagDiff>][-f<DiagFactor>][-l<MatchLen>][-s<MaxSeparation>]
  
  mummer [options]<reference-file><query-files>
  
  mummerplot [options]<matchfile>
  
  nucmer [options]<Reference><Query>
  
  nucmer2xfig
  
  promer [options]<Reference><Query>
  
  repeat-match [options]<genome-file>
  
  run-mummer1
    <fastareference><fastaquery><prefix>[-r]
  
  run-mummer3
    <fastareference><multi-fastaquery><prefix>
  
  show-aligns
    [options]<deltafile><refID><qryID>
Input is the .delta output of either the "nucmer" or the "promer" program passed on the command line.
Output is to stdout, and consists of all the alignments between the query and reference sequences identified on the command line.
NOTE: No sorting is done by default, therefore the alignments will
    be ordered as found in the <deltafile> input.
  
  show-coords [options]<deltafile>
  
  show-snps [options]<deltafile>
  
  show-tiling [options]<deltafile>
DESCRIPTION¶
OPTIONS¶
All tools (except for gaps) obey to the -h, --help, -V and
    --version options as one would expect. This help is excellent and makes
    these man pages basically obsolete.
  
  combineMUMs Combines MUMs in <GapsFile> by extending matches off
    ends and between MUMs. <RefSequence> is a fasta file of the reference
    sequence. <MatchSequences> is a multi-fasta file of the sequences
    matched against the reference
  
   -D Only output to stdout the difference positions
  
   and characters
  
   -n Allow matches only between nucleotides, i.e., ACGTs
  
   -N num Break matches at <num> or more consecutive non-ACGTs
  
   -q tag Used to label query match
  
   -r tag Used to label reference match
  
   -S Output all differences in strings
  
   -t Label query matches with query fasta header
  
   -v num Set verbose level for extra output
  
   -W file Reset the default output filename witherrors.gaps
  
   -x Don't output .cover files
  
   -e Set error-rate cutoff to e (e.g. 0.02 is two percent)
  
  dnadiff Run comparative analysis of two sequence sets using nucmer and
    its associated utilities with recommended parameters. See MUMmer
    documentation for a more detailed description of the output. Produces the
    following output files:
  
   .report - Summary of alignments, differences and SNPs
  
   .delta - Standard nucmer alignment output
  
   .1delta - 1-to-1 alignment from delta-filter -1
  
   .mdelta - M-to-M alignment from delta-filter -m
  
   .1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
  
   .mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
  
   .snps - SNPs from show-snps -rlTHC .1delta
  
   .rdiff - Classified ref breakpoints from show-diff -rH .mdelta
  
   .qdiff - Classified qry breakpoints from show-diff -qH .mdelta
  
   .unref - Unaligned reference IDs and lengths (if applicable)
  
   .unqry - Unaligned query IDs and lengths (if applicable)
MANDATORY:
  
   reference Set the input reference multi-FASTA filename
  
   query Set the input query multi-FASTA filename
  
   or
  
   delta file Unfiltered .delta alignment file from nucmer
OPTIONS:
  
   -d|delta Provide precomputed delta file for analysis
  
   -h
  
   --help Display help information and exit
  
   -p|prefix Set the prefix of the output files (default "out")
  
   -V
  
   --version Display the version information and exit
mapview
  
   -h
  
   --help Display help information and exit
  
   -m|mag Set the magnification at which the figure is rendered,
  
   this is an option for fig2dev which is used to generate
  
   the PDF and PS files (default 1.0)
  
   -n|num Set the number of output files used to partition the
  
   output, this is to avoid generating files that are too
  
   large to display (default 10)
  
   -p|prefix Set the output file prefix
  
   (default "PROMER_graph or NUCMER_graph")
  
   -v
  
   --verbose Verbose logging of the processed files
  
   -V
  
   --version Display the version information and exit
  
   -x1 coord Set the lower coordinate bound of the display
  
   -x2 coord Set the upper coordinate bound of the display
  
   -g|ref If the input file is provided by 'mgaps', set the
  
   reference sequence ID (as it appears in the first column
  
   of the UTR/CDS coords file)
  
   -I Display the name of query sequences
  
   -Ir Display the name of reference genes
  
  mummer Find and output (to stdout) the positions and length of all
    sufficiently long maximal matches of a substring in <query-file> and
    <reference-file>
  
   -mum compute maximal matches that are unique in both sequences
  
   -mumcand same as -mumreference
  
   -mumreference compute maximal matches that are unique in
  
   		 the reference-sequence but not necessarily 		 in the query-sequence
    (default)
  
   -maxmatch compute all maximal matches regardless of their uniqueness
  
   -n match only the characters a, c, g, or t
  
   they can be in upper or in lower case
  
   -l set the minimum length of a match
  
   if not set, the default value is 20
  
   -b compute forward and reverse complement matches
  
   -r only compute reverse complement matches
  
   -s show the matching substrings
  
   -c report the query-position of a reverse complement match
  
   relative to the original query sequence
  
   -F force 4 column output format regardless of the number of
  
   reference sequence inputs
  
   -L show the length of the query sequences on the header line
  
  nuncmer
  
   nucmer generates nucleotide alignments between two mutli-FASTA input
  
   files. Two output files are generated. The .cluster output file lists
  
   clusters of matches between each sequence. The .delta file lists the
  
   distance between insertions and deletions that produce maximal scoring
  
   alignments between each sequence.
MANDATORY:
  
   Reference Set the input reference multi-FASTA filename
  
   Query Set the input query multi-FASTA filename
  
   --mum Use anchor matches that are unique in both the reference
  
   and query
  
   --mumcand Same as --mumreference
  
   --mumreference Use anchor matches that are unique in in the reference
  
   but not necessarily unique in the query (default behavior)
  
   --maxmatch Use all anchor matches regardless of their uniqueness
  
   -b|breaklen Set the distance an alignment extension will attempt to
  
   extend poor scoring regions before giving up (default 200)
  
   -c|mincluster Sets the minimum length of a cluster of matches (default 65)
  
   --[no]delta Toggle the creation of the delta file (default --delta)
  
   --depend Print the dependency information and exit
  
   -d|diagfactor Set the clustering diagonal difference separation factor
  
   (default 0.12)
  
   --[no]extend Toggle the cluster extension step (default --extend)
  
   -f
  
   --forward Use only the forward strand of the Query sequences
  
   -g|maxgap Set the maximum gap between two adjacent matches in a
  
   cluster (default 90)
  
   -h
  
   --help Display help information and exit
  
   -l|minmatch Set the minimum length of a single match (default 20)
  
   -o
  
   --coords Automatically generate the original NUCmer1.1 coords
  
   output file using the 'show-coords' program
  
   --[no]optimize Toggle alignment score optimization, i.e. if an alignment
  
   extension reaches the end of a sequence, it will backtrack
  
   to optimize the alignment score instead of terminating the
  
   alignment at the end of the sequence (default --optimize)
  
   -p|prefix Set the prefix of the output files (default "out")
  
   -r
  
   --reverse Use only the reverse complement of the Query sequences
  
   --[no]simplify Simplify alignments by removing shadowed clusters. Turn
  
   this option off if aligning a sequence to itself to look
  
   for repeats (default --simplify)
promer
  
   promer generates amino acid alignments between two mutli-FASTA DNA input
  
   files. Two output files are generated. The .cluster output file lists
  
   clusters of matches between each sequence. The .delta file lists the
  
   distance between insertions and deletions that produce maximal scoring
  
   alignments between each sequence. The DNA input is translated into all 6
  
   reading frames in order to generate the output, but the output coordinates
  
   reference the original DNA input.
MANDATORY:
  
   Reference Set the input reference multi-FASTA DNA file
  
   Query Set the input query multi-FASTA DNA file
  
   --mum Use anchor matches that are unique in both the reference
  
   and query
  
   --mumcand Same as --mumreference
  
   --mumreference Use anchor matches that are unique in in the reference
  
   but not necessarily unique in the query (default behavior)
  
   --maxmatch Use all anchor matches regardless of their uniqueness
  
   -b|breaklen Set the distance an alignment extension will attempt to
  
   extend poor scoring regions before giving up, measured in
  
   amino acids (default 60)
  
   -c|mincluster Sets the minimum length of a cluster of matches, measured in
  
   amino acids (default 20)
  
   --[no]delta Toggle the creation of the delta file (default --delta)
  
   --depend Print the dependency information and exit
  
   -d|diagfactor Set the clustering diagonal difference separation factor
  
   (default .11)
  
   --[no]extend Toggle the cluster extension step (default --extend)
  
   -g|maxgap Set the maximum gap between two adjacent matches in a
  
   cluster, measured in amino acids (default 30)
  
   -l|minmatch Set the minimum length of a single match, measured in amino
  
   acids (default 6)
  
   -m|masklen Set the maximum bookend masking length, measured in amino
  
   acids (default 8)
  
   -o
  
   --coords Automatically generate the original PROmer1.1 ".coords"
  
   output file using the "show-coords" program
  
   --[no]optimize Toggle alignment score optimization, i.e. if an alignment
  
   extension reaches the end of a sequence, it will backtrack
  
   to optimize the alignment score instead of terminating the
  
   alignment at the end of the sequence (default --optimize)
  
   -p|prefix Set the prefix of the output files (default "out")
  
   -x|matrix Set the alignment matrix number to 1 [BLOSUM 45],
  
   2 [BLOSUM 62] or 3 [BLOSUM 80] (default 2)
  
  repeat-match Find all maximal exact matches in <genome-file>
  
   -E Use exhaustive (slow) search to find matches
  
   -f Forward strand only, don't use reverse complement
  
   -n # Set minimum exact match length to #
  
   -t Only output tandem repeats
  
   -V # Set level of verbose (debugging) printing to #
  
  show-aligns
  
   -h Display help information
  
   -q Sort alignments by the query start coordinate
  
   -r Sort alignments by the reference start coordinate
  
   -w int Set the screen width - default is 60
  
   -x int Set the matrix type - default is 2 (BLOSUM 62),
  
   other options include 1 (BLOSUM 45) and 3 (BLOSUM 80)
  
   note: only has effect on amino acid alignments
  
  show-coords
  
   -b Merges overlapping alignments regardless of match dir
  
   or frame and does not display any idenitity information.
  
   -B Switch output to btab format
  
   -c Include percent coverage information in the output
  
   -d Display the alignment direction in the additional
  
   FRM columns (default for promer)
  
   -g Deprecated option. Please use 'delta-filter' instead
  
   -h Display help information
  
   -H Do not print the output header
  
   -I float Set minimum percent identity to display
  
   -k Knockout (do not display) alignments that overlap
  
   another alignment in a different frame by more than 50%
  
   of their length, AND have a smaller percent similarity
  
   or are less than 75% of the size of the other alignment
  
   (promer only)
  
   -l Include the sequence length information in the output
  
   -L long Set minimum alignment length to display
  
   -o Annotate maximal alignments between two sequences, i.e.
  
   overlaps between reference and query sequences
  
   -q Sort output lines by query IDs and coordinates
  
   -r Sort output lines by reference IDs and coordinates
  
   -T Switch output to tab-delimited format
  
   Input is the .delta output of either the "nucmer" or the
    "promer" program passed on the command line.
  
   Output is to stdout, and consists of a list of coordinates, percent identity,
    and other useful information regarding the alignment data contained in the
    .delta file used as input.
  
   NOTE: No sorting is done by default, therefore the alignments will be ordered
    as found in the <deltafile> input.
  
  show-snps
  
   -C Do not report SNPs from alignments with an ambiguous
  
   mapping, i.e. only report SNPs where the [R] and [Q]
  
   columns equal 0 and do not output these columns
  
   -h Display help information
  
   -H Do not print the output header
  
   -I Do not report indels
  
   -l Include sequence length information in the output
  
   -q Sort output lines by query IDs and SNP positions
  
   -r Sort output lines by reference IDs and SNP positions
  
   -S Specify which alignments to report by passing
  
   'show-coords' lines to stdin
  
   -T Switch to tab-delimited format
  
   -x int Include x characters of surrounding SNP context in the
  
   output, default 0
  
   Input is the .delta output of either the nucmer or promer program passed on
    the command line.
  
   Output is to stdout, and consists of a list of SNPs (or amino acid
    substitutions for promer) with positions and other useful info. Output will
    be sorted with -r by default and the [BUFF] column will always refer to the
    sequence whose positions have been sorted. This value specifies the distance
    from this SNP to the nearest mismatch (end of alignment, indel, SNP, etc) in
    the same alignment, while the [DIST] column specifies the distance from this
    SNP to the nearest sequence end. SNPs for which the [R] and [Q] columns are
    greater than 0 should be evaluated with caution, as these columns specify
    the number of other alignments which overlap this position. Use -C to assure
    SNPs are only reported from unique alignment regions.
show-tiling
  
   -a Describe the tiling path by printing the tab-delimited
  
   alignment region coordinates to stdout
  
   -c Assume the reference sequences are circular, and allow
  
   tiled contigs to span the origin
  
   -g int Set maximum gap between clustered alignments [-1, INT_MAX]
  
   A value of -1 will represent infinity
  
   (nucmer default = 1000)
  
   (promer default = -1)
  
   -i float Set minimum percent identity to tile [0.0, 100.0]
  
   (nucmer default = 90.0)
  
   (promer default = 55.0)
  
   -l int Set minimum length contig to report [-1, INT_MAX]
  
   A value of -1 will represent infinity
  
   (common default = 1)
  
   -p file Output a pseudo molecule of the query contigs to 'file'
  
   -R Deal with repetitive contigs by randomly placing them
  
   in one of their copy locations (implies -V 0)
  
   -t file Output a TIGR style contig list of each query sequence
  
   that sufficiently matches the reference (non-circular)
  
   -u file Output the tab-delimited alignment region coordinates
  
   of the unusable contigs to 'file'
  
   -v float Set minimum contig coverage to tile [0.0, 100.0]
  
   (nucmer default = 95.0) sum of individual alignments
  
   (promer default = 50.0) extent of syntenic region
  
   -V float Set minimum contig coverage difference [0.0, 100.0]
  
   i.e. the difference needed to determine one alignment
  
   is 'better' than another alignment
  
   (nucmer default = 10.0) sum of individual alignments
  
   (promer default = 30.0) extent of syntenic region
  
   -x Describe the tiling path by printing the XML contig
  
   linking information to stdout
  
   Input is the .delta output of the nucmer program, run on very similar
    sequence data, or the .delta output of the promer program, run on divergent
    sequence data.
  
   Output is to stdout, and consists of the predicted location of each aligning
    query contig as mapped to the reference sequences. These coordinates
    reference the extent of the entire query contig, even when only a certain
    percentage of the contig was actually aligned (unless the -a option is
    used). Columns are, start in ref, end in ref, distance to next contig,
    length of this contig, alignment coverage, identity, orientation, and ID
    respectively.
SEE ALSO¶
http://mummer.sourceforge.net/
  
Open source MUMmer 3.0 is described in
  
  Versatile and open software for comparing large genomes. S. Kurtz, A.
    Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L.
    Salzberg, Genome Biology (2004), 5:R12.
AUTHOR¶
mummer was written by S. Kurtz, A. Phillippy, A.L. Delcher, M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg.
| May 21, 2005 |