MIRACONVERT(1) | User Commands | MIRACONVERT(1) |
NAME¶
miraconvert - convert assembly and sequencing file types
SYNOPSIS¶
miraconvert [-f <fromtype>] [-t <totype> [-t <totype> ...]] [-aAbCdhimMsuZ] [-cflnNoPqrtvxXyYz {...}] {infile} {outfile} [<totype> <totype> ...]
OPTIONS¶
- -f <fromtype>
- load this type of project files, where fromtype is:
- caf a complete assembly or single sequences from CAF
- maf a complete assembly or single sequences from CAF
- fasta sequences from a FASTA file
- fastq sequences from a FASTQ file
- gb[f|k|ff] sequences from a GenBank file
- phd sequences from a PHD file
- fofnexp sequences in EXP files from file of filenames
- -t <totype>
- write the sequences/assembly to this type (multiple mentions of -t are allowed):
- ace sequences or complete assembly to ACE
- caf sequences or complete assembly to CAF
- maf sequences or complete assembly to MAF
- sam complete assembly to SAM
- samnbb like above, but leaving out reference (backbones) in mapping assemblies
- gb[f|k|ff] sequences or consensus to GenBank
- gff3 consensus to GFF3
- wig assembly coverage info to wiggle file
- gcwig assembly gc content info to wiggle file
- fasta sequences or consensus to FASTA file (qualities to .qual)
- fastq sequences or consensus to FASTQ file
- exp sequences or complete assembly to EXP files in directories. Complete assemblies are suited for gap4 import as directed assembly. Note: using caf2gap to import into gap4 is recommended though
- text complete assembly to text alignment (only when -f is caf, maf or gbf)
- html complete assembly to HTML (only when -f is caf, maf or gbf)
- tcs complete assembly to tcs
- hsnp surrounding of SNP tags (SROc, SAOc, SIOc) to HTML (only when -f is caf, maf or gbf)
- asnp analysis of SNP tags (only when -f is caf, maf or gbf)
- cstats contig statistics file like from MIRA (only when source contains contigs)
- crlist contig read list file like from MIRA (only when source contains contigs)
- maskedfasta reads where sequencing vector is masked out (with X) to FASTA file (qualities to .qual)
- scaf sequences or complete assembly to single sequences CAF
- When reading formats which define clipping points, and saving to formats which do not have clipping information, miraconvert normally adjusts the case of read sequences: lower case for clipped parts, upper case for unclipped parts of reads. Use -A if you do not want this. See also -C.
- Applies only to files/formats which do not contain contigs.
- -b
- Blind data
- Replaces all bases in reads/contigs with a 'c'
- -C
- Perform hard clip to reads
- When reading formats which define clipping points, will save only the unclipped part into the result file.
- Applies only to files/formats which do not contain contigs.
- -d
- Delete gap only columns
- When output is contigs: delete columns that are entirely gaps (like after having deleted reads during editing in gap4 or similar)
- When output is reads: delete gaps in reads
- -F
- Filter read groups to different files
- Works only for input files with readgroups (CAF/MAF) 3 (or 4) files generated: one or two for paired, one for unpaired and one for debris reads.
- Reads in paired file are interlaced by default, use -F twice to create separate files.
- -m
- Make contigs (only for -t = caf or maf)
- Encase single reads as contig singlets into the CAF/MAF file.
- -n <filename>
- when given, selects only reads or contigs given by name in that file.
- -N <filename>
- like -n, but sorts output according to order given in file.
- -i
- when -n is used, inverts the selection
- -o <quality>t
- FASTQ quality Offset (only for -f = 'fastq')
- Offset of quality values in FASTQ file. Default of 33 loads Sanger/Phred style files, using 0 tries to automatically recognise.
- -P <string>
- String with MIRA parameters to be parsed
- Useful when setting parameters affecting consensus calling like -CO:mrpg etc.
- E.g.: -P "454_SETTINGS -CO:mrpg=3"
- -q <quality>
- Set default quality for bases in file types without quality values. Furthermore, do not stop if expected quality files are missing (e.g. '.fasta')
- -R <name>
- Rename contigs/singlets/reads with given name string to which a counter is appended.
- Known bug: will create duplicate names if input contains contigs/singlets as well as free reads, i.e. reads not in contigs nor singlets.
- -S <name>
- (name)Scheme for renaming reads, important for paired-ends. Only 'solexa' is currently supported.
- -T
- When converting single reads, trim/clip away stretches of N and X and ends of reads. Note: remember to use -C to also perform a hard clip (e.g. with FASTA as output).
- -v
- Print version number and exit
- -Y <integer>
- Yield. Max (clipped/padded) bases to convert.
- When used on reads: output will contain first reads of file where length of clipped bases totals at least -Y. When used on contigs: output will contain first contigs of file where length of padded contigs totals at least -Y.
The following switches work only when input (CAF or MAF) contains contigs. Beware: CAF and MAf can also contain just reads.
- -M
- Do not extract contigs (or their consensus), but the sequence of the reads they are composed of.
- -r [cCqf]
- Recalculate consensus and / or consensus quality values and / or SNP feature tags.
- 'c' recalc cons & cons qualities (with IUPAC)
- 'C' recalc cons & cons qualities (forcing non-IUPAC)
- 'q' recalc consensus qualities only
- 'f' recalc SNP features
- Note: only the last of cCq is relevant, f works as a switch and can be combined with cQq (e.g. "-r C -r f")
- Note: if the CAF/MAF contains multiple strains, recalculation of cons & cons qualities is forced, you can just influence whether IUPACs are used or not.
- Fill holes in the genome of one strain (N or @) with sequence from a consensus of other strains
- Takes effect only with -r and -t gbf or fasta/q in FASTA/Q: bases filled up are in lower case in GBF: bases filled up are in upper case
- -Q <integer>
- Defines minimum quality a consensus base of a strain must have, consensus bases below this will be 'N' Default: 0
- Only used with -r, and -f is caf/maf and -t is (fasta or gbf)
- -V <integer>
- Defines minimum coverage a consensus base of a strain must have, bases with coverage below this will be 'N' Default: 0
- Only used with -r, and -t is (fasta or gbf)
- -x <integer>
- Minimum contig or unclipped read length
- When loading, discard all contigs / reads with a length less than this value. Default: 0 (=switched off)
- Note: not applied to reads in contigs!
- -X <integer>
- Similar to -x but applies only to reads and then to the clipped length.
- -y <integer>
- Minimum average contig coverage When loading, discard all contigs with an average coverage less than this value. Default: 1
- -z <integer>
- Minimum number of reads in contig When loading, discard all contigs with a number of reads less than this value. Default: 0 (=switched off)
- -l <integer>
- when output as text or HTML: number of bases shown in one alignment line. Default: 60.
- -c <character>
- when output as text or HTML: character used to pad endgaps. Default: ' ' (blank)
EXAMPLES¶
- miraconvert source.maf dest.sam
- miraconvert source.caf dest.fasta wig ace
- miraconvert -x 2000 -y 10 source.caf dest.caf
- miraconvert -x 40 -C -F -F source.maf .fastq
SEE ALSO¶
A more extensive documentation is provided in the MIRA manual available online at
On Debian, this can be installed with the mira-doc package and can then be found at /usr/share/doc/mira-assembler/DefinitiveGuideToMIRA.html. On other systems, you may want to check in /usr/local/share/mira/doc or run "locate DefinitiveGuideToMIRA" to find it locally.
You can also subscribe one of the MIRA mailing lists at
After subscribing, mail general questions to the MIRA talk mailing list:
- mira_talk@freelists.org
BUGS¶
To report bugs or ask for features, please use the ticketing system at:
AUTHOR¶
Bastien Chevreux <bach@chevreux.org>
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
May 2016 | miraconvert 5.0.x |