table of contents
MIRABAIT(1) | User Commands | MIRABAIT(1) |
NAME¶
mirabait - a 'grep' like tool to select reads with kmers up to 256 bp
SYNOPSIS¶
mirabait [options] {-b baitfile [-b ...] | -B file | -j joblibrary} {-p file_1 file_2 | -P file3}* [file4 ...]
DESCRIPTION¶
mirabait selects reads from a read collection which are partly similar or equal to sequences defined as target baits. Similarity is defined by finding a user-adjustable number of common k-mers (sequences of k consecutive bases) which are the same in the bait sequences and the screened sequences to be selected, either in forward or forward/reverse complement direction. Adding a DUST-like repeat filter for repeats up 4 bases is optional.
When used on paired files, selects sequences where at least one mate matches.
OPTIONS¶
Main options:¶
- -b file
- Load bait sequences from file (multiple -b allowed)
- -B file
- Load baits from kmer statistics file, not from sequence files. Only one -B allowed, cannot be combined with -b. (see -K for creating such a file)
- -j job
- Set options for predefined job from supplied MIRA library Currently available jobs:
- rrna Bait rRNA sequences
- -p file1 file2
- Load paired sequences to search from file1 and file2 Files must contain same number of sequences, sequence names must be in same order. Multiple -p allowed, but must come before non-paired files.
- -P file
- Load paired sequences from file File must be interleaved: pairs must follow each other, non-pairs are not allowed. Multiple -p allowed, but must come before non-paired files.
- -k int
- kmer length of bait in bases (<=256, default=31)
- -n int
- If >0: minimum number of k-mer baits needed (default=1) If <=0: allowed number of missed kmers over sequence
- length
- -d
- Do not use kmers with microrepeats (DUST-like, see also -D)
- -D int
- Set length of microrepeats in kmers to discard from bait.
- int > 0 microrepeat len in percentage of kmer length. E.g.: -k 17 -D 67 --> 11.39 bases --> 12 bases.
- int < 0 microrepeat len in bases.
- int != 0 implies -d, int=0 turns DUST filter off. - -i
- Selects sequences that do not hit bait
- -I
- Selects sequences that hit and do not hit bait (to different files)
- -r
- No checking of reverse complement direction
- -t
- Number of threads to use (default=0 -> up to 4 CPU cores)
Options for output definition:¶
Normally mirabait writes separate result files (named 'bait_match_*' and 'bait_miss_*') for each input to the current directory. For changing this behaviour and other relating to output, use these options:
- -c
- No case change of sequence to denote bait hits
- -l int
- length of a line (FASTA only, default 0=unlimited)
- -K file
- Save kmer statistics to 'file' (see also -B)
- -N name
- Change the prefix 'bait' to <name> Has no effect if -o/-O is used and targets are not directories
- -o <path>
- Save sequences matching bait to path If path is a directory, write separate files into this directory. If not, combine all matching sequences from the input file(s) into a single file specified by the path.
- -O <path>
- Like -o, but for sequences not matching
Other options:¶
- -T dir
- Use 'dir' as directory for temporary files instead of current working directory.
- -m integer
- Memory to use for computing kmer statistics
0..100 = use percentage of free system memory
>100 = amount of MiB to use (e.g. 16384 for 16 GiB)
Default 75 (75% of free system memory).
Defining files types to load/save:¶
Normally mirabait recognises the file types according to the file extension (even when packed). In cases you need to force a certain file type because the file extension is non-standard, use the EMBOSS notation to force a type: <filetype>::<name_of_file>. E.g., to tell that "somefile.dat" is FASTQ, use: fastq::somefile.dat Recognised types are: caf, fasta, fastq, gbf, gbk, gbff, maf and phd.
MIRABAIT will write files in the same file type as the corresponding input files. Examples:
- mirabait -b b.fasta file.fastq
- mirabait -I -j rrna -p file_1.fastq file_2.fastq
- mirabait -b b1.fasta -b b2.gbk file.fastq
- mirabait -b fasta::baits.dat -p fastq::file_1.dat fastq::file_2.dat
- mirabait -b b.fasta -p file_1.fastq file_2.fastq -P file3.fasta file4.caf
- mirabait -I -b b.fasta -p file_1.fastq file_2.fastq -P file3.fasta file4.caf
- mirabait -k 27 -n 10 -b b.fasta file.fastq
- mirabait -b fasta::b.dat fastq::file.dat
- mirabait -o /dev/shm/ -b b.fasta -p file_1.fastq file_2.fastq
- mirabait -o /dev/shm/match -b b.fasta -p file_1.fastq file_2.fastq
- mirabait -b human_genome.fasta -K HG_kmerstats.mhs.gz -p file1.fastq file2.fastq
- mirabait -B HG_kmerstats.mhs.gz -p file1.fastq file2.fastq
- mirabait -d -B HG_kmerstats.mhs.gz -p file1.fastq file2.fastq
SEE ALSO¶
A more extensive documentation is provided in the MIRA manual available online at
On Debian, this can be installed with the mira-doc package and can then be found at /usr/share/doc/mira-assembler/DefinitiveGuideToMIRA.html. On other systems, you may want to check in /usr/local/share/mira/doc or run "locate DefinitiveGuideToMIRA" to find it locally.
You can also subscribe one of the MIRA mailing lists at
After subscribing, mail general questions to the MIRA talk mailing list:
- mira_talk@freelists.org
BUGS¶
To report bugs or ask for features, please use the ticketing system at:
AUTHOR¶
Bastien Chevreux <bach@chevreux.org>
This manual page was written by Bastien Chevreux <bach@chevreux.org> but can be freely used for any documentation purpose.
May 2016 | mirabait 5.0.x |