.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.3. .TH ART_ILLUMINA "1" "February 2016" "art_illumina 3.19.15" "User Commands" .SH NAME art_illumina \- Simulation of Illumina sequencers .SH DESCRIPTION ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. .P art_illumina can be used for Simulation of Illumina sequencers .SH USAGE .B art_illumina [options] \fB\-sam\fR \fB\-i\fR \fB\-l\fR \fB\-f\fR \fB\-ss\fR \fB\-o\fR .P .B art_illumina [options] \fB\-sam\fR \fB\-i\fR \fB\-l\fR \fB\-f\fR \fB\-o\fR .P .B art_illumina [options] \fB\-sam\fR \fB\-i\fR \fB\-l\fR \fB\-c\fR \fB\-o\fR .P .B art_illumina [options] \fB\-sam\fR \fB\-i\fR \fB\-l\fR \fB\-f\fR \fB\-m\fR \fB\-s\fR \fB\-o\fR .P .B art_illumina [options] \fB\-sam\fR \fB\-i\fR \fB\-l\fR \fB\-c\fR \fB\-m\fR \fB\-s\fR \fB\-o\fR .SH OPTIONS .TP \fB\-1\fR \fB\-\-qprof1\fR the first\-read quality profile .TP \fB\-2\fR \fB\-\-qprof2\fR the second\-read quality profile .HP \fB\-amp\fR \fB\-\-amplicon\fR amplicon sequencing simulation .TP \fB\-c\fR \fB\-\-rcount\fR total number of reads/read pairs to be generated [per amplicon if for amplicon simulation](not be used together with \fB\-f\fR/\-\-fcov) .TP \fB\-d\fR \fB\-\-id\fR the prefix identification tag for read ID .TP \fB\-ef\fR \fB\-\-errfree\fR indicate to generate the zero sequencing errors SAM file as well the regular one .IP NOTE: the reads in the zero\-error SAM file have the same alignment positions as those in the regular SAM file, but have no sequencing errors .TP \fB\-f\fR \fB\-\-fcov\fR the fold of read coverage to be simulated or number of reads/read pairs generated for each amplicon .TP \fB\-h\fR \fB\-\-help\fR print out usage information .TP \fB\-i\fR \fB\-\-in\fR the filename of input DNA/RNA reference .TP \fB\-ir\fR \fB\-\-insRate\fR the first\-read insertion rate (default: 0.00009) .HP \fB\-ir2\fR \fB\-\-insRate2\fR the second\-read insertion rate (default: 0.00015) .TP \fB\-dr\fR \fB\-\-delRate\fR the first\-read deletion rate (default: 0.00011) .HP \fB\-dr2\fR \fB\-\-delRate2\fR the second\-read deletion rate (default: 0.00023) .TP \fB\-l\fR \fB\-\-len\fR the length of reads to be simulated .TP \fB\-m\fR \fB\-\-mflen\fR the mean size of DNA/RNA fragments for paired\-end simulations .HP \fB\-mp\fR \fB\-\-matepair\fR indicate a mate\-pair read simulation .TP \fB\-nf\fR \fB\-\-maskN\fR the cutoff frequency of 'N' in a window size of the read length for masking genomic regions .IP NOTE: default: '\-nf 1' to mask all regions with 'N'. Use '\-nf 0' to turn off masking .TP \fB\-na\fR \fB\-\-noALN\fR do not output ALN alignment file .TP \fB\-o\fR \fB\-\-out\fR the prefix of output filename .TP \fB\-p\fR \fB\-\-paired\fR indicate a paired\-end read simulation or to generate reads from both ends of amplicons .IP NOTE: art will automatically switch to a mate\-pair simulation if the given mean fragment size >= 2000 .TP \fB\-q\fR \fB\-\-quiet\fR turn off end of run summary .TP \fB\-qs\fR \fB\-\-qShift\fR the amount to shift every first\-read quality score by .TP \fB\-qs2\fR \fB\-\-qShift2\fR the amount to shift every second\-read quality score by .IP NOTE: For \fB\-qs\fR/\-qs2 option, a positive number will shift up quality scores (the max is 93) that reduce substitution sequencing errors and a negative number will shift down quality scores that increase sequencing errors. If shifting scores by x, the error rate will be 1/(10^(x/10)) of the default profile. .TP \fB\-rs\fR \fB\-\-rndSeed\fR the seed for random number generator (default: system time in second) .IP NOTE: using a fixed seed to generate two identical datasets from different runs .TP \fB\-s\fR \fB\-\-sdev\fR the standard deviation of DNA/RNA fragment size for paired\-end simulations. .TP \fB\-sam\fR \fB\-\-samout\fR indicate to generate SAM alignment file .TP \fB\-sp\fR \fB\-\-sepProf\fR indicate to use separate quality profiles for different bases (ATGC) .TP \fB\-ss\fR \fB\-\-seqSys\fR The name of Illumina sequencing system of the built\-in profile used for simulation .IP NOTE: sequencing system id names are: .TP GA1 \- Genome Analyzer I, GA2 \- Genome Analyzer II .TP HS10 \- HiSeq 1000, HS20 \- HiSeq 2000, HS25 \- HiSeq 2500, MS \- MiSeq .TP \fB\-M\fR \fB\-\-cigarM\fR indicate to use CIGAR 'M' instead of '=/X' for alignment match/mismatch .SH NOTES .PP * ART by default selects a built\-in quality score profile according to the read length specified for the run. .PP * For single\-end simulation, ART requires input sequence file, outputfile prefix, read length, and read count/fold coverage. .PP * For paired\-end simulation (except for amplicon sequencing), ART also requires the parameter values of .IP the mean and standard deviation of DNA/RNA fragment lengths .PP .SH EXAMPLES .TP 1) single\-end read simulation .IP art_illumina \fB\-sam\fR \fB\-i\fR reference.fa \fB\-l\fR 150 \fB\-ss\fR HS25 \fB\-f\fR 10 \fB\-o\fR single_dat .TP 2) paired\-end read simulation .IP art_illumina \fB\-sam\fR \fB\-i\fR reference.fa \fB\-p\fR \fB\-l\fR 150 \fB\-ss\fR HS25 \fB\-f\fR 20 \fB\-m\fR 200 \fB\-s\fR 10 \fB\-o\fR paired_dat .TP 3) mate\-pair read simulation .IP art_illumina \fB\-sam\fR \fB\-i\fR reference.fa \fB\-mp\fR \fB\-l\fR 50 \fB\-f\fR 20 \fB\-m\fR 2500 \fB\-s\fR 50 \fB\-o\fR matepair_dat .TP 4) amplicon sequencing simulation with 5' end single\-end reads .IP art_illumina \fB\-amp\fR \fB\-sam\fR \fB\-na\fR \fB\-i\fR amp_reference.fa \fB\-l\fR 50 \fB\-f\fR 10 \fB\-o\fR amplicon_5end_dat .TP 5) amplicon sequencing simulation with paired\-end reads .IP art_illumina \fB\-amp\fR \fB\-p\fR \fB\-sam\fR \fB\-na\fR \fB\-i\fR amp_reference.fa \fB\-l\fR 50 \fB\-f\fR 10 \fB\-o\fR amplicon_pair_dat .TP 6) amplicon sequencing simulation with matepair reads .IP art_illumina \fB\-amp\fR \fB\-mp\fR \fB\-sam\fR \fB\-na\fR \fB\-i\fR amp_reference.fa \fB\-l\fR 50 \fB\-f\fR 10 \fB\-o\fR amplicon_mate_dat .TP 7) generate an extra SAM file with zero\-sequencing errors for a paired\-end read simulation .IP art_illumina \fB\-ef\fR \fB\-i\fR reference.fa \fB\-p\fR \fB\-l\fR 50 \fB\-f\fR 20 \fB\-m\fR 200 \fB\-s\fR 10 \fB\-o\fR paired_twosam_dat .TP 8) reduce the substitution error rate to one 10th of the default profile .IP art_illumina \fB\-i\fR reference.fa \fB\-qs\fR 10 \fB\-qs2\fR 10 \fB\-l\fR 50 \fB\-f\fR 10 \fB\-p\fR \fB\-m\fR 500 \fB\-s\fR 10 \fB\-sam\fR \fB\-o\fR reduce_error .TP 9) turn off the masking of genomic regions with unknown nucleotides 'N' .IP art_illumina \fB\-nf\fR 0 \fB\-sam\fR \fB\-i\fR reference.fa \fB\-p\fR \fB\-l\fR 50 \fB\-f\fR 20 \fB\-m\fR 200 \fB\-s\fR 10 \fB\-o\fR paired_nomask .TP 10) masking genomic regions with >=5 'N's within the read length 50 .IP art_illumina \fB\-nf\fR 5 \fB\-sam\fR \fB\-i\fR reference.fa \fB\-p\fR \fB\-l\fR 50 \fB\-f\fR 20 \fB\-m\fR 200 \fB\-s\fR 10 \fB\-o\fR paired_maskN5 .SH AUTHOR This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.