Scroll to navigation

ART_ILLUMINA(1) User Commands ART_ILLUMINA(1)

NAME

art_illumina - Simulation of Illumina sequencers

DESCRIPTION

ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data.

art_illumina can be used for Simulation of Illumina sequencers

USAGE

art_illumina [options] -sam -i <seq_ref_file> -l <read_length> -f <fold_coverage> -ss <sequencing_system> -o <outfile_prefix>

art_illumina [options] -sam -i <seq_ref_file> -l <read_length> -f <fold_coverage> -o <outfile_prefix>

art_illumina [options] -sam -i <seq_ref_file> -l <read_length> -c <total_num_reads> -o <outfile_prefix>

art_illumina [options] -sam -i <seq_ref_file> -l <read_length> -f <fold_coverage> -m <mean_fragsize> -s <std_fragsize> -o <outfile_prefix>

art_illumina [options] -sam -i <seq_ref_file> -l <read_length> -c <total_num_reads> -m <mean_fragsize> -s <std_fragsize> -o <outfile_prefix>

OPTIONS

-1 --qprof1
the first-read quality profile
-2 --qprof2
the second-read quality profile

-amp --amplicon amplicon sequencing simulation

total number of reads/read pairs to be generated [per amplicon if for amplicon simulation](not be used together with -f/--fcov)
the prefix identification tag for read ID
indicate to generate the zero sequencing errors SAM file as well the regular one
NOTE: the reads in the zero-error SAM file have the same alignment positions as those in the regular SAM file, but have no sequencing errors
the fold of read coverage to be simulated or number of reads/read pairs generated for each amplicon
print out usage information
the filename of input DNA/RNA reference
the first-read insertion rate (default: 0.00009)

-ir2 --insRate2 the second-read insertion rate (default: 0.00015)

the first-read deletion rate (default: 0.00011)

-dr2 --delRate2 the second-read deletion rate (default: 0.00023)

the length of reads to be simulated
the mean size of DNA/RNA fragments for paired-end simulations

-mp --matepair indicate a mate-pair read simulation

the cutoff frequency of 'N' in a window size of the read length for masking genomic regions
NOTE: default: '-nf 1' to mask all regions with 'N'. Use '-nf 0' to turn off masking
do not output ALN alignment file
the prefix of output filename
indicate a paired-end read simulation or to generate reads from both ends of amplicons
NOTE: art will automatically switch to a mate-pair simulation if the given mean fragment size >= 2000
turn off end of run summary
the amount to shift every first-read quality score by
the amount to shift every second-read quality score by
NOTE: For -qs/-qs2 option, a positive number will shift up quality scores (the max is 93) that reduce substitution sequencing errors and a negative number will shift down quality scores that increase sequencing errors. If shifting scores by x, the error rate will be 1/(10^(x/10)) of the default profile.
the seed for random number generator (default: system time in second)
NOTE: using a fixed seed to generate two identical datasets from different runs
the standard deviation of DNA/RNA fragment size for paired-end simulations.
indicate to generate SAM alignment file
indicate to use separate quality profiles for different bases (ATGC)
The name of Illumina sequencing system of the built-in profile used for simulation
NOTE: sequencing system id names are:
indicate to use CIGAR 'M' instead of '=/X' for alignment match/mismatch

NOTES

* ART by default selects a built-in quality score profile according to the read length specified for the run.

* For single-end simulation, ART requires input sequence file, outputfile prefix, read length, and read count/fold coverage.

* For paired-end simulation (except for amplicon sequencing), ART also requires the parameter values of

the mean and standard deviation of DNA/RNA fragment lengths

EXAMPLES

1) single-end read simulation
art_illumina -sam -i reference.fa -l 150 -ss HS25 -f 10 -o single_dat
2) paired-end read simulation
art_illumina -sam -i reference.fa -p -l 150 -ss HS25 -f 20 -m 200 -s 10 -o paired_dat
3) mate-pair read simulation
art_illumina -sam -i reference.fa -mp -l 50 -f 20 -m 2500 -s 50 -o matepair_dat
4) amplicon sequencing simulation with 5' end single-end reads
art_illumina -amp -sam -na -i amp_reference.fa -l 50 -f 10 -o amplicon_5end_dat
5) amplicon sequencing simulation with paired-end reads
art_illumina -amp -p -sam -na -i amp_reference.fa -l 50 -f 10 -o amplicon_pair_dat
6) amplicon sequencing simulation with matepair reads
art_illumina -amp -mp -sam -na -i amp_reference.fa -l 50 -f 10 -o amplicon_mate_dat
7) generate an extra SAM file with zero-sequencing errors for a paired-end read simulation
art_illumina -ef -i reference.fa -p -l 50 -f 20 -m 200 -s 10 -o paired_twosam_dat
8) reduce the substitution error rate to one 10th of the default profile
art_illumina -i reference.fa -qs 10 -qs2 10 -l 50 -f 10 -p -m 500 -s 10 -sam -o reduce_error
9) turn off the masking of genomic regions with unknown nucleotides 'N'
art_illumina -nf 0 -sam -i reference.fa -p -l 50 -f 20 -m 200 -s 10 -o paired_nomask
10) masking genomic regions with >=5 'N's within the read length 50
art_illumina -nf 5 -sam -i reference.fa -p -l 50 -f 20 -m 200 -s 10 -o paired_maskN5

AUTHOR

This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.

February 2016 art_illumina 3.19.15