- buster 1.2.12+dfsg-2
OBISAMPLE(1) | OBITools | OBISAMPLE(1) |
NAME¶
obisample - description of obisampleobisample randomly resamples sequence records with or without replacement.
OBISAMPLE SPECIFIC OPTIONS¶
- -s ###, --sample-size ###
- without the -a option, sample size is expressed as the exact number of sequence records to be sampled (default: number of sequence records in the input file).
- with the -a option, sample size is expressed as a fraction of the sequence record numbers in the input file (expressed as a number between 0 and 1).
Example:
> obisample -s 1000 seq1.fasta > seq2.fasta
Samples randomly 1000 sequence records from the seq1.fasta file, with replacement, and saves them in the seq2.fasta file.
- -a, --approx-sampling
The default algorithm selects exactly the number of sequence records specified with the -s option. When the -a option is set, each sequence record has a probability to be selected related to the count attribute of the sequence record and the -s fraction.
Example:
> obisample -s 0.5 -a seq1.fastq > seq2.fastq
Samples randomly half of the sequence records of the seq1.fastq file, without replacement, and saves them in the seq2.fastq file.
- -w, --without-replacement
Example:
> obisample -s 1000 -w seq1.fasta > seq2.fasta
Samples randomly 1000 sequence records from the seq1.fasta file, without replacement (the input file must contain at least 1000 sequences), and saves them in the seq2.fasta file.
OPTIONS TO SPECIFY INPUT FORMAT¶
Restrict the analysis to a sub-part of the input file¶
- --skip <N>
- The N first sequence records of the file are discarded from the analysis and not reported to the output file
- --only <N>
- Only the N next sequence records of the file are analyzed. The following sequences in the file are neither analyzed, neither reported to the output file. This option can be used conjointly with the –skip option.
Sequence annotated format¶
- --genbank
- Input file is in genbank format.
- --embl
- Input file is in embl format.
fasta related format¶
- --fasta
- Input file is in fasta format (including OBITools fasta extensions).
fastq related format¶
- --sanger
- Input file is in Sanger fastq format (standard fastq used by HiSeq/MiSeq sequencers).
- --solexa
- Input file is in fastq format produced by Solexa (Ga IIx) sequencers.
ecoPCR related format¶
- --ecopcr
- Input file is in ecoPCR format.
- --ecopcrdb
- Input is an ecoPCR database.
Specifying the sequence type¶
- --nuc
- Input file contains nucleic sequences.
- --prot
- Input file contains protein sequences.
COMMON OPTIONS¶
- -h, --help
- Shows this help message and exits.
- --DEBUG
- Sets logging in debug mode.
OBISAMPLE USED SEQUENCE ATTRIBUTE¶
- •
- count
AUTHOR¶
The OBITools Development Team - LECACOPYRIGHT¶
2019 - 2015, OBITool Development TeamJanuary 28, 2019 | 1.02 12 |