'\" t .\" Title: gt-hop .\" Author: [FIXME: author] [see http://www.docbook.org/tdg5/en/html/author] .\" Generator: DocBook XSL Stylesheets vsnapshot .\" Date: 11/28/2022 .\" Manual: GenomeTools Manual .\" Source: GenomeTools 1.6.2 .\" Language: English .\" .TH "GT\-HOP" "1" "11/28/2022" "GenomeTools 1\&.6\&.2" "GenomeTools Manual" .\" ----------------------------------------------------------------- .\" * Define some portability stuff .\" ----------------------------------------------------------------- .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .\" http://bugs.debian.org/507673 .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" ----------------------------------------------------------------- .\" * set default formatting .\" ----------------------------------------------------------------- .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .\" ----------------------------------------------------------------- .\" * MAIN CONTENT STARTS HERE * .\" ----------------------------------------------------------------- .SH "NAME" gt-hop \- Cognate sequence\-based homopolymer error correction\&. .SH "SYNOPSIS" .sp \fBgt hop\fR \- \-c \-map \-reads [options\&...] .SH "DESCRIPTION" .PP \fB\-c\fR [\fIstring\fR] .RS 4 cognate sequence (encoded using gt encseq encode) .RE .PP \fB\-map\fR [\fIstring\fR] .RS 4 mapping of reads to the cognate sequence it must be in SAM/BAM format, and sorted by coordinate (can be prepared e\&.g\&. using: samtools sort) .RE .PP \fB\-sam\fR [\fIyes|no\fR] .RS 4 mapping file is SAM default: BAM .RE .PP \fB\-aggressive\fR [\fIyes|no\fR] .RS 4 correct as much as possible .RE .PP \fB\-moderate\fR [\fIyes|no\fR] .RS 4 mediate between sensitivity and precision .RE .PP \fB\-conservative\fR [\fIyes|no\fR] .RS 4 correct only most likely errors .RE .PP \fB\-expert\fR [\fIyes|no\fR] .RS 4 manually select correction criteria .RE .PP \fB\-reads\fR .RS 4 uncorrected read file(s) in FastQ format; the corrected reads are output in the currect working directory in files which are named as the input files, each prepended by a prefix (see \-outprefix option) \-reads allows one to output the reads in the same order as in the input and is mandatory if the SAM contains more than a single primary alignment for each read (e\&.g\&. output of bwasw) see also \-o option as an alternative .RE .PP \fB\-outprefix\fR [\fIstring\fR] .RS 4 prefix for output filenames (corrected reads)when \-reads is specified the prefix is prepended to each input filename (default: hop_) .RE .PP \fB\-o\fR [\fIstring\fR] .RS 4 output file for corrected reads (see also \-reads/\-outprefix) if \-o is used, reads are output in a single file in the order they are found in the SAM file (which usually differ from the original order) this will only work if the reads were aligned with a software which only includes 1 alignment for each read (e\&.g\&. bwa) (default: undefined) .RE .PP \fB\-hmin\fR [\fIvalue\fR] .RS 4 minimal homopolymer length in cognate sequence (default: 3) .RE .PP \fB\-read\-hmin\fR [\fIvalue\fR] .RS 4 minimal homopolymer length in reads (default: 2) .RE .PP \fB\-qmax\fR [\fIvalue\fR] .RS 4 maximal average quality of homopolymer in a read (default: 120) .RE .PP \fB\-altmax\fR [\fIvalue\fR] .RS 4 max support of alternate homopol\&. length; e\&.g\&. 0\&.8 means: do not correct any read if homop\&. length in more than 80%% of the reads has the same value, different from the cognate if altmax is set to 1\&.0 reads are always corrected (default: 0\&.800000) .RE .PP \fB\-cogmin\fR [\fIvalue\fR] .RS 4 min support of cognate sequence homopol\&. length; e\&.g\&. 0\&.1 means: do not correct any read if cognate homop\&. length is not present in at least 10%% of the reads if cogmin is set to 0\&.0 reads are always corrected .RE .PP \fB\-mapqmin\fR [\fIvalue\fR] .RS 4 minimal mapping quality (default: 21) .RE .PP \fB\-covmin\fR [\fIvalue\fR] .RS 4 minimal coverage; e\&.g\&. 5 means: do not correct any read if coverage (number of reads mapped over whole homopolymer) is less than 5 if covmin is set to 1 reads are always corrected (default: 1) .RE .PP \fB\-allow\-muliple\fR [\fIyes|no\fR] .RS 4 allow multiple corrections in a read (default: no) .RE .PP \fB\-clenmax\fR [\fIvalue\fR] .RS 4 maximal correction length default: unlimited .RE .PP \fB\-ann\fR [\fIstring\fR] .RS 4 annotation of cognate sequence it must be sorted by coordinates on the cognate sequence (this can be e\&.g\&. done using: gt gff3 \-sort) if \-ann is used, corrections will be limited to homopolymers startingor ending inside the feature type indicated by \-ft optionformat: sorted GFF3 (default: undefined) .RE .PP \fB\-ft\fR [\fIstring\fR] .RS 4 feature type to use when \-ann option is specified (default: CDS) .RE .PP \fB\-v\fR [\fIyes|no\fR] .RS 4 be verbose (default: no) .RE .PP \fB\-help\fR .RS 4 display help for basic options and exit .RE .PP \fB\-help+\fR .RS 4 display help for all options and exit .RE .PP \fB\-version\fR .RS 4 display version information and exit .RE .sp Correction mode: .sp One of the options \fI\-aggressive\fR, \fI\-moderate\fR, \fI\-conservative\fR or \fI\-expert\fR must be selected\&. .sp The \fI\-aggressive\fR, \fI\-moderate\fR and \fI\-conservative\fR modes are presets of the criteria by which it is decided if an observed discrepancy in homopolymer length between cognate sequence and a read shall be corrected or not\&. A description of the single criteria is provided by using the \fI\-help+\fR\*(Aq option\&. The presets are equivalent to the following settings: .sp .if n \{\ .RS 4 .\} .nf \-aggressive \-moderate \-conservative \-hmin 3 3 3 \-read\-hmin 1 1 2 \-altmax 1\&.00 0\&.99 0\&.80 \-refmin 0\&.00 0\&.00 0\&.10 \-mapqmin 0 10 21 \-covmin 1 1 1 \-clenmax unlimited unlimited unlimited \-allow\-multiple yes yes no .fi .if n \{\ .RE .\} .sp The aggressive mode tries to maximize the sensitivity, the conservative mode to minimize the false positives\&. An even more conservative set of corrections can be achieved using the \fI\-ann\fR option (see \fI\-help+\fR)\&. .sp The \fI\-expert\fR mode allows one to manually set each parameter; the default values are the same as in the \fI\-conservative\fR mode\&. .sp (Finally, for evaluation purposes only, the \fI\-state\-of\-truth\fR mode can be used: this mode assumes that the sequenced genome has been specified as cognate sequence and outputs an ideal list of corrections\&.) .SH "REPORTING BUGS" .sp Report bugs to https://github\&.com/genometools/genometools/issues\&.