NAME¶

run_tipp.py - an identification and phylogenetic profiling tool

DESCRIPTION¶

usage: run_tipp.py [-h] [-v] [-A N] [-P N] [-F N] [--distance DISTANCE]

: [-M DIAMETER] [-S DECOMP] [-p DIR] [-rt] [-o OUTPUT] [-d OUTPUT_DIR] [-c CONFIG] [-t TREE] [-r RAXML] [-a ALIGN] [-f FRAG] [-m MOLECULE] [--ignore-overlap] [-x N] [-cp CHCK_FILE] [-cpi N] [-seed N] [-R N] [-at N] [-D] [-pt N] [-PD N] [-tx TAXONOMY] [-txm MAPPING] [-adt TREE] [-C N]

This script runs the SEPP algorithm on an input tree, alignment, fragment file, and RAxML info file. It uses a reference dataset which has to be downloaded from https://obj.umiacs.umd.edu/tipp/tipp2-refpkg.tar.gz

If the local administrator has not set the path to this reference dataset in /etc/tipp/tipp.config, you should copy this file to ~/.tipp/ and put the path to the dataset in the reference section of the configuration file, see tipp.config(5).

optional arguments:¶

-h, --help: show this help message and exit
-v, --version: show program's version number and exit

DECOMPOSITION OPTIONS:¶

: These options determine the alignment decomposition size and taxon insertion size. If None is given, then the default is to align/place at 10% of total taxa. The alignment decomosition size must be less than the taxon insertion size.

-A N, --alignmentSize N: max alignment subset size of N [default: 10% of the total number of taxa or the placement subset size if given]
-P N, --placementSize N: max placement subset size of N [default: 10% of the total number of taxa or the alignment length (whichever bigger)]
-F N, --fragmentChunkSize N: maximum fragment chunk size of N. Helps controlling memory. [default: 20000]
--distance DISTANCE: minimum p-distance before stopping the decomposition[default: 1]
-M DIAMETER, --diameter DIAMETER: maximum tree diameter before stopping the decomposition[default: None]
-S DECOMP, --decomp_strategy DECOMP: decomposition strategy [default: using tree branch length]

OUTPUT OPTIONS:¶

: These options control output.

-p DIR, --tempdir DIR: Tempfile files will be written to DIR. Full-path required. [default: /tmp/sepp]
-rt, --remtemp: Remove template directory. [default: disabled]
-o OUTPUT, --output OUTPUT: output files with prefix OUTPUT. [default: output]
-d OUTPUT_DIR, --outdir OUTPUT_DIR: output to OUTPUT_DIR directory. full-path required. [default: .]

INPUT OPTIONS:¶

: These options control input. To run SEPP the following is required. A backbone tree (in newick format), a RAxML_info file (this is the file generated by RAxML during estimation of the backbone tree. Pplacer uses this info file to set model parameters), a backbone alignment file (in fasta format), and a fasta file including fragments. The input sequences are assumed to be DNA unless specified otherwise.

-c CONFIG, --config CONFIG: A config file, including options used to run SEPP. Options provided as command line arguments overwrite config file values for those options. [default: None]
-t TREE, --tree TREE: Input tree file (newick format) [default: None]
-r RAXML, --raxml RAXML: RAxML_info file including model parameters, generated by RAxML.[default: None]
-a ALIGN, --alignment ALIGN: Aligned fasta file [default: None]
-f FRAG, --fragment FRAG: fragment file [default: None]
-m MOLECULE, --molecule MOLECULE: Molecule type of sequences. Can be amino, dna, or rna [default: dna]
--ignore-overlap: When a query sequence has the same name as a backbone sequence, ignore the query sequences and keep the backbone sequence [default: False]

OTHER OPTIONS:¶

: These options control how SEPP is run

-x N, --cpu N: Use N cpus [default: number of cpus available on the machine]
-cp CHCK_FILE, --checkpoint CHCK_FILE: checkpoint file [default: no checkpointing]
-cpi N, --interval N: Interval (in seconds) between checkpoint writes. Has effect only with -cp provided. [default: 3600]
-seed N, --randomseed N: random seed number. [default: 297834]

TIPP OPTIONS:¶

: These arguments set settings specific to TIPP

-R N, --reference_pkg N: Use a pre-computed reference package [default: None]
-at N, --alignmentThreshold N: Enough alignment subsets are selected to reach a commulative probability of N. This should be a number between 0 and 1 [default: 0.95]
-D, --dist: Treat fragments as distribution
-pt N, --placementThreshold N: Enough placements are selected to reach a commulative probability of N. This should be a number between 0 and 1 [default: 0.95]
-PD N, --push_down N: Whether to classify based on children below or above insertion point. [default: True]
-tx TAXONOMY, --taxonomy TAXONOMY: A file describing the taxonomy. This is a commaseparated text file that has the following fields: taxon_id,parent_id,taxon_name,rank. If there are other columns, they are ignored. The first line is also ignored.
-txm MAPPING, --taxonomyNameMapping MAPPING: A comma-separated text file mapping alignment sequence names to taxonomic ids. Formats (each line): sequence_name,taxon_id. If there are other columns, they are ignored. The first line is also ignored.
-adt TREE, --alignmentDecompositionTree TREE: A newick tree file used for decomposing taxa into alignment subsets. [default: the backbone tree]
-C N, --cutoff N: Placement probability requirement to count toward the distribution. This should be a number between 0 and 1 [default: 0.0]

Source file:	run_tipp.py.1.en.gz (from tipp )
Source last updated:	2021-09-30T12:45:31Z
Converted to HTML:	2024-10-21T18:02:00Z