Infernal Manual

NAME¶

cmstat - display summary statistics for a CM

SYNOPSIS¶

cmstat [options] cmfile

DESCRIPTION¶

cmstat calculates and displays various types of statistics describing the covariance models (CMs) in cmfile.

CMs are profiles of RNA consensus sequence and secondary structure. A CM file is produced by the cmbuild program, from a given RNA sequence alignment of known consensus structure. CM files can be calibrated with the cmcalibrate program. Searches with calibrated CM files will include E-values and will use appropriate filter thresholds for faster speed. It is strongly recommended to calibrate your CM files before using cmsearch. CM calibration is described in more detail below and in chapters 5 and 6 of the User's Guide. cmstat is useful for determining statistics on calibrated or non-calibrated CM files.

By default, cmstat prints general statistics of the model and the alignment it was built from. If the model(s) in cmfile have been calibrated with cmcalibrate the --le and --ge options can be used to print statistics on the the exponential tails used for calculating E-values for the various possible search modes for locally ( --le ) and globally configured ( --ge ) models in cmsearch. If cmfile is calibrated, HMM filter threshold statistics can be printed for local inside CM search with --lfi, for glocal inside CM search with --gfi, for local CYK CM search with --lfc, and for glocal CYK CM search with --gfc.

The --search option causes cmstat performing a timing experiment for homology search. Statistics will be printed on how many kilobases can be scanned per second for the different possible algorithms in cmsearch.

OPTIONS¶

-h: Print brief help; includes version number and summary of all options, including expert options.

-g: Turn on the 'glocal' alignment algorithm, local with respect to the target database, and global with respect to the model. By default, the model is configured for local alignment which is local with respect to both the target sequence and the model.

-m: print general statistics on the models in cmfile and the alignment it was built from.

-Z <x>: Calculate E-values as if the target database size was <x> megabases (Mb). Ignore the actual size of the database. This option is only valid if the CM file has been calibrated.

--all: print all available statistics

--le: print local E-value statistics. This option only works if cmfile has been calibrated with cmcalibrate.

--ge: print glocal E-value statistics. This option only works if cmfile has been calibrated with cmcalibrate.

--beta <x>: With the --search option set the beta parameter for the query-dependent banding algorithm stages to <x> Beta is the probability mass considered negligible during band calculation. The default is 1E-7.

--qdbfile <f>: Save the query-dependent bands (QDBs) for each state to file <f>

EXPERT OPTIONS¶

--lfi: Print the HMM filter thresholds for the range of relevant CM bit score cutoffs for searches with locally configured models using the Inside algorithm.

--gfi: Print the HMM filter thresholds for the range of relevant CM bit score cutoffs for searches with globally configured models using the Inside algorithm.

--lfc: Print the HMM filter thresholds for the range of relevant CM bit score cutoffs for searches with locally configured models using the CYK algorithm.

--gfc: Print the HMM filter thresholds for the range of relevant CM bit score cutoffs for searches with globally configured models using the CYK algorithm.

-E <x>: Print filter threshold statistics for an HMM filter if a final CM E-value cutoff of <x> were to be used for a run of cmsearch on 1 MB of sequence. (Remember cmsearch considers a 500,000 nucleotide sequence file as 1 MB of sequence because by default both strands of the sequence are searched). The size 1 MB of sequence can be changed to the size of a given database in file <f> using the --seqfile <f> option.

-T <x>: Print filter threshold statistics for an HMM filter if a final CM bit score cutoff of <x> were to be used for a run of cmsearch.

--nc: Print filter threshold statistics for an HMM filter if a CM bit score cutoff equal to the Rfam NC cutoff were to be used for a run of cmsearch. The NC cutoff is defined as <x> bits in the original Stockholm alignment the model was built from with a line: #=GF NC <x> positioned before the sequence alignment. If such a line existed in the alignment provided to cmbuild then the --nc option will be available in cmstat If no such line existed when cmbuild was run, then using the --nc option to cmstat will cause the program to print an error message and exit.

--ga: Print filter threshold statistics for an HMM filter if a CM bit score cutoff of Rfam GA cutoff value were to be used for a run of cmsearch. The GA cutoff is defined in a stockholm file used to build the model in the same way as the NC cutoff (see above), but with a line: #=GF GA <x>

--tc: Print filter threshold statistics for an HMM filter if a CM bit score cutoff equal to the Rfam TC cutoff value were to be used for a run of cmsearch. The TC cutoff is defined in a stockholm file used to build the model in the same way as the NC cutoff (see above), but with a line: #=GF TC <x>

--seqfile <x>: With the -E option, use the database size of the database in <x> instead of the default database size of 1 MB.

--toponly: In combination with --seqfile <x> option, only consider the top strand of the database in <x> instead of both strands.

--search perform an experiment to determine how fast the CM(s) can search with different search algorithms.

--cmL <n>: With the --search option set the length of sequence to search with CM algorithms as <n> residues. By default, <n> is 1000.

--hmmL <n>: With the --search option set the length of sequence to search with HMM algorithms as <n> residues. By default, <n> is 100,000.

--efile <f>: Save a plot of cmsearch HMM filter E value cutoffs versus CM E value cutoffs in xmgrace format to file <f>. This option must be used in combination with --lfi, --gfi, --lfc or --gfc.

--bfile <f>: Save a plot of cmsearch HMM bit score cutoffs versus CM bit score cutoffs in xmgrace format to file <f>. This option must be used in combination with --lfi, --gfi, --lfc or --gfc.

--sfile <f>: Save a plot of cmsearch predicted survival fraction from the HMM filter versus CM E value cutoff in xmgrace format to file <f>. This option must be used in combination with --lfi, --gfi, --lfc or --gfc.

--xfile <f>: Save a plot of 'xhmm' versus CM E value cutoff in xmgrace format to file <f>
'xhmm' is the ratio of the number of dynamic programming calculations predicted to be required for the HMM filter and the CM search of the filter survivors versus the number of dynamic programming calculations for the filter alone. So, an 'xhmm' value of 2.0 means the filter stage of a search requires the same number of calculations as the CM search of the filter survivors does. This option must be used in combination with --lfi, --gfi, --lfc or --gfc.

--afile <f>: Save a plot of the predicted acceleration for an HMM filtered search versus CM E value cutoff in xmgrace format to file <f>. This option must be used in combination with --lfi, --gfi, --lfc or --gfc.

--bits: With --efile, --sfile, --xfile, and --afile use CM bit score cutoffs instead of CM E value cutoffs for the x-axis values of the plot.

COPYRIGHT¶

Copyright (C) 2009 HHMI Janelia Farm Research Campus.
Freely distributed under the GNU General Public License (GPLv3).

See the file COPYING that came with the source for details on redistribution conditions.

AUTHOR¶

Eric Nawrocki, Diana Kolbe, and Sean Eddy
HHMI Janelia Farm Research Campus
19700 Helix Drive
Ashburn VA 20147
http://selab.janelia.org/

October 2009

Infernal 1.0.2

Source file:	cmstat.1.en.gz (from infernal 1.0.2-2)
Source last updated:	2011-09-27T15:41:55Z
Converted to HTML:	2017-06-07T16:48:47Z