NAME¶
- cmstat - display summary statistics for a CM
-
SYNOPSIS¶
cmstat [options] cmfile
DESCRIPTION¶
cmstat calculates and displays various types of statistics describing the
covariance models (CMs) in
cmfile.
CMs are profiles of RNA consensus sequence and secondary structure. A CM file is
produced by the
cmbuild program, from a given RNA sequence alignment of
known consensus structure. CM files can be calibrated with the
cmcalibrate program. Searches with calibrated CM files will include
E-values and will use appropriate filter thresholds for faster speed. It is
strongly recommended to calibrate your CM files before using
cmsearch.
CM calibration is described in more detail below and in chapters 5 and 6 of
the User's Guide.
cmstat is useful for determining statistics on
calibrated or non-calibrated CM files.
By default,
cmstat prints general statistics of the model and the
alignment it was built from. If the model(s) in
cmfile have been
calibrated with
cmcalibrate the
--le and
--ge options can
be used to print statistics on the the exponential tails used for calculating
E-values for the various possible search modes for locally (
--le ) and
globally configured (
--ge ) models in
cmsearch. If
cmfile is calibrated, HMM filter threshold statistics can be printed
for local inside CM search with
--lfi, for glocal inside CM search with
--gfi, for local CYK CM search with
--lfc, and for glocal CYK CM
search with
--gfc.
The
--search option causes
cmstat performing a timing experiment
for homology search. Statistics will be printed on how many kilobases can be
scanned per second for the different possible algorithms in
cmsearch.
OPTIONS¶
- -h
- Print brief help; includes version number and summary of
all options, including expert options.
- -g
- Turn on the 'glocal' alignment algorithm, local with
respect to the target database, and global with respect to the model. By
default, the model is configured for local alignment which is local with
respect to both the target sequence and the model.
- -m
- print general statistics on the models in cmfile and
the alignment it was built from.
- -Z <x>
- Calculate E-values as if the target database size was
<x> megabases (Mb). Ignore the actual size of the database.
This option is only valid if the CM file has been calibrated.
- --all
- print all available statistics
- --le
- print local E-value statistics. This option only works if
cmfile has been calibrated with cmcalibrate.
- --ge
- print glocal E-value statistics. This option only works if
cmfile has been calibrated with cmcalibrate.
- --beta <x>
- With the --search option set the beta parameter for
the query-dependent banding algorithm stages to <x> Beta is
the probability mass considered negligible during band calculation. The
default is 1E-7.
- --qdbfile <f>
- Save the query-dependent bands (QDBs) for each state to
file <f>
EXPERT OPTIONS¶
- --lfi
- Print the HMM filter thresholds for the range of relevant
CM bit score cutoffs for searches with locally configured models using the
Inside algorithm.
- --gfi
- Print the HMM filter thresholds for the range of relevant
CM bit score cutoffs for searches with globally configured models using
the Inside algorithm.
- --lfc
- Print the HMM filter thresholds for the range of relevant
CM bit score cutoffs for searches with locally configured models using the
CYK algorithm.
- --gfc
- Print the HMM filter thresholds for the range of relevant
CM bit score cutoffs for searches with globally configured models using
the CYK algorithm.
- -E <x>
- Print filter threshold statistics for an HMM filter if a
final CM E-value cutoff of <x> were to be used for a run of
cmsearch on 1 MB of sequence. (Remember cmsearch considers a
500,000 nucleotide sequence file as 1 MB of sequence because by default
both strands of the sequence are searched). The size 1 MB of sequence can
be changed to the size of a given database in file <f> using
the --seqfile <f> option.
- -T <x>
- Print filter threshold statistics for an HMM filter if a
final CM bit score cutoff of <x> were to be used for a run of
cmsearch.
- --nc
- Print filter threshold statistics for an HMM filter if a CM
bit score cutoff equal to the Rfam NC cutoff were to be used for a run of
cmsearch. The NC cutoff is defined as <x> bits in the
original Stockholm alignment the model was built from with a line: #=GF
NC <x> positioned before the sequence alignment. If such a line
existed in the alignment provided to cmbuild then the --nc
option will be available in cmstat If no such line existed when
cmbuild was run, then using the --nc option to cmstat
will cause the program to print an error message and exit.
- --ga
- Print filter threshold statistics for an HMM filter if a CM
bit score cutoff of Rfam GA cutoff value were to be used for a run of
cmsearch. The GA cutoff is defined in a stockholm file used to
build the model in the same way as the NC cutoff (see above), but with a
line: #=GF GA <x>
- --tc
- Print filter threshold statistics for an HMM filter if a CM
bit score cutoff equal to the Rfam TC cutoff value were to be used for a
run of cmsearch. The TC cutoff is defined in a stockholm file used
to build the model in the same way as the NC cutoff (see above), but with
a line: #=GF TC <x>
- --seqfile <x>
- With the -E option, use the database size of the
database in <x> instead of the default database size of 1 MB.
- --toponly
- In combination with --seqfile <x>
option, only consider the top strand of the database in <x>
instead of both strands.
--search perform an experiment to determine how fast the CM(s) can
search with different search algorithms.
- --cmL <n>
- With the --search option set the length of sequence
to search with CM algorithms as <n> residues. By default,
<n> is 1000.
- --hmmL <n>
- With the --search option set the length of sequence
to search with HMM algorithms as <n> residues. By default,
<n> is 100,000.
- --efile <f>
- Save a plot of cmsearch HMM filter E value cutoffs
versus CM E value cutoffs in xmgrace format to file <f>. This
option must be used in combination with --lfi, --gfi, --lfc or
--gfc.
- --bfile <f>
- Save a plot of cmsearch HMM bit score cutoffs versus
CM bit score cutoffs in xmgrace format to file <f>. This
option must be used in combination with --lfi, --gfi, --lfc or
--gfc.
- --sfile <f>
- Save a plot of cmsearch predicted survival fraction
from the HMM filter versus CM E value cutoff in xmgrace format to file
<f>. This option must be used in combination with --lfi,
--gfi, --lfc or --gfc.
- --xfile <f>
- Save a plot of 'xhmm' versus CM E value cutoff in xmgrace
format to file <f>
'xhmm' is the ratio of the number of dynamic programming calculations
predicted to be required for the HMM filter and the CM search of the
filter survivors versus the number of dynamic programming calculations for
the filter alone. So, an 'xhmm' value of 2.0 means the filter stage of a
search requires the same number of calculations as the CM search of the
filter survivors does. This option must be used in combination with
--lfi, --gfi, --lfc or --gfc.
- --afile <f>
- Save a plot of the predicted acceleration for an HMM
filtered search versus CM E value cutoff in xmgrace format to file
<f>. This option must be used in combination with --lfi,
--gfi, --lfc or --gfc.
- --bits
- With --efile, --sfile, --xfile, and --afile
use CM bit score cutoffs instead of CM E value cutoffs for the x-axis
values of the plot.
SEE ALSO¶
For complete documentation, see the User's Guide (Userguide.pdf) that came with
the distribution; or see the Infernal web page,
http://infernal.janelia.org/.
COPYRIGHT¶
Copyright (C) 2009 HHMI Janelia Farm Research Campus.
Freely distributed under the GNU General Public License (GPLv3).
See the file COPYING that came with the source for details on redistribution
conditions.
AUTHOR¶
Eric Nawrocki, Diana Kolbe, and Sean Eddy
HHMI Janelia Farm Research Campus
19700 Helix Drive
Ashburn VA 20147
http://selab.janelia.org/