NAME¶
rate4site - detector of conserved amino-acid sites
SYNOPSIS¶
rate4site [OPTIONS] -s <MSA FILE>
DESCRIPTION¶
The rate of evolution is not constant among amino acid sites: some positions
evolve slowly and are commonly referred to as "conserved", while
others evolve rapidly and are referred to as "variable". The rate
variations correspond to different levels of purifying selection acting on
these sites. The purifying selection can be the result of geometrical
constraints on the folding of the protein into its 3D structure, constraints
at amino acid sites involved in enzymatic activity or in ligand binding or,
alternatively, at amino acid sites that take part in protein-protein
interactions. Rate4Site calculates the relative evolutionary rate at each site
using a probabilistic-based evolutionary model. This allows taking into
account the stochastic process underlying sequence evolution within protein
families and the phylogenetic tree of the proteins in the family. The
conservation score at a site corresponds to the site's evolutionary rate.
METHODOLOGY¶
The sole obligatory input to Rate4Site is an MSA file. The program then computes
a phylogenetic tree that is consistent with the available MSA (the user can
also input a pre-calculated tree). It then calculates the relative
conservation score for each site in the MSA. This is carried out using either
an empirical Bayesian method or a maximum likelihood method (Pupko et al.,
2002). The differences between the two methods are explained in details in
Mayrose et al (2004).
REFERENCES¶
- Mayrose, I., Graur, D., Ben-Tal, N., and Pupko, T. 2004. Comparison of
site-specific rate-inference methods: Bayesian methods are superior. Mol
Biol Evol 21: 1781-1791.
OPTIONS¶
- -s MSA_FILE
- The input sequence file name. The following formats are supported: Mase,
Molphy, Phylip, Clustal, Fasta
- -t
- The input tree file name (in Newick format)
- -o OUTPUT_FILE
- The results output file
- -a
- Reference sequence name in the MSA. The conservation scores are printed
based on the amino-acids in this sequence.
- -k
- The number of discrete Gamma categories
- -m
- Evolutionary model. The following amino-acids models are supported:
DAY (-md), JTT (-mj), REV (-mr), aaJC (-ma), LG (-Ml), WAG (-Mw) .
For nucleotides, the following models are supported: JC (-mn), HKY (-Mh), Tamura92 (-Mt), GTR (-Mg).
- -b
- Branch lengths optimization flag:
-bn = no Branch lengths optimization
-bh = optimization using a homogenous model (no among-site-rate-variation)
-bg = optimization using a Gamma model
- -i
- Rate inference method flag:
-Im = rates are inferred using the maximum likelihood method
-Ib = rates are inferred using the empirical Bayes method
- -z
- Tree constructing method
zj = Neighbor-joining tree with Jukes-Cantor distances
zn = Neighbor-joining tree with maximum likelihood distances
- -h
- Short help message
AUTHOR¶
Nir Ben-Tal <NirB@tauex.tau.ac.il>
SEE ALSO¶
- Main website
- <http://www.tau.ac.il/~itaymay/cp/rate4site.html>