table of contents
PFSEARCHV2(1) | General Commands Manual | PFSEARCHV2(1) |
NAME¶
pfsearchV2 - search a protein or DNA sequence library for sequence segments matching a profile
SYNOPSIS¶
pfsearchV2 [ -abflLrsuxyz ] [ profile-file | - ]
[ seq-library-file | - ] [C=#] [W=#]
DESCRIPTION¶
pfsearchV2 compares a query profile against a DNA or protein sequence library. The result is an unsorted list of profile-sequence matches written to the standard output. A variety of output formats containing different information can be specified via the options -a, -l, -L, -r, -u, -s, -x, -y, -z, and the command-line parameter C=#. profile-file contains a profile in PROSITE format. seq-library-file contains a sequence library in EMBL/SWISS-PROT format (assumed by default) or in Pearson/Fasta format (indicated by option -f). pfsearchV2 can be used as a filter if - is used instead of one of the input filenames.
OPTIONS¶
- -a
- Report optimal alignment scores for all sequences regardless of the cut-off value. This option simultaneously forces DISJOINT=UNIQUE.
- -b
- Search the complementary strands of DNA sequences as well.
- -f
- Input sequence-library is in Pearson/Fasta format.
- -l
- Indicate by number the highest cut-off level exceeded by the match score in the output list.
- -L
- Indicate by character string the highest cut-off level exceeded by the match score in the output list. Note that the generalized profile format includes a text string field to specify a name for a cut-off level. The -L option causes the program to display the first two characters of this text string (usually something like "!", "?", "??", etc.) at the beginning of each match description.
- -r
- Use raw scores rather than normalized scores for match selection. Normalized scores will not be listed in the output.
- -s
- List the sequences of the matched regions as well. The output will be a Pearson/Fasta-formatted sequence library.
- -u
- Forces DISJOINT=UNIQUE.
- -x
- List profile-sequence alignments in pftools PSA format.
- -y
- Display alignments between the profile and the matched sequence regions in a human-friendly format.
- -z
- Indicate starting and ending position of the matched profile range. The latter position will be given as a negative offset from the end of the profile. Thus the range [ 1, -1] means entire profile.
PARAMETERS¶
- C=#
- Cut-off value. Over-writes the level zero cut-off value specified in the profile. An integer argument is interpreted as a raw score value, a decimal argument as a normalized score value. An integer value forces option -r.
- W=#
- Output width. Output lines will be truncated after W characters. Default: W=132.
EXAMPLES¶
- (1)
- pfsearchV2 -f sh3.prf sh3.seq C=6.0
Searches the Pearson/Fasta-formatted protein sequence library sh3.seq for SH3 domains with a cut-off value of 6.0 normalized score units. sh3.seq contains 20 SH3 domain-containing protein sequences from SWISS-PROT release 32. sh3.prf contains the PROSITE entry SH3/PS50002.
- (2)
- pfsearchV2 -bx ecp.prf CVPBR322 | psa2msa -du |
readseq -p -fMSF > ecp.msf
Generates a multiple sequence alignment of potential E. coli promoters on both strands of plasmid pBR322. ecp.prf contains a profile for E. coli promoters. CVPBR322 contains EMBL entry J01749|CVPBR322. The result file ecp.msf can further be processed by GCG programs accepting MSF files as input.
See also manual pages of psa2msa.
AUTHOR¶
Philipp Bucher
Philipp.Bucher@isrec.unil.ch
February 1998 | pftools 2.1 |