Scroll to navigation
NAME¶
alf - Alignment free sequence comparison
SYNOPSIS¶
alf [OPTIONS] -i IN.FASTA [-o OUT.TXT]
DESCRIPTION¶
Compute pairwise similarity of sequences using alignment-free methods in
  IN.FASTA and write out tab-delimited matrix with pairwise scores to
  OUT.TXT.
OPTIONS¶
  - -h, --help
- Display the help message.
- --version
- Display version information.
- -v, --verbose
- When given, details about the progress are printed to the screen.
  - -i, --input-file INPUT_FILE
- Name of the multi-FASTA input file. Valid filetypes are: .sam[.*],
      .raw[.*], .gbk[.*], .frn[.*], .fq[.*],
      .fna[.*], .ffn[.*], .fastq[.*], .fasta[.*],
      .faa[.*], .fa[.*], .embl[.*], and .bam, where
      * is any of the following extensions: gz, bz2, and
      bgzf for transparent (de)compression.
- -o, --output-file OUTPUT_FILE
- Name of the file to which the tab-delimtied matrix with pairwise scores
      will be written to. Default is to write to stdout. Valid filetype is:
      .alf[.*], where * is any of the following extensions: tsv
      for transparent (de)compression.
General Algorithm Parameters:¶
  - -m, --method STRING
- Select method to use. One of N2, D2, D2Star, and
      D2z. Default: N2.
- -k, --k-mer-size INTEGER
- Size of the k-mers. Default: 4.
- -mo, --bg-model-order INTEGER
- Order of background Markov Model. Default: 1.
N2 Algorithm Parameters:¶
  - -rc, --reverse-complement STRING
- Which strand to score. Use both_strands to score both strands
      simultaneously. One of input, both_strands, mean,
      min, and max. Default: input.
- -mm, --mismatches INTEGER
- Number of mismatches, one of 0 and 1. When 1 is used,
      N2 uses the k-mer-neighbour with one mismatch. Default: 0.
- -mmw, --mismatch-weight DOUBLE
- Real-valued weight of counts for words with mismatches. Default:
      0.1.
- -kwf, --k-mer-weights-file OUTPUT_FILE
- Print k-mer weights for every sequence to this file if given. Valid
      filetype is: .txt.
  - For questions or comments, contact:
- Jonathan Goeke <goeke@molgen.mpg.de>
- Please reference the following publication if you used ALF or the N2
    method for your analysis:
- Jonathan Goeke, Marcel H. Schulz, Julia Lasserre, and Martin Vingron.
      Estimation of Pairwise Sequence Similarity of Mammalian Enhancers with
      Word Neighbourhood Counts. Bioinformatics (2012).
- Project Homepage:
- http://www.seqan.de/projects/alf