NAME¶

alignment-thin - Remove sequences or columns from an alignment.

SYNOPSIS¶

alignment-thin alignment-file [OPTIONS]

Remove sequences or columns from an alignment.

-p arg, –protect arg: Sequences that cannot be removed (comma-separated).
-k arg, –keep arg: Remove sequences not in comma-separated list arg.
-r arg, –remove arg: Remove sequences in comma-separated list arg.
-l arg, –longer-than arg: Remove sequences not longer than arg.
-s arg, –shorter-than arg: Remove sequences not shorter than arg.
-c arg, –cutoff arg: Remove similar sequences with #mismatches < cutoff.
-d arg, –down-to arg: Remove similar sequences down to arg sequences.
–remove-crazy arg: Remove arg outlier sequences – defined as sequences that are missing too many conserved sites.
–conserved arg (=0.75): Fraction of sequences that must contain a letter for it to be considered conserved.

-K arg, –keep-columns arg: Keep columns from this sequence
-m arg, –min-letters arg: Remove columns with fewer than arg letters.
-u arg, –remove-unique arg: Remove insertions in a single sequence if longer than arg letters
-e, –erase-empty-columns: Remove columns with no characters (all gaps).

-S, –sort: Sort partially ordered columns to group similar gaps.
-L, –show-lengths: Just print out sequence lengths.
-N, –show-names: Just print out sequence lengths.
-F arg, –find-dups arg: For each sequence, find the closest other sequence.

Remove columns without a minimum number of letters:

% alignment-thin --min-letters=5 file.fasta > file-thinned.fasta

Remove sequences by name:

% alignment-thin --remove=seq1,seq2 file.fasta > file2.fasta

% alignment-thin --keep=seq1,seq2   file.fasta > file2.fasta

Remove short sequences:

% alignment-thin --longer-than=250 file.fasta > file-long.fasta

Remove similar sequences with <= 5 differences from the closest other sequence:

% alignment-thin --cutoff=5 file.fasta > more-than-5-differences.fasta

Remove similar sequences until we have the right number of sequences:

% alignment-thin --down-to=30 file.fasta > file-30taxa.fasta

Remove dissimilar sequences that are missing conserved columns:

% alignment-thin --remove-crazy=10 file.fasta > file2.fasta

Protect some sequences from being removed:

% alignment-thin --down-to=30 file.fasta --protect=seq1,seq2 > file2.fasta

% alignment-thin --down-to=30 file.fasta --protect=@filename > file2.fasta

Benjamin Redelings.

Feb 2018

Source file:	alignment-thin.1.en.gz (from bali-phy 4.0~beta16+dfsg-1)
Source last updated:	2024-11-25T19:44:19Z
Converted to HTML:	2026-02-10T15:03:42Z