Scroll to navigation

alignment-thin(1) alignment-thin(1)

NAME

alignment-thin - Remove sequences or columns from an alignment.

SYNOPSIS

alignment-thin alignment-file [OPTIONS]

DESCRIPTION

Remove sequences or columns from an alignment.

GENERAL OPTIONS:

Print usage information.
Output more log messages on stderr.

SEQUENCE FILTERING OPTIONS:

Sequences that cannot be removed (comma-separated).
Remove sequences not in comma-separated list arg.
Remove sequences in comma-separated list arg.
Remove sequences not longer than arg.
Remove sequences not shorter than arg.
Remove similar sequences with #mismatches < cutoff.
Remove similar sequences down to arg sequences.
Remove arg outlier sequences -- defined as sequences that are missing too many conserved sites.
Fraction of sequences that must contain a letter for it to be considered conserved.

COLUMN FILTERING OPTIONS:

Keep columns from this sequence
Remove columns with fewer than arg letters.
Remove insertions in a single sequence if longer than arg letters
Remove columns with no characters (all gaps).

OUTPUT OPTIONS:

Sort partially ordered columns to group similar gaps.
Just print out sequence lengths.
Just print out sequence lengths.
For each sequence, find the closest other sequence.

EXAMPLES:

Remove columns without a minimum number of letters:

% alignment-thin --min-letters=5 file.fasta > file-thinned.fasta
    

Remove sequences by name:

% alignment-thin --remove=seq1,seq2 file.fasta > file2.fasta
    
% alignment-thin --keep=seq1,seq2   file.fasta > file2.fasta
    

Remove short sequences:

% alignment-thin --longer-than=250 file.fasta > file-long.fasta
    

Remove similar sequences with <= 5 differences from the closest other sequence:

% alignment-thin --cutoff=5 file.fasta > more-than-5-differences.fasta
    

Remove similar sequences until we have the right number of sequences:

% alignment-thin --down-to=30 file.fasta > file-30taxa.fasta
    

Remove dissimilar sequences that are missing conserved columns:

% alignment-thin --remove-crazy=10 file.fasta > file2.fasta
    

Protect some sequences from being removed:

% alignment-thin --down-to=30 file.fasta --protect=seq1,seq2 > file2.fasta
    
% alignment-thin --down-to=30 file.fasta --protect=@filename > file2.fasta
    

REPORTING BUGS:

BAli-Phy online help: <http://www.bali-phy.org/docs.php>.

Please send bug reports to <bali-phy-users@googlegroups.com>.

AUTHORS

Benjamin Redelings.

Feb 2018