Scroll to navigation

UNIKMER(1) User Commands UNIKMER(1)

NAME

unikmer - Toolkit for nucleic acid k-mer analysis

DESCRIPTION

unikmer - Toolkit for k-mer with taxonomic information

unikmer is a toolkit for nucleic acid k-mer analysis, providing functionsincluding set operation on k-mers optional with TaxIds but without countinformation.

K-mers are either encoded (k<=32) or hashed (arbitrary k) into 'uint64',and serialized in binary file with extension '.unik'.

TaxIds can be assigned when counting k-mers from genome sequences,and LCA (Lowest Common Ancestor) is computed during set opertionsincluding computing union, intersection, set difference, unique andrepeated k-mers.

Version: v0.19.0

Author: Wei Shen <shenwei356@gmail.com>

Documents : https://bioinf.shenwei.me/unikmerSource code: https://github.com/shenwei356/unikmer

Dataset (optional):

Manipulating k-mers with TaxIds needs taxonomy file from e.g.,NCBI Taxonomy database, please extract "nodes.dmp", "names.dmp","delnodes.dmp" and "merged.dmp" from link below into ~/.unikmer/ ,ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz ,or some other directory, and later you can refer to using flag--data-dir or environment variable UNIKMER_DB.
For GTDB, use 'taxonkit create-taxdump' to create NCBI-styletaxonomy dump files, or download from:
https://github.com/shenwei356/gtdb-taxonomy
Note that TaxIds are represented using uint32 and stored in 4 orless bytes, all TaxIds should be in the range of [1, 4294967295]

Usage:

unikmer [command]

Available Commands:

autocompletion Generate shell autocompletion script (bash|zsh|fish|powershell)common Find k-mers shared by most of multiple binary filesconcat Concatenate multiple binary files without removing duplicatescount Generate k-mers (sketch) from FASTA/Q sequencesdecode Decode encoded integer to k-mer textdiff Set difference of multiple binary filesdump Convert plain k-mer text to binary formatencode Encode plain k-mer text to integerfilter Filter out low-complexity k-mers (experimental)grep Search k-mers from binary fileshead Extract the first N k-mersinfo Information of binary filesinter Intersection of multiple binary fileslocate Locate k-mers in genomemerge Merge k-mers from sorted chunk filesnum Quickly inspect number of k-mers in binary filesrfilter Filter k-mers by taxonomic ranksample Sample k-mers from binary filessort Sort k-mers in binary files to reduce file sizesplit Split k-mers into sorted chunk filestsplit Split k-mers according to taxidunion Union of multiple binary filesuniqs Mapping k-mers back to genome and find unique subsequencesversion Print version information and check for updateview Read and output binary format to plain text

Flags:

write compact binary file with little loss of speed
compression level (default -1)
directory containing NCBI Taxonomy files, including nodes.dmp,names.dmp, merged.dmp and delnodes.dmp (default "/home/nilesh/.unikmer")
help for unikmer
ignore taxonomy information
file of input files list (one file per line), if given, they areappended to files from cli arguments
for smaller TaxIds, we can use less space to store TaxIds. default valueis 1<<32-1, that's enough for NCBI Taxonomy TaxIds (default 4294967295)
do not compress binary file (not recommended)
do not check binary file, when using process substitution or named pipe
number of CPUs to use (default 4)
print verbose information

Use "unikmer [command] --help" for more information about a command.

August 2022 unikmer 0.19.0