NAME¶

unikmer - Toolkit for nucleic acid k-mer analysis

DESCRIPTION¶

unikmer - Toolkit for k-mer with taxonomic information

unikmer is a toolkit for nucleic acid k-mer analysis, providing functionsincluding set operation on k-mers optional with TaxIds but without countinformation.

K-mers are either encoded (k<=32) or hashed (arbitrary k) into 'uint64',and serialized in binary file with extension '.unik'.

TaxIds can be assigned when counting k-mers from genome sequences,and LCA (Lowest Common Ancestor) is computed during set opertionsincluding computing union, intersection, set difference, unique andrepeated k-mers.

Version: v0.19.0

Author: Wei Shen <shenwei356@gmail.com>

Documents : https://bioinf.shenwei.me/unikmerSource code: https://github.com/shenwei356/unikmer

Dataset (optional):

: Manipulating k-mers with TaxIds needs taxonomy file from e.g.,NCBI Taxonomy database, please extract "nodes.dmp", "names.dmp","delnodes.dmp" and "merged.dmp" from link below into ~/.unikmer/ ,ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz ,or some other directory, and later you can refer to using flag--data-dir or environment variable UNIKMER_DB.
: For GTDB, use 'taxonkit create-taxdump' to create NCBI-styletaxonomy dump files, or download from:
: https://github.com/shenwei356/gtdb-taxonomy
: Note that TaxIds are represented using uint32 and stored in 4 orless bytes, all TaxIds should be in the range of [1, 4294967295]

Usage:¶

: unikmer [command]

Available Commands:¶

: autocompletion Generate shell autocompletion script (bash|zsh|fish|powershell)common Find k-mers shared by most of multiple binary filesconcat Concatenate multiple binary files without removing duplicatescount Generate k-mers (sketch) from FASTA/Q sequencesdecode Decode encoded integer to k-mer textdiff Set difference of multiple binary filesdump Convert plain k-mer text to binary formatencode Encode plain k-mer text to integerfilter Filter out low-complexity k-mers (experimental)grep Search k-mers from binary fileshead Extract the first N k-mersinfo Information of binary filesinter Intersection of multiple binary fileslocate Locate k-mers in genomemerge Merge k-mers from sorted chunk filesnum Quickly inspect number of k-mers in binary filesrfilter Filter k-mers by taxonomic ranksample Sample k-mers from binary filessort Sort k-mers in binary files to reduce file sizesplit Split k-mers into sorted chunk filestsplit Split k-mers according to taxidunion Union of multiple binary filesuniqs Mapping k-mers back to genome and find unique subsequencesversion Print version information and check for updateview Read and output binary format to plain text

Flags:¶

-c, --compact: write compact binary file with little loss of speed
--compression-level int: compression level (default -1)
--data-dir string: directory containing NCBI Taxonomy files, including nodes.dmp,names.dmp, merged.dmp and delnodes.dmp (default "/home/nilesh/.unikmer")
-h, --help: help for unikmer
-I, --ignore-taxid: ignore taxonomy information
-i, --infile-list string: file of input files list (one file per line), if given, they areappended to files from cli arguments
--max-taxid uint32: for smaller TaxIds, we can use less space to store TaxIds. default valueis 1<<32-1, that's enough for NCBI Taxonomy TaxIds (default 4294967295)
-C, --no-compress: do not compress binary file (not recommended)
--nocheck-file: do not check binary file, when using process substitution or named pipe
-j, --threads int: number of CPUs to use (default 4)
--verbose: print verbose information

Use "unikmer [command] --help" for more information about a command.

August 2022

unikmer 0.19.0

Source file:	unikmer.1.en.gz (from unikmer 0.20.0-1+b9)
Source last updated:	2025-06-22T20:03:39Z
Converted to HTML:	2025-12-30T11:16:54Z