Scroll to navigation

LLAMA-TOKENIZE(1) User Commands LLAMA-TOKENIZE(1)

NAME

llama-tokenize - llama-tokenize

DESCRIPTION

usage: obj-x86_64-linux-gnu/bin/llama-tokenize [options]

The tokenize program tokenizes a prompt using a given model, and prints the resulting tokens to standard output.

It needs a model file, a prompt, and optionally other flags to control the behavior of the tokenizer.

The possible options are:
print this help and exit
path to model.
if given, only print numerical token IDs, and not token strings. The output format looks like [1, 2, 3], i.e. parseable by Python.

-f PROMPT_FNAME, --file PROMPT_FNAME read prompt from a file.

read prompt from the argument.
read prompt from standard input.
do not ever add a BOS token to the prompt, even if normally the model uses a BOS token.
do not escape input (such as \n, \t, etc.).
do not parse control tokens.
disable logs. Makes stderr quiet when loading the model.
print the total number of tokens.
August 2025 debian