NAME¶

llama-bench - llama-bench

DESCRIPTION¶

load_backend: loaded CPU backend from /usr/lib/x86_64-linux-gnu/ggml/backends0/libggml-cpu-icelake.so usage: obj-x86_64-linux-gnu/bin/llama-bench [options]

options:¶

-h, --help

--numa <distribute|isolate|numactl>: numa mode (default: disabled)
-r, --repetitions <n>: number of times to repeat each test (default: 5)
--prio <-1|0|1|2|3>: process/thread priority (default: 0)
--delay <0...N> (seconds): delay between each test (default: 0)
-o, --output <csv|json|jsonl|md|sql>: output format printed to stdout (default: md)

-oe, --output-err <csv|json|jsonl|md|sql> output format printed to stderr (default: none)

-v, --verbose: verbose output
--progress: print test progress indicators
--no-warmup: skip warmup runs before benchmarking

test parameters:¶

-m, --model <filename>: (default: models/7B/ggml-model-q4_0.gguf)
-p, --n-prompt <n>: (default: 512)
-n, --n-gen <n>: (default: 128)
-pg <pp,tg>: (default: )
-d, --n-depth <n>: (default: 0)
-b, --batch-size <n>: (default: 2048)
-ub, --ubatch-size <n>: (default: 512)
-ctk, --cache-type-k <t>: (default: f16)
-ctv, --cache-type-v <t>: (default: f16)
-dt, --defrag-thold <f>: (default: -1)
-t, --threads <n>: (default: 6)
-C, --cpu-mask <hex,hex>: (default: 0x0)
--cpu-strict <0|1>: (default: 0)
--poll <0...100>: (default: 50)
-ngl, --n-gpu-layers <n>: (default: 99)
-sm, --split-mode <none|layer|row>: (default: layer)
-mg, --main-gpu <i>: (default: 0)
-nkvo, --no-kv-offload <0|1>: (default: 0)
-fa, --flash-attn <0|1>: (default: 0)
-mmp, --mmap <0|1>: (default: 1)
-embd, --embeddings <0|1>: (default: 0)
-ts, --tensor-split <ts0/ts1/..>: (default: 0)
-ot --override-tensors <tensor name pattern>=<buffer type>;...: (default: disabled)
-nopo, --no-op-offload <0|1>: (default: 0)

Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times. Ranges can be given as 'first-last' or 'first-last+step' or 'first-last*mult'.

August 2025

debian

Source file:	llama-bench.1.en.gz (from llama.cpp-tools 5882+dfsg-3)
Source last updated:	2025-08-27T05:01:15Z
Converted to HTML:	2025-10-06T08:49:27Z