table of contents
| LLAMA-BENCH(1) | User Commands | LLAMA-BENCH(1) | 
NAME¶
llama-bench - llama-bench
DESCRIPTION¶
load_backend: loaded CPU backend from /usr/lib/x86_64-linux-gnu/ggml/backends0/libggml-cpu-icelake.so usage: obj-x86_64-linux-gnu/bin/llama-bench [options]
options:¶
-h, --help
- --numa <distribute|isolate|numactl>
- numa mode (default: disabled)
- -r, --repetitions <n>
- number of times to repeat each test (default: 5)
- --prio <-1|0|1|2|3>
- process/thread priority (default: 0)
- --delay <0...N> (seconds)
- delay between each test (default: 0)
- -o, --output <csv|json|jsonl|md|sql>
- output format printed to stdout (default: md)
-oe, --output-err <csv|json|jsonl|md|sql> output format printed to stderr (default: none)
- -v, --verbose
- verbose output
- --progress
- print test progress indicators
- --no-warmup
- skip warmup runs before benchmarking
test parameters:¶
- -m, --model <filename>
- (default: models/7B/ggml-model-q4_0.gguf)
- -p, --n-prompt <n>
- (default: 512)
- -n, --n-gen <n>
- (default: 128)
- -pg <pp,tg>
- (default: )
- -d, --n-depth <n>
- (default: 0)
- -b, --batch-size <n>
- (default: 2048)
- -ub, --ubatch-size <n>
- (default: 512)
- -ctk, --cache-type-k <t>
- (default: f16)
- -ctv, --cache-type-v <t>
- (default: f16)
- -dt, --defrag-thold <f>
- (default: -1)
- -t, --threads <n>
- (default: 6)
- -C, --cpu-mask <hex,hex>
- (default: 0x0)
- --cpu-strict <0|1>
- (default: 0)
- --poll <0...100>
- (default: 50)
- -ngl, --n-gpu-layers <n>
- (default: 99)
- -sm, --split-mode <none|layer|row>
- (default: layer)
- -mg, --main-gpu <i>
- (default: 0)
- -nkvo, --no-kv-offload <0|1>
- (default: 0)
- -fa, --flash-attn <0|1>
- (default: 0)
- -mmp, --mmap <0|1>
- (default: 1)
- -embd, --embeddings <0|1>
- (default: 0)
- -ts, --tensor-split <ts0/ts1/..>
- (default: 0)
- -ot --override-tensors <tensor name pattern>=<buffer type>;...
- (default: disabled)
- -nopo, --no-op-offload <0|1>
- (default: 0)
Multiple values can be given for each parameter by separating them with ',' or by specifying the parameter multiple times. Ranges can be given as 'first-last' or 'first-last+step' or 'first-last*mult'.
| August 2025 | debian |