Scroll to navigation

LLAMA-RUN(1) User Commands LLAMA-RUN(1)

NAME

llama-run - llama-run

DESCRIPTION

Error: Failed to parse arguments. Description:

Runs a llm

Usage:

llama-run [options] model [prompt]

OPTIONS

-c, --context-size <value>

Context size (default: 2048)

--chat-template-file <path>

Path to the file containing the chat template to use with the model. Only supports jinja templates and implicitly sets the --jinja flag.

--jinja

Use jinja templating for the chat template of the model

-n, -ngl, --ngl <value>

Number of GPU layers (default: 0)

--temp <value>

Temperature (default: 0.8)

-t, --threads <value>

Number of threads to use during generation (default: 4)

-v, --verbose, --log-verbose

Set verbosity level to infinity (i.e. log all messages, useful for debugging)

-h, --help

Show help message

Commands:

model
Model is a string with an optional prefix of huggingface:// (hf://), modelscope:// (ms://), ollama://, https:// or file://. If no protocol is specified and a file exists in the specified path, file:// is assumed, otherwise if a file does not exist in the specified path, ollama:// is assumed. Models that are being pulled are downloaded with .partial extension while being downloaded and then renamed as the file without the .partial extension when complete.

EXAMPLES

llama-run llama3 llama-run ollama://granite-code llama-run ollama://smollm:135m llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf llama-run ms://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf llama-run modelscope://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf llama-run https://example.com/some-file1.gguf llama-run some-file2.gguf llama-run file://some-file3.gguf llama-run --ngl 999 some-file4.gguf llama-run --ngl 999 some-file5.gguf Hello World
August 2025 debian