table of contents
| LLAMA-RUN(1) | User Commands | LLAMA-RUN(1) | 
NAME¶
llama-run - llama-run
DESCRIPTION¶
Error: Failed to parse arguments. Description:
- Runs a llm
Usage:¶
- llama-run [options] model [prompt]
OPTIONS¶
-c, --context-size <value>
- Context size (default: 2048)
--chat-template-file <path>
- Path to the file containing the chat template to use with the model. Only supports jinja templates and implicitly sets the --jinja flag.
--jinja
- Use jinja templating for the chat template of the model
-n, -ngl, --ngl <value>
- Number of GPU layers (default: 0)
--temp <value>
- Temperature (default: 0.8)
-t, --threads <value>
- Number of threads to use during generation (default: 4)
-v, --verbose, --log-verbose
- Set verbosity level to infinity (i.e. log all messages, useful for debugging)
-h, --help
- Show help message
Commands:¶
- model
- Model is a string with an optional prefix of huggingface:// (hf://), modelscope:// (ms://), ollama://, https:// or file://. If no protocol is specified and a file exists in the specified path, file:// is assumed, otherwise if a file does not exist in the specified path, ollama:// is assumed. Models that are being pulled are downloaded with .partial extension while being downloaded and then renamed as the file without the .partial extension when complete.
EXAMPLES¶
- llama-run llama3 llama-run ollama://granite-code llama-run ollama://smollm:135m llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf llama-run ms://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf llama-run modelscope://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf llama-run https://example.com/some-file1.gguf llama-run some-file2.gguf llama-run file://some-file3.gguf llama-run --ngl 999 some-file4.gguf llama-run --ngl 999 some-file5.gguf Hello World
| August 2025 | debian |