Loads a GGUF format language model from local path or URL with intelligent caching and download management. Supports various model sources including Hugging Face, Ollama repositories, and direct HTTPS URLs. Models are automatically cached to avoid repeated downloads.
Usage
model_load(
model_path,
cache_dir = NULL,
n_gpu_layers = 0L,
use_mmap = TRUE,
use_mlock = FALSE,
show_progress = TRUE,
force_redownload = FALSE,
verify_integrity = TRUE,
check_memory = TRUE,
hf_token = NULL,
verbosity = 1L
)Arguments
- model_path
Path to local GGUF model file, URL, or cached model name. Supported URL formats:
https://- Direct download from web servers
If you previously downloaded a model through this package you can supply the cached file name (or a distinctive fragment of it) instead of the full path or URL. The loader will search the local cache and offer any matches.
- cache_dir
Custom directory for downloaded models (default: NULL uses system cache directory)
- n_gpu_layers
Number of transformer layers to offload to GPU (default: 0 for CPU-only). Set to -1 to offload all layers, or a positive integer for partial offloading
- use_mmap
Enable memory mapping for efficient model loading (default: TRUE). Disable only if experiencing memory issues
- use_mlock
Lock model in physical memory to prevent swapping (default: FALSE). Enable for better performance but requires sufficient RAM
- show_progress
Display download progress bar for remote models (default: TRUE)
- force_redownload
Force re-download even if cached version exists (default: FALSE). Useful for updating to newer model versions
- verify_integrity
Verify file integrity using checksums when available (default: TRUE)
- check_memory
Check if sufficient system memory is available before loading (default: TRUE)
- hf_token
Optional Hugging Face access token to set during model resolution. Defaults to the existing `HF_TOKEN` environment variable.
- verbosity
Control backend logging during model loading (default: 1L). Larger numbers print more detail:
0shows only errors,1adds warnings,2prints informational messages, and3enables the most verbose debug output.
Value
A model object (external pointer) that can be used with context_create,
tokenize, and other model functions
Examples
if (FALSE) { # \dontrun{
# Load local GGUF model
model <- model_load("/path/to/my_model.gguf")
# Download from Hugging Face and cache locally
hf_path = "https://huggingface.co/Qwen/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf"
model <- model_load(hf_path)
# Load with GPU acceleration (offload 10 layers)
model <- model_load("/path/to/model.gguf", n_gpu_layers = 10)
# Download to custom cache directory
model <- model_load(hf_path,
cache_dir = file.path(tempdir(), "my_models"))
# Force fresh download (ignore cache)
model <- model_load(hf_path,
force_redownload = TRUE)
# High-performance settings for large models
model <- model_load("/path/to/large_model.gguf",
n_gpu_layers = -1, # All layers on GPU
use_mlock = TRUE) # Lock in memory
# Load with minimal verbosity (quiet mode)
model <- model_load("/path/to/model.gguf", verbosity = 2L)
} # }