Installation Issues
“Backend library is not loaded” error
Problem: You see the error “Backend library is not loaded. Please run install_localLLM() first.”
Solution: Run the installation function after loading the package:
This downloads the platform-specific backend library. You only need to do this once.
Installation fails on my platform
Problem: install_localLLM() fails to
download or install.
Solution: Check your platform is supported: - Windows (x86-64) - macOS (ARM64 / Apple Silicon) - Linux (x86-64)
If you’re on an unsupported platform, you may need to compile llama.cpp manually.
“Library already installed” but functions don’t work
Problem: install_localLLM() says the
library is installed, but generation fails.
Solution: Try reinstalling:
# Force reinstall
install_localLLM(force = TRUE)
# Verify installation
lib_is_installed()Model Download Issues
“Download lock” or “Another download in progress” error
Problem: A previous download was interrupted and left a lock file.
Solution: Clear the cache directory:
cache_root <- tools::R_user_dir("localLLM", which = "cache")
models_dir <- file.path(cache_root, "models")
unlink(models_dir, recursive = TRUE, force = TRUE)Then try downloading again.
Download times out or fails
Problem: Large model downloads fail partway through.
Solution: 1. Check your internet connection 2. Try a smaller model first 3. Download manually and load from local path:
# Download with browser or wget, then:
model <- model_load("/path/to/downloaded/model.gguf")“Model not found” when using cached model
Problem: You’re trying to load a model by name but it’s not found.
Solution: Check what’s actually cached:
cached <- list_cached_models()
print(cached)Use the exact filename or a unique substring that matches only one model.
Private Hugging Face model fails
Problem: Downloading a gated/private model fails with authentication error.
Solution: Set your Hugging Face token:
# Get token from https://huggingface.co/settings/tokens
set_hf_token("hf_your_token_here")
# Now download should work
model <- model_load("https://huggingface.co/private/model.gguf")Memory Issues
R crashes when loading a model
Problem: R crashes or freezes when calling
model_load().
Solution: The model is too large for your available RAM. Try:
- Use a smaller quantized model (Q4 instead of Q8)
- Free up memory by closing other applications
- Check model requirements:
hw <- hardware_profile()
cat("Available RAM:", hw$ram_gb, "GB\n")“Memory check failed” warning
Problem: localLLM warns about insufficient memory.
Solution: The safety check detected potential issues. Options:
Use a smaller model
-
Reduce context size:
ctx <- context_create(model, n_ctx = 512) # Smaller context If you’re sure you have enough memory, proceed when prompted
Context creation fails with large n_ctx
Problem: Creating a context with large
n_ctx fails.
Solution: Reduce the context size or use a smaller model:
# Instead of n_ctx = 32768, try:
ctx <- context_create(model, n_ctx = 4096)GPU Issues
GPU not being used
Problem: Generation is slow even with
n_gpu_layers = 999.
Solution: Check if GPU is detected:
hw <- hardware_profile()
print(hw$gpu)If no GPU is listed, the backend may not support your GPU. Currently supported: - NVIDIA GPUs (via CUDA) - Apple Silicon (Metal)
“CUDA out of memory” error
Problem: GPU runs out of memory during generation.
Solution: Reduce GPU layer count:
# Offload fewer layers to GPU
model <- model_load("model.gguf", n_gpu_layers = 20)Generation Issues
Output is garbled or nonsensical
Problem: The model produces meaningless text.
Solution: 1. Ensure you’re using a chat template:
messages <- list(
list(role = "user", content = "Your question")
)
prompt <- apply_chat_template(model, messages)
result <- generate(ctx, prompt)- The model file may be corrupted - redownload it
Output contains strange tokens like
<|eot_id|>
Problem: Output includes control tokens.
Solution: Use the clean = TRUE
parameter:
result <- generate(ctx, prompt, clean = TRUE)
# or
result <- quick_llama("prompt", clean = TRUE)Generation stops too early
Problem: Output is cut off before completion.
Solution: Increase max_tokens:
result <- quick_llama("prompt", max_tokens = 500)Same prompt gives different results
Problem: Running the same prompt twice gives different outputs.
Solution: Set a seed for reproducibility:
result <- quick_llama("prompt", seed = 42)With temperature = 0 (default), outputs should be
deterministic.
Performance Issues
Generation is very slow
Problem: Text generation takes much longer than expected.
Solutions:
-
Use GPU acceleration:
model <- model_load("model.gguf", n_gpu_layers = 999) Use a smaller model: Q4 quantization is faster than Q8
-
Reduce context size:
ctx <- context_create(model, n_ctx = 512) -
Use parallel processing for multiple prompts:
results <- quick_llama(c("prompt1", "prompt2", "prompt3"))
Parallel processing isn’t faster
Problem: generate_parallel() is no
faster than sequential generation.
Solution: Ensure n_seq_max is set
appropriately:
ctx <- context_create(
model,
n_ctx = 2048,
n_seq_max = 10 # Allow 10 parallel sequences
)Compatibility Issues
“GGUF format required” error
Problem: Trying to load a non-GGUF model.
Solution: localLLM only supports GGUF format. Convert your model or find a GGUF version on Hugging Face (search for “model-name gguf”).
Model works in Ollama but not localLLM
Problem: An Ollama model doesn’t work when loaded directly.
Solution: Use the Ollama integration:
# List available Ollama models
list_ollama_models()
# Load via Ollama reference
model <- model_load("ollama:model-name")Common Error Messages
| Error | Cause | Solution |
|---|---|---|
| “Backend library is not loaded” | Backend not installed | Run install_localLLM()
|
| “Invalid model handle” | Model was freed/invalid | Reload the model |
| “Invalid context handle” | Context was freed/invalid | Recreate the context |
| “Failed to open library” | Backend installation issue | Reinstall with install_localLLM(force = TRUE)
|
| “Download timeout” | Network issue or lock file | Clear cache and retry |
Getting Help
If you encounter issues not covered here:
-
Check the documentation:
?function_name -
Report bugs: Email xu2009@purdue.edu with:
- Your code
- The error message
- Output of
sessionInfo() - Output of
hardware_profile()
Quick Reference
# Check installation status
lib_is_installed()
# Check hardware
hardware_profile()
# List cached models
list_cached_models()
# List Ollama models
list_ollama_models()
# Clear model cache
cache_dir <- file.path(tools::R_user_dir("localLLM", "cache"), "models")
unlink(cache_dir, recursive = TRUE)
# Force reinstall backend
install_localLLM(force = TRUE)