Provides R bindings to the llama.cpp library for running large language models locally. This package uses an innovative lightweight architecture where the C++ backend library is downloaded at runtime rather than bundled with the package, enabling zero-configuration AI inference in R with enterprise-grade performance.
Details
The localLLM package brings state-of-the-art language models to R users through a carefully designed four-layer architecture that combines ease of use with high performance.
## Quick Start
1. Install the R package: install.packages("localLLM")
2. Download backend library: install_localLLM()
3. Start generating text: quick_llama("Hello, how are you?")
## Key Features
Zero Configuration: One-line setup with automatic model downloading
High Performance: Native C++ inference engine with GPU support
Cross Platform: Pre-compiled binaries for Windows, macOS, and Linux
Memory Efficient: Smart caching and memory management
Production Ready: Robust error handling and comprehensive documentation
## Architecture Overview The package uses a layered design:
High-Level API:
quick_llamafor simple text generationMid-Level API:
model_load,generatefor detailed controlLow-Level API: Direct access to tokenization and context management
C++ Backend: llama.cpp engine with dynamic loading
## Main Functions
install_localLLM- Download and install backend libraryquick_llama- High-level text generation (recommended for beginners)model_load- Load GGUF models with smart cachingcontext_create- Create inference contextsgenerate- Generate text with full parameter controltokenize/detokenize- Text <-> Token conversionapply_chat_template- Format conversations for chat models
## Example Workflows
### Basic Text Generation
# Simple one-liner
response <- quick_llama("Explain quantum computing")
# With custom parameters
creative_text <- quick_llama("Write a poem about AI",
temperature = 0.9,
max_tokens = 150)### Advanced Usage with Custom Models
# Load your own model
model <- model_load("path/to/your/model.gguf")
ctx <- context_create(model, n_ctx = 4096)
# Direct text generation with auto-tokenization
output <- generate(ctx, "The future of AI is", max_tokens = 100)### Batch Processing
# Process multiple prompts efficiently
prompts <- c("Summarize AI trends", "Explain machine learning", "What is deep learning?")
responses <- quick_llama(prompts)## Supported Model Formats The package works with GGUF format models from various sources:
Hugging Face Hub (automatic download)
Local .gguf files
Custom quantized models
Ollama-compatible models
## Performance Tips
Use
n_gpu_layers = -1to fully utilize GPU accelerationSet
n_threadsto match your CPU cores for optimal performanceUse larger
n_ctxvalues for longer conversationsEnable
use_mlockfor frequently used models to prevent swapping