R Interface to llama.cpp with Runtime Library Loading

Provides R bindings to the llama.cpp library for running large language models locally. This package uses an innovative lightweight architecture where the C++ backend library is downloaded at runtime rather than bundled with the package, enabling zero-configuration AI inference in R with enterprise-grade performance.

Details

The localLLM package brings state-of-the-art language models to R users through a carefully designed four-layer architecture that combines ease of use with high performance.

## Quick Start 1. Install the R package: install.packages("localLLM") 2. Download backend library: install_localLLM() 3. Start generating text: quick_llama("Hello, how are you?")

## Key Features

Zero Configuration: One-line setup with automatic model downloading
High Performance: Native C++ inference engine with GPU support
Cross Platform: Pre-compiled binaries for Windows, macOS, and Linux
Memory Efficient: Smart caching and memory management
Production Ready: Robust error handling and comprehensive documentation

## Architecture Overview The package uses a layered design:

High-Level API: quick_llama for simple text generation
Mid-Level API: model_load, generate for detailed control
Low-Level API: Direct access to tokenization and context management
C++ Backend: llama.cpp engine with dynamic loading

## Main Functions

install_localLLM - Download and install backend library
quick_llama - High-level text generation (recommended for beginners)
model_load - Load GGUF models with smart caching
context_create - Create inference contexts
generate - Generate text with full parameter control
tokenize / detokenize - Text <-> Token conversion
apply_chat_template - Format conversations for chat models

## Example Workflows

### Basic Text Generation


# Simple one-liner
response <- quick_llama("Explain quantum computing")

# With custom parameters
creative_text <- quick_llama("Write a poem about AI",
                             temperature = 0.9,
                             max_tokens = 150)

### Advanced Usage with Custom Models


# Load your own model
model <- model_load("path/to/your/model.gguf")
ctx <- context_create(model, n_ctx = 4096)

# Direct text generation with auto-tokenization
output <- generate(ctx, "The future of AI is", max_tokens = 100)

### Batch Processing


# Process multiple prompts efficiently
prompts <- c("Summarize AI trends", "Explain machine learning", "What is deep learning?")
responses <- quick_llama(prompts)

## Supported Model Formats The package works with GGUF format models from various sources:

Hugging Face Hub (automatic download)
Local .gguf files
Custom quantized models
Ollama-compatible models

## Performance Tips

Use n_gpu_layers = -1 to fully utilize GPU acceleration
Set n_threads to match your CPU cores for optimal performance
Use larger n_ctx values for longer conversations
Enable use_mlock for frequently used models to prevent swapping

References

https://github.com/EddieYang211/localLLM

Author

Eddie Yang and Yaosheng Xu <xu2009@purdue.edu>

R Interface to llama.cpp with Runtime Library Loading

Details

References

See also

Author