Skip to contents

Provides R bindings to the llama.cpp library for running large language models locally. This package uses an innovative lightweight architecture where the C++ backend library is downloaded at runtime rather than bundled with the package, enabling zero-configuration AI inference in R with enterprise-grade performance.

Details

The localLLM package brings state-of-the-art language models to R users through a carefully designed four-layer architecture that combines ease of use with high performance.

## Quick Start 1. Install the R package: install.packages("localLLM") 2. Download backend library: install_localLLM() 3. Start generating text: quick_llama("Hello, how are you?")

## Key Features

  • Zero Configuration: One-line setup with automatic model downloading

  • High Performance: Native C++ inference engine with GPU support

  • Cross Platform: Pre-compiled binaries for Windows, macOS, and Linux

  • Memory Efficient: Smart caching and memory management

  • Production Ready: Robust error handling and comprehensive documentation

## Architecture Overview The package uses a layered design:

  • High-Level API: quick_llama for simple text generation

  • Mid-Level API: model_load, generate for detailed control

  • Low-Level API: Direct access to tokenization and context management

  • C++ Backend: llama.cpp engine with dynamic loading

## Main Functions

## Example Workflows

### Basic Text Generation


# Simple one-liner
response <- quick_llama("Explain quantum computing")

# With custom parameters
creative_text <- quick_llama("Write a poem about AI",
                             temperature = 0.9,
                             max_tokens = 150)

### Advanced Usage with Custom Models


# Load your own model
model <- model_load("path/to/your/model.gguf")
ctx <- context_create(model, n_ctx = 4096)

# Direct text generation with auto-tokenization
output <- generate(ctx, "The future of AI is", max_tokens = 100)

### Batch Processing


# Process multiple prompts efficiently
prompts <- c("Summarize AI trends", "Explain machine learning", "What is deep learning?")
responses <- quick_llama(prompts)

## Supported Model Formats The package works with GGUF format models from various sources:

  • Hugging Face Hub (automatic download)

  • Local .gguf files

  • Custom quantized models

  • Ollama-compatible models

## Performance Tips

  • Use n_gpu_layers = -1 to fully utilize GPU acceleration

  • Set n_threads to match your CPU cores for optimal performance

  • Use larger n_ctx values for longer conversations

  • Enable use_mlock for frequently used models to prevent swapping

Author

Eddie Yang and Yaosheng Xu <xu2009@purdue.edu>