llama.cpp

LLM inference in C/C++ with minimal dependencies

C+
Score: 63/100
Type
Execution
aot
Interface
cli

About

llama.cpp is the de facto standard inference engine for running LLMs locally. Written in C/C++, it's designed for minimal dependencies and maximum portability. Supports GGUF format, extensive quantization options, and multiple backends including CUDA, Metal, ROCm, Vulkan, and CPU with AVX/AVX2/AVX512.

Performance

100ms
Cold Start
50MB
Base Memory
10ms
Startup Overhead

Last Verified

Date: Jan 18, 2026
Version: b4604
Method: version check

brew install + llama-cli --version verified

Languages

CC++

Details

Isolation
process
Maturity
production
License
MIT

Links