llama.cpp

LLM inference in C/C++ with minimal dependencies

C+

Score: 63/100

Type

Execution

aot

Interface

cli

About

llama.cpp is the de facto standard inference engine for running LLMs locally. Written in C/C++, it's designed for minimal dependencies and maximum portability. Supports GGUF format, extensive quantization options, and multiple backends including CUDA, Metal, ROCm, Vulkan, and CPU with AVX/AVX2/AVX512.

Performance

100ms

Cold Start

50MB

Base Memory

10ms

Startup Overhead

✓ Last Verified

Date: Jan 18, 2026

Version: b4604

Method: version check

brew install + llama-cli --version verified

Languages

CC++

Details

Isolation: process
Maturity: production
License: MIT

Links

Website GitHub Documentation