Run LLMs on your own hardware. Find the right launcher, engine, and configuration for your setup.
NVIDIA-first. Mac-strong. Pick your GPU, get your stack.
| Name | Role | Backends | Formats | Score | Install |
|---|---|---|---|---|---|
| Text Generation WebUI Gradio web UI for running Large Language Models | Launcher | cuda, metal, rocm... | gguf, gptq... | 97 (A+) | 🍎🐧🪟 |
| llama.cpp LLM inference in C/C++ with minimal dependencies | Engine | cuda, metal, rocm... | gguf, ggml | 93 (A+) | 🍎🐧🪟 |
| KoboldCpp Easy-to-use AI text generation with llama.cpp backend | Launcher | cuda, metal, rocm... | gguf, ggml | 92 (A+) | 🍎🐧🪟 |
| Ollama Get up and running with large language models locally | Launcher | cuda, metal, rocm... | gguf | 90 (A+) | 🍎🐧🪟 |
| Jan Open-source ChatGPT alternative that runs offline | Launcher | cuda, metal, cpu... | gguf | 87 (A) | 🍎🐧🪟 |
| LM Studio Discover, download, and run local LLMs with a beautiful GUI | Launcher | cuda, metal, cpu... | gguf | 87 (A) | 🍎🐧🪟 |
| Text Generation Inference Hugging Face's production-ready LLM serving solution | Engine | cuda, rocm | safetensors, gptq... | 78 (B+) | 🐧 |
| vLLM High-throughput LLM serving with PagedAttention | Engine | cuda, rocm | safetensors, pytorch... | 78 (B+) | 🐧 |
| LocalAI Free, open-source OpenAI alternative with local inference | Launcher | cuda, metal, rocm... | gguf, safetensors | 77 (B+) | 🍎🐧 |
| llamafile Distribute and run LLMs with a single file | Engine | cuda, metal, cpu | gguf | 75 (B+) | 🍎🐧🪟 |
| GPT4All Free-to-use, locally running, privacy-aware chatbot | Launcher | cuda, metal, cpu | gguf | 72 (B) | 🍎🐧🪟 |
| Candle Minimalist ML framework for Rust with GPU support | Engine | cuda, metal, cpu | safetensors, gguf | 70 (B) | 🍎🐧 |
| CTransformers Python bindings for GGML models with GPU acceleration | Engine | cuda, metal, cpu | gguf, ggml | 70 (B) | 🍎🐧 |
| MLC LLM Machine Learning Compilation for LLMs | Engine | cuda, metal, rocm... | safetensors | 70 (B) | 🍎🐧 |
| ONNX Runtime Cross-platform, high performance ML inferencing and training accelerator | Engine | cuda, cpu, metal | onnx | 68 (B-) | 🍎🐧🪟 |
| ExLlamaV2 Fast inference library for running LLMs locally on NVIDIA GPUs | Engine | cuda | exl2, safetensors | 65 (B-) | 🐧 |
| Open WebUI User-friendly WebUI for LLMs with Ollama/OpenAI support | Launcher | cuda, metal, rocm... | gguf | 62 (C+) | 🍎🐧 |
| MLX Apple's array framework for machine learning on Apple Silicon | Engine | metal | mlx, safetensors | 60 (C+) | 🍎 |
| LLM (Python CLI) Access large language models from the command-line | Tool | cuda, metal, cpu | gguf | 52 (C-) | 🍎🐧 |
| GGUF GPT-Generated Unified Format for efficient LLM storage | Format | cuda, metal, rocm... | - | — | |
| safetensors Safe and fast tensor serialization format by Hugging Face | Format | cuda, metal, rocm... | - | — | |
| CUDA Runtime NVIDIA's parallel computing platform for GPU acceleration | Backend | cuda | - | — | 🐧🪟 |
| ROCm AMD's open-source GPU computing platform | Backend | rocm | - | — | 🐧 |
| Metal Apple's GPU framework for Apple Silicon acceleration | Backend | metal | - | — | |
| Vulkan Cross-platform GPU API for compute and graphics | Backend | vulkan | - | — |