🏠

Local LLM Hub

Run LLMs on your own hardware. Find the right launcher, engine, and configuration for your setup.

NVIDIA-first. Mac-strong. Pick your GPU, get your stack.

🟢 NVIDIA 🍎 Mac 💻 CPU Only 🔴 AMD

Quick Start: Mac (16GB RAM (M1/M2/M3 Base))

Beginnergguf

ollama + llama.cpp

Quant: Q4_K_M

7B models (Q4) run smoothly. Metal supported

GUIgguf

lm-studio + llama.cpp

Quant: Q4_K_M

Mac native. Easy to use

Apple Nativemlx

mlx-community + mlx

Quant: 4bit

Apple optimized. Limited model support

→ Find your exact stack → See all Mac configurations

Filter by role:All Launcher (8)Engine (10)Format (2)Backend (4)Tool (1)

25 local LLM tools

Name	Role	Backends	Formats	Score	Install
Text Generation WebUI Gradio web UI for running Large Language Models	Launcher	cuda, metal, rocm...	gguf, gptq...	97 (A+)	🍎🐧🪟
llama.cpp LLM inference in C/C++ with minimal dependencies	Engine	cuda, metal, rocm...	gguf, ggml	93 (A+)	🍎🐧🪟
KoboldCpp Easy-to-use AI text generation with llama.cpp backend	Launcher	cuda, metal, rocm...	gguf, ggml	92 (A+)	🍎🐧🪟
Ollama Get up and running with large language models locally	Launcher	cuda, metal, rocm...	gguf	90 (A+)	🍎🐧🪟
Jan Open-source ChatGPT alternative that runs offline	Launcher	cuda, metal, cpu...	gguf	87 (A)	🍎🐧🪟
LM Studio Discover, download, and run local LLMs with a beautiful GUI	Launcher	cuda, metal, cpu...	gguf	87 (A)	🍎🐧🪟
Text Generation Inference Hugging Face's production-ready LLM serving solution	Engine	cuda, rocm	safetensors, gptq...	78 (B+)	🐧
vLLM High-throughput LLM serving with PagedAttention	Engine	cuda, rocm	safetensors, pytorch...	78 (B+)	🐧
LocalAI Free, open-source OpenAI alternative with local inference	Launcher	cuda, metal, rocm...	gguf, safetensors	77 (B+)	🍎🐧
llamafile Distribute and run LLMs with a single file	Engine	cuda, metal, cpu	gguf	75 (B+)	🍎🐧🪟
GPT4All Free-to-use, locally running, privacy-aware chatbot	Launcher	cuda, metal, cpu	gguf	72 (B)	🍎🐧🪟
Candle Minimalist ML framework for Rust with GPU support	Engine	cuda, metal, cpu	safetensors, gguf	70 (B)	🍎🐧
CTransformers Python bindings for GGML models with GPU acceleration	Engine	cuda, metal, cpu	gguf, ggml	70 (B)	🍎🐧
MLC LLM Machine Learning Compilation for LLMs	Engine	cuda, metal, rocm...	safetensors	70 (B)	🍎🐧
ONNX Runtime Cross-platform, high performance ML inferencing and training accelerator	Engine	cuda, cpu, metal	onnx	68 (B-)	🍎🐧🪟
ExLlamaV2 Fast inference library for running LLMs locally on NVIDIA GPUs	Engine	cuda	exl2, safetensors	65 (B-)	🐧
Open WebUI User-friendly WebUI for LLMs with Ollama/OpenAI support	Launcher	cuda, metal, rocm...	gguf	62 (C+)	🍎🐧
MLX Apple's array framework for machine learning on Apple Silicon	Engine	metal	mlx, safetensors	60 (C+)	🍎
LLM (Python CLI) Access large language models from the command-line	Tool	cuda, metal, cpu	gguf	52 (C-)	🍎🐧
GGUF GPT-Generated Unified Format for efficient LLM storage	Format	cuda, metal, rocm...	-	—
safetensors Safe and fast tensor serialization format by Hugging Face	Format	cuda, metal, rocm...	-	—
CUDA Runtime NVIDIA's parallel computing platform for GPU acceleration	Backend	cuda	-	—	🐧🪟
ROCm AMD's open-source GPU computing platform	Backend	rocm	-	—	🐧
Metal Apple's GPU framework for Apple Silicon acceleration	Backend	metal	-	—
Vulkan Cross-platform GPU API for compute and graphics	Backend	vulkan	-	—