ExLlamaV2
Fast inference library for running LLMs locally on NVIDIA GPUs
D
Score: 47/100
Type
Execution
aot
Interface
sdk
About
ExLlamaV2 is a fast inference library for running LLMs locally on modern NVIDIA GPUs. It's optimized for low VRAM usage with its EXL2 quantization format, supports dynamic batching, and offers excellent tokens/second performance.
Performance
1000ms
Cold Start
300MB
Base Memory
200ms
Startup Overhead
✓ Last Verified
Date: Jan 18, 2026
Method: manual test
Manually verified
Languages
PythonC++CUDA
Details
- Isolation
- process
- Maturity
- stable
- License
- MIT