Llama.cpp (LLaMA C++) Download

Llama.cpp (LLaMA C++) is a lightweight, high-performance implementation designed to run large language models locally on your own machine. It enables fast inference with minimal setup, making it ideal for developers, scientists, researches and even enthusiasts who want to have control over their AI workflows without relying on cloud services.

Windows

Available for CPU, CUDA, Vulkan and SYCL.

Linux

Available for CPU and Vulkan builds.

macOS

Available for Apple Silicon and Intel chips. You can also install it via Homebrew with the below command:

brew install llama.cpp