Getting Started with LLaMA.cpp (Complete Installation Guide)
Llama.cpp is a high-performance C/C++ implementation to run Large Language Models locally. It focuses on efficient inference on any consumer hardware enabling you to run models on CPUs and GPUs without requiring large cloud infrastructure.
Unlike machine learning frameworks, Llama.cpp is designed to:
1. Run on consumer-grade hardware
2. Work without Python
3. Provide high-performance inference
4. Support quantized models for low memory usage
Who is this guide for?
Llama.cpp is not complex to Download and Install. The below guide walks you through everything you need to know to Download, Install and setup Llama.cpp on your Mac, Linux and Windows PC. You don’t need a lot of knowledge to be able to setup Llama.cpp, the below guide is suitable for all technical levels, however some familiarity with command-line tools will be helpful. This guide is designed for anyone who wants to run large language models locally for various reasons including experimentation, development or production use.

Developers and Engineers
If you are a software developer or an engineer looking to integrate AI into applications without relying on cloud services, this guide will help you to build llama.cpp from the original source across different platforms so you can run models locally for development and testing. You can also use the built-in server to create API-driven applications and also integrate with tools such as llama.cpp Python. This is great if you are building AI-powered web apps, local-first tools or want privacy with your prompts.
AI Enthusiasts and Hobbyists
If you want to explore local AI for personal use, this guide will walk you through setting up your first mode, running prompts and do interactive chats. You will also experiment with different models and quantizations along with understanding the performance trade-offs on your hardware.
Researches and Experimenters
If you are exploring local AI for personal use and want to setup your first model, run prompts and have an interactive chat this is the guide for you! You can absolutely run different modems and even experiment with quantizations and understand the different performance trade-offs on your hardware. You don’t need machine learning knowledge or skills, just a willingness to work the terminal, that is it! This is also why so many people love it!
Privacy-focused Users
Running models locally also means that you are in control of your data privacy. As you are running the models locally, no data is sent to external APIs, you can work fully offline with your favorite LLM and the sensitive data stays on your machine. Llama.cpp is also open-source, free and supports multiple models. This is only a few of the vast variety of features it comes with.
Contents
Prerequisites
Before Installing Llama.cpp and following this guide, ensure you can meeting the requirements:
Basic Requirements
- Git, so you can clone the repository
- CMake 3.18 or higher is better
- A C/C++ compiler
- At least 4 GB of RAM so you can run small models
Optional requirements for Hardware Acceleration
- Metal on Apple Silicon
- CUDA on Nvidia GPUs
- OpenBLAS or BLAS on CPUs
- Vulkan
- CLBlast
To know the list of requirements, you can see those under the Requirements section on the homepage.
How To Install Llama.cpp on your Mac
Installing Llama.cpp on macOS is straightforward and works on both Intel Macs and Apple Silicon. This section walks you through all the steps needed to get going on your Macbook. You will find all the required information including prerequisites, running your first build and optional steps for GPU acceleration using Apple’s Metal framework.
Step 1: Install Required Tools
macOS does not include all the required build tools by default so you will need to install them first.
Install Xcode Command Line Tools
This provides essential compilers like clang:
xcode-select --install
Verify your installation with:
clang --version
Install Homebrew (If you don’t already have it)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Then install all the required dependencies:
brew install git cmake
You can also install it with one command using Homebrew:
brew install llama.cpp
Verify your installation with:
git --version
cmake --version
Step 2: Clone the Llama.cpp Repository
Clone the latest version llama.cpp or alternatively get it from the download page and go into the “llama.cpp” directory:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
Step 3: Now build llama.cpp
mkdir build
cd build
cmake ..
cmake --build . --config Release
After the build is complete, the binaries will be located in the following directory:
./build/bin/
Step 4: Enable Apple Metal Acceleration
It is likely you will be using an Apple Silicon chip mac (M1 or newer). If you have one of these Macbooks you can increase your performance by quite a bit using Metal GPU acceleration.
Build with Metal enabled:
cmake -B build -DGGML_METAL=ON
cmake --build build --config Release
This allows llama.cpp to use the integrated Apple GPU so you can now have faster inference.
Step 5: Verify Installation and run your first LLM Model
Run the below help command to insure llama.cpp is responding correctly:
./build/bin/llama-cli --help
If installed correctly, you’ll see a list of available options and commands.
Now to run your first mode, place a GGUF model inside a “models” directory then run:
./build/bin/llama-cli -m models/model.gguf -p "Hello, does a computer work?"
If everything is set up correctly, the model will begin generating output in your terminal.
That’s all! You have now successfully installed llama.cpp on macOS.
How To Install Llama.cpp on Linux
Llama.cpp works on multiple Linux distributions and supports both CPU-only inference and GPU acceleration for Nvidia, AMD and Vulkan backends.
Llama.cpp works on Ubuntu, Debian, Fedora and Arch Linux. Just the package manager commands differ but the overall process remains the same.
Step 1: Install Required Dependencies
What you will need is a compiler, build tools and CMake.
For Ubuntu/Debian systems:
sudo apt update
sudo apt install -y git build-essential cmake
For Fedora:
sudo dnf install git gcc gcc-c++ make cmake
For Arch Linux:
sudo pacman -S git base-devel cmake
Now verify your above installations:
gcc --version
cmake --version
git --version
Step 2: Clone the Llama.cpp Repository
Create a build directory and compile in that directory:
mkdir build
cd build
cmake ..
cmake --build . --config Release
After building binaries will be available in:
./build/bin/
Step 4: Verify your Installation:
./build/bin/llama-cli --help
After running the above command you will now see all the available runtime options.
Linux Performance Enhancements
Linux provides the widest range of optimization options for llama.cpp:
Enable OpenBLAS CPU Acceleration
BLAS libraries improve matrix multiplication performance on CPUs
Install OpenBLAS:
sudo apt install libopenblas-dev # Ubuntu/Debian
Build with BLAS:
cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
cmake --build build --config Release
Nvidia GPU Support with CUDA:
If you have an Nvidia GPU, you can significantly accelerate inference with the help of the CUDA Toolkit.
Build with CUDA:
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
Verify GPU Usage:
nvidia-smi
Vulkan Support (Cross-Platform GPU)
For broader GPU compatibility you can use Vulkan.
Install Vulkan SDK:
sudo apt install vulkan-tools libvulkan-dev
Build with Vulkan:
cmake -B build -DGGML_VULKAN=ON
cmake --build build
AMD GPU Support with ROCm
On supported AMD hardware, ROCm can be used for acceleration.
Build with ROCm:
cmake -B build -DGGML_HIPBLAS=ON
cmake --build build
How To Install Llama.cpp on Windows
Llama.cpp also works on Windows PCs and there are several options you can use to Install it. This section covers:
1: Native build using Visual Studio
2: Linux-compatible setup using WSL
3: Alternative builds using MSYS2
Option 1: Install Llama.cpp using Visual Studio
Step 1: Install the required software
1: Download and Install Microsoft Visual Studio
2: Make sure you have “Desktop development with C++” workload selected
The above will make sure you have CMake integration, Windows SDK and the MSVC compiler.
Step 2: Install Git and CMake
1: Download Git
2: Download CMake
Step 3: Clone the repository and build llama.cpp
1: Run the below command to clone the official repository or alternatively get it from the download page.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
2: Now build llama.cpp:
mkdir build
cd build
cmake ..
cmake --build . --config Release
Once the build has completed, you will find your executable files usually in:
.\build\bin\Release\
Option 2: Install Llama.cpp via WSL
Step 1: Install WSL and a Linux distribution
To Install WSL, run the following command in powershell:
wsl --install
Now install any Linux distribution from the Microsoft Store, Ubuntu is perfect.
Step 2: Run an update and install git and cmake
sudo apt update
sudo apt install git build-essential cmake
Step 3: Now clone and build llama.cpp
You can either clone llama.cpp or also get it from the download page.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build
Note: WSL is ideal if you have preference for Linux tooling and already have some Linux workflows in place.
Option 3: Install with MSYS2
This option is mainly for advanced users. MSYS2 provides a Unix-like environment for you on Windows.
Step 1: Download and Install MSYS2, follow the instructions on the installer.
Once you have installed MSYS2, update the packages and install the build tools.
pacman -Syu
pacman -S mingw-w64-x86_64-gcc cmake git
Step 2: Build llama.cpp
1: Run the below command to clone the official repository or alternatively get it from our download page.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build
That’s it, you have now install llama.cpp!
Windows Tips for Llama.cpp
Use Windows-style file paths:
models\model.gguf
Or quote the file paths with spaces:
-m "C:\Users\YourUserName\models\model.gguf"
Antivirus Issues:
Some antivirus software may slow down your execution. If you are seeing any performance issues, try adding exclusions for the build directory and ensure the binaries are not sandboxed.
Summary
Llama.cpp runs on all major platforms including Windows, Linux and macOS. Download and Installing llama.cpp on any of these platforms is not hard and pretty much a straightforward process. We have shown you all the build tools required to do it, cloning the repository, providing the installation steps and verifying it at the end. If you have any more questions, please do check our FAQ section.