Getting Started with LLaMA.cpp (Complete Installation Guide)

Llama.cpp is a high-performance C/C++ implementation to run Large Language Models locally. It focuses on efficient inference on any consumer hardware enabling you to run models on CPUs and GPUs without requiring large cloud infrastructure.

Who is this guide for?

Llama.cpp is not complex to Download and Install. The below guide walks you through everything you need to know to Download, Install and setup Llama.cpp on your Mac, Linux and Windows PC. You don’t need a lot of knowledge to be able to setup Llama.cpp, the below guide is suitable for all technical levels, however some familiarity with command-line tools will be helpful. This guide is designed for anyone who wants to run large language models locally for various reasons including experimentation, development or production use.

Llama.cpp Getting Started Installation Guide

Developers and Engineers

If you are a software developer or an engineer looking to integrate AI into applications without relying on cloud services, this guide will help you to build llama.cpp from the original source across different platforms so you can run models locally for development and testing. You can also use the built-in server to create API-driven applications and also integrate with tools such as llama.cpp Python. This is great if you are building AI-powered web apps, local-first tools or want privacy with your prompts.

AI Enthusiasts and Hobbyists

If you want to explore local AI for personal use, this guide will walk you through setting up your first mode, running prompts and do interactive chats. You will also experiment with different models and quantizations along with understanding the performance trade-offs on your hardware.

Researches and Experimenters

If you are exploring local AI for personal use and want to setup your first model, run prompts and have an interactive chat this is the guide for you! You can absolutely run different modems and even experiment with quantizations and understand the different performance trade-offs on your hardware. You don’t need machine learning knowledge or skills, just a willingness to work the terminal, that is it! This is also why so many people love it!

Privacy-focused Users

Running models locally also means that you are in control of your data privacy. As you are running the models locally, no data is sent to external APIs, you can work fully offline with your favorite LLM and the sensitive data stays on your machine. Llama.cpp is also open-source, free and supports multiple models. This is only a few of the vast variety of features it comes with.

Prerequisites

Before Installing Llama.cpp and following this guide, ensure you can meeting the requirements:

Basic Requirements

  1. Git, so you can clone the repository
  2. CMake 3.18 or higher is better
  3. A C/C++ compiler
  4. At least 4 GB of RAM so you can run small models

Optional requirements for Hardware Acceleration

  1. Metal on Apple Silicon
  2. CUDA on Nvidia GPUs
  3. OpenBLAS or BLAS on CPUs
  4. Vulkan
  5. CLBlast

To know the list of requirements, you can see those under the Requirements section on the homepage.

How To Install Llama.cpp on your Mac

Installing Llama.cpp on macOS is straightforward and works on both Intel Macs and Apple Silicon. This section walks you through all the steps needed to get going on your Macbook. You will find all the required information including prerequisites, running your first build and optional steps for GPU acceleration using Apple’s Metal framework.

Step 1: Install Required Tools

macOS does not include all the required build tools by default so you will need to install them first.

Install Xcode Command Line Tools

This provides essential compilers like clang:

xcode-select --install

Verify your installation with:

clang --version

Install Homebrew (If you don’t already have it)

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Then install all the required dependencies:

brew install git cmake

You can also install it with one command using Homebrew:

brew install llama.cpp

Verify your installation with:

git --version
cmake --version

Step 2: Clone the Llama.cpp Repository

Clone the latest version llama.cpp or alternatively get it from the download page and go into the “llama.cpp” directory:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

Step 3: Now build llama.cpp

mkdir build
cd build

cmake ..
cmake --build . --config Release

After the build is complete, the binaries will be located in the following directory:

./build/bin/

Step 4: Enable Apple Metal Acceleration

It is likely you will be using an Apple Silicon chip mac (M1 or newer). If you have one of these Macbooks you can increase your performance by quite a bit using Metal GPU acceleration.

Build with Metal enabled:

cmake -B build -DGGML_METAL=ON
cmake --build build --config Release

This allows llama.cpp to use the integrated Apple GPU so you can now have faster inference.

Step 5: Verify Installation and run your first LLM Model

Run the below help command to insure llama.cpp is responding correctly:

./build/bin/llama-cli --help

If installed correctly, you’ll see a list of available options and commands.

Now to run your first mode, place a GGUF model inside a “models” directory then run:

./build/bin/llama-cli -m models/model.gguf -p "Hello, does a computer work?"

If everything is set up correctly, the model will begin generating output in your terminal.

That’s all! You have now successfully installed llama.cpp on macOS.

How To Install Llama.cpp on Linux

Llama.cpp works on multiple Linux distributions and supports both CPU-only inference and GPU acceleration for Nvidia, AMD and Vulkan backends.

Llama.cpp works on Ubuntu, Debian, Fedora and Arch Linux. Just the package manager commands differ but the overall process remains the same.

Step 1: Install Required Dependencies

What you will need is a compiler, build tools and CMake.

For Ubuntu/Debian systems:

sudo apt update
sudo apt install -y git build-essential cmake

For Fedora:

sudo dnf install git gcc gcc-c++ make cmake

For Arch Linux:

sudo pacman -S git base-devel cmake

Now verify your above installations:

gcc --version
cmake --version
git --version

Step 2: Clone the Llama.cpp Repository

Create a build directory and compile in that directory:

mkdir build
cd build

cmake ..
cmake --build . --config Release

After building binaries will be available in:

./build/bin/

Step 4: Verify your Installation:

./build/bin/llama-cli --help

After running the above command you will now see all the available runtime options.

Linux Performance Enhancements

Linux provides the widest range of optimization options for llama.cpp:

Enable OpenBLAS CPU Acceleration

BLAS libraries improve matrix multiplication performance on CPUs

Install OpenBLAS:

sudo apt install libopenblas-dev   # Ubuntu/Debian

Build with BLAS:

cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
cmake --build build --config Release

Nvidia GPU Support with CUDA:

If you have an Nvidia GPU, you can significantly accelerate inference with the help of the CUDA Toolkit.

Build with CUDA:

cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

Verify GPU Usage:

nvidia-smi

Vulkan Support (Cross-Platform GPU)

For broader GPU compatibility you can use Vulkan.

Install Vulkan SDK:

sudo apt install vulkan-tools libvulkan-dev

Build with Vulkan:

cmake -B build -DGGML_VULKAN=ON
cmake --build build

AMD GPU Support with ROCm

On supported AMD hardware, ROCm can be used for acceleration.

Build with ROCm:

cmake -B build -DGGML_HIPBLAS=ON
cmake --build build

How To Install Llama.cpp on Windows

Llama.cpp also works on Windows PCs and there are several options you can use to Install it. This section covers:

1: Native build using Visual Studio

2: Linux-compatible setup using WSL

3: Alternative builds using MSYS2

Option 1: Install Llama.cpp using Visual Studio

Step 1: Install the required software

1: Download and Install Microsoft Visual Studio

2: Make sure you have “Desktop development with C++” workload selected

The above will make sure you have CMake integration, Windows SDK and the MSVC compiler.

Step 2: Install Git and CMake

1: Download Git

2: Download CMake

Step 3: Clone the repository and build llama.cpp

1: Run the below command to clone the official repository or alternatively get it from the download page.

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

2: Now build llama.cpp:

mkdir build
cd build

cmake ..
cmake --build . --config Release

Once the build has completed, you will find your executable files usually in:

.\build\bin\Release\

Option 2: Install Llama.cpp via WSL

Step 1: Install WSL and a Linux distribution

To Install WSL, run the following command in powershell:

wsl --install

Now install any Linux distribution from the Microsoft Store, Ubuntu is perfect.

Step 2: Run an update and install git and cmake

sudo apt update
sudo apt install git build-essential cmake

Step 3: Now clone and build llama.cpp

You can either clone llama.cpp or also get it from the download page.

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

cmake -B build
cmake --build build

Note: WSL is ideal if you have preference for Linux tooling and already have some Linux workflows in place.

Option 3: Install with MSYS2

This option is mainly for advanced users. MSYS2 provides a Unix-like environment for you on Windows.

Step 1: Download and Install MSYS2, follow the instructions on the installer.

Once you have installed MSYS2, update the packages and install the build tools.

pacman -Syu
pacman -S mingw-w64-x86_64-gcc cmake git

Step 2: Build llama.cpp

1: Run the below command to clone the official repository or alternatively get it from our download page.

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

cmake -B build
cmake --build build

That’s it, you have now install llama.cpp!

Windows Tips for Llama.cpp

Use Windows-style file paths:

models\model.gguf

Or quote the file paths with spaces:

-m "C:\Users\YourUserName\models\model.gguf"

Antivirus Issues:

Some antivirus software may slow down your execution. If you are seeing any performance issues, try adding exclusions for the build directory and ensure the binaries are not sandboxed.

Summary

Llama.cpp runs on all major platforms including Windows, Linux and macOS. Download and Installing llama.cpp on any of these platforms is not hard and pretty much a straightforward process. We have shown you all the build tools required to do it, cloning the repository, providing the installation steps and verifying it at the end. If you have any more questions, please do check our FAQ section.