Automated builds of llama.cpp's llama-server for Aegis-AI.
Produces pre-built binaries for all platforms, published as GitHub Releases tagged to match the upstream llama.cpp version (e.g. b8502). New releases are auto-detected weekly (Monday 04:00 UTC).
| Artifact | GPU | SM Targets |
|---|---|---|
llama-server-{ver}-linux-x64-cpu.tar.gz |
— | — |
llama-server-{ver}-linux-x64-cuda-12.tar.gz |
CUDA 12.8 | 75–120 |
llama-server-{ver}-linux-x64-cuda-13.tar.gz |
CUDA 13.1 | 75–120 |
llama-server-{ver}-linux-x64-vulkan.tar.gz |
Vulkan | — |
| Artifact | GPU |
|---|---|
llama-server-{ver}-linux-arm64-cpu.tar.gz |
— |
llama-server-{ver}-linux-arm64-cuda-12.tar.gz |
CUDA 12 |
llama-server-{ver}-linux-arm64-cuda-13.tar.gz |
CUDA 13 |
⚠️ Note on generic ARM64 builds: These are compiled with-march=armv9-awhich enables SVE instructions not available on Cortex-A72/A76/A78AE CPUs (Raspberry Pi 4/5, Jetson Orin, RK3588). If you seesignal=SIGILL(exit 132), use the embedded device builds below.
Built with device-appropriate -march flags. CPU backend uses -march=armv8-a on all GPU-accelerated variants since inference runs on CUDA/Vulkan.
| Artifact | Device | Accel | CPU flags |
|---|---|---|---|
llama-server-{ver}-arm64-jetson-orin-cuda-12.tar.gz |
Jetson Orin Nano/NX/AGX | CUDA 12 | -march=armv8-a |
llama-server-{ver}-arm64-jetson-xavier-cuda-11.tar.gz |
Jetson Xavier NX/AGX | CUDA 11 | -march=armv8-a |
llama-server-{ver}-arm64-rpi5-vulkan.tar.gz |
Raspberry Pi 5 | Vulkan | -march=armv8-a |
llama-server-{ver}-arm64-rk3588-vulkan.tar.gz |
Rockchip RK3588 | Vulkan | -march=armv8-a |
llama-server-{ver}-arm64-a76-vulkan.tar.gz |
Orange Pi 5, Rock 5B | Vulkan | -march=armv8-a |
llama-server-{ver}-arm64-rpi5-cpu.tar.gz |
Raspberry Pi 5 | CPU | -mcpu=cortex-a76 |
llama-server-{ver}-arm64-rpi4-cpu.tar.gz |
Raspberry Pi 4 | CPU | -mcpu=cortex-a72 |
llama-server-{ver}-arm64-modern-cpu.tar.gz |
RPi5, RK3588, Jetson | CPU | -march=armv8.2-a+dotprod |
llama-server-{ver}-arm64-safe-cpu.tar.gz |
All ARM64 boards | CPU | -march=armv8-a |
| Artifact | GPU |
|---|---|
llama-server-{ver}-windows-x64-cuda-12.zip |
CUDA 12.4 |
llama-server-{ver}-windows-x64-cuda-13.zip |
CUDA 13.1 |
llama-server-{ver}-windows-x64-vulkan.zip |
Vulkan |
llama-server-{ver}-windows-x64-cpu.zip |
— |
llama-server-{ver}-windows-arm64-cpu.zip |
— |
llama-server-{ver}-macos-arm64-metal.tar.gz |
Metal |
llama-server-{ver}-macos-x64-cpu.tar.gz |
— |
The scripts/build-embedded.sh script lets you build any variant locally on your device.
If you already have a binary installed but it crashes with signal=SIGILL, use patch-cpu-lib to rebuild only libggml-cpu.so with safe flags and swap it in:
git clone https://github.com/SharpAI/llama-server-build.git
cd llama-server-build
# Replace the SVE-crashing libggml-cpu.so in your existing install:
./scripts/build-embedded.sh b8502 patch-cpu-lib \
~/.aegis-ai/llama_binaries/b8502/linux-arm64-cuda-12
# Takes ~5 minutes. Verifies automatically on completion.# Single binary for Jetson Orin/Xavier, RPi 5, RK3588 (modern boards, no RPi4):
./scripts/build-embedded.sh b8502 modern-cpu
# Universal binary — all boards including RPi 4:
./scripts/build-embedded.sh b8502 safe-cpu
# Jetson Orin with CUDA (run natively on the device):
./scripts/build-embedded.sh b8502 jetson-orin-cuda
# Raspberry Pi 5 with Vulkan:
./scripts/build-embedded.sh b8502 rpi5-vulkan
# Rockchip RK3588 with Vulkan (Mali-G610):
./scripts/build-embedded.sh b8502 rk3588-vulkan
# Raspberry Pi 4, CPU only:
./scripts/build-embedded.sh b8502 rpi4-cpuAll builds produce a tarball in ./dist/. Install into Aegis-AI:
VERSION=b8502
PROFILE=jetson-orin-cuda-12 # or rpi5-vulkan, rk3588-vulkan, etc.
mkdir -p ~/.aegis-ai/llama_binaries/${VERSION}/${PROFILE}/
tar -xzf dist/llama-server-${VERSION}-arm64-${PROFILE}.tar.gz \
--strip-components=1 \
-C ~/.aegis-ai/llama_binaries/${VERSION}/${PROFILE}/On an x86_64 Linux host, install the aarch64 cross-toolchain and set CROSS_TRIPLE:
sudo apt install gcc-aarch64-linux-gnu g++-aarch64-linux-gnu
CROSS_TRIPLE=aarch64-linux-gnu \
./scripts/build-embedded.sh b8502 safe-cpuCUDA profiles require building natively on the target device.
- Weekly (Monday 04:00 UTC), the workflow checks the latest llama.cpp release
- If the repo doesn't have a matching release it automatically builds all variants
- Binaries are published as a GitHub Release with the same version tag
- You can also manually trigger a build from the Actions tab
Aegis-AI's config/llama-binary-manifest.json contains url_template entries pointing to this repo's releases. The runtime binary manager downloads the appropriate variant when a user installs or upgrades the AI engine.
The built binaries are subject to the llama.cpp license (MIT).