Split Inference

This project implements Split Inference for YOLOv11 to enable real-time object detection on low-power edge devices (Jetson Nano) by dividing the neural network across multiple machines.

Instead of transmitting full video frames, the edge device executes the first part of the model (head) and sends only intermediate feature maps to another device that runs the remaining layers (tail).

Overview

In traditional edge AI pipelines, raw video frames are transmitted to a centralized server for processing. This creates high network bandwidth usage and latency.

Split inference solves this by dividing the neural network into two parts:

Head (Edge Device) – processes the early layers of the model.
Tail (Server / Cloud) – processes the remaining layers.

Only intermediate feature maps are transmitted instead of full images, reducing bandwidth and improving scalability.

Architecture

The system consists of four main components.

Stage 1 – Edge Device (Head)

Devices located at the edge such as traffic cameras or embedded devices (Jetson Nano).

Responsibilities:

Capture video frames
Run the first layers of YOLOv11
Compress intermediate feature maps using quantization
Send feature maps to the network

Stage 2 – Tail Device (Tail)

Devices located in the cloud or high-performance servers.

Responsibilities:

Receive feature maps from edge devices
Run the remaining layers of the neural network
Produce final detection results

Server – Controller

Central coordination service responsible for:

Registering clients
Selecting model cut-layers
Managing inference workflow
Coordinating communication using RabbitMQ

Pipeline

Pipeline steps:

Clients register with the server.
Server collects device information.
The model is split and inference begins.

Project Structure

split_inference/
│
├── client.py          # Edge or tail inference node
├── server.py          # Central controller
├── config.yaml        # System configuration
├── requirements.txt   # Python dependencies
│
├── imgs/              # Images used in README
|   ├── overview.png
│   └── SI-Inference.jpg
│
├── src/               # Core framework modules
└── output.csv         # Performance results

How to Run

1. Clone the repository

git clone https://github.com/filrg/split_inference
cd split_inference

2. Install dependencies

Python 3.8 or higher is required.

pip install -r requirements.txt

3. Start RabbitMQ

RabbitMQ is used for communication between distributed components.

RabbitMQ admin interface:

http://localhost:15672

Default credentials:

username: guest
password: guest

Configuration

Edit config.yaml before running the system.

Example configuration:

name: YOLO
server:
  cut-layer: a # or b, c, d
  clients:
    - 1
    - 1
  model: yolo26n
  batch-size: 5
rabbit:
  address: 127.0.0.1
  username: guest
  password: guest
  virtual-host: /

debug-mode: False
data: videos/video.mp4
log-path: .
control-count: 1
compress:
  enable: True
  num_bit: 8

Feature map compression:

compress:
  enable: True
  num_bit: 8

Running the System

Step 1 – Start Server

python server.py

Step 2 – Start Clients

Edge device:

python client.py --layer_id 1

Optional CPU mode:

python client.py --layer_id 1 --device cpu

Tail device:

python client.py --layer_id 2

Tested Hardware

Device	Role
Jetson Nano	Edge Client (Head)
Jetson Nano	Tail Client
Laptop / Desktop	Tracker
LAN Network	RabbitMQ communication

Application Scenarios

Smart traffic monitoring
Edge surveillance AI
Distributed deep learning research
Bandwidth reduction experiments

License

See LICENSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Split Inference

Table of Contents

Overview

Architecture

Stage 1 – Edge Device (Head)

Stage 2 – Tail Device (Tail)

Server – Controller

Pipeline

Project Structure

How to Run

1. Clone the repository

2. Install dependencies

3. Start RabbitMQ

Configuration

Running the System

Step 1 – Start Server

Step 2 – Start Clients

Tested Hardware

Application Scenarios

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
cfg		cfg
imgs		imgs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
client.py		client.py
config.yaml		config.yaml
requirements.txt		requirements.txt
server.py		server.py

Folders and files

Latest commit

History

Repository files navigation

Split Inference

Table of Contents

Overview

Architecture

Stage 1 – Edge Device (Head)

Stage 2 – Tail Device (Tail)

Server – Controller

Pipeline

Project Structure

How to Run

1. Clone the repository

2. Install dependencies

3. Start RabbitMQ

Configuration

Running the System

Step 1 – Start Server

Step 2 – Start Clients

Tested Hardware

Application Scenarios

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages