This project implements Split Inference for YOLOv11 to enable real-time object detection on low-power edge devices (Jetson Nano) by dividing the neural network across multiple machines.
Instead of transmitting full video frames, the edge device executes the first part of the model (head) and sends only intermediate feature maps to another device that runs the remaining layers (tail).
In traditional edge AI pipelines, raw video frames are transmitted to a centralized server for processing. This creates high network bandwidth usage and latency.
Split inference solves this by dividing the neural network into two parts:
- Head (Edge Device) – processes the early layers of the model.
- Tail (Server / Cloud) – processes the remaining layers.
Only intermediate feature maps are transmitted instead of full images, reducing bandwidth and improving scalability.
The system consists of four main components.
Devices located at the edge such as traffic cameras or embedded devices (Jetson Nano).
Responsibilities:
- Capture video frames
- Run the first layers of YOLOv11
- Compress intermediate feature maps using quantization
- Send feature maps to the network
Devices located in the cloud or high-performance servers.
Responsibilities:
- Receive feature maps from edge devices
- Run the remaining layers of the neural network
- Produce final detection results
Central coordination service responsible for:
- Registering clients
- Selecting model cut-layers
- Managing inference workflow
- Coordinating communication using RabbitMQ
Pipeline steps:
- Clients register with the server.
- Server collects device information.
- The model is split and inference begins.
split_inference/
│
├── client.py # Edge or tail inference node
├── server.py # Central controller
├── config.yaml # System configuration
├── requirements.txt # Python dependencies
│
├── imgs/ # Images used in README
| ├── overview.png
│ └── SI-Inference.jpg
│
├── src/ # Core framework modules
└── output.csv # Performance results
git clone https://github.com/filrg/split_inference
cd split_inferencePython 3.8 or higher is required.
pip install -r requirements.txtRabbitMQ is used for communication between distributed components.
RabbitMQ admin interface:
http://localhost:15672
Default credentials:
username: guest
password: guest
Edit config.yaml before running the system.
Example configuration:
name: YOLO
server:
cut-layer: a # or b, c, d
clients:
- 1
- 1
model: yolo26n
batch-size: 5
rabbit:
address: 127.0.0.1
username: guest
password: guest
virtual-host: /
debug-mode: False
data: videos/video.mp4
log-path: .
control-count: 1
compress:
enable: True
num_bit: 8Feature map compression:
compress:
enable: True
num_bit: 8python server.pyEdge device:
python client.py --layer_id 1Optional CPU mode:
python client.py --layer_id 1 --device cpuTail device:
python client.py --layer_id 2| Device | Role |
|---|---|
| Jetson Nano | Edge Client (Head) |
| Jetson Nano | Tail Client |
| Laptop / Desktop | Tracker |
| LAN Network | RabbitMQ communication |
- Smart traffic monitoring
- Edge surveillance AI
- Distributed deep learning research
- Bandwidth reduction experiments
See LICENSE

