Skip to content

filrg/split_inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Split Inference

This project implements Split Inference for YOLOv11 to enable real-time object detection on low-power edge devices (Jetson Nano) by dividing the neural network across multiple machines.

Instead of transmitting full video frames, the edge device executes the first part of the model (head) and sends only intermediate feature maps to another device that runs the remaining layers (tail).


Table of Contents


Overview

In traditional edge AI pipelines, raw video frames are transmitted to a centralized server for processing. This creates high network bandwidth usage and latency.

Split inference solves this by dividing the neural network into two parts:

  1. Head (Edge Device) – processes the early layers of the model.
  2. Tail (Server / Cloud) – processes the remaining layers.

Only intermediate feature maps are transmitted instead of full images, reducing bandwidth and improving scalability.


Architecture

The system consists of four main components.

Stage 1 – Edge Device (Head)

Devices located at the edge such as traffic cameras or embedded devices (Jetson Nano).

Responsibilities:

  • Capture video frames
  • Run the first layers of YOLOv11
  • Compress intermediate feature maps using quantization
  • Send feature maps to the network

Stage 2 – Tail Device (Tail)

Devices located in the cloud or high-performance servers.

Responsibilities:

  • Receive feature maps from edge devices
  • Run the remaining layers of the neural network
  • Produce final detection results

Server – Controller

Central coordination service responsible for:

  • Registering clients
  • Selecting model cut-layers
  • Managing inference workflow
  • Coordinating communication using RabbitMQ


Pipeline

Pipeline steps:

  1. Clients register with the server.
  2. Server collects device information.
  3. The model is split and inference begins.

Project Structure

split_inference/
│
├── client.py          # Edge or tail inference node
├── server.py          # Central controller
├── config.yaml        # System configuration
├── requirements.txt   # Python dependencies
│
├── imgs/              # Images used in README
|   ├── overview.png
│   └── SI-Inference.jpg
│
├── src/               # Core framework modules
└── output.csv         # Performance results

How to Run

1. Clone the repository

git clone https://github.com/filrg/split_inference
cd split_inference

2. Install dependencies

Python 3.8 or higher is required.

pip install -r requirements.txt

3. Start RabbitMQ

RabbitMQ is used for communication between distributed components.

RabbitMQ admin interface:

http://localhost:15672

Default credentials:

username: guest
password: guest

Configuration

Edit config.yaml before running the system.

Example configuration:

name: YOLO
server:
  cut-layer: a # or b, c, d
  clients:
    - 1
    - 1
  model: yolo26n
  batch-size: 5
rabbit:
  address: 127.0.0.1
  username: guest
  password: guest
  virtual-host: /

debug-mode: False
data: videos/video.mp4
log-path: .
control-count: 1
compress:
  enable: True
  num_bit: 8

Feature map compression:

compress:
  enable: True
  num_bit: 8

Running the System

Step 1 – Start Server

python server.py

Step 2 – Start Clients

Edge device:

python client.py --layer_id 1

Optional CPU mode:

python client.py --layer_id 1 --device cpu

Tail device:

python client.py --layer_id 2

Tested Hardware

Device Role
Jetson Nano Edge Client (Head)
Jetson Nano Tail Client
Laptop / Desktop Tracker
LAN Network RabbitMQ communication

Application Scenarios

  • Smart traffic monitoring
  • Edge surveillance AI
  • Distributed deep learning research
  • Bandwidth reduction experiments

License

See LICENSE

Packages

 
 
 

Contributors

Languages