A Rust port of PySceneDetect — scene/shot cut detection built around a Sans-I/O streaming API, designed to slot in any other frame source.
scenesdetect is a from-scratch Rust port of PySceneDetect. It is deliberately Sans-I/O: the crate never opens a file, decodes a packet, or spawns a thread. Callers hand frames in one by one, and each detector returns an Option<Timestamp> identifying the cut point — or nothing. Composing those point cuts into scene ranges is the caller's responsibility, which keeps this crate independent of any particular decoding pipeline.
Timestamps are represented as raw integer pts + Timebase (matching FFmpeg's AVRational) rather than floating-point seconds, so all arithmetic is exact and cross-stream comparisons are unambiguous.
| Module | Algorithm | Good for |
|---|---|---|
histogram |
YUV-luma histogram correlation | Generic cuts, robust to camera shake |
phash |
DCT-based perceptual hash (pHash) | Similarity-tolerant dedup / cut detection |
threshold |
Mean-brightness state machine | Fade-to-black / fade-in transitions |
content |
HSV-space delta + optional Canny edge delta | Motion/composition changes — the default PySceneDetect algorithm |
adaptive |
Rolling-average wrapper over content |
Suppresses false positives on sustained fast motion |
- Sans-I/O streaming API — hand in
LumaFrame/RgbFrame/HsvFrame(zero-copy slices), getOption<Timestamp>back per frame. No allocation on the hot path once the detector is primed. - Hand-written SIMD backends — aarch64 NEON, x86 SSSE3 + AVX2 (runtime-dispatched via
is_x86_feature_detected!), and wasmsimd128. All with scalar fallbacks, toggleable per-detector viaOptions::with_simd(false). - Exact rational timestamps —
Timebasemirrors FFmpeg'sAVRational;Timestampcompares semantically across timebases via i128 cross-multiply. no_std+alloc— the crate builds withoutstd; enable the defaultstdfeature for runtime x86 feature detection.- Optional
serde— allOptionstypes deriveSerialize/Deserializeunder theserdefeature.
[dependencies]
scenesdetect = "0.1"| Feature | Default | Purpose |
|---|---|---|
std |
✓ | Runtime x86 SIMD dispatch, standard library types |
alloc |
no_std build using alloc only |
|
serde |
Serialize / Deserialize for all Options types |
Numbers below are per-frame runtimes from the benchmark.yml CI workflow on GitHub-hosted runners, compiled with the default release profile (opt-level = 3, thin LTO). Each row is a single process_* call — that is, the full pipeline for one frame including the per-channel delta reduction. Lower is better; fps is 1 s / per-frame time. Full data lives in the Benchmarks workflow artifacts.
Best SIMD-on path, single-threaded:
| Detector | macOS aarch64 NEON | Linux x86_64 AVX2 | Windows x86_64 AVX2 |
|---|---|---|---|
histogram |
0.93 ms (≈1 080 fps) | 1.24 ms (≈810 fps) | 1.26 ms (≈790 fps) |
phash |
1.65 ms (≈610 fps) | 2.03 ms (≈490 fps) | 2.22 ms (≈450 fps) |
threshold — luma |
0.12 ms (≈8 000 fps) | 0.33 ms (≈3 080 fps) | 0.34 ms (≈2 940 fps) |
threshold — RGB |
0.38 ms (≈2 650 fps) | 0.98 ms (≈1 030 fps) | 0.99 ms (≈1 020 fps) |
content — luma-only |
0.48 ms (≈2 080 fps) | 0.34 ms (≈2 940 fps) | 0.40 ms (≈2 510 fps) |
content — BGR, no edges |
3.38 ms (≈ 300 fps) | 2.78 ms (≈360 fps) | 2.84 ms (≈350 fps) |
content — BGR with Canny edges |
58.0 ms (≈17 fps) | 71.0 ms (≈14 fps) | 75.8 ms (≈13 fps) |
adaptive — luma-only |
0.49 ms (≈2 040 fps) | 0.30 ms (≈3 300 fps) | 0.40 ms (≈2 500 fps) |
adaptive — BGR, no edges |
3.18 ms (≈ 315 fps) | 2.78 ms (≈360 fps) | 3.06 ms (≈325 fps) |
The BGR path is the hot spot — packed-BGR → planar HSV conversion is where the hand-written SIMD backends earn their keep. Scalar numbers come from the same benches with Options::with_simd(false).
| Tier | SIMD | Scalar | Uplift |
|---|---|---|---|
macos-aarch64-neon |
3.38 ms | 4.61 ms | 1.36× |
ubuntu-x86_64-default (runtime AVX2) |
2.78 ms | 24.99 ms | 9.0× |
ubuntu-x86_64-native (-C target-cpu=native) |
2.72 ms | 9.00 ms | 3.3× |
ubuntu-x86_64-ssse3-only (AVX/AVX2/FMA disabled) |
2.09 ms | 21.34 ms | 10.2× |
windows-x86_64-default |
2.84 ms | 57.55 ms | 20.3× |
A few things fall out of this:
- x86 SIMD is very much worth it. Intel/AMD runners without the hand-written
std::archdispatch — i.e. scalar — run the BGR pipeline 9–20× slower than the SSSE3/AVX2 backend. The biggest x86 win is the 3-plane deinterleave viaPSHUFB, which the compiler doesn't emit on its own. - NEON uplift is modest because aarch64's auto-vectorizer handles the scalar fallback well; the hand-written NEON path still wins on the deinterleave (
vld3q_u8) but the scalar baseline is already strong. -C target-cpu=nativecloses most of the scalar gap on x86 (9 ms vs 25 ms default scalar) by unlocking AVX2 for LLVM's auto-vectorizer, but it still loses to the hand-written dispatch by ~3×.- Canny edges are expensive. Turning on
delta_edgesdominates the frame time at ~60–75 ms/1080p. Only enable it when color deltas aren't enough. - Adaptive overhead is ≈O(1) per frame. Varying
window_widthfrom 1 to 16 moves the 1080p luma-only timing by <5% — the rolling-sum fix made the per-frame cost flat.
cargo bench --bench content
cargo bench --bench adaptive
# ...or all of them:
cargo benchThe benchmark.yml workflow runs five matrix rows on every push to main and every PR touching src/**, benches/**, or the workflow file: macos-aarch64-neon, ubuntu-x86_64-default, ubuntu-x86_64-native, ubuntu-x86_64-ssse3-only, windows-x86_64-default. The per-run artifact contains both a bencher-format summary and the Criterion HTML detail tree.
scenesdetect is a Rust port of PySceneDetect by Brandon Castellano, released under the BSD 3-Clause license. The detector algorithms — histogram correlation, DCT-based pHash, brightness-threshold fades, HSV + Canny content deltas, and the rolling-average adaptive layer — are re-implementations of the algorithms described in PySceneDetect's source and documentation. Default parameters mirror PySceneDetect's where practical; any deliberate deviations are called out in the relevant module docs.
See THIRD-PARTY.md for the full upstream license text and additional third-party notices.
scenesdetect is under the terms of both the MIT license and the
Apache License (Version 2.0).
See LICENSE-APACHE, LICENSE-MIT for details.
Copyright (c) 2026 FinDIT studio authors.