Conversation
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 02:35:24 UTC View detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
There was a problem hiding this comment.
Pull request overview
This PR adds first-class NV12 (semi-planar 4:2:0) support across the frame model, row primitives (scalar + SIMD backends), and the sinker pipeline, while also making the sink API fallible to allow per-frame preflight validation and early error propagation.
Changes:
- Add NV12 source format:
Nv12Frame,Nv12Row, andnv12_torow-walker kernel. - Add NV12→RGB row primitive (
nv12_to_rgb_row) with scalar + SIMD implementations (NEON, SSE4.1, AVX2, AVX-512, wasm simd128). - Make the sink/walker pipeline fallible (
PixelSink::{begin_frame, process}returningResult) and updateMixedSinkerto validate buffer sizes up front and reject per-frame dimension mismatches before writing.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/yuv/yuv420p.rs | Makes the YUV420p walker fallible and adds per-frame begin_frame preflight. |
| src/yuv/nv12.rs | New NV12 walker, row type, and sink trait. |
| src/yuv/mod.rs | Wires NV12 module and public re-exports. |
| src/sinker/mixed.rs | Adds NV12 support, fallible buffer-attachment API, per-frame dimension validation, and new error types/tests. |
| src/row/scalar.rs | Adds scalar NV12→RGB reference implementation. |
| src/row/mod.rs | Adds NV12 dispatcher + shared overflow-checked RGB row sizing helper. |
| src/row/arch/x86_sse41.rs | Adds SSE4.1 NV12→RGB backend + equivalence tests. |
| src/row/arch/x86_avx2.rs | Adds AVX2 NV12→RGB backend + equivalence tests. |
| src/row/arch/x86_avx512.rs | Adds AVX-512 NV12→RGB backend + equivalence tests. |
| src/row/arch/wasm_simd128.rs | Adds wasm simd128 NV12→RGB backend + equivalence tests. |
| src/row/arch/neon.rs | Adds NEON NV12→RGB backend + equivalence tests. |
| src/lib.rs | Updates PixelSink to support fallible processing + per-frame preflight and expands docs accordingly. |
| src/frame.rs | Allows odd heights for 4:2:0, adds Nv12Frame + errors, and adds 32-bit geometry overflow checks. |
| benches/nv12_to_rgb.rs | Adds Criterion benchmark for NV12 row conversion throughput. |
| Cargo.toml | Registers the new NV12 benchmark target. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let one_plane_start = idx * w; | ||
| let one_plane_end = one_plane_start + w; | ||
| let rgb_plane_start = one_plane_start * 3; | ||
| let rgb_plane_end = one_plane_end * 3; | ||
| let rgb_row_bytes = w * 3; |
There was a problem hiding this comment.
idx is used to compute row slice ranges (e.g., idx * w) without validating idx < self.height. If process is called directly (or a custom walker passes a bad index), this will panic on slice indexing, which contradicts the stated goal of returning recoverable errors instead of panicking. Add an explicit bounds check for idx (and return an error) before computing/slicing into the output buffers.
| let one_plane_start = idx * w; | ||
| let one_plane_end = one_plane_start + w; | ||
| let rgb_plane_start = one_plane_start * 3; | ||
| let rgb_plane_end = one_plane_end * 3; | ||
| let rgb_row_bytes = w * 3; |
There was a problem hiding this comment.
idx is assumed to be in-range (“idx < h holds ...”), but there is no explicit check before using it to slice output buffers. A direct process call (or buggy walker) with idx >= self.height will panic here. Add an idx < self.height validation and return a recoverable error before computing row offsets.
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 02:47:02 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 02:56:49 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fn process(&mut self, row: Nv12Row<'_>) -> Result<(), Self::Error> { | ||
| let w = self.width; | ||
| let idx = row.row(); | ||
| let use_simd = self.simd; | ||
|
|
||
| // Defense-in-depth shape check (see Yuv420p impl above). An NV12 | ||
| // UV row is `width` bytes of interleaved U / V payload — same | ||
| // length as Y — so both slices must equal `self.width`. | ||
| if row.y().len() != w { |
There was a problem hiding this comment.
MixedSinker<Nv12>::process can panic if self.width is odd when called directly (bypassing the walker). The function checks slice lengths and row index, but then calls nv12_to_rgb_row, which asserts width & 1 == 0 and will panic for odd widths. To keep process non-panicking under direct calls, add an explicit even-width guard here (or prevent constructing MixedSinker<Nv12> with odd widths) and return a MixedSinkerError variant instead.
| /// Returns [`Nv12FrameError`] if any of: | ||
| /// - `width` or `height` is zero or odd, | ||
| /// - `y_stride < width`, | ||
| /// - `uv_stride < width` (the UV row holds `width / 2` interleaved | ||
| /// pairs = `width` bytes of payload), | ||
| /// - either plane is too short to cover its declared rows. |
There was a problem hiding this comment.
The Nv12Frame::try_new docs say it returns an error when width or height is "zero or odd", but the implementation only rejects odd width (odd heights are explicitly supported via height.div_ceil(2)). Please update this bullet list to match the actual validation behavior so callers don’t incorrectly assume odd heights are rejected.
| fn process(&mut self, row: Yuv420pRow<'_>) -> Result<(), Self::Error> { | ||
| let w = self.width; | ||
| let idx = row.row(); | ||
| let use_simd = self.simd; | ||
|
|
There was a problem hiding this comment.
MixedSinker<Yuv420p>::process can still panic if self.width is odd (e.g., a direct process caller bypassing the walker). The method validates slice lengths and row index, but it later calls yuv_420_to_rgb_row, which asserts width & 1 == 0 and will panic for odd widths. Consider adding an explicit even-width check here (or rejecting odd widths at construction for YUV formats) and returning a MixedSinkerError variant instead to preserve the no-panic contract for direct process calls.
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 03:01:05 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 03:17:11 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let one_plane_start = idx * w; | ||
| let one_plane_end = one_plane_start + w; | ||
| let rgb_plane_start = one_plane_start * 3; | ||
| let rgb_plane_end = one_plane_end * 3; | ||
| let rgb_row_bytes = w * 3; | ||
|
|
There was a problem hiding this comment.
Same overflow issue as the Yuv420p impl: let rgb_row_bytes = w * 3; is unchecked and can wrap on 32-bit even when HSV buffers were successfully attached (which only validates width * height). Use checked_mul(3) here as well and return a fallible error before resizing rgb_scratch/invoking nv12_to_rgb_row.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub trait PixelSink { | ||
| /// The shape of one input unit chosen by the per-format subtrait — | ||
| /// e.g. [`yuv::Yuv420pRow`] for YUV 4:2:0, one row at a time. | ||
| type Input<'a>; | ||
|
|
||
| /// The error type surfaced by this sink. Use | ||
| /// [`core::convert::Infallible`] for sinks that can't fail — the | ||
| /// compiler eliminates the `Result` branching at the call sites. | ||
| type Error; |
There was a problem hiding this comment.
The PR title indicates NV12 support, but this change also makes PixelSink fallible (adds associated Error, changes process signature, and adds begin_frame) and updates existing kernels like yuv420p_to to return Result. That’s a significant API/breaking change and should be explicitly called out in the PR title/description and (if this crate is versioned for external use) the changelog/versioning strategy, so downstream users aren’t surprised.
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 03:40:32 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
PixelSink contract
Summary
Adds NV12 support and reshapes the
PixelSinkcontract to be fullyfallible — no panics under normal contract violations, safe for
no_std+panic = "abort"targets.New: NV12 (semi-planar 4:2:0)
frame::Nv12Frame— validated frame type (Y plane + interleaved UVplane, stride-aware, odd heights allowed per 4:2:0 math), plus the
matching
Nv12FrameErrorenum (ZeroDimension,OddWidth,YStrideTooSmall,UvStrideTooSmall,YPlaneTooShort,UvPlaneTooShort,GeometryOverflow).yuv::{Nv12, Nv12Row, Nv12Sink}+yuv::nv12_towalker — mirrorsthe
Yuv420pshape, with UV as interleaved half-width payload.row::nv12_to_rgb_rowdispatcher withuse_simdtoggle.math, no f32):
scalar::nv12_to_rgb_row(reference)vld2_u8fuses UV deinterleave into one op_mm_loadu_si128+ two_mm_shuffle_epi8deinterleavepermute4x64_epi64::<0xD8>permutexvar_epi64i8x16_shufflecallsnv12_matches_yuv420p_*) guardagainst deinterleave regressions.
benches/nv12_to_rgb.rs— Criterion bench mirroringyuv_420_to_rgb.Breaking:
PixelSinkis falliblePreviously:
Now:
Walkers (
yuv420p_to,nv12_to) now returnResult<(), S::Error>andshort-circuit on the first sink error.
type Error = core::convert::Infallible— LLVM strips the
Resultwrapping entirely, so there's zerohot-path cost versus
().begin_frameis a new per-frame preflight hook. DefaultOk(());buffer-backed sinks override to validate frame-vs-config dimensions
before any row is handed to them. Catches the stale-state failure
modes that per-row
idx < heightcan't: shorter frames leavingbottom rows unwritten, taller frames partial-writing before erroring.
Breaking:
MixedSinkerbuilder/process are fallibleMixedSinker::new(width, height)— now takes height;with_rgb/with_luma/with_hsv/set_*all returnResult<Self, MixedSinkerError>. Buffer-size validation happens atattachment, not part-way through a frame.
New
MixedSinkerErrorvariants (#[non_exhaustive]):DimensionMismatch { configured_w, configured_h, frame_w, frame_h }RgbBufferTooShort { expected, actual }LumaBufferTooShort { expected, actual }HsvPlaneTooShort { which: HsvPlane, expected, actual }GeometryOverflow { width, height, channels }— 32-bitusizewrapRowShapeMismatch { which: RowSlice, row, expected, actual }—direct
processcallers that bypass the walkerRowIndexOutOfRange { row, configured_height }OddWidth { width }— configured width must be even for 4:2:0Supporting types:
HsvPlaneandRowSlice(both#[non_exhaustive], withDisplayimpls for readable error messages).Other fixes
Yuv420pFrame/Nv12Frame— onlyodd width is rejected (4:2:0 subsamples 2:1 in width, height uses
div_ceil(2)). 640×481 HW-decoded frames are representable._mm512_extracti32x8_epi32::<1>is an AVX-512DQ intrinsic, but ourkernel declared only
avx512f,avx512bw. Replaced with_mm512_extracti64x4_epi64::<1>(AVX-512F only, same__m256iresult) so the contract is genuine.
(
width × 3), frame constructor (stride × rows), andMixedSinker's per-row RGB arithmetic. Real bug class on 32-bit
(wasm32, i686); regression tests gated on
target_pointer_width = "32".-C target-cpu=nativeunderRUSTFLAGSglobally was compiling
thiserror_impland other proc macros withhost-CPU-specific instructions, then SIGILL'ing when rustc loaded
them. Fixed by scoping to
CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_RUSTFLAGSso only target-build crates get the flag.
Docs
src/lib.rstrait-levelPixelSinkdocs explain why the trait isfallible, with
Infallible/io::Errorexamples.docs/color-conversion-functions.mdmarks NV12 (#8) as shippedv0.1.
Test plan
cargo test --all-features— 76 passing (18 new NV12 + 9 newfallible/preflight/overflow regressions + cross-format equivalence).
cargo hack check --feature-powerset(7 feature combos incl.--no-default-features).cargo hack clippy --each-feature -- -D warnings.i686-unknown-linux-gnu, x86_64-pc-windows-msvc.
cargo fmt --check.for YUV/NV12→RGB, 1.6–1.8× for RGB→HSV. See
feat(NV12): NV12(semi-planar 4:2:0) + fallible
PixelSinkcontract #2 (comment).Next
infrastructure is in place.