Conversation
There was a problem hiding this comment.
Pull request overview
Implements NV21 (4:2:0 semi-planar with VU-ordered chroma) end-to-end support across frame validation, row conversion (scalar + SIMD), and the MixedSinker pipeline.
Changes:
- Add
Nv21Frame(+ validation errors) and a YUV walkernv21_towithNv21Row/Nv21Sink. - Add
nv21_to_rgb_row(dispatcher) and NV21 support across scalar + SIMD backends via a shared NV12/NV21 implementation. - Extend
MixedSinkerto consume NV21 and add benches/tests for equivalence and output correctness.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/yuv/nv21.rs |
New NV21 row walker + row/sink types mirroring NV12 but with VU chroma. |
src/yuv/mod.rs |
Registers/exposes the new NV21 module and public API exports. |
src/sinker/mixed.rs |
Adds PixelSink impl for MixedSinker<Nv21> and NV21-focused tests. |
src/row/scalar.rs |
Adds scalar NV21 path via shared NV12/NV21 impl with compile-time UV swap. |
src/row/mod.rs |
Public dispatcher nv21_to_rgb_row with SIMD selection matching NV12 pattern. |
src/row/arch/x86_sse41.rs |
Adds NV21 SIMD wrapper + shared NV12/NV21 impl + equivalence tests. |
src/row/arch/x86_avx2.rs |
Adds NV21 SIMD wrapper + shared NV12/NV21 impl + equivalence tests. |
src/row/arch/x86_avx512.rs |
Adds NV21 SIMD wrapper + shared NV12/NV21 impl + equivalence tests. |
src/row/arch/wasm_simd128.rs |
Adds NV21 SIMD wrapper + shared NV12/NV21 impl + equivalence tests. |
src/row/arch/neon.rs |
Adds NV21 SIMD wrapper + shared NV12/NV21 impl + equivalence tests. |
src/frame.rs |
Adds Nv21Frame and Nv21FrameError plus validation tests. |
benches/nv21_to_rgb.rs |
New Criterion bench for per-row NV21→RGB conversion. |
Cargo.toml |
Registers the new nv21_to_rgb benchmark target. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fn begin_frame(&mut self, width: u32, height: u32) -> Result<(), Self::Error> { | ||
| if self.width & 1 != 0 { | ||
| return Err(MixedSinkerError::OddWidth { width: self.width }); | ||
| } |
There was a problem hiding this comment.
MixedSinkerError::OddWidth is now returned from the NV21 sink as well, but the error’s Display text and doc comment still mention only “YUV420p / NV12 require even width” (see MixedSinkerError::OddWidth). This makes the error misleading for NV21 callers; consider updating the message/docs to include NV21 or make it format-agnostic (e.g., “4:2:0 formats require even width”).
| if row.vu_half().len() != w { | ||
| // Nv21 reuses the `UvHalf` variant because the RowSlice enum | ||
| // describes "half-width interleaved chroma" structurally — the | ||
| // byte order isn't part of the shape contract. If a caller | ||
| // needs the format-specific distinction they can match on the | ||
| // sink type instead. | ||
| return Err(MixedSinkerError::RowShapeMismatch { | ||
| which: RowSlice::UvHalf, | ||
| row: idx, | ||
| expected: w, | ||
| actual: row.vu_half().len(), | ||
| }); |
There was a problem hiding this comment.
The NV21 path reports chroma shape mismatches as RowSlice::UvHalf (and RowSlice currently displays this as “UV Half”), which can be confusing when debugging NV21 (VU-ordered) inputs. Since NV21 support is being added, consider adding a RowSlice::VuHalf variant (or a more generic interleaved-chroma variant) and updating the existing RowSlice doc comment that references NV21 as a future addition.
| //! - [`Nv21`](crate::yuv::Nv21) — 4:2:0 semi‑planar with **VU**- | ||
| //! ordered chroma (Android MediaCodec default). |
There was a problem hiding this comment.
The bullet for NV21 splits “VU-ordered” across a line break as **VU**- / ordered, which renders as “VU- ordered” in rustdoc. Consider keeping “VU-ordered” on one line (or dropping the end-of-line hyphen) to avoid unintended hyphenation/spacing in the generated docs.
| //! - [`Nv21`](crate::yuv::Nv21) — 4:2:0 semi‑planar with **VU**- | |
| //! ordered chroma (Android MediaCodec default). | |
| //! - [`Nv21`](crate::yuv::Nv21) — 4:2:0 semi‑planar with **VU**-ordered | |
| //! chroma (Android MediaCodec default). |
| // ---- Nv21 impl ---------------------------------------------------------- | ||
| // | ||
| // Structurally identical to the Nv12 impl — the row primitives hide | ||
| // the U/V byte-order difference. Only the trait `Input<'r>` and the | ||
| // primitive name change. |
There was a problem hiding this comment.
The file-level docs near the top still say “v0.1 ships the Yuv420p and Nv12 impls”, but this PR adds an NV21 PixelSink impl as well. Please update the MixedSinker module documentation to reflect NV21 support so public docs remain accurate.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 04:15:28 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 04:33:10 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Summary
Adds NV21 support (Android MediaCodec's default 8-bit output format).
NV21 is structurally identical to NV12 — one full-size Y plane plus
one interleaved chroma plane at half width and half height — but the
chroma bytes are VU-ordered (
V0, U0, V1, U1, …) instead ofNV12's UV-ordering.
The implementation shares every kernel with NV12 via
const SWAP_UV: boolmonomorphization — zero runtime cost, ~5 linesof diff per SIMD backend instead of ~150 lines of duplicated kernel
body. The swap is resolved at compile time; LLVM generates two
monomorphic copies (one per boolean value), each with the
if SWAP_UVbranch eliminated. Benchmarks confirm NV21 runs withinmeasurement noise of NV12 on every backend — see
#3 (comment).
New API surface
frame::Nv21Frame+Nv21FrameError— validated frame type,stride-aware, odd heights accepted, mirrors
Nv12Frame'svalidation and error-variant shape.
yuv::{Nv21, Nv21Row, Nv21Sink, nv21_to}— marker type, rowstruct exposing
y()/vu_half(), sink subtrait, walker.row::nv21_to_rgb_rowdispatcher withuse_simdtoggle.impl PixelSink for MixedSinker<'_, Nv21>— full fallible contract(
begin_framepreflight,OddWidth/DimensionMismatch/RowShapeMismatch { which: VuHalf, … }/RowIndexOutOfRange/GeometryOverflow).Refactor: const-generic kernel share
Every backend now has one
nv12_or_nv21_to_rgb_row_impl<const SWAP_UV: bool>helper with two thin public wrappers:Inside the shared impl, the U/V output of the deinterleave step is
selected by
SWAP_UV:Applied consistently across scalar, NEON (
vld2_u8result swap),SSE4.1 (
_mm_shuffle_epi8mask pair swap), AVX2 (cast_si128/extracti128swap after thepermute4x64fixup), AVX-512(
castsi512_si256/extracti64x4swap after thepermutexvargather), and wasm simd128 (
i8x16_shuffleresult swap).This is the pattern the project roadmap (
docs/color-conversion-functions.md§ 8) calls out for future format pairs — NV24/NV42, and an eventual
P010-with-VU-swapped twin will follow the same shape.
Tests
90 lib tests pass (7 new):
byte-identical output across all 6 color matrices and
full/limited range.
check_nv21_matches_nv12_swappedper arch: builds a chromastream with bytes swapped and verifies NV21 kernel on the swapped
data produces the same RGB as NV12 on the original. Catches any
stray
SWAP_UVbug.MixedSinker<Nv21>integration: luma-only, rgb-only, mixedthree-output, and cross-format NV21-matches-NV12 tests.
Nv21Framevalidation: 9 tests (zero/odd-width/odd-height-accepted/stride/plane-short/overflow/panic).
Error-type additions
RowSlice::VuHalf— new#[non_exhaustive]variant so NV21 usersdon't see misleading
"UV Half"inRowShapeMismatcherrors.Displays as
"VU Half"viaderive_more::Display.MixedSinkerError::OddWidthmessage is now format-agnostic ("4:2:0 formats require even width") rather than listing only YUV420p / NV12.Docs
src/lib.rs→PixelSinkdocs unchanged (already cover thefallible contract from PR feat(NV12): NV12(semi-planar 4:2:0) + fallible
PixelSinkcontract #2).src/yuv/mod.rstop-level doc adds the NV21 bullet.docs/color-conversion-functions.md:and a TODO plan (Ships 1–6):
yuvj420p→yuv420p10→P010→ higher bit-depths → NV16/24/42 → alpha + niche u16variants. Value-first (HDR) ordering with the keystone
dependency chain called out.
out of FFmpeg's ~180
AVPixelFormatenum: hardware handles,endian twins, legacy palette formats, libavfilter-internal floats
aren't dedicated-kernel territory.
Benchmarks
Run across all 8 tiers (macOS NEON, macOS scalar-forced, Ubuntu
default/avx2-max/sse41-max/scalar/native, Windows default):
#3 (comment)
Headline: NV21 is within noise of NV12 on every tier, confirming
the
const SWAP_UVmonomorphization has zero runtime cost. At1920 px:
Test plan
cargo test --all-features— 90 passingcargo hack check --feature-powerset(7 feature combos incl.--no-default-features)cargo hack clippy --each-feature -- -D warningsaarch64-linux, aarch64-darwin
cargo fmt --checkcargo doc --all-features --no-deps— no unresolved linksNext
Per the updated
docs/color-conversion-functions.md§ 8, the nextshipment is either
yuvj420p(a doc-only clarification) oryuv420p10(the u16 kernel template that unblocks HDR).yuv420p10is the load-bearing next step.