feat(yuv420p10): 10-bit YUV 4:2:0 planar → u8 + native u16 RGB#4
Conversation
There was a problem hiding this comment.
Pull request overview
Adds YUV420p10 (10‑bit 4:2:0 planar, AV_PIX_FMT_YUV420P10LE) support end-to-end, including frame validation, row conversion (scalar + SIMD backends), and integration into MixedSinker for RGB/Luma/HSV outputs.
Changes:
- Introduces
Yuv420pFrame16/Yuv420p10Frameand a YUV420p10 row-walker (yuv420p10_to) plusYuv420p10marker types. - Adds 10‑bit YUV420p→RGB row primitives for both u8 RGB output and native-depth u16 RGB output (scalar + NEON/SSE4.1/AVX2/AVX‑512/wasm simd128).
- Extends
MixedSinkerwith optionalrgb_u16output and adds benchmarks + tests for the new format and SIMD equivalence.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/yuv/yuv420p10.rs | New YUV420p10 format marker, row type, and frame row-walker. |
| src/yuv/mod.rs | Wires the new YUV420p10 module into the public yuv API. |
| src/sinker/mixed.rs | Adds rgb_u16 output support and a PixelSink impl for Yuv420p10; adds tests. |
| src/row/scalar.rs | Adds scalar high-bit-depth 4:2:0 row converters and shared range/bias helpers; adds tests. |
| src/row/mod.rs | Exposes public YUV420p10 row conversion APIs with SIMD dispatch + u16 output path. |
| src/row/arch/x86_sse41.rs | Adds SSE4.1 YUV420p10→RGB kernels (u8 + u16) and scalar-equivalence tests. |
| src/row/arch/x86_common.rs | Adds shared write_rgb_u16_8 helper for x86 u16 RGB interleaving. |
| src/row/arch/x86_avx2.rs | Adds AVX2 YUV420p10→RGB kernels (u8 + u16) and scalar-equivalence tests. |
| src/row/arch/x86_avx512.rs | Adds AVX‑512 YUV420p10→RGB kernels (u8 + u16) and scalar-equivalence tests. |
| src/row/arch/wasm_simd128.rs | Adds wasm simd128 YUV420p10→RGB kernels (u8 + u16) and u16 write helper. |
| src/row/arch/neon.rs | Adds NEON YUV420p10→RGB kernels (u8 + u16) and scalar-equivalence tests. |
| src/frame.rs | Adds Yuv420pFrame16 + Yuv420p10Frame and validation error types + tests. |
| benches/yuv_420p10_to_rgb.rs | Adds Criterion benchmarks for u8 and u16 YUV420p10 row conversion throughput. |
| Cargo.toml | Registers the new benchmark target. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 06:15:35 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 06:36:06 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if row.y().len() != w { | ||
| return Err(MixedSinkerError::RowShapeMismatch { | ||
| which: RowSlice::Y10, | ||
| row: idx, | ||
| expected: w, | ||
| actual: row.y().len(), | ||
| }); |
There was a problem hiding this comment.
RowShapeMismatch is reused here for u16 row slices (Y10/UHalf10/VHalf10), but its error message says the slice has N "bytes". In this 10‑bit path the lengths are counts of u16 elements, so the diagnostics will be misleading. Consider making the message unit-agnostic (e.g., “elements/len”) or adding a separate mismatch variant for u16 rows so callers don’t misinterpret the reported sizes.
| fn p10_planes() -> (std::vec::Vec<u16>, std::vec::Vec<u16>, std::vec::Vec<u16>) { | ||
| // 16×8 frame, chroma 8×4. Neutral 10-bit mid-gray (Y=512, UV=512). | ||
| ( | ||
| std::vec![0u16; 16 * 8], |
There was a problem hiding this comment.
The test helper comment says this builds a neutral 10‑bit mid‑gray frame (Y=512, UV=512), but the Y plane is initialized with 0s. Either update the comment (it’s currently describing a different frame) or initialize the Y plane with 512 to match the stated intent.
| std::vec![0u16; 16 * 8], | |
| std::vec![512u16; 16 * 8], |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 06:46:44 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 06:58:31 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Returns `true` iff the sinker will write `u16` RGB at the | ||
| /// source's native bit depth. Only high‑bit‑depth source impls | ||
| /// (currently [`Yuv420p10`](crate::yuv::Yuv420p10)) honor this | ||
| /// buffer — attaching it on an 8‑bit source format is legal but | ||
| /// no writes occur. | ||
| #[cfg_attr(not(tarpaulin), inline(always))] | ||
| pub const fn produces_rgb_u16(&self) -> bool { | ||
| self.rgb_u16.is_some() | ||
| } |
There was a problem hiding this comment.
The doc comment says attaching a u16 RGB buffer on an 8-bit format is “legal but no writes occur”, but the PR specifically type-gates with_rgb_u16/set_rgb_u16 to MixedSinker<Yuv420p10>, so external callers can’t attach it for 8-bit formats at all. Update this doc to reflect the compile-time restriction (or alternatively move produces_rgb_u16 onto the MixedSinker<Yuv420p10> impl so the API surface matches the gating).
| /// Returns [`Yuv420pFrame16Error`] if any of: | ||
| /// - `BITS` is not 10, 12, or 14 (Ship 2 additionally rejects 12/14 | ||
| /// at the type alias layer — see [`Yuv420p10Frame`]), |
There was a problem hiding this comment.
The PR description says the public “Ship 2” jargon was replaced with “colconv v0.2”, but this doc comment still uses “Ship 2”. Please update to the v0.2 wording (or remove the ship reference) to keep public docs consistent with the PR’s stated cleanup.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 07:18:02 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 07:25:58 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 08:07:06 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 08:14:26 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Benchmark ResultsBenchmark Results SummaryDate: 2026-04-19 08:38:12 UTC Benchmark Results for macos-aarch64-neonSystem Information
allBenchmark Results for macos-aarch64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-avx2-maxSystem Information
allBenchmark Results for ubuntu-x86_64-defaultSystem Information
allBenchmark Results for ubuntu-x86_64-nativeSystem Information
allBenchmark Results for ubuntu-x86_64-scalarSystem Information
allBenchmark Results for ubuntu-x86_64-sse41-maxSystem Information
allBenchmark Results for windows-x86_64-defaultSystem Information
allView detailed resultsDetailed Criterion results have been uploaded as artifacts. Download them from the workflow run to view charts and detailed statistics. |
Summary
Ships
AV_PIX_FMT_YUV420P10LEend-to-end — the keystone high-bit-depthformat that Ships 3/4/6 (P010, yuv420p12/14, P012/P016) build on. Two
output paths: u8 RGB (fast, downshifts 10→8 in a single Q15 shift)
and native-depth u16 RGB (lossless,
yuv420p10le-style low-bit-packed for HDR / 10-bit scene analysis).
API additions
frame::Yuv420pFrame16<'a, const BITS: u32>+Yuv420p10Framealias— generic u16-backed frame type. Validates
BITS ∈ {10, 12, 14}so12/14 unlock by relaxing one check. Opt-in
try_new_checkedvalidates every sample is in
[0, (1<<BITS)-1]for untrusted input.yuv::{Yuv420p10, Yuv420p10Row, Yuv420p10Sink, yuv420p10_to}—marker type, row struct, Sink subtrait, row walker.
row::yuv420p10_to_rgb_row(u8 out) +row::yuv420p10_to_rgb_u16_row(u16 native-depth out) dispatchers with SIMD/scalar toggle.
MixedSinker::with_rgb_u16/set_rgb_u16— gated at the typelevel to
MixedSinker<Yuv420p10>. Attaching u16 RGB to an 8-bitsink is now a compile error, not a silent stale-buffer bug.
Yuv420pFrame16Error::{UnsupportedBits, SampleOutOfRange, ...},MixedSinkerError::RgbU16BufferTooShort, andRowSlice::{Y10, UHalf10, VHalf10}.Kernel design
range_params_n<BITS, OUT_BITS>derivesy_off/y_scale_q15/c_scale_q15from the input/output bit depths. Q15 coefficientsand i32 intermediates work unchanged across 10/12/14 — the 2-term
chroma sum stays < 10⁹, well inside i32 (16-bit input would overflow;
deferred to Ship 4 with i64 intermediates or a lower-Q coefficient
family).
Every kernel AND-masks
u16loads to the lowBITSbits so out-of-range samples (e.g. p010-packed buffers) produce deterministic,
backend-identical output instead of backend-dependent corruption.
Mask is a no-op on valid input.
SIMD backends
All 5 shipped with scalar-equivalence tests across every matrix
(BT.601/709/2020-NCL/SMPTE240M/FCC/YCgCo) × both range modes × tail
widths (18/30/34/1922/1920) × adversarial out-of-range input.
CI benchmark (1920px row, scalar → SIMD, ns/iter)
Results from the bench runner across all configured tiers (full
report):
Notes:
colconv_force_scalarrows validate the dispatch gate: whenforced scalar, the
simd=trueandsimd=falsepaths match.M-series' wider issue + tighter cache on row-granular workloads).
defaultandavx2-maxnumbers are near-identical — AVX-512 correctness iscovered by the SDE-emulated
test-sde-avx512job.Review findings addressed
Adversarial review iterations flagged and this branch now resolves:
with_rgb_u16compile-time gated toYuv420p10sinks.wording, with shift-left-by-6 instructions for p010 consumers.
low
BITSbits, so scalar and all 5 SIMD backends produce bit-identical output on malformed input.
try_new_checkedopt-inconstructor rejects out-of-range samples up front with a plane-
specific diagnostic.
v0.2".
test-wasm-simd128CI job runsevery equivalence test under wasmtime with simd128 enabled.
Test plan
cargo test --lib— 127 tests pass (NEON native)cargo build --lib --target x86_64-pc-windows-msvc— cleancargo build --lib --target wasm32-unknown-unknown(simd128) —clean
Intel SDE
-icxin thetest-sde-avx512job)test-wasm-simd128job; 118 tests pass including the p10out-of-range adversarial regression)
yuv_420p10_to_rgb— 4.5–6.2× SIMD speedupacross aarch64 + x86_64 + Windows tiers (see table above)
🤖 Generated with Claude Code