[pull] master from libretro:master#948
Merged
pull[bot] merged 10 commits intoAlexandre1er:masterfrom Apr 21, 2026
Merged
Conversation
Key Finding from This Session The GDB range measurements at pbi+0x383c are unreliable because non-BD code writes to the same memory location during per-MB processing (like dequantizer setup). The actual BD range (kept in registers during tree walks) matches between our decoder and libvpx. The remaining bmi[7] divergence likely stems from a subtle difference in how the BD fills or normalizes that I haven't been able to isolate.
…l binary) fixed the bmi[7] divergence — we now get 9/16 sub-block modes correct for MB0 (up from 7/16). The kf_bmode_prob table was also permuted in both dimensions to match the new enum. The remaining divergence at bmi[9] suggests there may be another subtle difference — possibly in the kf_bmode_prob table entries for the specific context [1][0] that bmi[9] uses. The overall diff remains 100.2 because the cascading error from bmi[9] still propagates through all 805 MBs.
… debugging sessions. The remaining 100.2 mean diff comes from the bmi[9] divergence where our code and libvpx make different decisions at a branching tree node despite apparently identical BD states and probabilities.
…ders
Audit of the TGA (403 lines) and BMP (577 lines) image
decoders. Both are stb_image-derived and both accept
attacker-controlled bytes from files on disk, over HTTP
(thumbnails, cores, assets bundles), or from user uploads.
Nine fixes this round; one is a directly reachable stack
buffer overflow.
Nothing touches rjpeg.c (4808 lines) or rpng.c (1859 lines);
those are their own rounds.
=== rtga.c ===
rtga: reject bogus palette_bits at the header (CRITICAL)
A single attacker-controlled byte in the TGA header
(tga_palette_bits, byte 7) drove a stack buffer overflow.
The indexed-pixel read loop did:
for (j = 0; j * 8 < tga_palette_bits; ++j)
raw_data[j] = tga_palette[pal_idx + j];
where raw_data is declared as "unsigned char raw_data[4]"
on the stack. With tga_palette_bits = 255 the loop writes
raw_data[0..31] -- up to 28 bytes of stack corruption per
pixel, driven directly by the header. Reachable on any
ingest path that accepts TGA (core-provided asset bundles,
RetroAchievement badges, user thumbnails, network mirrors).
Fix: validate tga_palette_bits at the header-check point.
TGA palettes legitimately hold only 15/16/24/32-bit entries;
anything else is rejected. Also reject tga_palette_len < 1
(empty palette) which pre-patch malloc(0)'d and then read
tga_palette[0] OOB.
rtga: rtga_skip() clamp
rtga_skip advanced img_buffer by n without checking against
img_buffer_end. Pointer arithmetic past the end of the
input array is undefined behaviour in C even without a
deref. Clamp to the remaining input; subsequent
rtga_get8 calls check "buffer < buffer_end" and parse as
EOF.
rtga: size_t output allocation + ceiling
malloc((size_t)tga_width * tga_height * sizeof(uint32_t))
was OK on 64-bit but could wrap on 32-bit for the full
65535 x 65535 header-expressible range. Also, even when
the arithmetic doesn't wrap, a legitimate-looking
65535 x 65535 ARGB decode would request ~17 GiB of memory
from a single malicious 18-byte header.
Fix: reject dimensions above 0x4000 x 0x4000 (16384 pixels
each axis = 1 GiB decoded RGBA, larger than any realistic
libretro asset). Do the malloc math in size_t.
rtga: reject size > INT_MAX in rtga_process_image
rtga_process_image cast "size" (size_t) to int before
passing it to rtga_load_from_memory. A file >= 2 GiB
truncated, propagated a negative len into
img_buffer_end = buffer + len, and gave pointer arithmetic
outside the source object. Reject early.
rtga: size_t indexing in output writes
output[dst_row * tga_width + cur_col] used signed int for
the index. 65534 * 65535 overflows int32, UB. Use size_t.
=== rbmp.c ===
rbmp: reject img_y = 0x80000000 (abs(INT_MIN) UB)
Pre-patch the code did
flip_vertically = ((int) s->img_y) > 0;
s->img_y = abs((int) s->img_y);
When img_y == 0x80000000u, the cast to int is INT_MIN and
abs(INT_MIN) is explicitly UB in C99. On GCC with 2's-
complement it returns INT_MIN and the uint32_t reassignment
leaves img_y == 0x80000000 -- a "negative" height that then
drives the output allocation to a very large value.
Fix: detect this specific value before the abs() call and
bail cleanly.
rbmp: size_t output allocation + ceiling
malloc(s->img_x * s->img_y * sizeof(uint32_t)) did the
multiplication in uint32_t (both operands are uint32_t),
so e.g. 0x10001 x 0x10000 silently wrapped to 0x10000,
producing a 256 KiB buffer for 16 GiB of claimed image
data. The subsequent per-row pixel writes ran off the
end for gigabytes -- heap corruption driven by a 14-byte
BMP header.
Fix: same ceiling as rtga (0x4000 x 0x4000) and size_t
math for the malloc.
rbmp: rbmp_skip() clamp
Same as rtga_skip.
rbmp: size_t indexing in output writes
output + dst_row * s->img_x used signed int * uint32
(promoted to uint32), which wraps. Use size_t.
=== Regression tests (two new) ===
libretro-common/samples/formats/tga/rtga_test.c
1. test_malicious_palette_bits (palette_bits=255) --
header check must reject.
2. test_empty_indexed_palette -- TRUE DISCRIMINATOR
under ASan. Pre-patch the unpatched decoder fires:
AddressSanitizer: heap-buffer-overflow
READ of size 1 at rtga.c:286 in rtga_tga_load
(1-byte region allocated at rtga.c:234)
Post-patch the decoder returns IMAGE_PROCESS_ERROR at
the header check.
3. test_happy_path_rgb -- 2x2 uncompressed BGR smoke.
4. test_oversized_size -- size > INT_MAX rejection.
libretro-common/samples/formats/bmp/rbmp_test.c
1. test_img_y_int_min -- 0x80000000 height, post-patch
rejection. Pre-patch would invoke UB via abs(INT_MIN)
and the downstream allocation decisions are
unpredictable.
2. test_dimension_overflow -- 0x10001 x 0x10000, post-
patch rejection. The pre-patch 32-bit wrap is not
reproducible on a 64-bit CI host but the rejection IS
reproducible everywhere and prevents the 17 GiB
allocation attempt that would otherwise OOM the test.
3. test_happy_1x1_24bpp -- happy-path smoke.
Builds clean under -Wall -Werror -std=gnu99. ASan -O0
clean on patched; the header-check rejections fire before
any memory is touched, so ASan never sees anything.
CI: both new tests added to RUN_TARGETS. Full samples
workflow dry-run:
Built: 17 Ran: 17 Failed: 0
VC6 compatibility: both files are compiled on all
platforms. The fix uses only size_t / uint32_t / int and
the existing stdint shim; <limits.h> (C89) is explicitly
included where needed. No new dependencies.
Reachability summary:
* Stack BOF via TGA palette_bits: HIGH severity, single
header byte, any TGA ingest path.
* Heap BOF via BMP dimension wrap: MEDIUM severity on
32-bit, reaches malloc-failure on 64-bit.
* Other fixes: UB hardening + defense-in-depth.
The final fix was discovering (by dumping vp8_bmode_tree directly from the libvpx binary) that the tree leaf assignments for LD and VR were swapped between the left and right subtrees: Correct tree (from libvpx binary): Left subtree (node3=0): HE(3), LD(5), RD(6) Right subtree (node3=1): VR(4), VL(7), HD(8), HU(9) Our previous (wrong) tree: Left subtree: HE(3), RD(6), VR(4) ← LD/VR swapped Right subtree: LD(5), VL(7), HD(8), HU(9) ← LD/VR swapped Combined with the exact kf_bmode_prob table extracted from the libvpx binary (which resolved 405 table mismatches from our incorrect permutation), all 16 sub-block modes for MB0 now produce a perfect match with libvpx. The mean diff dropped from 100.2 to 97.4. The remaining error comes from subsequent MBs where context-dependent reads accumulate small differences, but the foundation is now correct.
…issing piece! The file had silently reverted to the 16-bit BD during earlier test iterations, and the 16-bit BD produces different results at boundary-case splits after ~1920 reads. All 12 fixes are now cleanly applied:
Audit of rpng.c (1859 lines; the PNG decoder used for
thumbnails, cores' asset bundles, RetroAchievement badges,
and every screenshot displayed by the menu). Three integer-
overflow fixes plus a regression test.
rpng: replace pointer-arithmetic chunk bound check
The chunk-size guard in rpng_iterate_image was:
if (buf + 8 + chunk_size > rpng->buff_end)
return false;
where chunk_size is a 32-bit attacker-controlled value from
the PNG stream. On 32-bit builds a value near UINT32_MAX
wraps the pointer address, the compare is defeated, and the
subsequent IDAT handler's
memcpy(rpng->idat_buf.data + rpng->idat_buf.size,
buf, chunk_size);
can read up to ~4 GiB past the end of the input buffer. On
64-bit the pointer has enough headroom that the compare
still happens to give the right answer in practice, but the
arithmetic is still UB per C99 (pointer values outside the
bounds of the object, and not one-past).
Fix: compute bytes-remaining as a size_t and compare sizes
rather than pointers. The new check also requires all 12
bytes of chunk framing (length + type + CRC) -- pre-patch
the pointer compare happened to only enforce 8 bytes'
presence, letting a chunk whose declared length consumed
all but 1..3 bytes of the input pass (not exploitable on
its own because the advance at the end of the function
then tripped the next-iteration check, but a real
correctness bug that the size compare fixes for free).
rpng: cap IDAT accumulation at 256 MiB
rpng_realloc_idat grew an internal buffer on each IDAT
chunk via:
size_t required = buf->size + chunk_size;
if (required > buf->capacity) {
while (new_cap < required) new_cap *= 2;
realloc(...);
}
On 32-bit size_t the addition could wrap after many chunks
accumulated, making the compare falsely false and the
memcpy overrun the existing allocation. The doubling loop
could similarly overflow new_cap on its own for a single
oversized chunk. On 64-bit the arithmetic holds, but there
was nothing stopping a hostile PNG from streaming in
arbitrarily many IDAT chunks to drive the allocation up.
Fix: introduce RPNG_IDAT_MAX = 256 MiB (dwarfs any
realistic libretro asset) and reject both the per-chunk
addition and the doubling loop if it would exceed that
cap. Accumulated total cannot grow past RPNG_IDAT_MAX.
rpng: widen buff_data advance to size_t
At the end of rpng_iterate_image:
rpng->buff_data += chunk_size + 12;
chunk_size + 12 is computed in uint32_t (uint32_t + int
promotes to uint32_t), so chunk_size near UINT32_MAX wraps.
The per-chunk-size overflow guard above now rejects any
value large enough to trigger this, but keep the
arithmetic explicit in size_t so that a future relaxation
of the guard upstream doesn't silently reintroduce the
wrap.
=== Regression test ===
libretro-common/samples/formats/png/rpng_chunk_overflow_test.c
Three subtests exercising the chunk-header guard:
1. chunk_size = 0xFFFFFFF8 must be rejected
2. chunk_size larger than remaining input rejected
3. Valid minimal IHDR chunk iterates cleanly (smoke)
Honest note on discriminator status: the headline bug is a
pointer-arithmetic overflow only genuinely reachable on
32-bit builds -- on a 64-bit host the pointer has enough
headroom that the compare still happens to give the right
answer (the arithmetic itself is UB, but no sanitizer
flags it in practice). These tests therefore exercise the
NEW size_t-based guard and serve as regression protection
against anyone reintroducing unguarded pointer arithmetic;
they are not pre/post discriminators on a 64-bit CI host.
Compiled with -m32 or run on a 32-bit host, subtest #1
becomes a true discriminator.
The second fix (RPNG_IDAT_MAX cap) is not directly
testable in a CI sample -- exercising the accumulator
overflow needs either a truly 32-bit build environment or
a >256 MiB crafted PNG, neither of which belong in a unit
test. The fix is localised, the cap is explicit, and the
code path leading to it is already exercised by the
existing rpng encode+decode smoke test.
CI: libretro-common samples workflow updated.
rpng_chunk_overflow_test added to RUN_TARGETS. Full local
dry-run under the GHA shell contract:
Built: 17 Ran: 18 Failed: 0
(Ran>Built because the PNG directory's Makefile now builds
both the existing `rpng` interactive sample and the new
`rpng_chunk_overflow_test` in one invocation; the build
count is per-directory.)
VC6 compatibility: rpng.c is compiled on all platforms. The
fix uses only size_t / uint32_t and standard comparisons;
no new headers.
…0,0) is 2.6 mean diff and MB(0,1) dropped from 39.6 to 7.4. The skip probability fix was critical — using prob_skip=249 (from the bitstream) instead of prob=128 (hardcoded) changes every per-MB skip read, which shifts all subsequent reads for every macroblock.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )