[pull] master from libretro:master by pull[bot] · Pull Request #948 · Alexandre1er/RetroArch

pull · 2026-04-21T16:03:03Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

Key Finding from This Session The GDB range measurements at pbi+0x383c are unreliable because non-BD code writes to the same memory location during per-MB processing (like dequantizer setup). The actual BD range (kept in registers during tree walks) matches between our decoder and libvpx. The remaining bmi[7] divergence likely stems from a subtle difference in how the BD fills or normalizes that I haven't been able to isolate.

…l binary) fixed the bmi[7] divergence — we now get 9/16 sub-block modes correct for MB0 (up from 7/16). The kf_bmode_prob table was also permuted in both dimensions to match the new enum. The remaining divergence at bmi[9] suggests there may be another subtle difference — possibly in the kf_bmode_prob table entries for the specific context [1][0] that bmi[9] uses. The overall diff remains 100.2 because the cascading error from bmi[9] still propagates through all 805 MBs.

… debugging sessions. The remaining 100.2 mean diff comes from the bmi[9] divergence where our code and libvpx make different decisions at a branching tree node despite apparently identical BD states and probabilities.

…ders Audit of the TGA (403 lines) and BMP (577 lines) image decoders. Both are stb_image-derived and both accept attacker-controlled bytes from files on disk, over HTTP (thumbnails, cores, assets bundles), or from user uploads. Nine fixes this round; one is a directly reachable stack buffer overflow. Nothing touches rjpeg.c (4808 lines) or rpng.c (1859 lines); those are their own rounds. === rtga.c === rtga: reject bogus palette_bits at the header (CRITICAL) A single attacker-controlled byte in the TGA header (tga_palette_bits, byte 7) drove a stack buffer overflow. The indexed-pixel read loop did: for (j = 0; j * 8 < tga_palette_bits; ++j) raw_data[j] = tga_palette[pal_idx + j]; where raw_data is declared as "unsigned char raw_data[4]" on the stack. With tga_palette_bits = 255 the loop writes raw_data[0..31] -- up to 28 bytes of stack corruption per pixel, driven directly by the header. Reachable on any ingest path that accepts TGA (core-provided asset bundles, RetroAchievement badges, user thumbnails, network mirrors). Fix: validate tga_palette_bits at the header-check point. TGA palettes legitimately hold only 15/16/24/32-bit entries; anything else is rejected. Also reject tga_palette_len < 1 (empty palette) which pre-patch malloc(0)'d and then read tga_palette[0] OOB. rtga: rtga_skip() clamp rtga_skip advanced img_buffer by n without checking against img_buffer_end. Pointer arithmetic past the end of the input array is undefined behaviour in C even without a deref. Clamp to the remaining input; subsequent rtga_get8 calls check "buffer < buffer_end" and parse as EOF. rtga: size_t output allocation + ceiling malloc((size_t)tga_width * tga_height * sizeof(uint32_t)) was OK on 64-bit but could wrap on 32-bit for the full 65535 x 65535 header-expressible range. Also, even when the arithmetic doesn't wrap, a legitimate-looking 65535 x 65535 ARGB decode would request ~17 GiB of memory from a single malicious 18-byte header. Fix: reject dimensions above 0x4000 x 0x4000 (16384 pixels each axis = 1 GiB decoded RGBA, larger than any realistic libretro asset). Do the malloc math in size_t. rtga: reject size > INT_MAX in rtga_process_image rtga_process_image cast "size" (size_t) to int before passing it to rtga_load_from_memory. A file >= 2 GiB truncated, propagated a negative len into img_buffer_end = buffer + len, and gave pointer arithmetic outside the source object. Reject early. rtga: size_t indexing in output writes output[dst_row * tga_width + cur_col] used signed int for the index. 65534 * 65535 overflows int32, UB. Use size_t. === rbmp.c === rbmp: reject img_y = 0x80000000 (abs(INT_MIN) UB) Pre-patch the code did flip_vertically = ((int) s->img_y) > 0; s->img_y = abs((int) s->img_y); When img_y == 0x80000000u, the cast to int is INT_MIN and abs(INT_MIN) is explicitly UB in C99. On GCC with 2's- complement it returns INT_MIN and the uint32_t reassignment leaves img_y == 0x80000000 -- a "negative" height that then drives the output allocation to a very large value. Fix: detect this specific value before the abs() call and bail cleanly. rbmp: size_t output allocation + ceiling malloc(s->img_x * s->img_y * sizeof(uint32_t)) did the multiplication in uint32_t (both operands are uint32_t), so e.g. 0x10001 x 0x10000 silently wrapped to 0x10000, producing a 256 KiB buffer for 16 GiB of claimed image data. The subsequent per-row pixel writes ran off the end for gigabytes -- heap corruption driven by a 14-byte BMP header. Fix: same ceiling as rtga (0x4000 x 0x4000) and size_t math for the malloc. rbmp: rbmp_skip() clamp Same as rtga_skip. rbmp: size_t indexing in output writes output + dst_row * s->img_x used signed int * uint32 (promoted to uint32), which wraps. Use size_t. === Regression tests (two new) === libretro-common/samples/formats/tga/rtga_test.c 1. test_malicious_palette_bits (palette_bits=255) -- header check must reject. 2. test_empty_indexed_palette -- TRUE DISCRIMINATOR under ASan. Pre-patch the unpatched decoder fires: AddressSanitizer: heap-buffer-overflow READ of size 1 at rtga.c:286 in rtga_tga_load (1-byte region allocated at rtga.c:234) Post-patch the decoder returns IMAGE_PROCESS_ERROR at the header check. 3. test_happy_path_rgb -- 2x2 uncompressed BGR smoke. 4. test_oversized_size -- size > INT_MAX rejection. libretro-common/samples/formats/bmp/rbmp_test.c 1. test_img_y_int_min -- 0x80000000 height, post-patch rejection. Pre-patch would invoke UB via abs(INT_MIN) and the downstream allocation decisions are unpredictable. 2. test_dimension_overflow -- 0x10001 x 0x10000, post- patch rejection. The pre-patch 32-bit wrap is not reproducible on a 64-bit CI host but the rejection IS reproducible everywhere and prevents the 17 GiB allocation attempt that would otherwise OOM the test. 3. test_happy_1x1_24bpp -- happy-path smoke. Builds clean under -Wall -Werror -std=gnu99. ASan -O0 clean on patched; the header-check rejections fire before any memory is touched, so ASan never sees anything. CI: both new tests added to RUN_TARGETS. Full samples workflow dry-run: Built: 17 Ran: 17 Failed: 0 VC6 compatibility: both files are compiled on all platforms. The fix uses only size_t / uint32_t / int and the existing stdint shim; <limits.h> (C89) is explicitly included where needed. No new dependencies. Reachability summary: * Stack BOF via TGA palette_bits: HIGH severity, single header byte, any TGA ingest path. * Heap BOF via BMP dimension wrap: MEDIUM severity on 32-bit, reaches malloc-failure on 64-bit. * Other fixes: UB hardening + defense-in-depth.

The final fix was discovering (by dumping vp8_bmode_tree directly from the libvpx binary) that the tree leaf assignments for LD and VR were swapped between the left and right subtrees: Correct tree (from libvpx binary): Left subtree (node3=0): HE(3), LD(5), RD(6) Right subtree (node3=1): VR(4), VL(7), HD(8), HU(9) Our previous (wrong) tree: Left subtree: HE(3), RD(6), VR(4) ← LD/VR swapped Right subtree: LD(5), VL(7), HD(8), HU(9) ← LD/VR swapped Combined with the exact kf_bmode_prob table extracted from the libvpx binary (which resolved 405 table mismatches from our incorrect permutation), all 16 sub-block modes for MB0 now produce a perfect match with libvpx. The mean diff dropped from 100.2 to 97.4. The remaining error comes from subsequent MBs where context-dependent reads accumulate small differences, but the foundation is now correct.

…issing piece! The file had silently reverted to the 16-bit BD during earlier test iterations, and the 16-bit BD produces different results at boundary-case splits after ~1920 reads. All 12 fixes are now cleanly applied:

Audit of rpng.c (1859 lines; the PNG decoder used for thumbnails, cores' asset bundles, RetroAchievement badges, and every screenshot displayed by the menu). Three integer- overflow fixes plus a regression test. rpng: replace pointer-arithmetic chunk bound check The chunk-size guard in rpng_iterate_image was: if (buf + 8 + chunk_size > rpng->buff_end) return false; where chunk_size is a 32-bit attacker-controlled value from the PNG stream. On 32-bit builds a value near UINT32_MAX wraps the pointer address, the compare is defeated, and the subsequent IDAT handler's memcpy(rpng->idat_buf.data + rpng->idat_buf.size, buf, chunk_size); can read up to ~4 GiB past the end of the input buffer. On 64-bit the pointer has enough headroom that the compare still happens to give the right answer in practice, but the arithmetic is still UB per C99 (pointer values outside the bounds of the object, and not one-past). Fix: compute bytes-remaining as a size_t and compare sizes rather than pointers. The new check also requires all 12 bytes of chunk framing (length + type + CRC) -- pre-patch the pointer compare happened to only enforce 8 bytes' presence, letting a chunk whose declared length consumed all but 1..3 bytes of the input pass (not exploitable on its own because the advance at the end of the function then tripped the next-iteration check, but a real correctness bug that the size compare fixes for free). rpng: cap IDAT accumulation at 256 MiB rpng_realloc_idat grew an internal buffer on each IDAT chunk via: size_t required = buf->size + chunk_size; if (required > buf->capacity) { while (new_cap < required) new_cap *= 2; realloc(...); } On 32-bit size_t the addition could wrap after many chunks accumulated, making the compare falsely false and the memcpy overrun the existing allocation. The doubling loop could similarly overflow new_cap on its own for a single oversized chunk. On 64-bit the arithmetic holds, but there was nothing stopping a hostile PNG from streaming in arbitrarily many IDAT chunks to drive the allocation up. Fix: introduce RPNG_IDAT_MAX = 256 MiB (dwarfs any realistic libretro asset) and reject both the per-chunk addition and the doubling loop if it would exceed that cap. Accumulated total cannot grow past RPNG_IDAT_MAX. rpng: widen buff_data advance to size_t At the end of rpng_iterate_image: rpng->buff_data += chunk_size + 12; chunk_size + 12 is computed in uint32_t (uint32_t + int promotes to uint32_t), so chunk_size near UINT32_MAX wraps. The per-chunk-size overflow guard above now rejects any value large enough to trigger this, but keep the arithmetic explicit in size_t so that a future relaxation of the guard upstream doesn't silently reintroduce the wrap. === Regression test === libretro-common/samples/formats/png/rpng_chunk_overflow_test.c Three subtests exercising the chunk-header guard: 1. chunk_size = 0xFFFFFFF8 must be rejected 2. chunk_size larger than remaining input rejected 3. Valid minimal IHDR chunk iterates cleanly (smoke) Honest note on discriminator status: the headline bug is a pointer-arithmetic overflow only genuinely reachable on 32-bit builds -- on a 64-bit host the pointer has enough headroom that the compare still happens to give the right answer (the arithmetic itself is UB, but no sanitizer flags it in practice). These tests therefore exercise the NEW size_t-based guard and serve as regression protection against anyone reintroducing unguarded pointer arithmetic; they are not pre/post discriminators on a 64-bit CI host. Compiled with -m32 or run on a 32-bit host, subtest #1 becomes a true discriminator. The second fix (RPNG_IDAT_MAX cap) is not directly testable in a CI sample -- exercising the accumulator overflow needs either a truly 32-bit build environment or a >256 MiB crafted PNG, neither of which belong in a unit test. The fix is localised, the cap is explicit, and the code path leading to it is already exercised by the existing rpng encode+decode smoke test. CI: libretro-common samples workflow updated. rpng_chunk_overflow_test added to RUN_TARGETS. Full local dry-run under the GHA shell contract: Built: 17 Ran: 18 Failed: 0 (Ran>Built because the PNG directory's Makefile now builds both the existing `rpng` interactive sample and the new `rpng_chunk_overflow_test` in one invocation; the build count is per-directory.) VC6 compatibility: rpng.c is compiled on all platforms. The fix uses only size_t / uint32_t and standard comparisons; no new headers.

…0,0) is 2.6 mean diff and MB(0,1) dropped from 39.6 to 7.4. The skip probability fix was critical — using prob_skip=249 (from the bitstream) instead of prob=128 (hardcoded) changes every per-MB skip read, which shifts all subsequent reads for every macroblock.

warmenhoven and others added 10 commits April 21, 2026 09:39

Fix HDR on overlays

5f5a0be

Add missing sample

85e060d

pull Bot locked and limited conversation to collaborators Apr 21, 2026

pull Bot added the ⤵️ pull label Apr 21, 2026

pull Bot merged commit 85e060d into Alexandre1er:master Apr 21, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from libretro:master#948

[pull] master from libretro:master#948
pull[bot] merged 10 commits intoAlexandre1er:masterfrom
libretro:master

pull Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pull Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pull Bot commented Apr 21, 2026 •

edited

Loading