[pull] master from libretro:master by pull[bot] · Pull Request #935 · Alexandre1er/RetroArch

pull · 2026-04-17T16:03:03Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

a workaround/hack since we don't have a proper solution

win32_common.c mixed window/gfx infrastructure with pure UI concerns (menu bar, file browser, content loading, "Pick Core" dialog). Move 936 lines of UI code to ui_win32.c where it belongs. Functions moved: - pick_core_proc, win32_resources_pick_core_dialog, align_dword, append_wstr (dialog construction) - win32_load_content_from_gui, win32_drag_query_file (content loading) - win32_browser, g_win32_browser_mode (file browser wrapper) - win32_menu_loop (WM_COMMAND dispatch for menu bar) - win32_resources_create_menu (programmatic menu bar) - menu_id_to_label_enum, menu_id_to_meta_key, win32_meta_key_to_name, win32_localize_menu (menu localisation) The UI resource ID enum moves to ui_win32.h so both translation units can reference it. Functions called cross-file (win32_drag_query_file, win32_menu_loop, win32_load_content_from_gui, win32_localize_menu) become non-static and are declared in ui_win32.h. Existing #ifdef guards (HAVE_MENU, HAVE_THREADS, __WINRT__, LEGACY_WIN32) are preserved. No functional change. win32_common.c: 3403 -> 2413 lines ui_win32.c: 490 -> 1458 lines

for --disable-menu

…ast-path threshold Two independent performance improvements to the async image loader path used by menu thumbnails, wallpapers, and icons. 1) task_image.c: upscale_image() now uses malloc instead of calloc. The nearest-neighbour scale loop writes every destination pixel before returning: the x_src expansion loop fills the top row of each scale_factor-high block, then memcpy duplicates it into the remaining rows. No pixel is ever read before being written, so the zero-fill that calloc performs is pure waste -- the kernel zeros every cache line and the scale loop then overwrites every cache line, doubling the write traffic through memory. Measured on x86_64 at -O2 across typical thumbnail sizes (64x64..512x512 sources, 2x..8x scale factors): 37-55% reduction in upscale_image() wall time, consistent across runs. Larger destinations see the biggest win because zero-fill cost scales with output size. Correctness verified by running the modified loop over a deliberately poisoned (memset 0xCD) destination buffer and confirming byte-identical output to the calloc variant across 11 cases including edge cases (1x1, scale_factor=1, odd dimensions, non-square). 2) task_file_transfer.c: NBIO_SMALL_FILE_THRESHOLD raised from 256 KiB to 1 MiB. Files under this threshold finish their iterative transfer in a single tick rather than spreading work across several frames. The previous 256 KiB limit was tuned for small config files and low-res thumbnails and left modern box-art PNGs (typically 400-600 KiB at 512x720) in the multi-frame iterative path, which is visibly laggy when scrolling a playlist. A blocking 1 MiB read completes in well under a frame on every supported platform, so the larger threshold does not threaten frame pacing. The comment in the header is updated to record the rationale. No behavioural change beyond the above; both files compile cleanly with -fsyntax-only against the existing RetroArch headers.

* video: add display query slots to video_display_server_t and init early Add get_refresh_rate, get_video_output_size, get_video_output_prev, get_video_output_next, and get_metrics to the display server vtable. These operations are platform concerns (they query the display hardware) rather than driver concerns, but were previously only accessible through per-driver poke/ctx interfaces. Wire all 5 slots in dispserv_win32.c, delegating to the existing win32_get_refresh_rate, win32_get_video_output_size, win32_get_video_output_prev/next, and win32_get_metrics functions. Other display servers (x11, kms, android, apple, null) get NULL for now. Add get_display_type to frontend_ctx_driver_t so the platform can report its display type without needing a window. Implement for Win32 (compile-time constant) and Unix (runtime detection via WAYLAND_DISPLAY / DISPLAY environment variables). All other frontends (darwin, uwp, ctr, dos, gx, orbis, ps2, ps3, psp, qnx, switch, wiiu, xdk, xenon, emscripten, null) pass NULL which falls back to RARCH_DISPLAY_NONE. Move video_display_server_init() to run early - right after frontend_driver_init_first() in rarch_main() - so the display server is available before video_driver_init_internal() computes window dimensions. The existing late init inside video_driver_init_internal() remains as a safety net and will reinit if the display type changed. This is the infrastructure commit. Follow-up commits can: - Route video_driver.c dispatch through the display server instead of poke/ctx for these 5 operations - Remove the identical boilerplate wrappers from d3d10/11/12/gdi - Use display server queries for max window size instead of DEFAULT_WINDOW_AUTO_WIDTH_MAX No functional change - existing poke/ctx call paths are untouched. * Fix for frontend_driver.h - include gfx/video_defines.h

PNG decode time for RGBA images is dominated by the per-scanline reverse filter, which walks each byte of the row with a serial recurrence decoded[i] = raw[i] + f(decoded[i-bpp], prev[i], prev[i-bpp]). The scalar loop stalls the pipeline on that dependency and — for PAETH — runs two unpredictable branches per byte. For RGBA the recurrence distance is exactly one pixel (4 bytes), so we can process a pixel's 4 channels in parallel inside one SIMD register while still respecting the pixel-to-pixel chain. This loses the per-byte branch and dependency chain completely. Adds three helpers under the existing RPNG_SIMD_SSE2 / RPNG_SIMD_NEON gates: rpng_filter_sub_rgba — SUB, bpp==4 rpng_filter_avg_rgba — AVERAGE, bpp==4 rpng_filter_paeth_rgba — PAETH, bpp==4, branch-free predictor PAETH uses the standard libpng-style branch-free selection via max(x, -x) for 16-bit abs and cmpgt/and/andnot/or blend for the three-way pick. All arithmetic is in 16-bit lanes to keep the wrap-around semantics of PNG's mod-256 filter. rpng_reverse_filter_copy_line dispatches to these when pngp->bpp is 4 and SSE2/NEON is available; for other bpp or non-SIMD builds the scalar paths are unchanged. Correctness: 1805 randomised tests passed against the scalar reference (20 widths from 1 to 1920 pixels × 30 seeds × 3 filters + all-zero / all-0xFF edge cases + three deliberately misaligned input offsets exercising the memcpy load path). Output is byte-identical. Measured on x86-64, -O2, per-scanline wall time: SUB AVERAGE PAETH 64 px 1.87x 2.61x 1.71x 128 px 3.33x 1.89x 1.76x 256 px 3.60x 2.20x 1.75x 512 px 3.47x 1.87x 1.72x 1024 px 4.14x 2.00x 1.68x 1920 px 3.71x 2.03x 2.14x SUB benefits most because the scalar version is pure sequential adds with no ILP; the SIMD version is just an add-and-chain. AVERAGE and PAETH have more per-iteration work so the fraction gained is smaller, but both still nearly double. Loads and stores use memcpy into an aligned temporary rather than casting through (int32_t*) — the scanline buffer is not guaranteed to be 4-byte aligned at the start of every filter step. The memcpy compiles to a single movd at -O2. Build-gating follows the existing rpng_filter_up pattern. No new public symbols. NEON path compiles but has not been tested on ARM hardware in this change; structural analog of the SSE2 path. No behavioural change for bpp != 4 or for non-SSE2/NEON builds.

LibretroAdmin and others added 7 commits April 17, 2026 12:48

(Apple/MoltenVK) Fixes regression on Apple systems. Note: this is

36eda89

a workaround/hack since we don't have a proper solution

Fix HAVE_MENU and also add a github actions regression test now

600d279

for --disable-menu

Change description

60a64d0

pull bot locked and limited conversation to collaborators Apr 17, 2026

pull bot added the ⤵️ pull label Apr 17, 2026

pull bot merged commit 3464ffe into Alexandre1er:master Apr 17, 2026
17 of 36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from libretro:master#935

[pull] master from libretro:master#935
pull[bot] merged 7 commits intoAlexandre1er:masterfrom
libretro:master

pull bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pull bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pull bot commented Apr 17, 2026 •

edited

Loading