Skip to content

[pull] master from libretro:master#932

Merged
pull[bot] merged 29 commits intoAlexandre1er:masterfrom
libretro:master
Apr 16, 2026
Merged

[pull] master from libretro:master#932
pull[bot] merged 29 commits intoAlexandre1er:masterfrom
libretro:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Apr 16, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

LibretroAdmin and others added 29 commits April 16, 2026 13:24
Primary fix: idle barrier before driver deinit
==============================================

When the video driver is torn down (driver reinit from fullscreen
toggle, driver switch, or core launch, or full shutdown), the
deinit sequence in driver_uninit() runs on the main thread:

  1) gfx_widgets_deinit()        -- frees widget GPU resources
  2) menu_driver_ctl(DEINIT)     -- calls xmb_context_destroy /
                                    ozone_context_destroy /
                                    materialui_context_destroy,
                                    which free textures and fonts
  3) video_driver_free_internal()-- finally stops the video thread
                                    via sthread_join

Between steps 1 and 3, the video thread is still alive and may
be processing a frame that was queued before the deinit began.
That in-flight frame can reference textures and fonts that step 1
or 2 has just freed, producing use-after-free crashes -- most
visibly on D3D12 and Vulkan where the GPU driver validates
texture handles.

A comment at step 1 already acknowledges the issue ("This
absolutely has to be done before video_driver_free_internal()")
but relies on ordering rather than synchronisation.  Prior
commits in this series added context_generation guards to each
menu driver as defensive mitigation (2ac0afa), but those only
cover the menu's own draw calls, not widgets or the video
thread's internal state.

Fix: add a true barrier in the threaded video wrapper that waits
for the video thread to finish any pending frame and become idle
(blocked on cond_thread waiting for the next command).  The
barrier uses the existing thr->lock mutex and thr->cond_cmd
signal -- the frame processing loop already signals cond_cmd
after clearing thr->frame.updated (line 509 of the frame loop),
so the barrier just waits for the frame.updated flag to clear.
Secondary fix: harden wrapper API against reinit window misuse
==============================================================

A comment in video_driver_texture_load() acknowledges a latent
misuse window: "During reinit, video_st->threaded may already
reflect the new setting while video_st->data still points to
the real driver, not thread_video_t."  That site was previously
fixed by requiring both VIDEO_DRIVER_IS_THREADED_INTERNAL and
VIDEO_FLAG_THREAD_WRAPPER_ACTIVE before taking the threaded
path.  But the underlying wrapper functions themselves still
unconditionally cast video_st->data to thread_video_t*, making
them unsafe if called from any other path that only checks the
configuration flag.

The three public wrapper functions that read video_st->data
are now self-safe: each checks VIDEO_FLAG_THREAD_WRAPPER_ACTIVE
as the first step before casting.

  - video_thread_wait_idle():    returns (no-op, matches intent)
  - video_thread_font_init():    returns false (no-op)
  - video_thread_texture_handle(): falls back to func(data)
    on the caller's thread -- matches the existing "already
    on video thread" branch semantics, preserving texture
    loads during any reinit-window call.

For video_thread_font_init specifically, this closes a
pre-existing latent bug: its sole caller in font_driver.c
guards on `is_threaded` (the configuration flag) only, not on
VIDEO_FLAG_THREAD_WRAPPER_ACTIVE.  During a reinit transition
where those two disagree, the function would previously cast
a raw driver handle to thread_video_t* and access thr->lock,
producing undefined behaviour.

Changes
=======

- gfx/video_thread_wrapper.c:
  * Add video_thread_wait_idle() -- the barrier.  Guarded on
    VIDEO_FLAG_THREAD_WRAPPER_ACTIVE, then checks thr, then
    checks for self-deadlock (called from video thread), then
   waits on cond_cmd while frame.updated is true.
  * Harden video_thread_font_init with the flag check.
  * Harden video_thread_texture_handle with the flag check,
    falling back to direct func invocation.

- gfx/video_thread_wrapper.h: export video_thread_wait_idle().

- retroarch.c: call video_thread_wait_idle() at the top of
  driver_uninit(), before gfx_widgets_deinit and menu_driver_ctl
  (DEINIT).  Guarded by HAVE_THREADS, DRIVERS_VIDEO_INPUT flag,
  VIDEO_DRIVER_IS_THREADED_INTERNAL, and
  VIDEO_FLAG_THREAD_WRAPPER_ACTIVE to be a no-op when not
  applicable.

Cost
====

The barrier adds one lock acquire + one potential cond_wait per
driver reinit.  Common case (no pending frame) returns
immediately.  Worst case blocks for up to one frame duration
(~16ms) during a driver reinit which already pauses rendering.

The hardening adds one flag read per call to the three public
wrapper functions.  No measurable overhead.

Compatibility
=============

Non-threaded video: all new code paths are short-circuited by
the flag checks.  Functionally invisible.

Builds without HAVE_THREADS: the entire barrier call site in
retroarch.c is #ifdef-guarded, and the wrapper file is not
compiled.  Zero footprint.

Defense in depth: the context_generation guards from 2ac0afa
remain.  With this barrier in place, they should never fire
in practice during normal driver reinit.

No behavioural change during normal operation.  Eliminates the
TOCTOU window between menu GPU resource freeing and video
thread shutdown, and closes a pre-existing reinit-window bug
in video_thread_font_init.
…ideo active

The D3D12 driver was partially thread-unsafe when used with the
threaded video wrapper.  Unlike gl2/gl3/d3d9 which route texture
operations through video_thread_texture_handle so that all GPU
work happens on the video thread, D3D12 relied on D3D12's
inherent free-threaded device API and executed load/unload on
the calling (main) thread.  This created several races when
threaded video was active:

1. texture->dirty flag
   Written on main thread (after Map/Unmap in d3d12_update_texture).
   Read and cleared on video thread (during d3d12_gfx_frame and
   d3d12_upload_texture).  No synchronisation.  On weak-memory-
   order CPUs the video thread could observe a stale `false` flag
   and skip the upload, producing missing textures.

2. Upload buffer data visibility
   Pixel data written via Map/memcpy on main thread.  Consumed
   via CopyTextureRegion on video thread during command list
   recording.  No CPU-side memory barrier ensures the main
   thread's writes are visible to the video thread's reads.

3. Release timing vs command list recording
   d3d12_gfx_unload_texture has a GPU fence wait that ensures
   submitted work has completed, but does not serialise against
   the video thread CURRENTLY building a command list that
   references the texture being released.  In D3D12, command
   lists do not hold references to resources they record, so
   releasing a texture while its use is being recorded can
   leave the recorded command list with a dangling pointer.

4. fenceValue counter race
   d3d12->queue.fenceValue is incremented via non-atomic ++
   from both the main thread (unload path at line 6273) and
   the video thread (frame submission, font free, shader
   reset paths).  Concurrent increments can skip or duplicate
   fence values, breaking synchronisation.

Fix: adopt the gl2/d3d9 pattern.  When threaded video is active,
dispatch both load_texture and unload_texture to the video
thread via video_thread_texture_handle, so all D3D12 command
queue operations are serialised on the queue-owning thread.
This eliminates all four races in one change: shared state is
no longer shared across threads.

Implementation:

- Split d3d12_gfx_load_texture into an internal function
  (d3d12_gfx_load_texture_internal) that does the actual GPU
  work, and an outer function that dispatches.  Same for unload.

- New d3d12_texture_cmd_t packages the d3d12 pointer, image,
  filter_type, and handle fields.  Used for both load (image
  is input, handle is output) and unload (handle is input).

- Return the uintptr_t handle via cmd->handle rather than the
  int return channel of custom_command_method_t, avoiding
  truncation on 64-bit Windows where uintptr_t is 64 bits but
  int is 32 bits.

- The outer d3d12_gfx_load_texture / d3d12_gfx_unload_texture
  check the `threaded` parameter.  By the time it reaches the
driver, `threaded` has already been validated against both
  VIDEO_DRIVER_IS_THREADED_INTERNAL and
  VIDEO_FLAG_THREAD_WRAPPER_ACTIVE by video_driver_texture_load
  and video_driver_texture_unload.  If threaded, dispatch via
  video_thread_texture_handle.  Otherwise, call the internal
  function directly on the current thread (single-threaded
  path unchanged).

- video_thread_texture_handle already handles the self-call
  case (invoked from video thread) by calling the wrap function
  directly, avoiding self-deadlock.  This means d3d12's internal
  paths that run on the video thread (shader reset, font init)
  can call d3d12_gfx_load_texture with threaded=true safely.

Cost: texture load now blocks for up to one frame duration
(~16ms) waiting for the video thread to process the dispatched
command.  Texture loads happen during menu navigation and
playlist icon population, not in frame-critical paths, so the
latency is imperceptible.

Depends on: the video_thread_wait_idle barrier commit
(8d55232), which also hardens video_thread_texture_handle to
fall back to direct invocation when the wrapper is inactive.
That defense-in-depth means this patch is safe even against
future callers that don't follow the full guard pattern.

No behavioural change on non-threaded video (`threaded == false`
path unchanged).  No behavioural change on HW-context cores
(threaded video is force-disabled for those via
VIDEO_DRIVER_IS_THREADED_INTERNAL).

Not addressed by this patch: d3d12_font_free still runs on the
main thread during mid-session font scale changes.  XMB's
pending_context_reset counter already mitigates this by
deferring the context reset until the GPU has drained, so it's
not a new issue.  Addressing the font free path would require
a separate refactor outside this patch's scope.
… active

The Vulkan driver is already well-synchronised for most
operations -- queue_lock protects all vkQueueSubmit /
vkQueuePresent / vkQueueWaitIdle calls, and vulkan_load_texture
uploads textures synchronously under that lock so the texture
is fully resident on the GPU by the time the function returns.

But vulkan_unload_texture has a residual race that the existing
TODO comment at line 7307 already acknowledged:

  /* TODO: We really want to defer this deletion instead,
   * but this will do for now. */

The current sequence is:

  1. Lock queue_lock
  2. vkQueueWaitIdle (waits for SUBMITTED work)
  3. Unlock queue_lock
  4. vulkan_destroy_texture (NOT under the lock)

Step 2 only drains commands that have already been submitted
to the queue.  It does not account for the video thread
currently RECORDING a command buffer that references the
texture -- that command buffer has not yet been submitted, so
vkQueueWaitIdle doesn't see it.

The sequence that triggers the race:

  T0: Video thread is recording commands in vulkan_frame that
      reference texture X (uses X->view in a descriptor, or
      X->image in a barrier).  Not yet submitted.
  T1: Main thread enters vulkan_unload_texture for X.
  T2: Main thread locks queue_lock, calls vkQueueWaitIdle
      (returns immediately -- nothing to wait for).
  T3: Main thread unlocks queue_lock.
  T4: Main thread calls vulkan_destroy_texture -- X's image,
      view, and memory are freed.
  T5: Video thread finishes recording, acquires queue_lock,
      calls vkQueueSubmit.  The command buffer references
      freed Vulkan handles -- undefined behaviour, typically
      a validation-layer error or GPU hang.

Fix: follow the same pattern just applied to D3D12 (and long
established in gl2/gl3/d3d9cg).  When threaded video is active,
dispatch the unload via video_thread_texture_handle so the
destruction runs on the video thread, serialised with command
buffer recording.  The video thread can only run one
vulkan_frame or one custom_command at a time, so by the time
the unload wrap function executes, any in-progress recording
has completed and the subsequent vkQueueWaitIdle drains any
submitted work before destruction.

Implementation:

- Split vulkan_unload_texture into an internal function
  (vulkan_unload_texture_internal) that does the actual work,
  and an outer function that dispatches.

- New vulkan_texture_cmd_t packages the vk_t pointer and
  texture handle.  (Smaller than d3d12_texture_cmd_t because
  Vulkan unload has no filter_type and returns no handle.)

- vulkan_load_texture is NOT changed: unlike D3D12 which
  uses deferred upload (dirty flag + per-frame CopyTextureRegion),
  Vulkan uploads synchronously inside vulkan_create_texture
  under queue_lock, so the texture is fully live before the
  function returns.  No cross-thread races on the load path.

Cost: texture unload now blocks briefly while the video thread
processes the dispatched command.  In practice this is much
less than the existing vkQueueWaitIdle stall -- the video
thread usually completes its current operation in microseconds,
whereas vkQueueWaitIdle can stall for milliseconds waiting for
GPU queue drain.  So this patch is approximately a performance
neutral or positive change.

Depends on: 8d55232 (VIDEO/THREAD: barrier + wrapper hardening),
which hardens video_thread_texture_handle to be safe when the
wrapper is inactive.

The TODO comment is retained in the internal function, as the
deferred-deletion pattern it describes would still be a valid
future optimisation -- it would avoid the per-unload
vkQueueWaitIdle stall by piggybacking on the natural frame
fence.  This patch eliminates the crash, not the stall.

No behavioural change on non-threaded video (the `threaded ==
false` path calls the internal function directly, same as the
pre-patch code).  No behavioural change on HW-context cores
(threaded video is disabled for those by
VIDEO_DRIVER_IS_THREADED_INTERNAL).
…ideo active

Follow-up to the D3D11 thread-safety patch.  D3D10 has the same
texture load/unload pattern as D3D11 (immediate CopySubresourceRegion
during load, direct Release during unload) and therefore the
same race classes when used with the threaded video wrapper.

D3D10 is slightly less severe than D3D11 in one respect: the
ID3D10Device is thread-safe by default (we do not set
D3D10_CREATE_DEVICE_SINGLETHREADED at creation), so concurrent
Device method calls from the main thread and video thread are
serialised by an internal D3D10 mutex.  This prevents outright
undefined behaviour on the context methods themselves.

However, the second race remains and is not mitigated by the
internal mutex:

Resource Release vs pending GPU work
------------------------------------
d3d10_update_texture during load_texture queues an asynchronous
CopySubresourceRegion on the Device.  d3d10_gfx_unload_texture
calls Release with no fence or wait -- if the GPU hasn't
finished processing pending commands that reference the
texture, the resource can be freed while still in use.

Fix: mirror the D3D11 patch.  When threaded video is active,
dispatch both load and unload to the video thread via
video_thread_texture_handle so all texture operations are
serialised with the video thread's draws.  This also
consolidates Device context access to a single thread even
though the internal mutex already makes it correct -- reducing
contention on that mutex.

Implementation is structurally identical to the D3D11 patch:

- d3d10_gfx_load_texture_internal / d3d10_gfx_unload_texture_internal
  contain the original logic.
- Outer functions dispatch via video_thread_texture_handle when
  threaded is true, otherwise call the internal function directly.
- d3d10_texture_cmd_t packages parameters and carries the
  handle field for both input (unload) and output (load).

Depends on: 8d55232 (VIDEO/THREAD: barrier + wrapper hardening)
and the D3D11 thread-safety patch (applied immediately before
for structural consistency, though technically independent).

No behavioural change on non-threaded video.
…ideo active

The D3D11 driver had two threading issues when used with the
threaded video wrapper.  Neither was specific to D3D11 as an API
-- they mirror the issues already fixed in D3D12 and Vulkan --
but D3D11 is actually the most severe case because its
ImmediateContext is explicitly documented as not thread-safe.

Race 1: ImmediateContext concurrent access
------------------------------------------
ID3D11DeviceContext (the immediate context used for rendering)
is NOT thread-safe -- Microsoft's documentation explicitly says
concurrent method calls produce undefined behaviour.  RetroArch
holds a single d3d11->context pointer used for all rendering.

When threaded video is active:
  - The video thread calls d3d11_gfx_frame which issues draw
    commands via d3d11->context (OMSet*, VSSet*, PSSet*,
    Draw, DrawIndexed, etc.).
  - The main thread concurrently calls d3d11_gfx_load_texture,
    which reaches d3d11_update_texture and calls context
    methods: Map, Unmap, CopySubresourceRegion, GenerateMips.

These two code paths race on the same ImmediateContext.  On
current D3D11 runtimes this typically does not crash outright
because the runtime has some internal locking for validation,
but correct behaviour is not guaranteed and we're relying on
an implementation detail.

Race 2: Release while GPU has pending references
------------------------------------------------
d3d11_gfx_unload_texture calls Release on the COM objects.
COM refcounting is thread-safe but does not block on GPU
completion.  If the ImmediateContext has queued a draw that
references the texture (and the GPU hasn't processed it yet),
the Release can free the resource before the GPU is done.

Fix: follow the pattern established in D3D12 (a2d7b48) and
Vulkan (17332fe) -- dispatch load and unload to the video
thread via video_thread_texture_handle when threaded video is
active.  This serialises all ImmediateContext access on the
video thread, and ensures Release runs only after any pending
draw that might reference the texture has been queued (the
video thread cannot be mid-recording when it handles the
dispatched command).

Implementation:

- Split d3d11_gfx_load_texture into internal / outer functions.
  Same for unload.

- New d3d11_texture_cmd_t packages the parameters, with handle
  used for both input (unload) and output (load).

- Return handle via cmd->handle to avoid truncation of 64-bit
  uintptr_t through custom_command_method_t's int return.

- Non-threaded path is unchanged -- calls the internal function
  directly.

Depends on: 8d55232 (VIDEO/THREAD: barrier + wrapper hardening).

No behavioural change on non-threaded video.  No behavioural
change on HW-context cores (threaded video is disabled for
those by VIDEO_DRIVER_IS_THREADED_INTERNAL).
(features_cpu) fix __cpuidex guard — requires VS2010 (_MSC_VER >= 1600), not 1500
The bundled DirectX headers in gfx/include/dxsdk/ are modern Windows
8.x/10 SDK headers and use SAL annotations throughout their function
prototypes. These annotations come in two generations:

  * SAL 1 (double-underscore): __in, __out, __inout, __in_ecount,
    __out_bcount, __deref_out, etc.
  * SAL 2 (leading underscore + trailing underscore): _In_, _Out_,
    _Inout_, _In_reads_(s), _Out_writes_bytes_(s), _COM_Outptr_,
    _Always_(x), _Outptr_opt_result_maybenull_, etc.

Older Microsoft toolchains ship incomplete or nonexistent sal.h:

  * MSVC 2003 (_MSC_VER 1310): no sal.h at all. Both SAL 1 and SAL 2
    macros are undefined.
  * MSVC 2005 (_MSC_VER 1400) and MSVC 2008 RTM (_MSC_VER 1500): sal.h
    has SAL 1 only.
  * MSVC 2008 SP1 and MSVC 2010 (_MSC_VER 1600): sal.h ships a partial
    SAL 2 set (_In_, _Out_, _Inout_, etc.) but is missing the
    buffer-size, COM-outptr, and wrapper families.
  * MSVC 2012+ (_MSC_VER >= 1700): full SAL 2 in sal.h.

Without definitions for these macros, the preprocessor treats them as
unknown identifiers and every function declaration that uses one fails
to parse, producing cascades like:

  error C2143: syntax error : missing ')' before 'type'
  error C2081: '_COM_Outptr_' : name in formal parameter list illegal

Add gfx/include/dxsdk/dxsdk_sal_compat.h which stubs out the missing
SAL 1 and SAL 2 macros to empty (or to a pass-through for wrapper forms
like _Always_(x) / _When_(c,x)) so declarations using them parse as
plain C. The shim is structured in four tiers:

  * Skip the whole thing on MSVC 2012+ via _MSC_VER < 1700 gate.
  * Skip #include <sal.h> on MSVC 2003, where the header does not
    exist, via a _MSC_VER >= 1400 gate.
  * Stub the SAL 1 family, guarded per-macro with #ifndef so MSVC 2005
    through 2010 (which already have them) leave them untouched and
    MSVC 2003 gets working definitions.
  * Stub the SAL 2 family the same way, filling in whatever the
    toolchain is missing without redefinition warnings.

Every macro is wrapped in #ifndef to avoid C4005 redefinition warnings
on toolchains that ship a partial SAL set.

Include the shim from the three bundled headers that tripped MSVC 2008
in griffin builds:

  * gfx/include/dxsdk/d3dcommon.h
  * gfx/include/dxsdk/d3d11shader.h
  * gfx/include/dxsdk/d3dcompiler.h

The shim's outer include guard makes repeat inclusion free, so if
additional DX headers surface the same errors in future build
configurations they can be added with a one-line #include.

This is a build-only fix. The stubs discard static-analysis
information, which older MSVC versions could not consume anyway.
No behavior change on any supported toolchain.
* Some unused variable warning fixes
The struct's address is stored in the caller's draw struct and
read after gfx_display_draw_bg returns, so the storage must
outlive the call. Making it a plain stack local produced a
dangling pointer, breaking wallpaper and pipeline background
rendering in xmb_frame.

The original thread-safety concern from 0abe6dc stands, but the
correct fix is for callers to own the video_coords struct and
pass it in — a larger refactor. Restoring static as an interim
fix.
…deo active

The D3D9 Cg driver already routes texture LOAD through the video
thread via video_thread_texture_handle -- see d3d9_cg_load_texture
and its d3d9_cg_video_texture_load_wrap_d3d helper.  But
d3d9_cg_unload_texture called IDirect3DTexture9_Release directly
on the caller's thread.

This creates an asymmetric race: a texture that was created on
the video thread (so the driver knew to allocate it from the
thread's execution context) gets released on the main thread
while the video thread may still be recording draw calls that
reference it.  IDirect3DTexture9_Release is refcount-safe for
the COM object itself, but if the D3D9 runtime or driver has
queued draws that reference the texture's underlying GPU
resource, freeing the refcount early can produce GPU-side
use-after-free.

This is the same class of race already fixed in D3D12 (a2d7b48),
Vulkan (17332fe), D3D10 (a88f107), and D3D11 (61b0a1a).

Fix: add a parallel d3d9_cg_video_texture_unload_wrap_d3d
function and dispatch through video_thread_texture_handle when
threaded video is active, matching the load path's pattern.
The handle itself is passed as the cmd data (cast to void*)
since no additional context is needed for Release.

Non-threaded path unchanged.  No behavioural change on
HW-context cores (threaded video is disabled for those by
VIDEO_DRIVER_IS_THREADED_INTERNAL).
…video active

Mirrors the fix for the D3D9 Cg driver.  d3d9_hlsl_load_texture
already routes through video_thread_texture_handle via
d3d9_hlsl_video_texture_load_wrap_d3d, but
d3d9_hlsl_unload_texture called IDirect3DTexture9_Release
directly on the caller's thread.

Under threaded video, the video thread runs d3d9_hlsl_gfx_frame
and may have draw calls pending that reference the texture.
Releasing the texture's refcount from the main thread while the
video thread is using it can cause the D3D9 runtime to free
the underlying GPU resource before the pending draws complete.

Fix: add d3d9_hlsl_video_texture_unload_wrap_d3d and dispatch
through video_thread_texture_handle when threaded video is
active, matching the existing load pattern.

Same class of race already addressed in D3D12, Vulkan, D3D10,
D3D11, and the D3D9 Cg driver.

No behavioural change on non-threaded video.
…ctive

Mirrors the fix for the D3D9 drivers.  d3d8_load_texture already
routes through video_thread_texture_handle via
d3d8_video_texture_load_wrap_d3d, but d3d8_unload_texture called
IDirect3DTexture8_Release directly on the caller's thread.

Under threaded video, the video thread runs d3d8_frame and may
have draw calls pending that reference the texture.  Releasing
the texture's refcount from the main thread while the video
thread is using it can cause the D3D8 runtime to free the
underlying resource before the pending draws complete.

Fix: add d3d8_video_texture_unload_wrap_d3d and dispatch through
video_thread_texture_handle when threaded video is active,
matching the existing load pattern.

Same class of race already addressed in D3D12, Vulkan, D3D10,
D3D11, D3D9 Cg, and D3D9 HLSL.  With this commit, every
D3D-family driver in the codebase serialises texture load and
unload on the video thread under threaded video.

No behavioural change on non-threaded video.
The video_coords struct filled by gfx_display_draw_bg is stored in
the caller's draw struct and read after the function returns, so it
must outlive the call. Previously this was a function-scope static,
which worked but was shared mutable state — racy under threaded
video when multiple draws overlap.

Make coords a parameter, and have each caller declare it as a stack
local alongside its existing gfx_display_ctx_draw_t draw. Per-frame,
per-caller ownership: no dangling pointer, no shared state.

Resolves the thread-safety concern from 0abe6dc without the
use-after-return bug that 904c384 reverted.
The Windows 7.x Platform SDK (shipped with MSVC 2010 and earlier) does
not include dxguid.lib. Requiring it forces developers to install the
legacy June 2010 DirectX SDK just to link, which is painful on modern
systems because of the S1023 installer bug (VC++ 2010 runtime
conflict).

Instead, emit the required COM GUID storage ourselves via
<initguid.h>. Each GUID must be defined in exactly one translation
unit across the program:

 * griffin.c picks it up for the griffin unity build (every
   RetroArch-msvc*.vcxproj project) via #include <initguid.h>
   inserted after the license header.
 * gfx/common/dx_guids.c is a new dedicated TU for the non-griffin
   build (Makefile.common: MinGW, command-line MSVC, cross-compile).

Both paths are guarded against UWP/WinRT (WINAPI_FAMILY !=
DESKTOP_APP) and Xbox (_XBOX) targets, where either dxguid.lib ships
with the modern Windows SDK or GUIDs come from a console SDK.

Changes:

 * gfx/common/dx_guids.c: new file, emits GUID storage for the
   non-griffin build path. Guarded against Xbox, UWP, and griffin
   so it compiles to an empty TU on those targets. Includes the
   DX headers in newest-to-oldest order (D3D12, D3D11, D3D10, D3D9,
   D3D8) so that d3d11.h's transitive include of d3d10_1.h lands
   before d3d10.h is seen, satisfying the SAL-annotation ordering
   check inside d3d10_1.h.

 * griffin/griffin.c: add gated #include <initguid.h> after the
   license block so griffin unity builds emit GUIDs before the
   first DirectX header is pulled in (d3d*.c includes start around
   line 432, dinput at 701, xaudio at 889).

 * Makefile.common: add gfx/common/dx_guids.o to the Win32 OBJ
   list; remove the now-dead HAVE_DX_COMMON bookkeeping (four
   assignments from the HAVE_DSOUND, HAVE_DINPUT, HAVE_D3D9, and
   HAVE_D3D8 blocks, plus the conditional that appended -ldxguid
   to LIBS).

 * audio/drivers/dsound.c: remove #pragma comment(lib, "dxguid").
   This MSVC-specific directive embeds a library dependency in
   the .obj file and would silently re-introduce dxguid.lib at
   link time even after it was stripped from the project files.

 * pkg/msvc/**/*.vcxproj, pkg/msvc/**/*.vcproj: remove dxguid.lib
   from AdditionalDependencies in every desktop MSVC project from
   2003 through 2022. The Xbox 360 and Xbox One project files are
   left untouched — their console SDKs handle GUIDs separately.
Commit e8191ce ("Add this") routes griffin.c's d3dcompiler_common.c
inclusion through an HAVE_D3D9+HAVE_HLSL arm and redirects
dxgi_common.h to the bundled <dxsdk/dxgi1_6.h>. On legacy MSVC
toolchains (2003-2010) this surfaces a new SAL-annotation parse
cascade: dxgi1_2.h through dxgi1_6.h reference SAL 2 macros that
those toolchains do not know
…family

The existing dxsdk_sal_compat.h was gated on `_MSC_VER < 1700`, which
made the entire shim body invisible to GCC/MinGW toolchains. This
worked because mingw-w64's <sal.h> historically provided the SAL
macros the bundled DX headers reference -- but not all mingw-w64
releases are complete. MSYS2's current mingw-w64 sal.h omits the
legacy SAL 1 form `_Inout_opt_bytecount_`, which the bundled d3d11.h
references, so the MXE/MinGW cross-build of gfx/common/dx_guids.c
fails to parse d3d11.h.

Fix the gate and extend macro coverage:

 * Replace the `_MSC_VER < 1700` gate with a portable form that pulls
   in <sal.h> via `__has_include` on non-MSVC toolchains. The shim
   body then runs on every toolchain, and the per-macro `#ifndef`
   guards prevent redefinition wherever the toolchain already
   provides the macro. Net effect on MSVC 2012+ and on complete
   mingw-w64 releases: no-op. On MSVC 2003-2010 and incomplete
   mingw-w64 releases: missing macros are stubbed.

 * Add the SAL 1 underscore-prefix bytecount/ecount family
   (`_In_bytecount_[_opt_]`, `_Out_bytecount_[_opt_]`,
   `_Inout_bytecount_[_opt_]`, `_Inout_opt_bytecount_`, and the
   four `_ecount_[_opt_]` equivalents). `_Inout_opt_bytecount_`
   is the one actually observed in the MXE/MSYS2 build log; the
   siblings are added together because they commonly surface in
   the same parse cascade once any one of them fails.

 * Update the file comment to document MinGW-w64 coverage and the
   no-op behavior on MSVC 2012+.

Also propagate `#include "dxsdk_sal_compat.h"` to the bundled DX
headers that didn't already have it:

 * gfx/include/dxsdk/d3d10.h, d3d10_1.h, d3d11.h, d3d11shader.h,
   d3d12.h, d3d12shader.h, d3dcompiler.h: add the shim include at
   line 1, ahead of the MIDL-generated preamble.

The dxgi1_*.h, d3dcommon.h, and dxgi.h headers already got this
in an earlier commit; d3d8.h and d3d9.h are not bundled.

Downstream effect:
 * MXE/MSYS2 mingw-w64 cross-build: resolves the
   `unknown type name '_Inout_opt_bytecount_'` error in
   gfx/common/dx_guids.c's transitive include of d3d11.h.
 * MSVC 2003-2010: unchanged (every new stub is #ifndef-guarded,
   and the SAL 2 additions from the prior dxgi patch remain intact).
 * MSVC 2012+: unchanged (<sal.h> provides everything, stubs no-op).
…tive

The Metal driver had a narrow threading issue when used with the
threaded video wrapper.  Unlike the other graphics APIs where
both load and unload have races, Metal's case is specifically
about the shared blit command buffer used during mipmapped
texture creation.

The race
========

When metal_load_texture is called with a TEXTURE_FILTER_MIPMAP_*
filter, the execution path reaches:

  - [self newTexture:image mipmapped:YES] in metal_renderer.m
  - which creates an MTLTexture (thread-safe on MTLDevice)
  - calls replaceRegion: (thread-safe)
  - accesses self.blitCommandBuffer (a lazily-created
    MTLCommandBuffer stored on the shared Context instance)
  - encodes mipmap generation via [cb blitCommandEncoder]

MTLCommandBuffer and MTLCommandEncoder are documented as
single-thread-only.  The blit command buffer is committed and
nulled at frame end, on the video thread:

  - (void)end { ... [_blitCommandBuffer commit];
                    _blitCommandBuffer = nil; ... }

If the main thread calls metal_load_texture mid-frame and
encodes mipmap commands on the same blit command buffer that
the video thread is about to commit, this is undefined
behaviour per Apple's Metal threading rules.

Other Metal paths are already safe
==================================

Most of the Metal driver operations are fine:

  - [_device newTextureWithDescriptor:] is thread-safe
  - [texture replaceRegion:] is thread-safe and synchronous
  - metal_unload_texture releases an ARC-retained handle; Metal
    command buffers retain their referenced resources, so the
    underlying MTLTexture stays alive until the GPU is done.
    No cross-thread race on unload (unlike D3D12/Vulkan where
    command buffers don't refcount resources).

The only remaining race is the blit command buffer sharing
during mipmapped load.

The fix
=======

Route metal_load_texture through video_thread_texture_handle
when threaded video is active, matching the pattern established
in the D3D10/11/12 and Vulkan patches.  This serialises all
Context.blitCommandBuffer access on the video thread.

Implementation:

- Split metal_load_texture into an internal function that does
  the actual texture creation, and an outer function that
  dispatches.

- metal_texture_cmd_t carries video_data (as void* to avoid
  ARC issues with Obj-C refs in C structs), image, filter_type,
  and handle (output).

- Handle is passed back via cmd->handle rather than the int
  return channel of custom_command_method_t to avoid truncation
  on 64-bit Apple platforms.

- The dispatch happens unconditionally for any threaded load,
  not just mipmapped ones -- consistent with other drivers and
  avoids branching on filter type.

- metal_unload_texture is left unchanged (ARC refcounting is
  sufficient).

Depends on: 8d55232 (VIDEO/THREAD: barrier + wrapper hardening),
which hardens video_thread_texture_handle to be safe when the
wrapper is inactive.

Cost: texture load now blocks briefly while the video thread
processes the dispatched command.  Texture loads happen during
menu navigation, not frame-critical paths, so latency is
imperceptible.

No behavioural change on non-threaded video.
gfx/drivers/d3d12.c carried a #ifdef __MINGW32__ / #if __GNUC__ < 12
block that manually emitted D3D12 COM GUIDs via DECLSPEC_SELECTANY to
work around MinGW not shipping dxguid.lib. With the introduction of
gfx/common/dx_guids.c (which emits all DirectX GUIDs centrally via
<initguid.h>), this block became redundant and caused multiple-
definition link errors on MinGW/MXE builds:

  dx_guids.o: multiple definition of `IID_ID3D12Device'
  d3d12.o: first defined here

On MSVC the linker silently deduplicated DECLSPEC_SELECTANY symbols
against initguid.h storage, so the collision only manifested on
GCC/MinGW where both are strong definitions.

Remove the entire block (19 standard IID_* GUIDs plus 7 debug-only
GUIDs that were already dead code on MinGW due to their inner
!defined(__MINGW32__) guard). dx_guids.c is now the sole emitter of
standard D3D12 GUIDs on all platforms.

The libretro_IID_IDXGIFactory5 in d3d11.c and libretro_IID_IDXGIOutput6
in dxgi_common.c are unaffected — they use custom prefixed names that
do not collide with dx_guids.c.
…ideo active

font_driver_free calls renderer->free (e.g. d3d12_font_free,
gl2_raster_font_free, d3d11_font_free) directly on the calling
thread.  Menu drivers call this from context_destroy or
context_reset_internal on the main thread, while the video
thread is still rendering frames.

This produces driver-specific races:

  GL1/GL2/GL3: The GL context can only be current on one
  thread.  gl2_raster_font_free calls make_current(true) to
  steal the context from the video thread, then calls
  glDeleteTextures.  If the video thread is mid-frame, its
  subsequent GL calls are invalid until context is restored.

  D3D11: The ImmediateContext is explicitly not thread-safe
  per Microsoft's docs.  d3d11_font_free calls Release on
  the font texture's COM objects while the video thread draws
  with the same context — undefined behaviour.

  D3D10: Same as D3D11, though slightly less severe because
  D3D10's device has an internal mutex.  Release still races
  with pending GPU draws.

  D3D12: d3d12_font_free uses a GPU fence wait with
  ++fenceValue.  The non-atomic increment from the main
  thread races with the video thread's own fence signalling,
  potentially skipping or duplicating fence values.

  Vulkan: vulkan_destroy_texture (called from font free) uses
  vkQueueWaitIdle under queue_lock, which drains submitted
  work but not command buffers currently being recorded on the
  video thread.

  Metal: Blit command buffer access can race (same as texture
  load).  However, font glyph textures in the Metal driver use
  replaceRegion: (thread-safe) rather than the blit command
  encoder, so this is lower risk.

Fix: in font_driver_free, when threaded video is active,
dispatch the renderer->free call to the video thread via
video_thread_texture_handle.  This serialises the font
resource destruction with the video thread's frame rendering,
the same pattern used for texture load/unload.

This is a single-point fix in font_driver.c that covers ALL
video driver backends at once, rather than patching each
driver's font_free individually.  The font_driver_free
function is the natural place for this because it already
checks is_threaded and is the single entry point for all font
destruction.

Implementation:

- font_free_cmd_t packages the renderer pointer, renderer_data,
  and is_threaded flag.

- font_driver_free_wrap is the CMD_CUSTOM_COMMAND callback.
  Calls renderer->free on the video thread.

- When is_threaded is true and renderer->free exists, dispatch
  via video_thread_texture_handle.  The font_data_t struct
  itself (renderer/renderer_data pointers + malloc'd memory)
  is cleaned up on the main thread after the dispatch returns
  -- only the GPU-touching renderer->free needs serialisation.

- video_thread_texture_handle is self-safe (falls back to
  direct call when the wrapper is not active, and handles
  same-thread calls without deadlock).

- Non-threaded path is unchanged.

This fixes the mid-session font scale change crash on D3D12
that was previously only mitigated by XMB's
pending_context_reset counter.  It also eliminates the GL
context-stealing race in gl2_raster_font_free.

Depends on: 8d55232 (VIDEO/THREAD: barrier + wrapper
hardening).

No behavioural change on non-threaded video.
Fixes #18761

All display-related Vulkan pipelines (font, alpha_blend, menu
shaders including ribbon/snow/bokeh) were being built against
vk->sdr_render_pass, which always uses VK_FORMAT_B8G8R8A8_UNORM.

On the non-HDR rendering path (the common case on Linux KMS,
Wayland, X11, and most platforms), the active render pass at
draw time is vk->render_pass, which uses the actual swapchain
format -- vk->context->swapchain_format.  When the swapchain
format happens to be B8G8R8A8_UNORM, the pipeline/render-pass
formats match and everything works.

But on some platforms (particularly KMS on certain GPUs), the
swapchain format is VK_FORMAT_R8G8B8A8_UNORM instead.  In that
case:

  - Pipeline built for: B8G8R8A8_UNORM (via sdr_render_pass)
  - Active render pass:  R8G8B8A8_UNORM (via render_pass)

This is a render pass compatibility violation per Vulkan spec
(VUID-vkCmdDraw-renderPass-02684).  The result is driver-
dependent: RADV on AMD typically segfaults, NVIDIA may silently
produce wrong output, Intel may report a validation error.

The crash manifests as a segfault when any menu shader pipeline
is active (ribbon, simple ribbon, snow, bokeh, snowflake) and
the user navigates the XMB Quick Menu -- any draw that uses the
menu pipeline triggers the incompatible render pass.

Fix: build all display pipelines against vk->render_pass
(the swapchain render pass) instead of vk->sdr_render_pass.
This ensures the pipelines match the active render pass on
the non-HDR path regardless of swapchain format.

Two changes:

1. Line 3372: Initial pipe.renderPass assignment changed from
   vk->sdr_render_pass to vk->render_pass.  This affects the
   font pipeline, alpha_blend pipeline, and display pipelines
   0-3.

2. After the #ifdef VULKAN_HDR_SWAPCHAIN block (which
   temporarily changes pipe.renderPass to readback_render_pass
   and sdr_render_pass for HDR-specific pipelines), restore
   pipe.renderPass to vk->render_pass before building the menu
   shader pipelines (indices 6+).  Without this, HDR builds
   would leave pipe.renderPass pointing at sdr_render_pass,
   defeating the fix.

Impact on HDR: the HDR compositing path renders menu content
into an offscreen B8G8R8A8 buffer using sdr_render_pass
(line 6523).  When the swapchain format differs from B8G8R8A8,
the menu pipelines built against render_pass would be
incompatible with sdr_render_pass in this path.  However:

  - HDR display pipelines (indices 4-5) are built separately
    against render_pass at line 3470, which is correct.
  - The HDR menu compositing path uses the standard alpha_blend
    pipeline for the menu texture overlay (line 6563), which
    now matches render_pass.
  - The HDR offscreen compositing step at line 6523 is a
    separate render pass instance that doesn't use the menu
    shader pipelines (ribbon/snow/bokeh are drawn earlier in
    the frame).

No behavioural change when swapchain format is B8G8R8A8_UNORM
(the common case on X11/Wayland with NVIDIA and Intel), because
both render passes have the same format and are therefore
compatible.
Follow-up to d250194 (build display pipelines against swapchain
render pass).  That commit fixed the KMS crash (#18761) by
building all display/menu pipelines against vk->render_pass
(the swapchain format) instead of vk->sdr_render_pass
(hardcoded B8G8R8A8_UNORM).  This was correct for the non-HDR
path but left a gap in the HDR offscreen compositing path.

When HDR is active, the menu is rendered into a B8G8R8A8
offscreen buffer under sdr_render_pass, then composited onto
the HDR swapchain.  If the swapchain format differs from
B8G8R8A8 (which is the common HDR case -- the swapchain uses
a wide-gamut format like R16G16B16A16_SFLOAT or
A2B10G10R10_UNORM), the main pipeline set (built against
render_pass) is incompatible with sdr_render_pass.  Using
those pipelines in the HDR offscreen render pass would be the
same class of render-pass compatibility violation that d250194
fixed for the non-HDR path.

Fix: when VK_CTX_FLAG_HDR_SUPPORT is set, build a parallel set
of display/menu pipelines against sdr_render_pass.  At runtime,
switch to the SDR pipeline variants while rendering into the
HDR offscreen buffer, and switch back when the render pass ends.

New storage:
  - vk->pipelines.font_sdr
  - vk->pipelines.alpha_blend_sdr
  - vk->display.pipelines_sdr[9 * 2]

These are created alongside the existing pipeline set during
vulkan_init_pipelines, using identical shader modules and
pipeline state -- only pipe.renderPass differs.  Destroyed
in vulkan_deinit_pipelines.

Runtime switching via VK_FLAG_SDR_PIPELINE (new flag, bit 16):
  - Set after vkCmdBeginRenderPass with sdr_render_pass on the
    HDR offscreen compositing path
  - Cleared before vkCmdEndRenderPass
  - Checked in four draw paths:
    * gfx_display_vk_draw menu shader case (ribbon/snow/bokeh)
    * gfx_display_vk_draw default case (display quads)
    * vulkan_raster_font_render_msg (font pipeline)
    * vulkan_frame menu texture overlay (alpha_blend pipeline)

Non-HDR builds: all SDR pipeline code is #ifdef
VULKAN_HDR_SWAPCHAIN guarded.  Zero impact on builds without
HDR support.

Non-HDR runtime: VK_FLAG_SDR_PIPELINE is never set.  All draw
paths use the main pipeline set unconditionally.  Zero overhead.

HDR with B8G8R8A8 swapchain (unusual but possible):
render_pass and sdr_render_pass have the same format, so both
pipeline sets are identical.  No visual difference, small extra
memory for the duplicate pipelines.

Cost: 18 additional VkPipeline objects + 2 named pipelines when
HDR is supported.  One-time creation cost at driver init.
Runtime cost is one flag check per draw call on HDR paths only.
Fixes startup crash on Windows Vulkan introduced by 8fa8418.

Two bugs in the SDR pipeline creation path when HDR is supported:

1. SDR display pipelines 0-3 were created after the alpha_blend
   fragment shader module had already been destroyed.  The
   alpha_blend_sdr pipeline was created, then the fragment
   shader module was immediately destroyed (line 3692 in the
   original patch).  The subsequent display pipeline loop
   (pipelines_sdr[0..3]) still referenced that destroyed module
   through the pipe struct, producing an invalid VkShaderModule
   handle passed to vkCreateGraphicsPipelines.

   Fix: keep both the vertex and fragment shader modules alive
   through the display 0-3 loop, then destroy both afterward.

2. After the SDR menu shader loop (pipelines_sdr[6+]), an extra
   vkDestroyShaderModule call destroyed shader_stages[0].module
   which had already been destroyed in the loop's last iteration
   at line 3782.  Double-free on a VkShaderModule handle.

   Fix: remove the spurious destroy after the loop.

Both bugs only manifest when VK_CTX_FLAG_HDR_SUPPORT is set
(Windows with HDR-capable displays, or any platform where the
Vulkan driver reports HDR swapchain support).  Non-HDR paths
are unaffected because the entire SDR pipeline block is guarded
by the HDR support check.
The SDR font pipeline introduced in 8fa8418 was created with
VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP instead of TRIANGLE_LIST.

The font and alpha_blend pipelines require TRIANGLE_LIST
topology for indexed quad rendering (indices [0,1,2,2,1,3] per
quad).  The initial topology is set to TRIANGLE_LIST at line
3321, and the original font pipeline at line 3451 inherits it
correctly.

However, the SDR pipeline creation block runs after the menu
shader loop (lines 3514-3646), which sets input_assembly.topology
to TRIANGLE_STRIP in its last iteration (i=11, topology = i&1 =
STRIP).  The SDR font pipeline at line 3678 inherited this stale
STRIP topology.

With TRIANGLE_STRIP, the indexed [0,1,2,2,1,3] pattern produces
degenerate triangles and incorrect winding, causing subtle glyph
positioning artifacts visible as ~1px character shifts in the
Ozone menu under Vulkan HDR.  The effect was most noticeable on
characters with vertical strokes ('l', 'n', 'd') and punctuation.

Fix: reset input_assembly.topology to VK_PRIMITIVE_TOPOLOGY_
TRIANGLE_LIST before creating the SDR font and alpha_blend
pipelines.

Only affects HDR builds with VK_CTX_FLAG_HDR_SUPPORT set.
Non-HDR path is unchanged.
@pull pull bot locked and limited conversation to collaborators Apr 16, 2026
@pull pull bot added the ⤵️ pull label Apr 16, 2026
@pull pull bot merged commit b9630a5 into Alexandre1er:master Apr 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants