[pull] master from libretro:master#932
Merged
pull[bot] merged 29 commits intoAlexandre1er:masterfrom Apr 16, 2026
Merged
Conversation
Primary fix: idle barrier before driver deinit
==============================================
When the video driver is torn down (driver reinit from fullscreen
toggle, driver switch, or core launch, or full shutdown), the
deinit sequence in driver_uninit() runs on the main thread:
1) gfx_widgets_deinit() -- frees widget GPU resources
2) menu_driver_ctl(DEINIT) -- calls xmb_context_destroy /
ozone_context_destroy /
materialui_context_destroy,
which free textures and fonts
3) video_driver_free_internal()-- finally stops the video thread
via sthread_join
Between steps 1 and 3, the video thread is still alive and may
be processing a frame that was queued before the deinit began.
That in-flight frame can reference textures and fonts that step 1
or 2 has just freed, producing use-after-free crashes -- most
visibly on D3D12 and Vulkan where the GPU driver validates
texture handles.
A comment at step 1 already acknowledges the issue ("This
absolutely has to be done before video_driver_free_internal()")
but relies on ordering rather than synchronisation. Prior
commits in this series added context_generation guards to each
menu driver as defensive mitigation (2ac0afa), but those only
cover the menu's own draw calls, not widgets or the video
thread's internal state.
Fix: add a true barrier in the threaded video wrapper that waits
for the video thread to finish any pending frame and become idle
(blocked on cond_thread waiting for the next command). The
barrier uses the existing thr->lock mutex and thr->cond_cmd
signal -- the frame processing loop already signals cond_cmd
after clearing thr->frame.updated (line 509 of the frame loop),
so the barrier just waits for the frame.updated flag to clear.
Secondary fix: harden wrapper API against reinit window misuse
==============================================================
A comment in video_driver_texture_load() acknowledges a latent
misuse window: "During reinit, video_st->threaded may already
reflect the new setting while video_st->data still points to
the real driver, not thread_video_t." That site was previously
fixed by requiring both VIDEO_DRIVER_IS_THREADED_INTERNAL and
VIDEO_FLAG_THREAD_WRAPPER_ACTIVE before taking the threaded
path. But the underlying wrapper functions themselves still
unconditionally cast video_st->data to thread_video_t*, making
them unsafe if called from any other path that only checks the
configuration flag.
The three public wrapper functions that read video_st->data
are now self-safe: each checks VIDEO_FLAG_THREAD_WRAPPER_ACTIVE
as the first step before casting.
- video_thread_wait_idle(): returns (no-op, matches intent)
- video_thread_font_init(): returns false (no-op)
- video_thread_texture_handle(): falls back to func(data)
on the caller's thread -- matches the existing "already
on video thread" branch semantics, preserving texture
loads during any reinit-window call.
For video_thread_font_init specifically, this closes a
pre-existing latent bug: its sole caller in font_driver.c
guards on `is_threaded` (the configuration flag) only, not on
VIDEO_FLAG_THREAD_WRAPPER_ACTIVE. During a reinit transition
where those two disagree, the function would previously cast
a raw driver handle to thread_video_t* and access thr->lock,
producing undefined behaviour.
Changes
=======
- gfx/video_thread_wrapper.c:
* Add video_thread_wait_idle() -- the barrier. Guarded on
VIDEO_FLAG_THREAD_WRAPPER_ACTIVE, then checks thr, then
checks for self-deadlock (called from video thread), then
waits on cond_cmd while frame.updated is true.
* Harden video_thread_font_init with the flag check.
* Harden video_thread_texture_handle with the flag check,
falling back to direct func invocation.
- gfx/video_thread_wrapper.h: export video_thread_wait_idle().
- retroarch.c: call video_thread_wait_idle() at the top of
driver_uninit(), before gfx_widgets_deinit and menu_driver_ctl
(DEINIT). Guarded by HAVE_THREADS, DRIVERS_VIDEO_INPUT flag,
VIDEO_DRIVER_IS_THREADED_INTERNAL, and
VIDEO_FLAG_THREAD_WRAPPER_ACTIVE to be a no-op when not
applicable.
Cost
====
The barrier adds one lock acquire + one potential cond_wait per
driver reinit. Common case (no pending frame) returns
immediately. Worst case blocks for up to one frame duration
(~16ms) during a driver reinit which already pauses rendering.
The hardening adds one flag read per call to the three public
wrapper functions. No measurable overhead.
Compatibility
=============
Non-threaded video: all new code paths are short-circuited by
the flag checks. Functionally invisible.
Builds without HAVE_THREADS: the entire barrier call site in
retroarch.c is #ifdef-guarded, and the wrapper file is not
compiled. Zero footprint.
Defense in depth: the context_generation guards from 2ac0afa
remain. With this barrier in place, they should never fire
in practice during normal driver reinit.
No behavioural change during normal operation. Eliminates the
TOCTOU window between menu GPU resource freeing and video
thread shutdown, and closes a pre-existing reinit-window bug
in video_thread_font_init.
…ideo active The D3D12 driver was partially thread-unsafe when used with the threaded video wrapper. Unlike gl2/gl3/d3d9 which route texture operations through video_thread_texture_handle so that all GPU work happens on the video thread, D3D12 relied on D3D12's inherent free-threaded device API and executed load/unload on the calling (main) thread. This created several races when threaded video was active: 1. texture->dirty flag Written on main thread (after Map/Unmap in d3d12_update_texture). Read and cleared on video thread (during d3d12_gfx_frame and d3d12_upload_texture). No synchronisation. On weak-memory- order CPUs the video thread could observe a stale `false` flag and skip the upload, producing missing textures. 2. Upload buffer data visibility Pixel data written via Map/memcpy on main thread. Consumed via CopyTextureRegion on video thread during command list recording. No CPU-side memory barrier ensures the main thread's writes are visible to the video thread's reads. 3. Release timing vs command list recording d3d12_gfx_unload_texture has a GPU fence wait that ensures submitted work has completed, but does not serialise against the video thread CURRENTLY building a command list that references the texture being released. In D3D12, command lists do not hold references to resources they record, so releasing a texture while its use is being recorded can leave the recorded command list with a dangling pointer. 4. fenceValue counter race d3d12->queue.fenceValue is incremented via non-atomic ++ from both the main thread (unload path at line 6273) and the video thread (frame submission, font free, shader reset paths). Concurrent increments can skip or duplicate fence values, breaking synchronisation. Fix: adopt the gl2/d3d9 pattern. When threaded video is active, dispatch both load_texture and unload_texture to the video thread via video_thread_texture_handle, so all D3D12 command queue operations are serialised on the queue-owning thread. This eliminates all four races in one change: shared state is no longer shared across threads. Implementation: - Split d3d12_gfx_load_texture into an internal function (d3d12_gfx_load_texture_internal) that does the actual GPU work, and an outer function that dispatches. Same for unload. - New d3d12_texture_cmd_t packages the d3d12 pointer, image, filter_type, and handle fields. Used for both load (image is input, handle is output) and unload (handle is input). - Return the uintptr_t handle via cmd->handle rather than the int return channel of custom_command_method_t, avoiding truncation on 64-bit Windows where uintptr_t is 64 bits but int is 32 bits. - The outer d3d12_gfx_load_texture / d3d12_gfx_unload_texture check the `threaded` parameter. By the time it reaches the driver, `threaded` has already been validated against both VIDEO_DRIVER_IS_THREADED_INTERNAL and VIDEO_FLAG_THREAD_WRAPPER_ACTIVE by video_driver_texture_load and video_driver_texture_unload. If threaded, dispatch via video_thread_texture_handle. Otherwise, call the internal function directly on the current thread (single-threaded path unchanged). - video_thread_texture_handle already handles the self-call case (invoked from video thread) by calling the wrap function directly, avoiding self-deadlock. This means d3d12's internal paths that run on the video thread (shader reset, font init) can call d3d12_gfx_load_texture with threaded=true safely. Cost: texture load now blocks for up to one frame duration (~16ms) waiting for the video thread to process the dispatched command. Texture loads happen during menu navigation and playlist icon population, not in frame-critical paths, so the latency is imperceptible. Depends on: the video_thread_wait_idle barrier commit (8d55232), which also hardens video_thread_texture_handle to fall back to direct invocation when the wrapper is inactive. That defense-in-depth means this patch is safe even against future callers that don't follow the full guard pattern. No behavioural change on non-threaded video (`threaded == false` path unchanged). No behavioural change on HW-context cores (threaded video is force-disabled for those via VIDEO_DRIVER_IS_THREADED_INTERNAL). Not addressed by this patch: d3d12_font_free still runs on the main thread during mid-session font scale changes. XMB's pending_context_reset counter already mitigates this by deferring the context reset until the GPU has drained, so it's not a new issue. Addressing the font free path would require a separate refactor outside this patch's scope.
… active
The Vulkan driver is already well-synchronised for most
operations -- queue_lock protects all vkQueueSubmit /
vkQueuePresent / vkQueueWaitIdle calls, and vulkan_load_texture
uploads textures synchronously under that lock so the texture
is fully resident on the GPU by the time the function returns.
But vulkan_unload_texture has a residual race that the existing
TODO comment at line 7307 already acknowledged:
/* TODO: We really want to defer this deletion instead,
* but this will do for now. */
The current sequence is:
1. Lock queue_lock
2. vkQueueWaitIdle (waits for SUBMITTED work)
3. Unlock queue_lock
4. vulkan_destroy_texture (NOT under the lock)
Step 2 only drains commands that have already been submitted
to the queue. It does not account for the video thread
currently RECORDING a command buffer that references the
texture -- that command buffer has not yet been submitted, so
vkQueueWaitIdle doesn't see it.
The sequence that triggers the race:
T0: Video thread is recording commands in vulkan_frame that
reference texture X (uses X->view in a descriptor, or
X->image in a barrier). Not yet submitted.
T1: Main thread enters vulkan_unload_texture for X.
T2: Main thread locks queue_lock, calls vkQueueWaitIdle
(returns immediately -- nothing to wait for).
T3: Main thread unlocks queue_lock.
T4: Main thread calls vulkan_destroy_texture -- X's image,
view, and memory are freed.
T5: Video thread finishes recording, acquires queue_lock,
calls vkQueueSubmit. The command buffer references
freed Vulkan handles -- undefined behaviour, typically
a validation-layer error or GPU hang.
Fix: follow the same pattern just applied to D3D12 (and long
established in gl2/gl3/d3d9cg). When threaded video is active,
dispatch the unload via video_thread_texture_handle so the
destruction runs on the video thread, serialised with command
buffer recording. The video thread can only run one
vulkan_frame or one custom_command at a time, so by the time
the unload wrap function executes, any in-progress recording
has completed and the subsequent vkQueueWaitIdle drains any
submitted work before destruction.
Implementation:
- Split vulkan_unload_texture into an internal function
(vulkan_unload_texture_internal) that does the actual work,
and an outer function that dispatches.
- New vulkan_texture_cmd_t packages the vk_t pointer and
texture handle. (Smaller than d3d12_texture_cmd_t because
Vulkan unload has no filter_type and returns no handle.)
- vulkan_load_texture is NOT changed: unlike D3D12 which
uses deferred upload (dirty flag + per-frame CopyTextureRegion),
Vulkan uploads synchronously inside vulkan_create_texture
under queue_lock, so the texture is fully live before the
function returns. No cross-thread races on the load path.
Cost: texture unload now blocks briefly while the video thread
processes the dispatched command. In practice this is much
less than the existing vkQueueWaitIdle stall -- the video
thread usually completes its current operation in microseconds,
whereas vkQueueWaitIdle can stall for milliseconds waiting for
GPU queue drain. So this patch is approximately a performance
neutral or positive change.
Depends on: 8d55232 (VIDEO/THREAD: barrier + wrapper hardening),
which hardens video_thread_texture_handle to be safe when the
wrapper is inactive.
The TODO comment is retained in the internal function, as the
deferred-deletion pattern it describes would still be a valid
future optimisation -- it would avoid the per-unload
vkQueueWaitIdle stall by piggybacking on the natural frame
fence. This patch eliminates the crash, not the stall.
No behavioural change on non-threaded video (the `threaded ==
false` path calls the internal function directly, same as the
pre-patch code). No behavioural change on HW-context cores
(threaded video is disabled for those by
VIDEO_DRIVER_IS_THREADED_INTERNAL).
…ideo active Follow-up to the D3D11 thread-safety patch. D3D10 has the same texture load/unload pattern as D3D11 (immediate CopySubresourceRegion during load, direct Release during unload) and therefore the same race classes when used with the threaded video wrapper. D3D10 is slightly less severe than D3D11 in one respect: the ID3D10Device is thread-safe by default (we do not set D3D10_CREATE_DEVICE_SINGLETHREADED at creation), so concurrent Device method calls from the main thread and video thread are serialised by an internal D3D10 mutex. This prevents outright undefined behaviour on the context methods themselves. However, the second race remains and is not mitigated by the internal mutex: Resource Release vs pending GPU work ------------------------------------ d3d10_update_texture during load_texture queues an asynchronous CopySubresourceRegion on the Device. d3d10_gfx_unload_texture calls Release with no fence or wait -- if the GPU hasn't finished processing pending commands that reference the texture, the resource can be freed while still in use. Fix: mirror the D3D11 patch. When threaded video is active, dispatch both load and unload to the video thread via video_thread_texture_handle so all texture operations are serialised with the video thread's draws. This also consolidates Device context access to a single thread even though the internal mutex already makes it correct -- reducing contention on that mutex. Implementation is structurally identical to the D3D11 patch: - d3d10_gfx_load_texture_internal / d3d10_gfx_unload_texture_internal contain the original logic. - Outer functions dispatch via video_thread_texture_handle when threaded is true, otherwise call the internal function directly. - d3d10_texture_cmd_t packages parameters and carries the handle field for both input (unload) and output (load). Depends on: 8d55232 (VIDEO/THREAD: barrier + wrapper hardening) and the D3D11 thread-safety patch (applied immediately before for structural consistency, though technically independent). No behavioural change on non-threaded video.
…ideo active
The D3D11 driver had two threading issues when used with the
threaded video wrapper. Neither was specific to D3D11 as an API
-- they mirror the issues already fixed in D3D12 and Vulkan --
but D3D11 is actually the most severe case because its
ImmediateContext is explicitly documented as not thread-safe.
Race 1: ImmediateContext concurrent access
------------------------------------------
ID3D11DeviceContext (the immediate context used for rendering)
is NOT thread-safe -- Microsoft's documentation explicitly says
concurrent method calls produce undefined behaviour. RetroArch
holds a single d3d11->context pointer used for all rendering.
When threaded video is active:
- The video thread calls d3d11_gfx_frame which issues draw
commands via d3d11->context (OMSet*, VSSet*, PSSet*,
Draw, DrawIndexed, etc.).
- The main thread concurrently calls d3d11_gfx_load_texture,
which reaches d3d11_update_texture and calls context
methods: Map, Unmap, CopySubresourceRegion, GenerateMips.
These two code paths race on the same ImmediateContext. On
current D3D11 runtimes this typically does not crash outright
because the runtime has some internal locking for validation,
but correct behaviour is not guaranteed and we're relying on
an implementation detail.
Race 2: Release while GPU has pending references
------------------------------------------------
d3d11_gfx_unload_texture calls Release on the COM objects.
COM refcounting is thread-safe but does not block on GPU
completion. If the ImmediateContext has queued a draw that
references the texture (and the GPU hasn't processed it yet),
the Release can free the resource before the GPU is done.
Fix: follow the pattern established in D3D12 (a2d7b48) and
Vulkan (17332fe) -- dispatch load and unload to the video
thread via video_thread_texture_handle when threaded video is
active. This serialises all ImmediateContext access on the
video thread, and ensures Release runs only after any pending
draw that might reference the texture has been queued (the
video thread cannot be mid-recording when it handles the
dispatched command).
Implementation:
- Split d3d11_gfx_load_texture into internal / outer functions.
Same for unload.
- New d3d11_texture_cmd_t packages the parameters, with handle
used for both input (unload) and output (load).
- Return handle via cmd->handle to avoid truncation of 64-bit
uintptr_t through custom_command_method_t's int return.
- Non-threaded path is unchanged -- calls the internal function
directly.
Depends on: 8d55232 (VIDEO/THREAD: barrier + wrapper hardening).
No behavioural change on non-threaded video. No behavioural
change on HW-context cores (threaded video is disabled for
those by VIDEO_DRIVER_IS_THREADED_INTERNAL).
(features_cpu) fix __cpuidex guard — requires VS2010 (_MSC_VER >= 1600), not 1500
The bundled DirectX headers in gfx/include/dxsdk/ are modern Windows
8.x/10 SDK headers and use SAL annotations throughout their function
prototypes. These annotations come in two generations:
* SAL 1 (double-underscore): __in, __out, __inout, __in_ecount,
__out_bcount, __deref_out, etc.
* SAL 2 (leading underscore + trailing underscore): _In_, _Out_,
_Inout_, _In_reads_(s), _Out_writes_bytes_(s), _COM_Outptr_,
_Always_(x), _Outptr_opt_result_maybenull_, etc.
Older Microsoft toolchains ship incomplete or nonexistent sal.h:
* MSVC 2003 (_MSC_VER 1310): no sal.h at all. Both SAL 1 and SAL 2
macros are undefined.
* MSVC 2005 (_MSC_VER 1400) and MSVC 2008 RTM (_MSC_VER 1500): sal.h
has SAL 1 only.
* MSVC 2008 SP1 and MSVC 2010 (_MSC_VER 1600): sal.h ships a partial
SAL 2 set (_In_, _Out_, _Inout_, etc.) but is missing the
buffer-size, COM-outptr, and wrapper families.
* MSVC 2012+ (_MSC_VER >= 1700): full SAL 2 in sal.h.
Without definitions for these macros, the preprocessor treats them as
unknown identifiers and every function declaration that uses one fails
to parse, producing cascades like:
error C2143: syntax error : missing ')' before 'type'
error C2081: '_COM_Outptr_' : name in formal parameter list illegal
Add gfx/include/dxsdk/dxsdk_sal_compat.h which stubs out the missing
SAL 1 and SAL 2 macros to empty (or to a pass-through for wrapper forms
like _Always_(x) / _When_(c,x)) so declarations using them parse as
plain C. The shim is structured in four tiers:
* Skip the whole thing on MSVC 2012+ via _MSC_VER < 1700 gate.
* Skip #include <sal.h> on MSVC 2003, where the header does not
exist, via a _MSC_VER >= 1400 gate.
* Stub the SAL 1 family, guarded per-macro with #ifndef so MSVC 2005
through 2010 (which already have them) leave them untouched and
MSVC 2003 gets working definitions.
* Stub the SAL 2 family the same way, filling in whatever the
toolchain is missing without redefinition warnings.
Every macro is wrapped in #ifndef to avoid C4005 redefinition warnings
on toolchains that ship a partial SAL set.
Include the shim from the three bundled headers that tripped MSVC 2008
in griffin builds:
* gfx/include/dxsdk/d3dcommon.h
* gfx/include/dxsdk/d3d11shader.h
* gfx/include/dxsdk/d3dcompiler.h
The shim's outer include guard makes repeat inclusion free, so if
additional DX headers surface the same errors in future build
configurations they can be added with a one-line #include.
This is a build-only fix. The stubs discard static-analysis
information, which older MSVC versions could not consume anyway.
No behavior change on any supported toolchain.
* Some unused variable warning fixes
The struct's address is stored in the caller's draw struct and read after gfx_display_draw_bg returns, so the storage must outlive the call. Making it a plain stack local produced a dangling pointer, breaking wallpaper and pipeline background rendering in xmb_frame. The original thread-safety concern from 0abe6dc stands, but the correct fix is for callers to own the video_coords struct and pass it in — a larger refactor. Restoring static as an interim fix.
…deo active The D3D9 Cg driver already routes texture LOAD through the video thread via video_thread_texture_handle -- see d3d9_cg_load_texture and its d3d9_cg_video_texture_load_wrap_d3d helper. But d3d9_cg_unload_texture called IDirect3DTexture9_Release directly on the caller's thread. This creates an asymmetric race: a texture that was created on the video thread (so the driver knew to allocate it from the thread's execution context) gets released on the main thread while the video thread may still be recording draw calls that reference it. IDirect3DTexture9_Release is refcount-safe for the COM object itself, but if the D3D9 runtime or driver has queued draws that reference the texture's underlying GPU resource, freeing the refcount early can produce GPU-side use-after-free. This is the same class of race already fixed in D3D12 (a2d7b48), Vulkan (17332fe), D3D10 (a88f107), and D3D11 (61b0a1a). Fix: add a parallel d3d9_cg_video_texture_unload_wrap_d3d function and dispatch through video_thread_texture_handle when threaded video is active, matching the load path's pattern. The handle itself is passed as the cmd data (cast to void*) since no additional context is needed for Release. Non-threaded path unchanged. No behavioural change on HW-context cores (threaded video is disabled for those by VIDEO_DRIVER_IS_THREADED_INTERNAL).
…video active Mirrors the fix for the D3D9 Cg driver. d3d9_hlsl_load_texture already routes through video_thread_texture_handle via d3d9_hlsl_video_texture_load_wrap_d3d, but d3d9_hlsl_unload_texture called IDirect3DTexture9_Release directly on the caller's thread. Under threaded video, the video thread runs d3d9_hlsl_gfx_frame and may have draw calls pending that reference the texture. Releasing the texture's refcount from the main thread while the video thread is using it can cause the D3D9 runtime to free the underlying GPU resource before the pending draws complete. Fix: add d3d9_hlsl_video_texture_unload_wrap_d3d and dispatch through video_thread_texture_handle when threaded video is active, matching the existing load pattern. Same class of race already addressed in D3D12, Vulkan, D3D10, D3D11, and the D3D9 Cg driver. No behavioural change on non-threaded video.
…ctive Mirrors the fix for the D3D9 drivers. d3d8_load_texture already routes through video_thread_texture_handle via d3d8_video_texture_load_wrap_d3d, but d3d8_unload_texture called IDirect3DTexture8_Release directly on the caller's thread. Under threaded video, the video thread runs d3d8_frame and may have draw calls pending that reference the texture. Releasing the texture's refcount from the main thread while the video thread is using it can cause the D3D8 runtime to free the underlying resource before the pending draws complete. Fix: add d3d8_video_texture_unload_wrap_d3d and dispatch through video_thread_texture_handle when threaded video is active, matching the existing load pattern. Same class of race already addressed in D3D12, Vulkan, D3D10, D3D11, D3D9 Cg, and D3D9 HLSL. With this commit, every D3D-family driver in the codebase serialises texture load and unload on the video thread under threaded video. No behavioural change on non-threaded video.
The video_coords struct filled by gfx_display_draw_bg is stored in the caller's draw struct and read after the function returns, so it must outlive the call. Previously this was a function-scope static, which worked but was shared mutable state — racy under threaded video when multiple draws overlap. Make coords a parameter, and have each caller declare it as a stack local alongside its existing gfx_display_ctx_draw_t draw. Per-frame, per-caller ownership: no dangling pointer, no shared state. Resolves the thread-safety concern from 0abe6dc without the use-after-return bug that 904c384 reverted.
The Windows 7.x Platform SDK (shipped with MSVC 2010 and earlier) does not include dxguid.lib. Requiring it forces developers to install the legacy June 2010 DirectX SDK just to link, which is painful on modern systems because of the S1023 installer bug (VC++ 2010 runtime conflict). Instead, emit the required COM GUID storage ourselves via <initguid.h>. Each GUID must be defined in exactly one translation unit across the program: * griffin.c picks it up for the griffin unity build (every RetroArch-msvc*.vcxproj project) via #include <initguid.h> inserted after the license header. * gfx/common/dx_guids.c is a new dedicated TU for the non-griffin build (Makefile.common: MinGW, command-line MSVC, cross-compile). Both paths are guarded against UWP/WinRT (WINAPI_FAMILY != DESKTOP_APP) and Xbox (_XBOX) targets, where either dxguid.lib ships with the modern Windows SDK or GUIDs come from a console SDK. Changes: * gfx/common/dx_guids.c: new file, emits GUID storage for the non-griffin build path. Guarded against Xbox, UWP, and griffin so it compiles to an empty TU on those targets. Includes the DX headers in newest-to-oldest order (D3D12, D3D11, D3D10, D3D9, D3D8) so that d3d11.h's transitive include of d3d10_1.h lands before d3d10.h is seen, satisfying the SAL-annotation ordering check inside d3d10_1.h. * griffin/griffin.c: add gated #include <initguid.h> after the license block so griffin unity builds emit GUIDs before the first DirectX header is pulled in (d3d*.c includes start around line 432, dinput at 701, xaudio at 889). * Makefile.common: add gfx/common/dx_guids.o to the Win32 OBJ list; remove the now-dead HAVE_DX_COMMON bookkeeping (four assignments from the HAVE_DSOUND, HAVE_DINPUT, HAVE_D3D9, and HAVE_D3D8 blocks, plus the conditional that appended -ldxguid to LIBS). * audio/drivers/dsound.c: remove #pragma comment(lib, "dxguid"). This MSVC-specific directive embeds a library dependency in the .obj file and would silently re-introduce dxguid.lib at link time even after it was stripped from the project files. * pkg/msvc/**/*.vcxproj, pkg/msvc/**/*.vcproj: remove dxguid.lib from AdditionalDependencies in every desktop MSVC project from 2003 through 2022. The Xbox 360 and Xbox One project files are left untouched — their console SDKs handle GUIDs separately.
Commit e8191ce ("Add this") routes griffin.c's d3dcompiler_common.c inclusion through an HAVE_D3D9+HAVE_HLSL arm and redirects dxgi_common.h to the bundled <dxsdk/dxgi1_6.h>. On legacy MSVC toolchains (2003-2010) this surfaces a new SAL-annotation parse cascade: dxgi1_2.h through dxgi1_6.h reference SAL 2 macros that those toolchains do not know
a robust d3d9_hlsl driver
…family The existing dxsdk_sal_compat.h was gated on `_MSC_VER < 1700`, which made the entire shim body invisible to GCC/MinGW toolchains. This worked because mingw-w64's <sal.h> historically provided the SAL macros the bundled DX headers reference -- but not all mingw-w64 releases are complete. MSYS2's current mingw-w64 sal.h omits the legacy SAL 1 form `_Inout_opt_bytecount_`, which the bundled d3d11.h references, so the MXE/MinGW cross-build of gfx/common/dx_guids.c fails to parse d3d11.h. Fix the gate and extend macro coverage: * Replace the `_MSC_VER < 1700` gate with a portable form that pulls in <sal.h> via `__has_include` on non-MSVC toolchains. The shim body then runs on every toolchain, and the per-macro `#ifndef` guards prevent redefinition wherever the toolchain already provides the macro. Net effect on MSVC 2012+ and on complete mingw-w64 releases: no-op. On MSVC 2003-2010 and incomplete mingw-w64 releases: missing macros are stubbed. * Add the SAL 1 underscore-prefix bytecount/ecount family (`_In_bytecount_[_opt_]`, `_Out_bytecount_[_opt_]`, `_Inout_bytecount_[_opt_]`, `_Inout_opt_bytecount_`, and the four `_ecount_[_opt_]` equivalents). `_Inout_opt_bytecount_` is the one actually observed in the MXE/MSYS2 build log; the siblings are added together because they commonly surface in the same parse cascade once any one of them fails. * Update the file comment to document MinGW-w64 coverage and the no-op behavior on MSVC 2012+. Also propagate `#include "dxsdk_sal_compat.h"` to the bundled DX headers that didn't already have it: * gfx/include/dxsdk/d3d10.h, d3d10_1.h, d3d11.h, d3d11shader.h, d3d12.h, d3d12shader.h, d3dcompiler.h: add the shim include at line 1, ahead of the MIDL-generated preamble. The dxgi1_*.h, d3dcommon.h, and dxgi.h headers already got this in an earlier commit; d3d8.h and d3d9.h are not bundled. Downstream effect: * MXE/MSYS2 mingw-w64 cross-build: resolves the `unknown type name '_Inout_opt_bytecount_'` error in gfx/common/dx_guids.c's transitive include of d3d11.h. * MSVC 2003-2010: unchanged (every new stub is #ifndef-guarded, and the SAL 2 additions from the prior dxgi patch remain intact). * MSVC 2012+: unchanged (<sal.h> provides everything, stubs no-op).
…tive
The Metal driver had a narrow threading issue when used with the
threaded video wrapper. Unlike the other graphics APIs where
both load and unload have races, Metal's case is specifically
about the shared blit command buffer used during mipmapped
texture creation.
The race
========
When metal_load_texture is called with a TEXTURE_FILTER_MIPMAP_*
filter, the execution path reaches:
- [self newTexture:image mipmapped:YES] in metal_renderer.m
- which creates an MTLTexture (thread-safe on MTLDevice)
- calls replaceRegion: (thread-safe)
- accesses self.blitCommandBuffer (a lazily-created
MTLCommandBuffer stored on the shared Context instance)
- encodes mipmap generation via [cb blitCommandEncoder]
MTLCommandBuffer and MTLCommandEncoder are documented as
single-thread-only. The blit command buffer is committed and
nulled at frame end, on the video thread:
- (void)end { ... [_blitCommandBuffer commit];
_blitCommandBuffer = nil; ... }
If the main thread calls metal_load_texture mid-frame and
encodes mipmap commands on the same blit command buffer that
the video thread is about to commit, this is undefined
behaviour per Apple's Metal threading rules.
Other Metal paths are already safe
==================================
Most of the Metal driver operations are fine:
- [_device newTextureWithDescriptor:] is thread-safe
- [texture replaceRegion:] is thread-safe and synchronous
- metal_unload_texture releases an ARC-retained handle; Metal
command buffers retain their referenced resources, so the
underlying MTLTexture stays alive until the GPU is done.
No cross-thread race on unload (unlike D3D12/Vulkan where
command buffers don't refcount resources).
The only remaining race is the blit command buffer sharing
during mipmapped load.
The fix
=======
Route metal_load_texture through video_thread_texture_handle
when threaded video is active, matching the pattern established
in the D3D10/11/12 and Vulkan patches. This serialises all
Context.blitCommandBuffer access on the video thread.
Implementation:
- Split metal_load_texture into an internal function that does
the actual texture creation, and an outer function that
dispatches.
- metal_texture_cmd_t carries video_data (as void* to avoid
ARC issues with Obj-C refs in C structs), image, filter_type,
and handle (output).
- Handle is passed back via cmd->handle rather than the int
return channel of custom_command_method_t to avoid truncation
on 64-bit Apple platforms.
- The dispatch happens unconditionally for any threaded load,
not just mipmapped ones -- consistent with other drivers and
avoids branching on filter type.
- metal_unload_texture is left unchanged (ARC refcounting is
sufficient).
Depends on: 8d55232 (VIDEO/THREAD: barrier + wrapper hardening),
which hardens video_thread_texture_handle to be safe when the
wrapper is inactive.
Cost: texture load now blocks briefly while the video thread
processes the dispatched command. Texture loads happen during
menu navigation, not frame-critical paths, so latency is
imperceptible.
No behavioural change on non-threaded video.
gfx/drivers/d3d12.c carried a #ifdef __MINGW32__ / #if __GNUC__ < 12 block that manually emitted D3D12 COM GUIDs via DECLSPEC_SELECTANY to work around MinGW not shipping dxguid.lib. With the introduction of gfx/common/dx_guids.c (which emits all DirectX GUIDs centrally via <initguid.h>), this block became redundant and caused multiple- definition link errors on MinGW/MXE builds: dx_guids.o: multiple definition of `IID_ID3D12Device' d3d12.o: first defined here On MSVC the linker silently deduplicated DECLSPEC_SELECTANY symbols against initguid.h storage, so the collision only manifested on GCC/MinGW where both are strong definitions. Remove the entire block (19 standard IID_* GUIDs plus 7 debug-only GUIDs that were already dead code on MinGW due to their inner !defined(__MINGW32__) guard). dx_guids.c is now the sole emitter of standard D3D12 GUIDs on all platforms. The libretro_IID_IDXGIFactory5 in d3d11.c and libretro_IID_IDXGIOutput6 in dxgi_common.c are unaffected — they use custom prefixed names that do not collide with dx_guids.c.
…ideo active font_driver_free calls renderer->free (e.g. d3d12_font_free, gl2_raster_font_free, d3d11_font_free) directly on the calling thread. Menu drivers call this from context_destroy or context_reset_internal on the main thread, while the video thread is still rendering frames. This produces driver-specific races: GL1/GL2/GL3: The GL context can only be current on one thread. gl2_raster_font_free calls make_current(true) to steal the context from the video thread, then calls glDeleteTextures. If the video thread is mid-frame, its subsequent GL calls are invalid until context is restored. D3D11: The ImmediateContext is explicitly not thread-safe per Microsoft's docs. d3d11_font_free calls Release on the font texture's COM objects while the video thread draws with the same context — undefined behaviour. D3D10: Same as D3D11, though slightly less severe because D3D10's device has an internal mutex. Release still races with pending GPU draws. D3D12: d3d12_font_free uses a GPU fence wait with ++fenceValue. The non-atomic increment from the main thread races with the video thread's own fence signalling, potentially skipping or duplicating fence values. Vulkan: vulkan_destroy_texture (called from font free) uses vkQueueWaitIdle under queue_lock, which drains submitted work but not command buffers currently being recorded on the video thread. Metal: Blit command buffer access can race (same as texture load). However, font glyph textures in the Metal driver use replaceRegion: (thread-safe) rather than the blit command encoder, so this is lower risk. Fix: in font_driver_free, when threaded video is active, dispatch the renderer->free call to the video thread via video_thread_texture_handle. This serialises the font resource destruction with the video thread's frame rendering, the same pattern used for texture load/unload. This is a single-point fix in font_driver.c that covers ALL video driver backends at once, rather than patching each driver's font_free individually. The font_driver_free function is the natural place for this because it already checks is_threaded and is the single entry point for all font destruction. Implementation: - font_free_cmd_t packages the renderer pointer, renderer_data, and is_threaded flag. - font_driver_free_wrap is the CMD_CUSTOM_COMMAND callback. Calls renderer->free on the video thread. - When is_threaded is true and renderer->free exists, dispatch via video_thread_texture_handle. The font_data_t struct itself (renderer/renderer_data pointers + malloc'd memory) is cleaned up on the main thread after the dispatch returns -- only the GPU-touching renderer->free needs serialisation. - video_thread_texture_handle is self-safe (falls back to direct call when the wrapper is not active, and handles same-thread calls without deadlock). - Non-threaded path is unchanged. This fixes the mid-session font scale change crash on D3D12 that was previously only mitigated by XMB's pending_context_reset counter. It also eliminates the GL context-stealing race in gl2_raster_font_free. Depends on: 8d55232 (VIDEO/THREAD: barrier + wrapper hardening). No behavioural change on non-threaded video.
Fixes #18761 All display-related Vulkan pipelines (font, alpha_blend, menu shaders including ribbon/snow/bokeh) were being built against vk->sdr_render_pass, which always uses VK_FORMAT_B8G8R8A8_UNORM. On the non-HDR rendering path (the common case on Linux KMS, Wayland, X11, and most platforms), the active render pass at draw time is vk->render_pass, which uses the actual swapchain format -- vk->context->swapchain_format. When the swapchain format happens to be B8G8R8A8_UNORM, the pipeline/render-pass formats match and everything works. But on some platforms (particularly KMS on certain GPUs), the swapchain format is VK_FORMAT_R8G8B8A8_UNORM instead. In that case: - Pipeline built for: B8G8R8A8_UNORM (via sdr_render_pass) - Active render pass: R8G8B8A8_UNORM (via render_pass) This is a render pass compatibility violation per Vulkan spec (VUID-vkCmdDraw-renderPass-02684). The result is driver- dependent: RADV on AMD typically segfaults, NVIDIA may silently produce wrong output, Intel may report a validation error. The crash manifests as a segfault when any menu shader pipeline is active (ribbon, simple ribbon, snow, bokeh, snowflake) and the user navigates the XMB Quick Menu -- any draw that uses the menu pipeline triggers the incompatible render pass. Fix: build all display pipelines against vk->render_pass (the swapchain render pass) instead of vk->sdr_render_pass. This ensures the pipelines match the active render pass on the non-HDR path regardless of swapchain format. Two changes: 1. Line 3372: Initial pipe.renderPass assignment changed from vk->sdr_render_pass to vk->render_pass. This affects the font pipeline, alpha_blend pipeline, and display pipelines 0-3. 2. After the #ifdef VULKAN_HDR_SWAPCHAIN block (which temporarily changes pipe.renderPass to readback_render_pass and sdr_render_pass for HDR-specific pipelines), restore pipe.renderPass to vk->render_pass before building the menu shader pipelines (indices 6+). Without this, HDR builds would leave pipe.renderPass pointing at sdr_render_pass, defeating the fix. Impact on HDR: the HDR compositing path renders menu content into an offscreen B8G8R8A8 buffer using sdr_render_pass (line 6523). When the swapchain format differs from B8G8R8A8, the menu pipelines built against render_pass would be incompatible with sdr_render_pass in this path. However: - HDR display pipelines (indices 4-5) are built separately against render_pass at line 3470, which is correct. - The HDR menu compositing path uses the standard alpha_blend pipeline for the menu texture overlay (line 6563), which now matches render_pass. - The HDR offscreen compositing step at line 6523 is a separate render pass instance that doesn't use the menu shader pipelines (ribbon/snow/bokeh are drawn earlier in the frame). No behavioural change when swapchain format is B8G8R8A8_UNORM (the common case on X11/Wayland with NVIDIA and Intel), because both render passes have the same format and are therefore compatible.
Follow-up to d250194 (build display pipelines against swapchain render pass). That commit fixed the KMS crash (#18761) by building all display/menu pipelines against vk->render_pass (the swapchain format) instead of vk->sdr_render_pass (hardcoded B8G8R8A8_UNORM). This was correct for the non-HDR path but left a gap in the HDR offscreen compositing path. When HDR is active, the menu is rendered into a B8G8R8A8 offscreen buffer under sdr_render_pass, then composited onto the HDR swapchain. If the swapchain format differs from B8G8R8A8 (which is the common HDR case -- the swapchain uses a wide-gamut format like R16G16B16A16_SFLOAT or A2B10G10R10_UNORM), the main pipeline set (built against render_pass) is incompatible with sdr_render_pass. Using those pipelines in the HDR offscreen render pass would be the same class of render-pass compatibility violation that d250194 fixed for the non-HDR path. Fix: when VK_CTX_FLAG_HDR_SUPPORT is set, build a parallel set of display/menu pipelines against sdr_render_pass. At runtime, switch to the SDR pipeline variants while rendering into the HDR offscreen buffer, and switch back when the render pass ends. New storage: - vk->pipelines.font_sdr - vk->pipelines.alpha_blend_sdr - vk->display.pipelines_sdr[9 * 2] These are created alongside the existing pipeline set during vulkan_init_pipelines, using identical shader modules and pipeline state -- only pipe.renderPass differs. Destroyed in vulkan_deinit_pipelines. Runtime switching via VK_FLAG_SDR_PIPELINE (new flag, bit 16): - Set after vkCmdBeginRenderPass with sdr_render_pass on the HDR offscreen compositing path - Cleared before vkCmdEndRenderPass - Checked in four draw paths: * gfx_display_vk_draw menu shader case (ribbon/snow/bokeh) * gfx_display_vk_draw default case (display quads) * vulkan_raster_font_render_msg (font pipeline) * vulkan_frame menu texture overlay (alpha_blend pipeline) Non-HDR builds: all SDR pipeline code is #ifdef VULKAN_HDR_SWAPCHAIN guarded. Zero impact on builds without HDR support. Non-HDR runtime: VK_FLAG_SDR_PIPELINE is never set. All draw paths use the main pipeline set unconditionally. Zero overhead. HDR with B8G8R8A8 swapchain (unusual but possible): render_pass and sdr_render_pass have the same format, so both pipeline sets are identical. No visual difference, small extra memory for the duplicate pipelines. Cost: 18 additional VkPipeline objects + 2 named pipelines when HDR is supported. One-time creation cost at driver init. Runtime cost is one flag check per draw call on HDR paths only.
Fixes startup crash on Windows Vulkan introduced by 8fa8418. Two bugs in the SDR pipeline creation path when HDR is supported: 1. SDR display pipelines 0-3 were created after the alpha_blend fragment shader module had already been destroyed. The alpha_blend_sdr pipeline was created, then the fragment shader module was immediately destroyed (line 3692 in the original patch). The subsequent display pipeline loop (pipelines_sdr[0..3]) still referenced that destroyed module through the pipe struct, producing an invalid VkShaderModule handle passed to vkCreateGraphicsPipelines. Fix: keep both the vertex and fragment shader modules alive through the display 0-3 loop, then destroy both afterward. 2. After the SDR menu shader loop (pipelines_sdr[6+]), an extra vkDestroyShaderModule call destroyed shader_stages[0].module which had already been destroyed in the loop's last iteration at line 3782. Double-free on a VkShaderModule handle. Fix: remove the spurious destroy after the loop. Both bugs only manifest when VK_CTX_FLAG_HDR_SUPPORT is set (Windows with HDR-capable displays, or any platform where the Vulkan driver reports HDR swapchain support). Non-HDR paths are unaffected because the entire SDR pipeline block is guarded by the HDR support check.
The SDR font pipeline introduced in 8fa8418 was created with VK_PRIMITIVE_TOPOLOGY_TRIANGLE_STRIP instead of TRIANGLE_LIST. The font and alpha_blend pipelines require TRIANGLE_LIST topology for indexed quad rendering (indices [0,1,2,2,1,3] per quad). The initial topology is set to TRIANGLE_LIST at line 3321, and the original font pipeline at line 3451 inherits it correctly. However, the SDR pipeline creation block runs after the menu shader loop (lines 3514-3646), which sets input_assembly.topology to TRIANGLE_STRIP in its last iteration (i=11, topology = i&1 = STRIP). The SDR font pipeline at line 3678 inherited this stale STRIP topology. With TRIANGLE_STRIP, the indexed [0,1,2,2,1,3] pattern produces degenerate triangles and incorrect winding, causing subtle glyph positioning artifacts visible as ~1px character shifts in the Ozone menu under Vulkan HDR. The effect was most noticeable on characters with vertical strokes ('l', 'n', 'd') and punctuation. Fix: reset input_assembly.topology to VK_PRIMITIVE_TOPOLOGY_ TRIANGLE_LIST before creating the SDR font and alpha_blend pipelines. Only affects HDR builds with VK_CTX_FLAG_HDR_SUPPORT set. Non-HDR path is unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )