[WIP] Guard host managed-memory access on concurrentManagedAccess=0#1769
[WIP] Guard host managed-memory access on concurrentManagedAccess=0#1769rwgk wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
Guard host-side memset/memcmp in test helpers on CMA=0 by syncing the device before touching managed allocations. Made-with: Cursor
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
There are no flakes in 100 trials with this PR at commit b611a87: Additional sanity check: |
|
Surprise: There are also no flakes with Additional sanity check: |
|
Note: I did not rebuild between running the tests reported under
I.e. everything was exactly identical, except for the presence/absence of commit b611a87. This is reflected in all log files, e.g.:
|
|
I don't know what changed, but I cannot reproduce the flakes anymore. All details are in the log files under Closing this PR and #1576 for now. If we see the flakes again later, we can come back here. |
|
@rluo8 I reopened this PR after seeing your question regarding nvbug 5815123 I hope we can use this for testing on your machine(s). (I'm leaving this in Draft mode for now.) |
|
@rluo8 I had Cursor GPT-5.4 Extra High Fast systematically look at the logs you sent me offline (for my own reference: cuda-python-logs_2026-04-20+212854.zip). I'm copy-pasting the Cursor findings below. I think the conclusion is sufficiently strong: this PR does not help. I'll close it again for now. Caveat: I didn't comb through the logs myself. From recent experience I have sufficient confidence that the GPT-5.4 Extra High Fast results are reliable. Thanks for trying it out. The results mean we have to look for other solutions. Analysis of Rui
|
xref: #1576 (comment)
This PR is:
Add a small helper (in
helpers/buffers.py) that callsDevice.sync()(orotherwise ensures no work is in flight) before any host
memset/memcmpofmanaged memory when
concurrentManagedAccess == 0. This is targeted andkeeps behavior unchanged on CMA=1 systems.