BUG/TST: Multi-stream tests missing correct stream-ordering operations by carterbox · Pull Request #288 · CVCUDA/CV-CUDA

carterbox · 2026-04-10T16:14:25Z

Fixes dataraces in multi-stream tests which are caused by multiple streams sharing tensors without inter-stream synchronization.

I can also open this MR to an internal repo if that is preferred.

Fixes dataraces in multi-stream tests which are caused by multiple streams sharing tensors without inter-stream synchronization.

carterbox · 2026-04-10T16:15:32Z

    stream3 = cvcuda.Stream()  # create a new stream
    assert stream1 is not stream2
    assert stream1 is not stream3
+    assert stream2 is not stream3


a != b and a != c does not imply that b != c.

carterbox · 2026-04-10T16:17:07Z

-    with t.raises(Exception):
+    with t.raises(_CatchThisException):
        with stream1:
            assert cvcuda.Stream.current is stream1
            with stream2:
                assert cvcuda.Stream.current is stream2
-                raise Exception()
+                raise _CatchThisException()
            assert cvcuda.Stream.current is stream1
        assert cvcuda.Stream.current is cvcuda.Stream.default


I believe that the purpose of this test to to check that the Exception that we raised is the one that is raised. Since Exception is the base class for all exceptions, I believe that we need to make a special exception. Otherwise, catching any Exception will cause the test to pass.

carterbox · 2026-04-10T16:17:32Z

+    stream1.sync()
+    stream2.sync()
+    stream3.sync()


You want to wait for all the queued work to complete before the test returns.

carterbox · 2026-04-10T16:19:22Z

+    prev_torch_stream = None
    for _ in range(Loop):
-        for stream in streams:
+        for stream, torch_stream in zip(streams, torch_streams, strict=True):
+            if prev_torch_stream is not None:
+                torch_stream.wait_stream(prev_torch_stream)
            cvcuda.flip_into(outTensor, inTensor, -1, stream=stream)  # output x flipped
            cvcuda.flip_into(inTensor, outTensor, -1, stream=stream)  # output y flipped
+            prev_torch_stream = torch_stream


The same buffer is shared by multiple streams. If you don't synchronize the streams using events or wait, the behavior is undefined because all of the streams will modify the data simultaneously.

carterbox · 2026-04-10T16:20:25Z

            outTensor, inTensorTmp, -1, stream=stream2
        )  # output y/y flipped

+    torch_stream2.synchronize()


I don't think that torch has an implicit wait when converting from cuda to cpu tensor, so we need to have the host wait for the stream to complete before getting the result.

BUG/TST: Multi-stream tests missing correct stream-ordering operations

dd891bd

Fixes dataraces in multi-stream tests which are caused by multiple streams sharing tensors without inter-stream synchronization.

carterbox commented Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/TST: Multi-stream tests missing correct stream-ordering operations#288

BUG/TST: Multi-stream tests missing correct stream-ordering operations#288
carterbox wants to merge 1 commit intoCVCUDA:mainfrom
carterbox:multistream-wait

carterbox commented Apr 10, 2026

Uh oh!

carterbox Apr 10, 2026

Uh oh!

carterbox Apr 10, 2026

Uh oh!

carterbox Apr 10, 2026

Uh oh!

carterbox Apr 10, 2026

Uh oh!

carterbox Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carterbox commented Apr 10, 2026

Uh oh!

carterbox Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

carterbox Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

carterbox Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

carterbox Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

carterbox Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant