BUG/TST: Multi-stream tests missing correct stream-ordering operations#288
BUG/TST: Multi-stream tests missing correct stream-ordering operations#288carterbox wants to merge 1 commit intoCVCUDA:mainfrom
Conversation
Fixes dataraces in multi-stream tests which are caused by multiple streams sharing tensors without inter-stream synchronization.
| stream3 = cvcuda.Stream() # create a new stream | ||
| assert stream1 is not stream2 | ||
| assert stream1 is not stream3 | ||
| assert stream2 is not stream3 |
There was a problem hiding this comment.
a != b and a != c does not imply that b != c.
| with t.raises(Exception): | ||
| with t.raises(_CatchThisException): | ||
| with stream1: | ||
| assert cvcuda.Stream.current is stream1 | ||
| with stream2: | ||
| assert cvcuda.Stream.current is stream2 | ||
| raise Exception() | ||
| raise _CatchThisException() | ||
| assert cvcuda.Stream.current is stream1 | ||
| assert cvcuda.Stream.current is cvcuda.Stream.default |
There was a problem hiding this comment.
I believe that the purpose of this test to to check that the Exception that we raised is the one that is raised. Since Exception is the base class for all exceptions, I believe that we need to make a special exception. Otherwise, catching any Exception will cause the test to pass.
| stream1.sync() | ||
| stream2.sync() | ||
| stream3.sync() |
There was a problem hiding this comment.
You want to wait for all the queued work to complete before the test returns.
| prev_torch_stream = None | ||
| for _ in range(Loop): | ||
| for stream in streams: | ||
| for stream, torch_stream in zip(streams, torch_streams, strict=True): | ||
| if prev_torch_stream is not None: | ||
| torch_stream.wait_stream(prev_torch_stream) | ||
| cvcuda.flip_into(outTensor, inTensor, -1, stream=stream) # output x flipped | ||
| cvcuda.flip_into(inTensor, outTensor, -1, stream=stream) # output y flipped | ||
| prev_torch_stream = torch_stream |
There was a problem hiding this comment.
The same buffer is shared by multiple streams. If you don't synchronize the streams using events or wait, the behavior is undefined because all of the streams will modify the data simultaneously.
| outTensor, inTensorTmp, -1, stream=stream2 | ||
| ) # output y/y flipped | ||
|
|
||
| torch_stream2.synchronize() |
There was a problem hiding this comment.
I don't think that torch has an implicit wait when converting from cuda to cpu tensor, so we need to have the host wait for the stream to complete before getting the result.
Fixes dataraces in multi-stream tests which are caused by multiple streams sharing tensors without inter-stream synchronization.
I can also open this MR to an internal repo if that is preferred.