Skip to content

BUG/TST: Multi-stream tests missing correct stream-ordering operations#288

Open
carterbox wants to merge 1 commit intoCVCUDA:mainfrom
carterbox:multistream-wait
Open

BUG/TST: Multi-stream tests missing correct stream-ordering operations#288
carterbox wants to merge 1 commit intoCVCUDA:mainfrom
carterbox:multistream-wait

Conversation

@carterbox
Copy link
Copy Markdown

Fixes dataraces in multi-stream tests which are caused by multiple streams sharing tensors without inter-stream synchronization.

I can also open this MR to an internal repo if that is preferred.

Fixes dataraces in multi-stream tests which are caused by multiple
streams sharing tensors without inter-stream synchronization.
stream3 = cvcuda.Stream() # create a new stream
assert stream1 is not stream2
assert stream1 is not stream3
assert stream2 is not stream3
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a != b and a != c does not imply that b != c.

Comment on lines -61 to 71
with t.raises(Exception):
with t.raises(_CatchThisException):
with stream1:
assert cvcuda.Stream.current is stream1
with stream2:
assert cvcuda.Stream.current is stream2
raise Exception()
raise _CatchThisException()
assert cvcuda.Stream.current is stream1
assert cvcuda.Stream.current is cvcuda.Stream.default
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the purpose of this test to to check that the Exception that we raised is the one that is raised. Since Exception is the base class for all exceptions, I believe that we need to make a special exception. Otherwise, catching any Exception will cause the test to pass.

Comment on lines +107 to +109
stream1.sync()
stream2.sync()
stream3.sync()
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to wait for all the queued work to complete before the test returns.

Comment on lines +130 to +137
prev_torch_stream = None
for _ in range(Loop):
for stream in streams:
for stream, torch_stream in zip(streams, torch_streams, strict=True):
if prev_torch_stream is not None:
torch_stream.wait_stream(prev_torch_stream)
cvcuda.flip_into(outTensor, inTensor, -1, stream=stream) # output x flipped
cvcuda.flip_into(inTensor, outTensor, -1, stream=stream) # output y flipped
prev_torch_stream = torch_stream
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same buffer is shared by multiple streams. If you don't synchronize the streams using events or wait, the behavior is undefined because all of the streams will modify the data simultaneously.

outTensor, inTensorTmp, -1, stream=stream2
) # output y/y flipped

torch_stream2.synchronize()
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that torch has an implicit wait when converting from cuda to cpu tensor, so we need to have the host wait for the stream to complete before getting the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant