test: Client-side input shape/element validation#7427
Draft
test: Client-side input shape/element validation#7427
Conversation
* Fix gRPC test failure and refactor * Add gRPC AsyncIO cancellation tests * Better check if a request is cancelled * Use f-string
* Fixing torch version for vllm
* Switch Jetson model TensorRT models generation to container * Adding missed file * Fix typo * Fix typos * Remove extra spaces * Fix typo
* Ensure notify_state_ gets properly destructed * Fix inflight state tracking to properly erase states * Prevent removing the notify_state from being erased * Wrap notify_state_ object within unique_ptr
* TRTLLM backend post release * TRTLLM backend post release * Update submodule url for permission issue * Update submodule url * Fix up * Not using postbuild function to workaround submodule url permission issue
Co-authored-by: Neelay Shah <neelays@nvidia.com>
* Minor fix for L0_model_config
* Test with different sizes of CUDA memory pool * Check the server log for error message * Improve debugging * Fix syntax
Co-authored-by: dyastremsky <58150256+dyastremsky@users.noreply.github.com> Co-authored-by: Ryan McCormick <mccormick.codes@gmail.com>
* Update README and versions for 23.10 branch (#6399) * Cherry-picking vLLM backend changes (#6404) * Update build.py to build vLLM backend (#6394) * Add Python backend when vLLM backend built (#6397) --------- Co-authored-by: dyastremsky <58150256+dyastremsky@users.noreply.github.com> * Add documentation on request cancellation (#6403) (#6407) * Add documentation on request cancellation * Include python backend * Update docs/user_guide/request_cancellation.md * Update docs/user_guide/request_cancellation.md * Update docs/README.md * Update docs/user_guide/request_cancellation.md * Remove inflight term from the main documentation * Address review comments * Fix * Update docs/user_guide/request_cancellation.md * Fix --------- Co-authored-by: Iman Tabrizian <iman.tabrizian@gmail.com> Co-authored-by: Neelay Shah <neelays@nvidia.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com> * Fixes in request cancellation doc (#6409) (#6410) * TRT-LLM backend build changes (#6406) (#6430) * Update url * Debugging * Debugging * Update url * Fix build for TRT-LLM backend * Remove TRTLLM TRT and CUDA versions * Fix up unused var * Fix up dir name * FIx cmake patch * Remove previous TRT version * Install required packages for example models * Remove packages that are only needed for testing * Fixing vllm build (#6433) (#6437) * Fixing torch version for vllm Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com> * Update TRT-LLM backend url (#6455) (#6460) * TRTLLM backend post release * TRTLLM backend post release * Update submodule url for permission issue * Update submodule url * Fix up * Not using postbuild function to workaround submodule url permission issue * remove redundant lines * Revert "remove redundant lines" This reverts commit 86be7ad. * restore missed lines * Update build.py Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com> * Update build.py Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com> --------- Co-authored-by: Tanmay Verma <tanmay2592@gmail.com> Co-authored-by: dyastremsky <58150256+dyastremsky@users.noreply.github.com> Co-authored-by: Iman Tabrizian <iman.tabrizian@gmail.com> Co-authored-by: Neelay Shah <neelays@nvidia.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com> Co-authored-by: Kris Hung <krish@nvidia.com> Co-authored-by: Katherine Yang <80359429+jbkyang-nvi@users.noreply.github.com> Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>
…ycle) (#6490) * Test torch allocator gpu memory usage directly rather than global gpu memory for more consistency
* Add testing backend and test * Add test to build / CI. Minor fix on L0_http * Format. Update backend documentation * Fix up * Address comment * Add negative testing * Fix up
…n from test script. (#6499)
* Use postbuild function * Remove updating submodule url
* Added testing for python_backend autocomplete: optional input and model_transaction_policy
Co-authored-by: Francesco Petrini <francescogpetrini@gmail.com>
* Fixing L0_io
* Bumped vllm version * Add python-bsed backends testing * Add python-based backends CI * Fix errors * Add vllm backend * Fix pre-commit * Modify test.sh * Remove vllm_opt qa model * Remove vLLM ackend tests * Resolve review comments * Fix pre-commit errors * Update qa/L0_backend_python/python_based_backends/python_based_backends_test.py Co-authored-by: Tanmay Verma <tanmay2592@gmail.com> * Remove collect_artifacts_from_subdir function call --------- Co-authored-by: oandreeva-nv <oandreeva@nvidia.com> Co-authored-by: Tanmay Verma <tanmay2592@gmail.com>
… pairs (similar to gRPC)
* Add boost-filesystem
11 tasks
rmccorm4
reviewed
Jul 25, 2024
rmccorm4
reviewed
Jul 25, 2024
rmccorm4
reviewed
Jul 25, 2024
Contributor
rmccorm4
left a comment
There was a problem hiding this comment.
Left a couple questions
chore: PA Migration From Client
f432d41 to
3863c39
Compare
17053e9 to
482409e
Compare
rmccorm4
reviewed
Aug 5, 2024
GuanLuo
reviewed
Aug 6, 2024
| } | ||
| EOL | ||
|
|
||
| cp -r $DATADIR/qa_model_repository/graphdef_object_int32_int32 models/. |
Contributor
Author
There was a problem hiding this comment.
Use model "simple_identity" instead to test string inputs.
Comment on lines
+206
to
+222
| if client_type == "http": | ||
| triton_client = tritonhttpclient.InferenceServerClient("localhost:8000") | ||
| else: | ||
| triton_client = tritongrpcclient.InferenceServerClient("localhost:8001") | ||
|
|
||
| # Example using BYTES input tensor with utf-8 encoded string that | ||
| # has an embedded null character. | ||
| null_chars_array = np.array( | ||
| ["he\x00llo".encode("utf-8") for i in range(16)], dtype=np.object_ | ||
| ) | ||
| null_char_data = null_chars_array.reshape([1, 16]) | ||
| identity_inference(triton_client, null_char_data, True) # Using binary data | ||
| identity_inference(triton_client, null_char_data, False) # Using JSON data | ||
|
|
||
| # Example using BYTES input tensor with 16 elements, where each | ||
| # element is a 4-byte binary blob with value 0x00010203. Can use | ||
| # dtype=np.bytes_ in this case. | ||
| bytes_data = [b"\x00\x01\x02\x03" for i in range(16)] | ||
| np_bytes_data = np.array(bytes_data, dtype=np.bytes_) | ||
| np_bytes_data = np_bytes_data.reshape([1, 16]) | ||
| identity_inference(triton_client, np_bytes_data, True) # Using binary data | ||
| identity_inference(triton_client, np_bytes_data, False) # Using JSON data |
Contributor
Author
There was a problem hiding this comment.
Copied from client/src/python/examples/simple_http_string_infer_client.py. It looks like the example demonstrated two ways of preparing string input data. I'll remove one of them.
rmccorm4
reviewed
Aug 6, 2024
| inputs[0].set_shape([2, 8]) | ||
| inputs[1].set_shape([2, 8]) | ||
|
|
||
| with self.assertRaises(InferenceServerException) as e: |
Contributor
There was a problem hiding this comment.
Suggested change
| with self.assertRaises(InferenceServerException) as e: | |
| # If number of elements (volume) is correct but shape is wrong, the core will return an error. | |
| with self.assertRaises(InferenceServerException) as e: |
dff25f4 to
48c9b25
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does the PR do?
Add client input size check to make sure input shape byte size matches input data byte size.
Checklist
<commit_type>: <Title>Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
triton-inference-server/client#742
Where should the reviewer start?
Should look at triton-inference-server/client#742 first.
Test plan:
n/a
17202351
Caveats:
Shared memory byte size checks for string inputs is not implemented.
Background
Stop malformed input request at client side before sending to the server.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Relates to #7171