test: Client-side input shape/element validation by yinggeh · Pull Request #7427 · triton-inference-server/server

yinggeh · 2024-07-09T01:22:42Z

What does the PR do?

Add client input size check to make sure input shape byte size matches input data byte size.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

test

Related PRs:

triton-inference-server/client#742

Where should the reviewer start?

Should look at triton-inference-server/client#742 first.

Test plan:

n/a

CI Pipeline ID:
17202351

Caveats:

Shared memory byte size checks for string inputs is not implemented.

Background

Stop malformed input request at client side before sending to the server.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to #7171

* Fix gRPC test failure and refactor * Add gRPC AsyncIO cancellation tests * Better check if a request is cancelled * Use f-string

* Fixing torch version for vllm

* Switch Jetson model TensorRT models generation to container * Adding missed file * Fix typo * Fix typos * Remove extra spaces * Fix typo

* Ensure notify_state_ gets properly destructed * Fix inflight state tracking to properly erase states * Prevent removing the notify_state from being erased * Wrap notify_state_ object within unique_ptr

* TRTLLM backend post release * TRTLLM backend post release * Update submodule url for permission issue * Update submodule url * Fix up * Not using postbuild function to workaround submodule url permission issue

Co-authored-by: Neelay Shah <neelays@nvidia.com>

* Minor fix for L0_model_config

* Test with different sizes of CUDA memory pool * Check the server log for error message * Improve debugging * Fix syntax

Co-authored-by: dyastremsky <58150256+dyastremsky@users.noreply.github.com> Co-authored-by: Ryan McCormick <mccormick.codes@gmail.com>

* Update README and versions for 23.10 branch (#6399) * Cherry-picking vLLM backend changes (#6404) * Update build.py to build vLLM backend (#6394) * Add Python backend when vLLM backend built (#6397) --------- Co-authored-by: dyastremsky <58150256+dyastremsky@users.noreply.github.com> * Add documentation on request cancellation (#6403) (#6407) * Add documentation on request cancellation * Include python backend * Update docs/user_guide/request_cancellation.md * Update docs/user_guide/request_cancellation.md * Update docs/README.md * Update docs/user_guide/request_cancellation.md * Remove inflight term from the main documentation * Address review comments * Fix * Update docs/user_guide/request_cancellation.md * Fix --------- Co-authored-by: Iman Tabrizian <iman.tabrizian@gmail.com> Co-authored-by: Neelay Shah <neelays@nvidia.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com> * Fixes in request cancellation doc (#6409) (#6410) * TRT-LLM backend build changes (#6406) (#6430) * Update url * Debugging * Debugging * Update url * Fix build for TRT-LLM backend * Remove TRTLLM TRT and CUDA versions * Fix up unused var * Fix up dir name * FIx cmake patch * Remove previous TRT version * Install required packages for example models * Remove packages that are only needed for testing * Fixing vllm build (#6433) (#6437) * Fixing torch version for vllm Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com> * Update TRT-LLM backend url (#6455) (#6460) * TRTLLM backend post release * TRTLLM backend post release * Update submodule url for permission issue * Update submodule url * Fix up * Not using postbuild function to workaround submodule url permission issue * remove redundant lines * Revert "remove redundant lines" This reverts commit 86be7ad. * restore missed lines * Update build.py Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com> * Update build.py Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com> --------- Co-authored-by: Tanmay Verma <tanmay2592@gmail.com> Co-authored-by: dyastremsky <58150256+dyastremsky@users.noreply.github.com> Co-authored-by: Iman Tabrizian <iman.tabrizian@gmail.com> Co-authored-by: Neelay Shah <neelays@nvidia.com> Co-authored-by: Ryan McCormick <rmccormick@nvidia.com> Co-authored-by: Jacky <18255193+kthui@users.noreply.github.com> Co-authored-by: Kris Hung <krish@nvidia.com> Co-authored-by: Katherine Yang <80359429+jbkyang-nvi@users.noreply.github.com> Co-authored-by: Olga Andreeva <124622579+oandreeva-nv@users.noreply.github.com>

…ycle) (#6490) * Test torch allocator gpu memory usage directly rather than global gpu memory for more consistency

* Add testing backend and test * Add test to build / CI. Minor fix on L0_http * Format. Update backend documentation * Fix up * Address comment * Add negative testing * Fix up

…n from test script. (#6499)

* Use postbuild function * Remove updating submodule url

* Added testing for python_backend autocomplete: optional input and model_transaction_policy

Co-authored-by: Francesco Petrini <francescogpetrini@gmail.com>

* Fixing L0_io

* Bumped vllm version * Add python-bsed backends testing * Add python-based backends CI * Fix errors * Add vllm backend * Fix pre-commit * Modify test.sh * Remove vllm_opt qa model * Remove vLLM ackend tests * Resolve review comments * Fix pre-commit errors * Update qa/L0_backend_python/python_based_backends/python_based_backends_test.py Co-authored-by: Tanmay Verma <tanmay2592@gmail.com> * Remove collect_artifacts_from_subdir function call --------- Co-authored-by: oandreeva-nv <oandreeva@nvidia.com> Co-authored-by: Tanmay Verma <tanmay2592@gmail.com>

… pairs (similar to gRPC)

* Add boost-filesystem

)

rmccorm4

Left a couple questions

…e check (#7444)

chore: PA Migration From Client

…backend_python bls test (#7485)

GuanLuo · 2024-08-06T02:35:29Z

-}
-EOL
-
-cp -r $DATADIR/qa_model_repository/graphdef_object_int32_int32 models/.


Why removed?

Use model "simple_identity" instead to test string inputs.

GuanLuo · 2024-08-06T02:36:50Z

+            if client_type == "http":
+                triton_client = tritonhttpclient.InferenceServerClient("localhost:8000")
+            else:
+                triton_client = tritongrpcclient.InferenceServerClient("localhost:8001")
+
+            # Example using BYTES input tensor with utf-8 encoded string that
+            # has an embedded null character.
+            null_chars_array = np.array(
+                ["he\x00llo".encode("utf-8") for i in range(16)], dtype=np.object_
+            )
+            null_char_data = null_chars_array.reshape([1, 16])
+            identity_inference(triton_client, null_char_data, True)  # Using binary data
+            identity_inference(triton_client, null_char_data, False)  # Using JSON data
+
+            # Example using BYTES input tensor with 16 elements, where each
+            # element is a 4-byte binary blob with value 0x00010203. Can use
+            # dtype=np.bytes_ in this case.
+            bytes_data = [b"\x00\x01\x02\x03" for i in range(16)]
+            np_bytes_data = np.array(bytes_data, dtype=np.bytes_)
+            np_bytes_data = np_bytes_data.reshape([1, 16])
+            identity_inference(triton_client, np_bytes_data, True)  # Using binary data
+            identity_inference(triton_client, np_bytes_data, False)  # Using JSON data


What is this testing?

Copied from client/src/python/examples/simple_http_string_infer_client.py. It looks like the example demonstrated two ways of preparing string input data. I'll remove one of them.

rmccorm4 · 2024-08-06T17:33:50Z

+            inputs[0].set_shape([2, 8])
+            inputs[1].set_shape([2, 8])
+
+            with self.assertRaises(InferenceServerException) as e:


Suggested change

with self.assertRaises(InferenceServerException) as e:

# If number of elements (volume) is correct but shape is wrong, the core will return an error.

with self.assertRaises(InferenceServerException) as e:

kthui and others added 30 commits October 13, 2023 13:56

Add gRPC AsyncIO request cancellation tests (#6408)

0956f95

* Fix gRPC test failure and refactor * Add gRPC AsyncIO cancellation tests * Better check if a request is cancelled * Use f-string

Fix L0_implicit_state (#6427)

ccbae03

Fixing vllm build (#6433)

c112666

* Fixing torch version for vllm

Switch Jetson model TensorRT models generation to container (#6378)

3aba5f4

* Switch Jetson model TensorRT models generation to container * Adding missed file * Fix typo * Fix typos * Remove extra spaces * Fix typo

Bumped vllm version (#6444)

46f93e9

Adjust test_concurrent_same_model_load_unload_stress (#6436)

cf85998

Adding emergency vllm latest release (#6454)

e29c89b

Fix notify state destruction and inflight states tracking (#6451)

b792c32

* Ensure notify_state_ gets properly destructed * Fix inflight state tracking to properly erase states * Prevent removing the notify_state from being erased * Wrap notify_state_ object within unique_ptr

Update TRT-LLM backend url (#6455)

566facd

* TRTLLM backend post release * TRTLLM backend post release * Update submodule url for permission issue * Update submodule url * Fix up * Not using postbuild function to workaround submodule url permission issue

Added docs on python based backends (#6429)

7e7ee88

Co-authored-by: Neelay Shah <neelays@nvidia.com>

L0_model_config Fix (#6472)

383850d

* Minor fix for L0_model_config

Add test for Python model parameters (#6452)

8c37608

Test Python BLS with different sizes of CUDA memory pool (#6276)

2f7f396

* Test with different sizes of CUDA memory pool * Check the server log for error message * Improve debugging * Fix syntax

Add documentation for K8s-onprem StartupProbe (#5257)

5d6a60a

Co-authored-by: dyastremsky <58150256+dyastremsky@users.noreply.github.com> Co-authored-by: Ryan McCormick <mccormick.codes@gmail.com>

Adding structure reference to the new document (#6493)

9f04d6d

Improve L0_backend_python test stability (ensemble / gpu_tensor_lifec…

dee479d

…ycle) (#6490) * Test torch allocator gpu memory usage directly rather than global gpu memory for more consistency

Add L0_generative_sequence test (#6475)

ab4d03a

* Add testing backend and test * Add test to build / CI. Minor fix on L0_http * Format. Update backend documentation * Fix up * Address comment * Add negative testing * Fix up

Downgrade vcpkg version (#6503)

1f8507e

Collecting sub dir artifacts in GitLab yaml. Removing collect functio…

a4286b5

…n from test script. (#6499)

Use post build function for TRT-LLM backend (#6476)

11ac9f0

* Use postbuild function * Remove updating submodule url

Enhanced python_backend autocomplete (#6504)

0d8059b

* Added testing for python_backend autocomplete: optional input and model_transaction_policy

Parse reuse-grpc-port and reuse-http-port as booleans (#6511)

fa8c2b6

Co-authored-by: Francesco Petrini <francescogpetrini@gmail.com>

Fixing L0_io (#6510)

aa473f1

* Fixing L0_io

Enabling option to restrict access to HTTP APIs based on header value…

e9677ec

… pairs (similar to gRPC)

Upgrade DCGM from 2.4.7 to 3.2.6 (#6515)

659611c

Enhance GCS credentials documentations (#6526)

f1465b9

Test file override outside of model directory (#6516)

2ad2786

* Add boost-filesystem

Update ORT version to 1.16.2 (#6531)

e12d06c

test: Tests for Metrics API enhancement to include error counters (#7423

b263bfc

)

yinggeh mentioned this pull request Jul 23, 2024

refactor: Refactor core input size checks triton-inference-server/core#382

Merged

11 tasks

rmccorm4 reviewed Jul 25, 2024

View reviewed changes

Comment thread qa/L0_input_validation/test.sh

rmccorm4 reviewed Jul 25, 2024

View reviewed changes

Comment thread Dockerfile.QA

rmccorm4 reviewed Jul 25, 2024

View reviewed changes

pvijayakrish and others added 6 commits July 25, 2024 10:36

Update NGC versions post-24.07 release (#7469)

3421429

[build]: Bumping vllm version to v0.5.3.post1 (#7453)

96ef8a7

ci: Fix shape and reformat free tensor handling in the input byte siz…

f151f8a

…e check (#7444)

chore: PA Migration From Client (#7449)

b8a3629

chore: PA Migration From Client

test: Refactor cpu metrics tests to make L0_metrics more stable (#7476)

5e61a01

test: Add BF16 test for python backend (#7483)

e713208

yinggeh force-pushed the yinggeh-DLIS-6657-client-input-byte-size-check branch from f432d41 to 3863c39 Compare July 31, 2024 02:22

yinggeh requested a review from rmccorm4 July 31, 2024 02:23

rmccorm4 and others added 4 commits July 30, 2024 21:35

test: Improve L0_logging stability (#7486)

3443dd6

ci: Return custom exit code to indicate known shm leak failure in L0_…

839faf7

…backend_python bls test (#7485)

Including 'tritonserver.lib' into final package (#7491)

d4b585d

Workaround with L0_trt_reformat_free by removing shm checks

6dc2a0b

yinggeh force-pushed the yinggeh-DLIS-6657-client-input-byte-size-check branch from 17053e9 to 482409e Compare August 4, 2024 22:56

rmccorm4 changed the title ~~feat: Client input byte size checks~~ test: Client-side input shape/element validation Aug 5, 2024

rmccorm4 reviewed Aug 5, 2024

View reviewed changes

Comment thread qa/L0_input_validation/input_validation_test.py

GuanLuo reviewed Aug 6, 2024

View reviewed changes

yinggeh requested review from GuanLuo and rmccorm4 August 6, 2024 17:24

rmccorm4 reviewed Aug 6, 2024

View reviewed changes

Update tests

48c9b25

yinggeh requested a review from rmccorm4 August 6, 2024 17:48

yinggeh added PR: test Adding missing tests or correcting existing test and removed PR: feat A new feature labels Aug 7, 2024

yinggeh marked this pull request as draft September 18, 2024 18:38

pvijayakrish force-pushed the yinggeh-DLIS-6657-client-input-byte-size-check branch from dff25f4 to 48c9b25 Compare January 15, 2025 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Client-side input shape/element validation#7427

test: Client-side input shape/element validation#7427
yinggeh wants to merge 3378 commits intomainfrom
yinggeh-DLIS-6657-client-input-byte-size-check

yinggeh commented Jul 9, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

rmccorm4 left a comment

Uh oh!

Uh oh!

GuanLuo Aug 6, 2024

Uh oh!

yinggeh Aug 6, 2024

Uh oh!

GuanLuo Aug 6, 2024

Uh oh!

yinggeh Aug 6, 2024

Uh oh!

rmccorm4 Aug 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

20 participants

	with self.assertRaises(InferenceServerException) as e:
	# If number of elements (volume) is correct but shape is wrong, the core will return an error.
	with self.assertRaises(InferenceServerException) as e:

Conversation

yinggeh commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

Uh oh!

Uh oh!

rmccorm4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

GuanLuo Aug 6, 2024

Choose a reason for hiding this comment

Uh oh!

yinggeh Aug 6, 2024

Choose a reason for hiding this comment

Uh oh!

GuanLuo Aug 6, 2024

Choose a reason for hiding this comment

Uh oh!

yinggeh Aug 6, 2024

Choose a reason for hiding this comment

Uh oh!

rmccorm4 Aug 6, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

20 participants

yinggeh commented Jul 9, 2024 •

edited

Loading