Skip to content

[Windows] Support CPU shared memory (Client/Frontend)#7048

Open
fpetrini15 wants to merge 3302 commits intomainfrom
fpetrini-win-cpu-shm
Open

[Windows] Support CPU shared memory (Client/Frontend)#7048
fpetrini15 wants to merge 3302 commits intomainfrom
fpetrini-win-cpu-shm

Conversation

@fpetrini15
Copy link
Copy Markdown
Contributor

@fpetrini15 fpetrini15 commented Mar 27, 2024

Goal: Support CPU shared memory between the server and client for Windows

Sub-goals: Modify L0_shared_memory to run on bare-metal Windows using only Python.

Client changes: triton-inference-server/client#551

Some things to note:

  • When I can verify that the Linux tests pass using only the Python script, I will remove test.sh
  • L0_shared_memory uses a graphdef model by default. I swapped it with Python so that it would be supported on both Windows and Linux. I still need to go back and investigate how the model ends up in L0_shared_memory (not generated by script) and remove it.
  • Some of the default paths need to be modified to reflect the testing environment and will be modified pre-merge.

kthui and others added 30 commits July 24, 2023 14:03
* Update README and versions for 2.36.0 / 23.07

* Update Dockerfile.win10.min

* Fix formating issue

* fix formating issue

* Fix whitespaces

* Fix whitespaces

* Fix whitespaces
* Reduce instance count to 1 for python bls model loading test

* Add comment when calling unload
* Fix queue test to expect exact number of failures

* Increase the execution time to more accurately capture requests
…yment (fix #6047) (#6100)

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
…6063)

* Adding tests for bls

* Added fixme, cleaned previous commit

* Removed unused imports

* Fixing commit tree:
Refactor code, so that OTel tracer provider is initialized only once
Added resource cmd option, testig
Added docs

* Clean up

* Update docs/user_guide/trace.md

Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

* Revision

* Update doc

* Clean up

* Added ostream exporter to OpenTelemetry for testing purposes; refactored trace tests

* Added opentelemetry trace collector set up to tests; refactored otel exporter tests to use OTel collector instead of netcat

* Revising according to comments

* Added comment regarding 'parent_span_id'

* Added permalink

* Adjusted test

---------

Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Add tests for python 3.8-3.11 for L0_python_backends
* Improve L0_backend_python debugging

* Use utils function for artifacts collection
Update docs with NVAIE messaging
…#6140)

* Remove test checking for --shape option

* Remove the entire test
…same time (#6150)

* Add test when unload/load requests for same model received the same time

* Add test_same_model_overlapping_load_unload

* Use a load/unload stress test instead

* Pre-merge test name update

* Address pre-commit error

* Revert "Address pre-commit error"

This reverts commit 781cab1.

* Record number of occurrence of each exception
* Add end-to-end CI test for decoupled model support

* Address feedback
* added debugging guide

* Run pre-commit

---------

Co-authored-by: David Yastremsky <dyastremsky@nvidia.com>
* Add utility functions for outlier removal

* Fix functions

* Add newline to end of file
* Testing: add gc collect to make sure gpu tensor is deallocated

* Address comment
* Initial commit

* Cleanup using new standard formatting

* QA test restructuring

* Add newline to the end of test.sh

* HTTP/GRCP protocol changed to pivot on ready status & error status. Log file name changed in qa test.

* Fixing unhandled error memory leak

* Handle index function memory leak fix
@fpetrini15 fpetrini15 force-pushed the fpetrini-win-cpu-shm branch from 15f94bb to a5b6b7e Compare April 11, 2024 17:18
rmccorm4 and others added 3 commits April 11, 2024 10:50
* Add async execute decoupled test

* Add decoupled bls async exec test

* Enhance test with different durations for concurrent executes
Add trace_mode and trace_config to getTraceSettingsAPI

---------

Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
triton_client = httpclient.InferenceServerClient(_url, verbose=True)
# Custom setup method to allow passing of parameters
def _setUp(self, protocol, log_file_path):
self._tritonserver_ipaddr = os.environ.get("TRITONSERVER_IPADDR", "localhost")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be configurable in practice? Do we expect to use shared memory for anything other than a co-located server on localhost?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBD: Currently on the Windows testing side of things, it's passed in as a variable and is different from "localhost". Still trying to get a CI pipeline up to see the new behavior for this test in particular. Will remove if no issue.

self._build_model_repo()
self._build_server_args()
self._shared_memory_test_server_log = open(log_file_path, "w")
self._server_process = util.run_server(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does util.run_server interact with test.sh also starting server? Is there conflict or issue there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe they should overlap. For this test my ultimate goal is to remove test.sh entirely.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this also getting run in the linux case that runs test.sh? or is there changes on the gitlab-side to not run test.sh at all?

Copy link
Copy Markdown
Contributor Author

@fpetrini15 fpetrini15 Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see your point. There are changes on the gitlab side such that test.sh will not run at all for Windows. I will attempt to change the Linux test case so that it also will not run test.sh

Comment on lines +95 to +97
backend_dir = "C:\\opt\\tritonserver\\backends"
model_dir = "C:\\opt\\tritonserver\\qa\\L0_shared_memory\\models"
self._server_executable = "C:\\opt\\tritonserver\\bin\\tritonserver.exe"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably more of a random note or follow-up, but I was under the impression something like Pathlib.Path("/opt/tritonserver/backends") would translate to "C:\\opt\\tritonserver\\backends" for free when run on Windows. If so you could probably condense the cases to work for both.

Did you see otherwise?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I believe you are right. ATM they are set to my local path and were hardcoded for convenience. They need to be modified and will once I determine the CI environment.

# Constant members
shared_memory_test_client_log = Path(os.getcwd()) / "client.log"
model_dir_path = Path(os.getcwd()) / "models"
model_source_path = Path(os.getcwd()).parents[0] / "python_models/add_sub/model.py"
Copy link
Copy Markdown
Contributor

@rmccorm4 rmccorm4 Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future follow-up as we expand python utilities for CI testing, but might be nice to have some kind of utils.relative_path([path, to, thing]).

ex maybe something like this:

model_dir_path = utils.relative_path("models")
model_source_path = utils.relative_path("..", "python_models", "add_sub", "model.py")

nv-kmcgill53
nv-kmcgill53 previously approved these changes Apr 11, 2024
Copy link
Copy Markdown
Contributor

@nv-kmcgill53 nv-kmcgill53 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great work on this!

Comment thread src/shared_memory_manager.h
Comment thread src/shared_memory_manager.cc Outdated
Copy link
Copy Markdown
Contributor

@GuanLuo GuanLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, can be addressed in the future PR that adds clean up logic

Comment thread src/shared_memory_manager.cc Outdated
Comment thread src/shared_memory_manager.cc Outdated
@Octoslav
Copy link
Copy Markdown

@fpetrini15
Hello! Any updates?

@Octoslav
Copy link
Copy Markdown

@rmccorm4 @GuanLuo @nv-kmcgill53
Hello! Sorry for tagging everyone, but this PR seems to be stalled. Do you consider it ready to merge? If not, maybe my team could help complete it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.