-
Notifications
You must be signed in to change notification settings - Fork 1
Add cuda_buffer_backend and torch_buffer_backend for rosidl::Buffer #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
yuanknv
wants to merge
19
commits into
ros2:main
Choose a base branch
from
yuanknv:native_buffer_backends
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
0b2e189
initial implementation of cuda_buffer_backend and torch_buffer_backend
yuanknv ab2d00f
clean up
yuanknv d8b838f
bug fix
yuanknv 06d403e
Apply suggestions from code review
yuanknv 7715637
Apply suggestions from code review
yuanknv 06c267b
Update torch_buffer_backend/torch_buffer/include/torch_buffer/torch_b…
yuanknv 98e286a
address comments
yuanknv e0d3a7e
fix lints
yuanknv cb2e100
add libtorch_vendor package, update to_buffer function
yuanknv 3b4dc6b
update API and readme
yuanknv 9def199
move the libtorch_vendor to the root folder
yuanknv f8cd8c9
update from_buffer function and libtorch version
yuanknv 6f2593a
update readme
yuanknv 701188b
add torch zero-copy from_buffer param
yuanknv 34f035f
libtorch_vendor auto detect CUDA version
yuanknv f3808ae
add contributing file
yuanknv ad7aa24
remove the typesupport fastrtps dep
yuanknv 4119125
remove fastrtps dep from msgs as well
yuanknv 775cd1a
add rodil_typesupport_cpp to torch backend
yuanknv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| Any contribution that you make to this repository will | ||
| be under the Apache 2 License, as dictated by that | ||
| [license](http://www.apache.org/licenses/LICENSE-2.0.html): | ||
|
|
||
| ~~~ | ||
| 5. Submission of Contributions. Unless You explicitly state otherwise, | ||
| any Contribution intentionally submitted for inclusion in the Work | ||
| by You to the Licensor shall be under the terms and conditions of | ||
| this License, without any additional terms or conditions. | ||
| Notwithstanding the above, nothing herein shall supersede or modify | ||
| the terms of any separate license agreement you may have executed | ||
| with Licensor regarding such Contributions. | ||
| ~~~ | ||
|
|
||
| Contributors must sign-off each commit by adding a `Signed-off-by: ...` | ||
| line to commit messages to certify that they have the right to submit | ||
| the code they are contributing to the project according to the | ||
| [Developer Certificate of Origin (DCO)](https://developercertificate.org/). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,93 @@ | ||
| # rosidl_buffer_backends | ||
| Backend implementations for ROSIDL buffer types | ||
|
|
||
| CUDA and PyTorch buffer backend implementations for `rosidl::Buffer`, | ||
| enabling zero-copy GPU memory sharing between ROS 2 publishers and | ||
| subscribers. | ||
|
|
||
| ## Packages | ||
|
|
||
| - **cuda_buffer** -- Core CUDA buffer library (VMM-backed IPC memory pool, | ||
| host endpoint manager, ReadHandle/WriteHandle with CUDA event sync). | ||
| - **cuda_buffer_backend** -- BufferBackend plugin for CUDA IPC transport. | ||
| - **cuda_buffer_backend_msgs** -- ROS 2 message definitions for CUDA buffer | ||
| descriptors. | ||
| - **libtorch_vendor** -- Vendor package that downloads and installs the | ||
| pre-built LibTorch C++ distribution. | ||
| - **torch_buffer** -- Device-agnostic PyTorch buffer library wrapping device | ||
| backends with tensor metadata (shape, strides, dtype). | ||
| - **torch_buffer_backend** -- BufferBackend plugin for PyTorch tensors. | ||
| - **torch_buffer_backend_msgs** -- ROS 2 message definitions for Torch buffer | ||
| descriptors. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - A ROS 2 Rolling development environment. See the upstream | ||
| [Building ROS 2 on Ubuntu](https://docs.ros.org/en/rolling/Installation/Alternatives/Ubuntu-Development-Setup.html) | ||
| guide for the canonical source-build flow, or use the pixi workflow | ||
| shipped by the [`ros2/ros2`](https://github.com/ros2/ros2) meta-repo. | ||
| - CUDA Toolkit (>= 11.8) on the host. | ||
| - LibTorch: provided automatically by `libtorch_vendor` at build time if a | ||
| system LibTorch isn't already visible. | ||
|
|
||
| Per-package build, test, and run details live in each backend's README: | ||
|
|
||
| - [`cuda_buffer_backend/README.md`](cuda_buffer_backend/README.md) | ||
| - [`torch_buffer_backend/README.md`](torch_buffer_backend/README.md) | ||
| - Demo: [`../rosidl_buffer_backends_tutorials/README.md`](../rosidl_buffer_backends_tutorials/README.md) | ||
|
|
||
| ## API overview | ||
|
|
||
| ### CUDA buffer backend (`cuda_buffer_backend`) | ||
|
|
||
| ```cpp | ||
| #include "cuda_buffer/cuda_buffer_api.hpp" | ||
|
|
||
| // Publisher: allocate + write directly via kernel. | ||
| auto msg = cuda_buffer_backend::allocate_msg<sensor_msgs::msg::Image>(byte_count); | ||
| { | ||
| auto wh = cuda_buffer_backend::from_buffer(msg.data, stream); | ||
| my_kernel<<<...>>>(wh.get_ptr(), ...); | ||
| } // wh destructor records the write event on `stream` | ||
|
|
||
| // Publisher: copy from an existing host/device pointer into a pre-allocated buffer. | ||
| { | ||
| auto wh = cuda_buffer_backend::from_buffer(msg.data, stream); | ||
| cuda_buffer_backend::to_buffer(host_ptr, byte_count, wh, stream, | ||
| cudaMemcpyHostToDevice); | ||
| } | ||
|
|
||
| // Subscriber: read handle (waits on publisher's write event). | ||
| auto rh = cuda_buffer_backend::from_buffer(msg->data, stream); | ||
| use_data<<<...>>>(rh.get_ptr(), ...); | ||
|
|
||
| // Auto-promotion: passing a non-CUDA buffer allocates a fresh CUDA buffer | ||
| // and (for reads) copies H2D; the handle owns the new buffer via | ||
| // get_promoted_buffer(). | ||
| auto rh_any = cuda_buffer_backend::from_buffer(cpu_or_other_buf, stream); | ||
| std::shared_ptr<rosidl::Buffer<uint8_t>> promoted = rh_any.get_promoted_buffer(); | ||
| ``` | ||
|
|
||
| ### Torch buffer backend (`torch_buffer_backend`) | ||
|
|
||
| ```cpp | ||
| #include "torch_buffer/torch_buffer_api.hpp" | ||
|
|
||
| // Publisher: allocate + copy a tensor into the message. | ||
| auto msg = torch_buffer_backend::allocate_msg<sensor_msgs::msg::Image>( | ||
| {H, W, C}, torch::kByte); | ||
| torch_buffer_backend::to_buffer(msg.data, tensor); | ||
|
|
||
| // Subscriber: safe default returns an independent clone. | ||
| at::Tensor t = torch_buffer_backend::from_buffer(msg->data); | ||
|
|
||
| // Subscriber: zero-copy view when the caller is certain it will not mutate | ||
| // the tensor in place. Caller must treat the returned tensor as read-only. | ||
| at::Tensor view = torch_buffer_backend::from_buffer(msg->data, /*clone=*/false); | ||
| ``` | ||
|
|
||
| The torch backend does not cross-device-promote: the returned tensor stays | ||
| on the same device as the underlying torch buffer (CUDA or CPU). | ||
|
|
||
| ## License | ||
|
|
||
| Apache-2.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,159 @@ | ||
| # cuda_buffer_backend | ||
|
|
||
| CUDA buffer backend plugin for the ROS 2 Buffer system. Enables zero-copy GPU memory sharing between publishers and subscribers on the same host using CUDA VMM (Virtual Memory Management). | ||
|
|
||
| ## Build | ||
|
|
||
| Requires a ROS 2 Rolling source workspace; see | ||
| [Building ROS 2 on Ubuntu](https://docs.ros.org/en/rolling/Installation/Alternatives/Ubuntu-Development-Setup.html) | ||
| for the canonical setup. After cloning this repo into your workspace's | ||
| `src/` directory: | ||
|
|
||
| ```bash | ||
| # Install system dependencies (CUDA toolkit, etc.). | ||
| rosdep install --from-paths src --ignore-src -y \ | ||
| --skip-keys "fastcdr rti-connext-dds-7.7.0 urdfdom_headers qt6-svg-dev" | ||
|
|
||
| # Build the CUDA backend. | ||
| colcon build --symlink-install --packages-up-to cuda_buffer_backend | ||
| source install/setup.sh | ||
| ``` | ||
|
|
||
| ## Test | ||
|
|
||
| ```bash | ||
| colcon test --packages-select cuda_buffer cuda_buffer_backend | ||
| colcon test-result --verbose | ||
| ``` | ||
|
|
||
| ## Packages | ||
|
|
||
| | Package | Description | | ||
| |---|---| | ||
| | `cuda_buffer` | Core CUDA buffer implementation: memory pool, IPC manager, host endpoint manager, and user-facing `allocate_msg`/`from_buffer`/`to_buffer` APIs | | ||
| | `cuda_buffer_backend` | Plugin registration via `pluginlib`, endpoint discovery, and descriptor serialization | | ||
| | `cuda_buffer_backend_msgs` | ROS 2 message definition for `CudaBufferDescriptor` | | ||
|
|
||
| ## Usage | ||
|
|
||
| ### Publisher (direct write, zero-copy) | ||
|
|
||
| ```cpp | ||
| #include "cuda_buffer/cuda_buffer_api.hpp" | ||
| #include "sensor_msgs/msg/image.hpp" | ||
|
|
||
| const size_t data_size = 640 * 480 * 3; | ||
|
|
||
| sensor_msgs::msg::Image msg = | ||
| cuda_buffer_backend::allocate_msg<sensor_msgs::msg::Image>(data_size); | ||
| msg.height = 480; | ||
| msg.width = 640; | ||
| msg.encoding = "rgb8"; | ||
| msg.step = 640 * 3; | ||
|
|
||
| cuda_buffer_backend::WriteHandle wh = | ||
| cuda_buffer_backend::from_buffer(msg.data, stream); | ||
| my_kernel<<<...>>>(wh.get_ptr(), ...); | ||
|
|
||
| publisher->publish(msg); | ||
| // wh destructor records write_event on stream when it goes out of scope | ||
| ``` | ||
|
|
||
| ### Publisher (copy from existing pointer) | ||
|
|
||
| Use `to_buffer` to copy bytes from an existing pointer (host or device) into | ||
| a buffer that was already allocated (e.g. via `allocate_msg`). `to_buffer` | ||
| is a plain memcpy-through-a-WriteHandle and does **not** allocate. | ||
|
|
||
| ```cpp | ||
| sensor_msgs::msg::Image msg = | ||
| cuda_buffer_backend::allocate_msg<sensor_msgs::msg::Image>(data_size); | ||
| msg.height = 480; | ||
| msg.width = 640; | ||
| msg.encoding = "rgb8"; | ||
| msg.step = 640 * 3; | ||
|
|
||
| { | ||
| cuda_buffer_backend::WriteHandle wh = | ||
| cuda_buffer_backend::from_buffer(msg.data, stream); | ||
|
|
||
| // From a device pointer (D2D copy, default kind) | ||
| cuda_buffer_backend::to_buffer(gpu_ptr, data_size, wh, stream); | ||
|
|
||
| // Or from a host pointer (H2D copy) | ||
| // cuda_buffer_backend::to_buffer( | ||
| // host_ptr, data_size, wh, stream, cudaMemcpyHostToDevice); | ||
| } // wh destructor records the write event on `stream` | ||
|
|
||
| publisher->publish(msg); | ||
| ``` | ||
|
|
||
| ### Subscriber (read from buffer, zero-copy) | ||
|
|
||
| ```cpp | ||
| #include "cuda_buffer/cuda_buffer_api.hpp" | ||
|
|
||
| void callback(const sensor_msgs::msg::Image::SharedPtr msg) { | ||
| const rosidl::Buffer<uint8_t> & data = msg->data; | ||
| cuda_buffer_backend::ReadHandle rh = | ||
| cuda_buffer_backend::from_buffer(data, stream); | ||
| // ReadHandle constructor waits on publisher's write_event | ||
|
|
||
| my_kernel<<<...>>>(rh.get_ptr(), ...); | ||
| } // ReadHandle destructor signals publisher that GPU work is complete | ||
| ``` | ||
|
|
||
| ### Auto-promoting non-CUDA buffers | ||
|
|
||
| `from_buffer` accept any `rosidl::Buffer<T>`, not just | ||
| CUDA-backed ones. If the source is a non-CUDA buffer (e.g. the CPU fallback | ||
| path), `from_buffer` allocates a new CUDA-backed `rosidl::Buffer<uint8_t>` | ||
| and returns a handle for it. | ||
|
|
||
| ```cpp | ||
| #include "cuda_buffer/cuda_buffer_api.hpp" | ||
|
|
||
| void callback(const sensor_msgs::msg::Image::SharedPtr msg) { | ||
| const rosidl::Buffer<uint8_t> & data = msg->data; | ||
| cuda_buffer_backend::ReadHandle rh = | ||
| cuda_buffer_backend::from_buffer(data, stream); | ||
|
|
||
| my_kernel<<<...>>>(rh.get_ptr(), ...); | ||
| } | ||
| ``` | ||
|
|
||
| ### `from_buffer` handle rules | ||
|
|
||
| `from_buffer` returns a **WriteHandle** when called with a non-const buffer, or a | ||
| **ReadHandle** when called with a const buffer. The overload is selected at compile | ||
| time based on const-ness of the reference: | ||
|
|
||
| ```cpp | ||
| // Write path (publisher): | ||
| cuda_buffer_backend::WriteHandle wh = cuda_buffer_backend::from_buffer(msg.data, stream); | ||
|
|
||
| // Read path (subscriber): | ||
| const rosidl::Buffer<uint8_t> & data = msg->data; | ||
| cuda_buffer_backend::ReadHandle rh = cuda_buffer_backend::from_buffer(data, stream); | ||
| ``` | ||
|
|
||
| - A **WriteHandle** can only be acquired once per buffer. Attempting to acquire | ||
| a second WriteHandle (or acquiring one after finalization) throws `CudaError`. | ||
| - To read a received buffer, always pass a **const reference**. | ||
| - If the source buffer is non-CUDA, the handle owns the promoted CUDA buffer; | ||
| call `handle.get_promoted_buffer()` to retrieve it. | ||
|
|
||
| ## IPC Behavior | ||
|
|
||
| The RMW layer calls `on_discovering_endpoint()` for each subscriber to decide between zero-copy IPC and CPU fallback: | ||
|
|
||
| | Condition | Path | | ||
| |---|---| | ||
| | Same host, same GPU, same user | Zero-copy via CUDA VMM IPC | | ||
| | Different GPU, different user, different host, or VMM unavailable | CPU fallback via `to_cpu()` | | ||
|
|
||
| The publisher's pool checks a shared-memory refcount before recycling a block, ensuring all IPC subscribers have released their handles. | ||
|
|
||
| ## License | ||
|
|
||
| Apache-2.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| cmake_minimum_required(VERSION 3.20) | ||
| project(cuda_buffer) | ||
|
|
||
| if(NOT CMAKE_CXX_STANDARD) | ||
| set(CMAKE_CXX_STANDARD 17) | ||
| endif() | ||
|
|
||
| if(CMAKE_COMPILER_IS_GNUCXX OR CMAKE_CXX_COMPILER_ID MATCHES "Clang") | ||
| add_compile_options(-Wall -Wextra -Wpedantic) | ||
| endif() | ||
|
|
||
| find_package(ament_cmake REQUIRED) | ||
| find_package(rosidl_buffer REQUIRED) | ||
| find_package(cuda_buffer_backend_msgs REQUIRED) | ||
| find_package(rmw REQUIRED) | ||
| find_package(rcutils REQUIRED) | ||
| find_package(CUDAToolkit REQUIRED) | ||
|
|
||
| add_library(${PROJECT_NAME} SHARED | ||
| src/cuda_buffer.cpp | ||
| src/cuda_buffer_ipc_manager.cpp | ||
| src/cuda_memory_pool.cpp | ||
| src/host_endpoint_manager.cpp | ||
| ) | ||
|
|
||
| target_include_directories(${PROJECT_NAME} PUBLIC | ||
| $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/include> | ||
| $<INSTALL_INTERFACE:include/${PROJECT_NAME}> | ||
| ${CUDAToolkit_INCLUDE_DIRS} | ||
| ) | ||
|
|
||
| target_link_libraries(${PROJECT_NAME} | ||
| PUBLIC | ||
| rosidl_buffer::rosidl_buffer | ||
| rmw::rmw | ||
| rcutils::rcutils | ||
| ${cuda_buffer_backend_msgs_TARGETS} | ||
| PRIVATE | ||
| CUDA::cudart | ||
| CUDA::cuda_driver | ||
| rt | ||
| ) | ||
|
|
||
| install( | ||
| DIRECTORY include/ | ||
| DESTINATION include/${PROJECT_NAME} | ||
| ) | ||
|
|
||
| install( | ||
| TARGETS ${PROJECT_NAME} | ||
| EXPORT ${PROJECT_NAME} | ||
| LIBRARY DESTINATION lib | ||
| ARCHIVE DESTINATION lib | ||
| RUNTIME DESTINATION bin | ||
| INCLUDES DESTINATION include/${PROJECT_NAME} | ||
| ) | ||
|
|
||
| ament_export_targets(${PROJECT_NAME} HAS_LIBRARY_TARGET) | ||
| ament_export_dependencies(rosidl_buffer cuda_buffer_backend_msgs rmw rcutils) | ||
| ament_export_libraries(${PROJECT_NAME}) | ||
| ament_export_include_directories(include/${PROJECT_NAME}) | ||
|
|
||
| if(BUILD_TESTING) | ||
| find_package(ament_lint_auto REQUIRED) | ||
| ament_lint_auto_find_test_dependencies() | ||
|
|
||
| find_package(ament_cmake_gtest REQUIRED) | ||
|
|
||
| ament_add_gtest(test_cuda_buffer | ||
| test/test_cuda_buffer.cpp | ||
| ) | ||
| if(TARGET test_cuda_buffer) | ||
| target_link_libraries(test_cuda_buffer | ||
| ${PROJECT_NAME} | ||
| rosidl_buffer::rosidl_buffer | ||
| CUDA::cudart | ||
| CUDA::cuda_driver | ||
| rt | ||
| ) | ||
| endif() | ||
|
|
||
| endif() | ||
|
|
||
| ament_package() | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to include this dependency in the
package.xml, the requirement it's probrablynvcc?is this key enough https://github.com/ros/rosdistro/blob/master/rosdep/base.yaml#L8367C1-L8367C12 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer -- added
nvidia-cudato the dependency list, which should provide everythingfind_package(CUDAToolkit)needs.