NV upstream release 2.12 merge#538
Open
Micky774 wants to merge 21 commits intorelease_v2.12_rocmfrom
Open
Conversation
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* update FE to 1.17 Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism flag Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism to test Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism to qa/ Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move bias/dbias/versioning/dropout logic to C API Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update qa/L0_pytorch_unittest/test.sh make .xml file specific to deterministic tests in qa/ Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism to Jax extension Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism to Jax tests Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/jax/test_fused_attn.py fix typo Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/common/fused_attn/fused_attn.cpp fix indentation Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the AI fixes Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Jax extension call Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes based on comments Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix selection logic and fwd arg Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix version check in Jax test Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix pytorch CI failures Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix Jax CI failures Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix non-/determinism logic and CI Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix formatting Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/fused_attn/fused_attn.cpp fix and/or logic Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update to 9.18.1 for requirement Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reduce Jax CI tests for determinism Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Implemented persistent nvfp4 kernel Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix FP4 guard in ptx Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * Fix Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * Fix in ptx. reduxf32 guard Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * Fixes per PR review Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes per PR review. Added parameter to turn off the persistency Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Modified reference CPU implementation in C++ unit tests to match GPU (numerical truncation). Tightened the numerical tolerance Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * Disabled persistency by default, as non-persistent kernel is more performant when inputs are large Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use the tuned kernel also for the rowwise only quantization Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * Fixed typo Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * Addressed comments from the PR review Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * Resolved conflicts Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Macros renaming Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> --------- Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
* PoC of the changes Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Early exit from the Free function for the empty tensor Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Use the proper function for nvtx range Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Only do mark_not_offload when the cpu_offloading is enabled Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * First pass on making the setattr issue not come back Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Actually add pytest.ini Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Changes to __init__ Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * A different way Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * WAR the fact that it is not possible to set __setattr__ dynamically Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Simpler solution and fixes Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix for the inference mode DPA Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Start of debugging debug tools Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * More fixes in debug Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Speculative moving the validate_name to the constructor Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Making the debug tools names saner Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Change the setattr usage in the tensor parallel group setting Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Adding try/finally - it does not seem to impact the time in observable way Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fixing lint issues and the thunder test Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix 1 of the debug tests Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Removed the warning and enforcement in the CI Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * try-finally in the context manager Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fixing the debug tests Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Fix cb.CUDAOptions usage for Triton 3.6.0 Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update utils.py Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * Update utils.py Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * Update utils.py Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> --------- Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Use correct block size for workspace in row id map creation, also shard workspace correctly based on 2nd dim of routing_map/row_id map Signed-off-by: DoubleCheeseCheetos <hanhdp99@gmail.com> * reduce size of largest test case on single_GPU scenario to fit on L40 and A100 in CI line up Signed-off-by: tdophung <hanhdp99@gmail.com> --------- Signed-off-by: DoubleCheeseCheetos <hanhdp99@gmail.com> Signed-off-by: tdophung <hanhdp99@gmail.com> Co-authored-by: DoubleCheeseCheetos <hanhdp99@gmail.com>
* Disabled the tuned NVFP4 kernels Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> * Disabled fast math in cpp tests Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com> --------- Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
* Update THD sink attention logic for newer cudnn versions THD Sink attention is supported in 9.18.0 Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update thd sink attention logic for cp>1 Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add unit test for thd + sink attention Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address comments Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * do not skip thd cp sink attention test Signed-off-by: Chen Cui <chcui@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * disable deterministic mode for sink attention Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
* SWA (left, right) with FusedAttention changes cherry-picked from NVIDIA/TransformerEngine#1369 Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix test_kv_cache failures Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * remove unnecessary comments Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * fix some more filter issues, address feedback Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * fix for local test case failures - `bottom_right_diagonal` should be calculated in `fused_attn_fwd` call as well Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * make conditions more accurate Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * add cp tests to test swa (left, right) Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dead code and make conditions better Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feedback form Charlene Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * small er Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * plumb `bottom_right_diagonal` through jax Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * plumb `bottom_right_diagonal` through jax Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing fields Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * use proper mask type in CP Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…x 404 error (#2625) * Use "nyu-mll/glue" instead of "glue" for encoder datasets to fix 404 error Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * rename mnist dataset path Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> * add dataset manifest Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>
* jjit bug fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix' Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* code drop Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add FP8 scale support and fix alignment for grouped GEMM - Add FP8 scale_inv pointer handling in nvte_grouped_gemm for proper FP8 GEMM - Fix random padding in tests to ensure 16-byte alignment for all dtypes - Reorder GroupedGemmSetupWorkspace members for natural alignment - Remove debug prints Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Grouped GEMM: code cleanup and NULL C support - Remove unused alignment parameter from GroupedGemmSetupWorkspace::from_buffers - Simplify select_grouped_operand by removing dead code branches - Add GroupedOperandSelection.tensor field to avoid passing tensor separately - Extract set_fp8_scale_pointers and init_matrix_layouts helpers - Add safety check for FP8 on Hopper column-wise fallback - Support NULL C tensor when beta=0 (uses D as placeholder) - Remove unused get_scale_inv() from test - Add use_null_c test parameter and test case - Fix documentation: alpha/beta are single element tensors only Signed-off-by: Piotr Gadzinski <pgadzinski@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Grouped GEMM: per-matrix alpha/beta support - Change alpha/beta from single values to per-matrix arrays - Validate alpha/beta have exactly num_tensors elements - Update kernel to index alpha_ptr[idx] and beta_ptr[idx] - Move alpha/beta validation to validate_grouped_gemm_inputs - Update tests to use per-matrix alpha/beta arrays - Update documentation Signed-off-by: Piotr Gadzinski <pgadzinski@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix alpha/beta numel - use SimpleTensor::numel() Signed-off-by: Piotr Gadzinski <pgadzinski@nvidia.com> Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * Refactor: move grouped GEMM to separate file and cleanup API Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * Require Blackwell (SM100) and cuBLAS 13.1+ for grouped GEMM Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/gemm/config.h Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changed Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * suggestions Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactored hopper tensor selection Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by: Piotr Gadzinski <pgadzinski@nvidia.com> Signed-off-by: Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Przemyslaw Tredak <ptrendx@gmail.com>
…02)" This reverts commit 9bb9d22.
ipanfilo
approved these changes
Apr 24, 2026
Collaborator
ipanfilo
left a comment
There was a problem hiding this comment.
It is good to go with some nits
| #ifdef __HIP_PLATFORM_AMD__ | ||
| const double atol = 0.05; | ||
| const double rtol = 0.1; | ||
| #else |
Collaborator
There was a problem hiding this comment.
this ifdef is not actually needed because comparison code check for +-0.5 (fp4 step)
| pytest.skip( | ||
| "For sm100+, bprop kernel support for dropout + determinism (bias) is not supported" | ||
| ) | ||
| if get_device_compute_capability(0) >= 100 and self.is_training and not is_hip_extension(): |
Collaborator
There was a problem hiding this comment.
nit: calling is_hip_extension() first is preferable - it is cached method pure in python contrary to get_device_compute_capabliity()
|
|
||
| # TODO(KshitijLakhani): Add a check for cuDNN version when determinism does get supported on | ||
| # sm100+ | ||
| compute_capabilities = get_all_device_compute_capability() |
Collaborator
There was a problem hiding this comment.
nit: better add get_all_device_compute_capability() if not is_hip_extension() else [] not to call enumeration on ROCm. And the following condition does not extra guard then
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR merges the changes from NV's upstream 2.12 release into our 2.12 release branch.
Fixes # (issue)
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: