rust_test: shard by stable name hash#14
Merged
dzbarsky merged 6 commits intohermeticbuild:mainfrom Apr 16, 2026
Merged
Conversation
bolinfest
added a commit
to openai/codex
that referenced
this pull request
Apr 15, 2026
Generate separate Bazel test labels for selected large Rust test targets so BuildBuddy can report timing and flakiness per shard. Keep the original aggregate target names as test_suites over the generated shard targets. Patch the pinned rules_rust archive with the stable name-hash sharding and explicit RULES_RUST_TEST_* env support from hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit that contains it. Co-authored-by: Codex <noreply@openai.com>
bolinfest
added a commit
to openai/codex
that referenced
this pull request
Apr 15, 2026
Generate separate Bazel test labels for selected large Rust test targets so BuildBuddy can report timing and flakiness per shard. Keep the original aggregate target names as test_suites over the generated shard targets. For integration tests, compile one manual *-all-test-bin rust_test and make each shard label a lightweight wrapper around that binary. This preserves distinct BuildBuddy labels without compiling the same test crate once per shard. Patch the pinned rules_rust archive with the stable name-hash sharding, explicit RULES_RUST_TEST_* env support, and Windows manifest fallback from hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit that contains it. Co-authored-by: Codex <noreply@openai.com>
bolinfest
added a commit
to openai/codex
that referenced
this pull request
Apr 15, 2026
Generate separate Bazel test labels for selected large Rust test targets so BuildBuddy can report timing and flakiness per shard. Keep the original aggregate target names as test_suites over the generated shard targets. For integration tests, compile one manual *-all-test-bin rust_test and make each shard label a lightweight wrapper around that binary. This preserves distinct BuildBuddy labels without compiling the same test crate once per shard. Patch the pinned rules_rust archive with the stable name-hash sharding, explicit RULES_RUST_TEST_* env support, Windows manifest fallback, and Windows-safe PowerShell UInt32 masking from hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit that contains it. Co-authored-by: Codex <noreply@openai.com>
bolinfest
added a commit
to openai/codex
that referenced
this pull request
Apr 15, 2026
Generate separate Bazel test labels for selected large Rust test targets so BuildBuddy can report timing and flakiness per shard. Keep the original aggregate target names as test_suites over the generated shard targets. For integration tests, compile one manual *-all-test-bin rust_test and make each shard label a lightweight wrapper around that binary. This preserves distinct BuildBuddy labels without compiling the same test crate once per shard. Patch the pinned rules_rust archive with the stable name-hash sharding, explicit RULES_RUST_TEST_* env support, Windows manifest fallback, Windows-safe PowerShell UInt32 masking, and isolated Windows shard temp files from hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit that contains it. Co-authored-by: Codex <noreply@openai.com>
bolinfest
added a commit
to openai/codex
that referenced
this pull request
Apr 16, 2026
Generate separate Bazel test labels for selected large Rust test targets so BuildBuddy can report timing and flakiness per shard. Keep the original aggregate target names as test_suites over the generated shard targets. For integration tests, compile one manual *-all-test-bin rust_test and make each shard label a lightweight wrapper around that binary. This preserves distinct BuildBuddy labels without compiling the same test crate once per shard. Patch the pinned rules_rust archive with the stable name-hash sharding, explicit RULES_RUST_TEST_* env support, Windows manifest fallback, Windows-safe PowerShell UInt32 masking, and isolated Windows shard temp files from hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit that contains it. Co-authored-by: Codex <noreply@openai.com>
6879072 to
de22a98
Compare
Sort libtest names before execution and assign shards by a stable FNV-1a hash of each test name. This keeps existing tests in the same shard when unrelated tests are added or libtest list order changes. Co-authored-by: Codex <noreply@openai.com>
Document that the sharding wrapper uses FNV-1a and identify the offset basis and prime constants in both Unix and Windows wrappers. Co-authored-by: Codex <noreply@openai.com>
Allow generated shard targets to drive the sharding wrapper with RULES_RUST_TEST_TOTAL_SHARDS and RULES_RUST_TEST_SHARD_INDEX without conflicting with Bazel reserved TEST_* variables. Co-authored-by: Codex <noreply@openai.com>
When a downstream test rule wraps a rust_test sharding wrapper on Windows, the wrapper may execute from another test's runfiles tree. Add a manifest lookup fallback so the real test binary can still be resolved through the active Bazel runfiles manifest. Co-authored-by: Codex <noreply@openai.com>
Windows PowerShell can interpret 0xffffffff as -1, which means the FNV multiply result was not narrowed before casting back to UInt32. Use explicit UInt64 decimal constants for the FNV prime and UInt32 mask so the sharding wrapper stays within the expected 32-bit range. Co-authored-by: Codex <noreply@openai.com>
eb0d722 to
a73a336
Compare
Use Bazel's per-test TEST_TMPDIR when available and create a unique temporary directory for each Windows sharding wrapper invocation. This avoids shared %TEMP% filename collisions when many test shards run concurrently and one shard deletes another shard's libtest list file. Co-authored-by: Codex <noreply@openai.com>
a73a336 to
da81c4f
Compare
bolinfest
added a commit
to openai/codex
that referenced
this pull request
Apr 17, 2026
## Why The large Rust test suites are slow and include some of our flakiest tests, so we want to run them with Bazel native sharding while keeping shard membership stable between runs. This is the simpler follow-up to the explicit-label experiment in #17998. Since #18397 upgraded Codex to `rules_rs` `0.0.58`, which includes the stable test-name hashing support from hermeticbuild/rules_rust#14, this PR only needs to wire Codex's Bazel macros into that support. Using native sharding preserves BuildBuddy's sharded-test UI and Bazel's per-shard test action caching. Using stable name hashing avoids reshuffling every test when one test is added or removed. ## What Changed `codex_rust_crate` now accepts `test_shard_counts` and applies the right Bazel/rules_rust attributes to generated unit and integration test rules. Matched tests are also marked `flaky = True`, giving them Bazel's default three attempts. This PR shards these labels 8 ways: ```text //codex-rs/core:core-all-test //codex-rs/core:core-unit-tests //codex-rs/app-server:app-server-all-test //codex-rs/app-server:app-server-unit-tests //codex-rs/tui:tui-unit-tests ``` ## Verification `bazel query --output=build` over the selected public labels and their inner unit-test binaries confirmed the expected `shard_count = 8`, `flaky = True`, and `experimental_enable_sharding = True` attributes. Also verified that we see the shards as expected in BuildBuddy so they can be analyzed independently. Co-authored-by: Codex <noreply@openai.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The Rust test sharding wrapper was assigning tests to shards by numeric position in the
libtest --list --format terseoutput. That makes shard assignment depend on list order: changing the order, or inserting a test before existing tests, can move unrelated tests to different shards.For CI workflows that shard expensive Rust test targets, an individual test name should map to a stable shard bucket. Some downstream users also need separate Bazel test rule labels per shard so systems like BuildBuddy can report timing and flakiness by shard label rather than only by the aggregate test target.
What
TEST_TOTAL_SHARDS/TEST_SHARD_INDEXenv.RULES_RUST_TEST_TOTAL_SHARDS/RULES_RUST_TEST_SHARD_INDEXenv support for downstream macros that generate separate shard targets.UInt64constants in the Windows PowerShell FNV hash expression so the 32-bit mask cannot be interpreted as-1.TEST_TMPDIRplus a per-wrapper temp directory in the Windows sharding wrapper so parallel shards do not collide on shared%TEMP%\rust_test_list_*.txtfiles.experimental_enable_shardingdocs to describe name-hash sharding and both env modes.Examples
With native Bazel sharding:
Bazel still exposes a single test target,
//test/unit/test_sharding:sharded_integration_test; it does not generate separate rule names with shard numbers appended. At execution time, Bazel runs that target as three shard invocations by setting:For separate shard rule labels, a downstream macro can generate one compiled
rust_testbinary and wrap it with lightweight test rules that set the explicit shard env:test_binary_testis a downstream wrapper rule in this example, not a new rules_rust API. This shape gives BuildBuddy and GitHub reruns concrete labels like//codex-rs/core:core-all-test-shard-1-of-8while still compiling the Rust test crate once as//codex-rs/core:core-all-test-bin.In both modes, the wrapper lists libtest names, computes
fnv1a32(test_name) % total_shards, and runs only the tests whose bucket matches the shard index. Sorting the listed names only makes the order within each shard deterministic; the shard bucket itself depends on the test name hash, not on list position.Verification
pre-commit run --files rust/private/rust.bzl rust/private/test_sharding_wrapper.bat rust/private/test_sharding_wrapper.sh test/unit/test_sharding/fake_libtest_binary.sh test/unit/test_sharding/test_sharding.bzl test/unit/test_sharding/test_sharding_wrapper_hashes_sorted_names.shbazel test //test/unit/test_sharding:test_sharding_test_suite