Skip to content

rust_test: shard by stable name hash#14

Merged
dzbarsky merged 6 commits intohermeticbuild:mainfrom
bolinfest:codex/stable-test-sharding
Apr 16, 2026
Merged

rust_test: shard by stable name hash#14
dzbarsky merged 6 commits intohermeticbuild:mainfrom
bolinfest:codex/stable-test-sharding

Conversation

@bolinfest
Copy link
Copy Markdown

@bolinfest bolinfest commented Apr 15, 2026

Why

The Rust test sharding wrapper was assigning tests to shards by numeric position in the libtest --list --format terse output. That makes shard assignment depend on list order: changing the order, or inserting a test before existing tests, can move unrelated tests to different shards.

For CI workflows that shard expensive Rust test targets, an individual test name should map to a stable shard bucket. Some downstream users also need separate Bazel test rule labels per shard so systems like BuildBuddy can report timing and flakiness by shard label rather than only by the aggregate test target.

What

  • Assigns tests to shards by a stable FNV-1a hash of the test name modulo the total shard count.
  • Sorts the discovered libtest test names before executing a shard so in-shard execution order is deterministic.
  • Keeps support for Bazel's native TEST_TOTAL_SHARDS / TEST_SHARD_INDEX env.
  • Adds explicit RULES_RUST_TEST_TOTAL_SHARDS / RULES_RUST_TEST_SHARD_INDEX env support for downstream macros that generate separate shard targets.
  • Adds a Windows manifest lookup fallback so nested launcher rules can still find the real test binary through the active Bazel runfiles manifest.
  • Uses explicit decimal UInt64 constants in the Windows PowerShell FNV hash expression so the 32-bit mask cannot be interpreted as -1.
  • Uses TEST_TMPDIR plus a per-wrapper temp directory in the Windows sharding wrapper so parallel shards do not collide on shared %TEMP%\rust_test_list_*.txt files.
  • Updates both the Unix and Windows sharding wrappers.
  • Updates the experimental_enable_sharding docs to describe name-hash sharding and both env modes.
  • Adds a focused wrapper test that verifies shard assignments are unchanged when libtest list order changes and when a new test is inserted.

Examples

With native Bazel sharding:

rust_test(
    name = "sharded_integration_test",
    srcs = ["sharded_test.rs"],
    experimental_enable_sharding = True,
    shard_count = 3,
)

Bazel still exposes a single test target, //test/unit/test_sharding:sharded_integration_test; it does not generate separate rule names with shard numbers appended. At execution time, Bazel runs that target as three shard invocations by setting:

TEST_TOTAL_SHARDS=3 TEST_SHARD_INDEX=0
TEST_TOTAL_SHARDS=3 TEST_SHARD_INDEX=1
TEST_TOTAL_SHARDS=3 TEST_SHARD_INDEX=2

For separate shard rule labels, a downstream macro can generate one compiled rust_test binary and wrap it with lightweight test rules that set the explicit shard env:

rust_test(
    name = "core-all-test-bin",
    srcs = ["tests/all.rs"],
    experimental_enable_sharding = True,
    tags = ["manual"],
)

test_binary_test(
    name = "core-all-test-shard-1-of-8",
    test_bin = ":core-all-test-bin",
    env = {
        "RULES_RUST_TEST_TOTAL_SHARDS": "8",
        "RULES_RUST_TEST_SHARD_INDEX": "0",
    },
)

test_binary_test is a downstream wrapper rule in this example, not a new rules_rust API. This shape gives BuildBuddy and GitHub reruns concrete labels like //codex-rs/core:core-all-test-shard-1-of-8 while still compiling the Rust test crate once as //codex-rs/core:core-all-test-bin.

In both modes, the wrapper lists libtest names, computes fnv1a32(test_name) % total_shards, and runs only the tests whose bucket matches the shard index. Sorting the listed names only makes the order within each shard deterministic; the shard bucket itself depends on the test name hash, not on list position.

Verification

  • pre-commit run --files rust/private/rust.bzl rust/private/test_sharding_wrapper.bat rust/private/test_sharding_wrapper.sh test/unit/test_sharding/fake_libtest_binary.sh test/unit/test_sharding/test_sharding.bzl test/unit/test_sharding/test_sharding_wrapper_hashes_sorted_names.sh
  • bazel test //test/unit/test_sharding:test_sharding_test_suite

bolinfest added a commit to openai/codex that referenced this pull request Apr 15, 2026
Generate separate Bazel test labels for selected large Rust test targets so BuildBuddy can report timing and flakiness per shard. Keep the original aggregate target names as test_suites over the generated shard targets.

Patch the pinned rules_rust archive with the stable name-hash sharding and explicit RULES_RUST_TEST_* env support from hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit that contains it.

Co-authored-by: Codex <noreply@openai.com>
@bolinfest bolinfest changed the title rust_test: shard by stable test name hash rust_test: shard by stable name hash Apr 15, 2026
bolinfest added a commit to openai/codex that referenced this pull request Apr 15, 2026
Generate separate Bazel test labels for selected large Rust test targets so BuildBuddy can report timing and flakiness per shard. Keep the original aggregate target names as test_suites over the generated shard targets.

For integration tests, compile one manual *-all-test-bin rust_test and make each shard label a lightweight wrapper around that binary. This preserves distinct BuildBuddy labels without compiling the same test crate once per shard.

Patch the pinned rules_rust archive with the stable name-hash sharding, explicit RULES_RUST_TEST_* env support, and Windows manifest fallback from hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit that contains it.

Co-authored-by: Codex <noreply@openai.com>
bolinfest added a commit to openai/codex that referenced this pull request Apr 15, 2026
Generate separate Bazel test labels for selected large Rust test targets so BuildBuddy can report timing and flakiness per shard. Keep the original aggregate target names as test_suites over the generated shard targets.

For integration tests, compile one manual *-all-test-bin rust_test and make each shard label a lightweight wrapper around that binary. This preserves distinct BuildBuddy labels without compiling the same test crate once per shard.

Patch the pinned rules_rust archive with the stable name-hash sharding, explicit RULES_RUST_TEST_* env support, Windows manifest fallback, and Windows-safe PowerShell UInt32 masking from hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit that contains it.

Co-authored-by: Codex <noreply@openai.com>
bolinfest added a commit to openai/codex that referenced this pull request Apr 15, 2026
Generate separate Bazel test labels for selected large Rust test targets so BuildBuddy can report timing and flakiness per shard. Keep the original aggregate target names as test_suites over the generated shard targets.

For integration tests, compile one manual *-all-test-bin rust_test and make each shard label a lightweight wrapper around that binary. This preserves distinct BuildBuddy labels without compiling the same test crate once per shard.

Patch the pinned rules_rust archive with the stable name-hash sharding, explicit RULES_RUST_TEST_* env support, Windows manifest fallback, Windows-safe PowerShell UInt32 masking, and isolated Windows shard temp files from hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit that contains it.

Co-authored-by: Codex <noreply@openai.com>
@bolinfest bolinfest marked this pull request as ready for review April 15, 2026 23:53
bolinfest added a commit to openai/codex that referenced this pull request Apr 16, 2026
Generate separate Bazel test labels for selected large Rust test targets so BuildBuddy can report timing and flakiness per shard. Keep the original aggregate target names as test_suites over the generated shard targets.

For integration tests, compile one manual *-all-test-bin rust_test and make each shard label a lightweight wrapper around that binary. This preserves distinct BuildBuddy labels without compiling the same test crate once per shard.

Patch the pinned rules_rust archive with the stable name-hash sharding, explicit RULES_RUST_TEST_* env support, Windows manifest fallback, Windows-safe PowerShell UInt32 masking, and isolated Windows shard temp files from hermeticbuild/rules_rust#14 until Codex can bump to a merged rules_rust commit that contains it.

Co-authored-by: Codex <noreply@openai.com>
@dzbarsky dzbarsky force-pushed the main branch 2 times, most recently from 6879072 to de22a98 Compare April 16, 2026 14:49
bolinfest and others added 5 commits April 16, 2026 08:25
Sort libtest names before execution and assign shards by a stable FNV-1a hash of each test name. This keeps existing tests in the same shard when unrelated tests are added or libtest list order changes.

Co-authored-by: Codex <noreply@openai.com>
Document that the sharding wrapper uses FNV-1a and identify the offset basis and prime constants in both Unix and Windows wrappers.

Co-authored-by: Codex <noreply@openai.com>
Allow generated shard targets to drive the sharding wrapper with RULES_RUST_TEST_TOTAL_SHARDS and RULES_RUST_TEST_SHARD_INDEX without conflicting with Bazel reserved TEST_* variables.

Co-authored-by: Codex <noreply@openai.com>
When a downstream test rule wraps a rust_test sharding wrapper on Windows, the wrapper may execute from another test's runfiles tree. Add a manifest lookup fallback so the real test binary can still be resolved through the active Bazel runfiles manifest.

Co-authored-by: Codex <noreply@openai.com>
Windows PowerShell can interpret 0xffffffff as -1, which means the FNV multiply result was not narrowed before casting back to UInt32. Use explicit UInt64 decimal constants for the FNV prime and UInt32 mask so the sharding wrapper stays within the expected 32-bit range.

Co-authored-by: Codex <noreply@openai.com>
@bolinfest bolinfest force-pushed the codex/stable-test-sharding branch from eb0d722 to a73a336 Compare April 16, 2026 15:28
Use Bazel's per-test TEST_TMPDIR when available and create a unique temporary directory for each Windows sharding wrapper invocation. This avoids shared %TEMP% filename collisions when many test shards run concurrently and one shard deletes another shard's libtest list file.

Co-authored-by: Codex <noreply@openai.com>
@bolinfest bolinfest force-pushed the codex/stable-test-sharding branch from a73a336 to da81c4f Compare April 16, 2026 15:31
@dzbarsky dzbarsky merged commit 793b5ef into hermeticbuild:main Apr 16, 2026
1 check failed
bolinfest added a commit to openai/codex that referenced this pull request Apr 17, 2026
## Why

The large Rust test suites are slow and include some of our flakiest
tests, so we want to run them with Bazel native sharding while keeping
shard membership stable between runs.

This is the simpler follow-up to the explicit-label experiment in
#17998. Since #18397 upgraded Codex to `rules_rs` `0.0.58`, which
includes the stable test-name hashing support from
hermeticbuild/rules_rust#14, this PR only needs to wire Codex's Bazel
macros into that support.

Using native sharding preserves BuildBuddy's sharded-test UI and Bazel's
per-shard test action caching. Using stable name hashing avoids
reshuffling every test when one test is added or removed.

## What Changed

`codex_rust_crate` now accepts `test_shard_counts` and applies the right
Bazel/rules_rust attributes to generated unit and integration test
rules. Matched tests are also marked `flaky = True`, giving them Bazel's
default three attempts.

This PR shards these labels 8 ways:

```text
//codex-rs/core:core-all-test
//codex-rs/core:core-unit-tests
//codex-rs/app-server:app-server-all-test
//codex-rs/app-server:app-server-unit-tests
//codex-rs/tui:tui-unit-tests
```

## Verification

`bazel query --output=build` over the selected public labels and their
inner unit-test binaries confirmed the expected `shard_count = 8`,
`flaky = True`, and `experimental_enable_sharding = True` attributes.

Also verified that we see the shards as expected in BuildBuddy so they
can be analyzed independently.

Co-authored-by: Codex <noreply@openai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants