Skip to content

feat(server): add integration test for message deduplication#3099

Open
seokjin0414 wants to merge 4 commits intoapache:masterfrom
seokjin0414:2872-add-integration-test-for-message-deduplication
Open

feat(server): add integration test for message deduplication#3099
seokjin0414 wants to merge 4 commits intoapache:masterfrom
seokjin0414:2872-add-integration-test-for-message-deduplication

Conversation

@seokjin0414
Copy link
Copy Markdown
Contributor

Summary

Closes #2872

  • Add integration test for the message deduplication pipeline (7-step scenario)
  • Fix server panic when all messages in a batch are duplicates
  • Fix partition offset calculation after dedup removes mid-batch messages
  • Fix deduplicator not being created for lazily-initialized partitions

Bug fixes

Empty batch panic (messages.rs): After prepare_for_persistence() removes all duplicate messages, subsequent .unwrap() calls on first_timestamp(), last_timestamp(), last_offset() panic. Added empty batch guard.

Offset calculation (messages.rs): last_offset was computed as current_offset + count - 1, which doesn't account for offset gaps created by dedup removal. Changed to use segment.end_offset (the actual last offset from batch).

Deduplicator not created (partitions.rs): init_partition_inner() hardcoded None for the message_deduplicator parameter. Added create_message_deduplicator() call matching the bootstrap path.

Integration test scenario

Step Description Validates
1 Send 10 messages with id=0 (auto UUID) All pass through with unique IDs
2 Send 10 messages with explicit IDs 1-10 Normal dedup registration
3 Re-send IDs 1-10 with different payload Duplicates rejected, original payload preserved
4 Send all-duplicate batch No server crash, count unchanged
5 Send mixed batch (IDs 6-15) Only new IDs 11-15 accepted
6 Verify offsets Monotonically increasing after dedup
7 Wait for TTL expiry, re-send IDs 1-10 Previously seen IDs accepted again

Test plan

  • cargo fmt --all -- --check
  • cargo clippy -p server -p integration --all-targets -- -D warnings
  • cargo test -p integration --test mod -- message_deduplication (CI)

… calculation

- Add empty batch guard after prepare_for_persistence() to prevent
  server panic when all messages in a batch are duplicates
- Fix partition offset calculation to use actual last offset from batch
  instead of arithmetic that ignores gaps created by dedup removal
- Create message deduplicator for lazily-initialized partitions in
  init_partition_inner() instead of hardcoding None

Signed-off-by: shin <sars21@hanmail.net>
Add 7-step scenario testing the full deduplication pipeline:
- Auto-generated IDs (id=0) all pass through with unique UUIDs
- Explicit IDs are accepted on first send
- Duplicate IDs are rejected, original payload preserved
- All-duplicate batch does not crash server (regression for empty batch)
- Mixed batch with partial duplicates only accepts new IDs
- Offsets are monotonically increasing after dedup removal
- TTL expiry allows previously seen IDs to be accepted again

Signed-off-by: shin <sars21@hanmail.net>
Signed-off-by: shin <sars21@hanmail.net>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.74%. Comparing base (27f0f11) to head (46db1ea).

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3099      +/-   ##
============================================
+ Coverage     72.71%   72.74%   +0.02%     
  Complexity      943      943              
============================================
  Files          1117     1117              
  Lines         96285    96288       +3     
  Branches      73485    73506      +21     
============================================
+ Hits          70014    70045      +31     
+ Misses        23725    23669      -56     
- Partials       2546     2574      +28     
Components Coverage Δ
Rust Core 73.53% <100.00%> (+0.07%) ⬆️
Java SDK 62.30% <ø> (ø)
C# SDK 69.11% <ø> (-0.29%) ⬇️
Python SDK 81.43% <ø> (ø)
Node SDK 91.40% <ø> (-0.13%) ⬇️
Go SDK 38.97% <ø> (ø)
Files with missing lines Coverage Δ
core/server/src/shard/system/messages.rs 88.00% <100.00%> (+0.02%) ⬆️
core/server/src/shard/system/partitions.rs 78.51% <100.00%> (+0.07%) ⬆️
core/server/src/streaming/partitions/journal.rs 85.54% <100.00%> (+0.17%) ⬆️

... and 27 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@seokjin0414 seokjin0414 force-pushed the 2872-add-integration-test-for-message-deduplication branch 3 times, most recently from 1750fe5 to 6fdbeb7 Compare April 11, 2026 08:31
…urnal offset tracking

- When all messages in a batch are duplicates, advance partition offset
  past the assigned (but removed) offset range to prevent offset reuse
  in subsequent batches
- Fix journal current_offset to use actual last offset from batch
  instead of arithmetic that ignores gaps created by dedup removal

Signed-off-by: shin <sars21@hanmail.net>
@seokjin0414 seokjin0414 force-pushed the 2872-add-integration-test-for-message-deduplication branch from 6fdbeb7 to 46db1ea Compare April 11, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add integration test for message deduplication

1 participant