Skip to content

[Lake/Iceberg] Fix testLogTableCompaction flakiness by adding compaction timeout and retry logic#2872

Open
hemanthsavasere wants to merge 1 commit intoapache:mainfrom
hemanthsavasere:2867-fix-testLogTableCompaction
Open

[Lake/Iceberg] Fix testLogTableCompaction flakiness by adding compaction timeout and retry logic#2872
hemanthsavasere wants to merge 1 commit intoapache:mainfrom
hemanthsavasere:2867-fix-testLogTableCompaction

Conversation

@hemanthsavasere
Copy link
Contributor

@hemanthsavasere hemanthsavasere commented Mar 14, 2026

Linked issue: close #2867

The testLogTableCompaction test in IcebergRewriteITCase was flaky due to two root causes:

  1. Indefinite blocking: IcebergLakeWriter.complete() called compactionFuture.get() without a timeout, which could block indefinitely if the async compaction was slow or stuck.
  2. Tight assertions: After triggering a write+compaction, the test immediately called checkFileStatusInIcebergTable() without any retry, causing it to fail if the compaction commit had not yet been applied.

Brief change log

  • IcebergLakeWriter: Added a 5-minute timeout (COMPACTION_TIMEOUT_SECONDS = 300) to compactionFuture.get(). On timeout, logs a warning and cancels the future rather than hanging indefinitely.
  • IcebergRewriteITCase: Updated testLogTableCompaction to use a 2-minute assertReplicaStatus timeout (via a new overload) when waiting for the write+compaction round to complete, and switched to waitForFileStatusInIcebergTable() which retries the file-count assertion.
  • FlinkIcebergTieringTestBase: Added two helper methods:
    • assertReplicaStatus(TableBucket, long, Duration) — timeout-aware overload of the existing method.
    • waitForFileStatusInIcebergTable(TablePath, int, boolean) — wraps checkFileStatusInIcebergTable in a 2-minute retry loop.

Tests

  • IcebergRewriteITCase#testLogTableCompaction — the flaky test itself; now uses retry-based assertions and a longer timeout to tolerate compaction latency.

API and Format

No API or storage format changes.

Documentation

No new features introduced; this is a stability fix. The Helm docs change is cosmetic table reformatting only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[test] Unstable test IcebergRewriteITCase.testLogTableCompaction

1 participant