[Fix] Sync Tablet::_cumulative_point with TabletMeta::_cumulative_layer_point#60950
Open
LemonCL wants to merge 1 commit intoapache:masterfrom
Open
[Fix] Sync Tablet::_cumulative_point with TabletMeta::_cumulative_layer_point#60950LemonCL wants to merge 1 commit intoapache:masterfrom
LemonCL wants to merge 1 commit intoapache:masterfrom
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
…er_point ## Problem Description There is a critical synchronization issue between the in-memory cumulative point and its persistent storage: - Tablet::_cumulative_point (in-memory, runtime value) - TabletMeta::_cumulative_layer_point (persistent, stored in RocksDB) The root cause is that these two values were never properly synchronized: 1. After compaction updates _cumulative_point, it was never written to _cumulative_layer_point 2. On BE restart, Tablet constructor hardcoded _cumulative_point to -1, ignoring the value loaded from RocksDB 3. This caused cumulative compaction to restart from scratch after every BE restart ### Real Production Evidence Production environment shows this exact issue (tablet_id=4676420): **Runtime value (correct)**: ```bash $ curl http://10.92.104.1:8040/api/compaction/show?tablet_id=4676420 { "cumulative point": 57, # ← Runtime value is correct "last cumulative success time": "2026-02-27 07:11:20.758", "rowsets": [ "[0-56] 1 DATA NONOVERLAPPING ... 56.33 KB" ] } ``` **Persistent value (wrong)**: ```bash $ curl http://10.92.104.1:8040/api/meta/header/4676420 { "tablet_id": 4676420, "cumulative_layer_point": -1, # ← Persistent value is -1 (wrong!) "tablet_state": "PB_RUNNING" } ``` **Impact**: After BE restart, cumulative point will reset from 57 to -1, losing all compaction progress and requiring re-compaction of 57 rowsets. ## Data Flow Analysis ### BE Startup/Restart Flow: ``` DataDir::load() → TabletMetaManager::traverse_headers() [iterate RocksDB] → TabletManager::load_tablet_from_meta(meta_binary from RocksDB) → TabletMeta::deserialize(meta_binary) → TabletMeta::init_from_pb() → _cumulative_layer_point = tablet_meta_pb.cumulative_layer_point() → std::make_shared<Tablet>(_engine, tablet_meta, data_dir) → Tablet::Tablet() constructor ``` **Before this fix**: Constructor hardcoded _cumulative_point = -1, losing RocksDB value **After this fix**: Constructor loads _cumulative_point from TabletMeta ### Compaction Update Flow: ``` CumulativeCompaction::execute_compact() → update_cumulative_point() → Tablet::set_cumulative_layer_point(new_point) [updates _cumulative_point] → Tablet::save_meta() ``` **Before this fix**: save_meta() only saved TabletMeta without syncing _cumulative_point **After this fix**: save_meta() syncs _cumulative_point to TabletMeta before persisting ## Solution This commit adds bidirectional synchronization: 1. **Load path** (tablet.cpp:260): ```cpp _cumulative_point(_tablet_meta->cumulative_layer_point()) ``` Initialize from TabletMeta on construction (BE restart/clone/restore) 2. **Save path** (tablet.cpp:341): ```cpp _tablet_meta->set_cumulative_layer_point(_cumulative_point); ``` Sync to TabletMeta before persisting to RocksDB ## Impact ### Fixed Scenarios: - ✅ BE restart: Cumulative point persists across restarts - ✅ Clone: Target replica inherits correct cumulative point - ✅ Restore: Restored tablet keeps original cumulative point ### New Tablet Creation: - Still correctly initializes to -1 (TabletMeta constructor sets it to -1) ## Verification After this fix, for tablet_id=4676420: - Runtime value: cumulative_point = 57 - Persistent value: cumulative_layer_point = 57 (synced!) - After BE restart: cumulative_point = 57 (restored from RocksDB) ## Related Issue This also resolves the existing TODO comment: ```cpp // TODO(ygl): lost some information here, such as cumulative layer point // engine_storage_migration_task.cpp:348 ```
96c4efa to
3c75e1a
Compare
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proposed changes
This PR fixes a synchronization issue between
Tablet::_cumulative_point(in-memory) andTabletMeta::_cumulative_layer_point(persistent in RocksDB). Although there is an auto-recovery mechanism (calculate_cumulative_point()), it creates a problem window period after every BE restart, causing compaction system to be completely disabled until the first compaction is triggered.Problem description
Real Production Evidence
Production environment shows this exact issue (tablet_id=4676420):
Runtime value (correct):
Persistent value (wrong):
$ curl http://10.92.104.1:8040/api/meta/header/4676420 { "tablet_id": 4676420, "cumulative_layer_point": -1, # ← Persistent value is -1 (wrong!) "tablet_state": "PB_RUNNING" }Root Cause
There are two separate variables tracking the cumulative point:
Tablet::_cumulative_point- Runtime value, updated by compactionTabletMeta::_cumulative_layer_point- Persistent value, stored in RocksDBThe core issue:
save_meta()never syncs_cumulative_pointto_cumulative_layer_point, causing persistent value to always remain at -1.Data flow:
What changes were proposed in this pull request?
This PR adds bidirectional synchronization to eliminate the problem window:
1. Save Path - Sync to TabletMeta before persistence
File:
be/src/olap/tablet.cpp:341Effect: Ensures cumulative_layer_point is always synchronized to RocksDB.
2. Load Path - Load from TabletMeta on construction
File:
be/src/olap/tablet.cpp:260Effect: Correctly restores cumulative_point from RocksDB on BE restart.
Before vs After
Before This Fix
After This Fix
Verification
After this fix, for tablet_id=4676420:
cumulative_point = 57cumulative_layer_point = 57✅ (synced!)cumulative_point = 57✅ (immediately restored)Benefits
Types of changes
Checklist