Skip to content

[Fix] Sync Tablet::_cumulative_point with TabletMeta::_cumulative_layer_point#60950

Open
LemonCL wants to merge 1 commit intoapache:masterfrom
LemonCL:bugfix/cumulative-layer-point-sync
Open

[Fix] Sync Tablet::_cumulative_point with TabletMeta::_cumulative_layer_point#60950
LemonCL wants to merge 1 commit intoapache:masterfrom
LemonCL:bugfix/cumulative-layer-point-sync

Conversation

@LemonCL
Copy link
Contributor

@LemonCL LemonCL commented Mar 2, 2026

Proposed changes

This PR fixes a synchronization issue between Tablet::_cumulative_point (in-memory) and TabletMeta::_cumulative_layer_point (persistent in RocksDB). Although there is an auto-recovery mechanism (calculate_cumulative_point()), it creates a problem window period after every BE restart, causing compaction system to be completely disabled until the first compaction is triggered.

Problem description

Real Production Evidence

Production environment shows this exact issue (tablet_id=4676420):

Runtime value (correct):

$ curl http://10.92.104.1:8040/api/compaction/show?tablet_id=4676420
{
    "cumulative point": 57,  # ← Runtime value is correct
    "last cumulative success time": "2026-02-27 07:11:20.758",
    "rowsets": [
        "[0-56] 1 DATA NONOVERLAPPING ... 56.33 KB"
    ]
}

Persistent value (wrong):

$ curl http://10.92.104.1:8040/api/meta/header/4676420
{
    "tablet_id": 4676420,
    "cumulative_layer_point": -1,  # ← Persistent value is -1 (wrong!)
    "tablet_state": "PB_RUNNING"
}

Root Cause

There are two separate variables tracking the cumulative point:

  • Tablet::_cumulative_point - Runtime value, updated by compaction
  • TabletMeta::_cumulative_layer_point - Persistent value, stored in RocksDB

The core issue: save_meta() never syncs _cumulative_point to _cumulative_layer_point, causing persistent value to always remain at -1.

Data flow:

1. Compaction updates _cumulative_point = 57 ✅
2. save_meta() is called
3. ❌ Missing: _tablet_meta->set_cumulative_layer_point(_cumulative_point)
4. TabletMeta saves to RocksDB with cumulative_layer_point = -1 ❌
5. BE restart loads cumulative_layer_point = -1 from RocksDB
6. Tablet constructor initializes _cumulative_point = -1 ❌

What changes were proposed in this pull request?

This PR adds bidirectional synchronization to eliminate the problem window:

1. Save Path - Sync to TabletMeta before persistence

File: be/src/olap/tablet.cpp:341

void Tablet::save_meta() {
    check_table_size_correctness();
    // NEW: Sync in-memory value to TabletMeta before persisting
    _tablet_meta->set_cumulative_layer_point(_cumulative_point);
    auto res = _tablet_meta->save_meta(_data_dir);
    // ...
}

Effect: Ensures cumulative_layer_point is always synchronized to RocksDB.

2. Load Path - Load from TabletMeta on construction

File: be/src/olap/tablet.cpp:260

// Before: hardcoded to -1
_cumulative_point(K_INVALID_CUMULATIVE_POINT),

// After: load from TabletMeta (which was deserialized from RocksDB)
_cumulative_point(_tablet_meta->cumulative_layer_point()),

Effect: Correctly restores cumulative_point from RocksDB on BE restart.

Before vs After

Before This Fix

Compaction completes:
  → _cumulative_point = 57 (runtime)
  → cumulative_layer_point = -1 (persistent) ❌

BE restart:
  → Load from RocksDB: cumulative_layer_point = -1
  → _cumulative_point = -1
  → Problem window: compaction disabled for minutes/hours ⚠️
  → Auto-recovery: calculate_cumulative_point() recalculates
  → _cumulative_point = 57 (runtime only, not persisted) ⚠️

Next BE restart:
  → Same problem repeats (cyclic issue) ❌

After This Fix

Compaction completes:
  → _cumulative_point = 57 (runtime)
  → save_meta() syncs to TabletMeta
  → cumulative_layer_point = 57 (persistent) ✅

BE restart:
  → Load from RocksDB: cumulative_layer_point = 57
  → _cumulative_point = 57 ✅
  → No problem window: compaction immediately available ✅
  → No need for auto-recovery ✅

Next BE restart:
  → Works correctly (problem eliminated) ✅

Verification

After this fix, for tablet_id=4676420:

  • Runtime value: cumulative_point = 57
  • Persistent value: cumulative_layer_point = 57 ✅ (synced!)
  • After BE restart: cumulative_point = 57 ✅ (immediately restored)
  • No problem window period

Benefits

  1. Eliminate problem window: Compaction works immediately after restart
  2. Eliminate cyclic issue: Fix persists across all restarts
  3. Improve query performance: No performance degradation window
  4. Reduce operational cost: No need to monitor or manually trigger compaction
  5. Support clone/restore: Correctly transfer cumulative point across replicas

Types of changes

  • Bugfix (non-breaking change which fixes an issue)

Checklist

  • I have read the CLA Document and made sure all commits are signed-off.
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

…er_point

## Problem Description

There is a critical synchronization issue between the in-memory cumulative point
and its persistent storage:

- Tablet::_cumulative_point (in-memory, runtime value)
- TabletMeta::_cumulative_layer_point (persistent, stored in RocksDB)

The root cause is that these two values were never properly synchronized:

1. After compaction updates _cumulative_point, it was never written to _cumulative_layer_point
2. On BE restart, Tablet constructor hardcoded _cumulative_point to -1, ignoring the
   value loaded from RocksDB
3. This caused cumulative compaction to restart from scratch after every BE restart

### Real Production Evidence

Production environment shows this exact issue (tablet_id=4676420):

**Runtime value (correct)**:
```bash
$ curl http://10.92.104.1:8040/api/compaction/show?tablet_id=4676420
{
    "cumulative point": 57,  # ← Runtime value is correct
    "last cumulative success time": "2026-02-27 07:11:20.758",
    "rowsets": [
        "[0-56] 1 DATA NONOVERLAPPING ... 56.33 KB"
    ]
}
```

**Persistent value (wrong)**:
```bash
$ curl http://10.92.104.1:8040/api/meta/header/4676420
{
    "tablet_id": 4676420,
    "cumulative_layer_point": -1,  # ← Persistent value is -1 (wrong!)
    "tablet_state": "PB_RUNNING"
}
```

**Impact**: After BE restart, cumulative point will reset from 57 to -1, losing all
compaction progress and requiring re-compaction of 57 rowsets.

## Data Flow Analysis

### BE Startup/Restart Flow:
```
DataDir::load()
  → TabletMetaManager::traverse_headers() [iterate RocksDB]
    → TabletManager::load_tablet_from_meta(meta_binary from RocksDB)
      → TabletMeta::deserialize(meta_binary)
        → TabletMeta::init_from_pb()
          → _cumulative_layer_point = tablet_meta_pb.cumulative_layer_point()
      → std::make_shared<Tablet>(_engine, tablet_meta, data_dir)
        → Tablet::Tablet() constructor
```

**Before this fix**: Constructor hardcoded _cumulative_point = -1, losing RocksDB value
**After this fix**: Constructor loads _cumulative_point from TabletMeta

### Compaction Update Flow:
```
CumulativeCompaction::execute_compact()
  → update_cumulative_point()
    → Tablet::set_cumulative_layer_point(new_point)  [updates _cumulative_point]
  → Tablet::save_meta()
```

**Before this fix**: save_meta() only saved TabletMeta without syncing _cumulative_point
**After this fix**: save_meta() syncs _cumulative_point to TabletMeta before persisting

## Solution

This commit adds bidirectional synchronization:

1. **Load path** (tablet.cpp:260):
   ```cpp
   _cumulative_point(_tablet_meta->cumulative_layer_point())
   ```
   Initialize from TabletMeta on construction (BE restart/clone/restore)

2. **Save path** (tablet.cpp:341):
   ```cpp
   _tablet_meta->set_cumulative_layer_point(_cumulative_point);
   ```
   Sync to TabletMeta before persisting to RocksDB

## Impact

### Fixed Scenarios:
- ✅ BE restart: Cumulative point persists across restarts
- ✅ Clone: Target replica inherits correct cumulative point
- ✅ Restore: Restored tablet keeps original cumulative point

### New Tablet Creation:
- Still correctly initializes to -1 (TabletMeta constructor sets it to -1)

## Verification

After this fix, for tablet_id=4676420:
- Runtime value: cumulative_point = 57
- Persistent value: cumulative_layer_point = 57 (synced!)
- After BE restart: cumulative_point = 57 (restored from RocksDB)

## Related Issue

This also resolves the existing TODO comment:
```cpp
// TODO(ygl): lost some information here, such as cumulative layer point
// engine_storage_migration_task.cpp:348
```
@LemonCL LemonCL force-pushed the bugfix/cumulative-layer-point-sync branch from 96c4efa to 3c75e1a Compare March 2, 2026 08:20
@LemonCL LemonCL changed the title [Fix](Compaction) Sync Tablet cumulative_point to TabletMeta before [Fix] Sync Tablet::_cumulative_point with TabletMeta::_cumulative_layer_point Mar 2, 2026
Copy link
Contributor

@lide-reed lide-reed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 2, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants