You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using majority write (quorum success), BE does not distinguish between replicas
with continuous versions and replicas with version gaps (lastFailedVersion >= 0).
This causes inconsistency with FE's commit check, which correctly excludes
version-gap replicas from success counting.
Bad Case
Consider 3 replicas on nodes 1, 2, 3 with load_required_replica_num = 2:
First write: nodes 1,2 succeed, node 3 fails → overall success.
Node 3 now has a version gap (lastFailedVersion >= 0).
Second write: nodes 1,3 succeed, node 2 fails →
BE counts 2 successes (nodes 1,3), considers it quorum success.
FE commit only counts node 1 as success (node 3 has version gap),
so successReplicaNum = 1 < 2, commit fails.
This wastes resources since BE already returned success to the client
but FE rejects the transaction.
The correct behavior for the second write:
nodes 1,3 succeed → should FAIL (node 3 has version gap, only node 1 counts)
nodes 1,2 succeed → should SUCCEED (both have continuous versions)
Solution
Pass per-tablet version-gap backend information from FE to BE via a new thrift field map<tablet_id, list<backend_id>> tablet_version_gap_backends in TOlapTablePartition.
On the BE side, when counting successful replicas for majority write in both VTabletWriter (V1) and VTabletWriterV2, exclude version-gap backends from
the finished_tablets_replica counter. This makes BE's quorum check consistent
with FE's commit check.
Changes
Descriptors.thrift: Add tablet_version_gap_backends field to TOlapTablePartition
OlapTable.java: Add getPartitionVersionGapBackends() to compute gap backends per tablet
OlapTableSink.java: Populate the new field when building partition info
tablet_info.h/cpp: Parse and store gap backends from thrift
vtablet_writer.cpp: Exclude gap backends in _quorum_success
vtablet_writer_v2.cpp: Exclude gap backends in _quorum_success and _create_commit_info
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When using majority write (quorum success), BE does not distinguish between replicas
with continuous versions and replicas with version gaps (
lastFailedVersion >= 0).This causes inconsistency with FE's commit check, which correctly excludes
version-gap replicas from success counting.
Bad Case
Consider 3 replicas on nodes 1, 2, 3 with
load_required_replica_num = 2:Node 3 now has a version gap (
lastFailedVersion >= 0).so
successReplicaNum = 1 < 2, commit fails.but FE rejects the transaction.
The correct behavior for the second write:
Solution
Pass per-tablet version-gap backend information from FE to BE via a new thrift field
map<tablet_id, list<backend_id>> tablet_version_gap_backendsinTOlapTablePartition.On the BE side, when counting successful replicas for majority write in both
VTabletWriter(V1) andVTabletWriterV2, exclude version-gap backends fromthe
finished_tablets_replicacounter. This makes BE's quorum check consistentwith FE's commit check.
Changes
tablet_version_gap_backendsfield toTOlapTablePartitiongetPartitionVersionGapBackends()to compute gap backends per tablet_quorum_success_quorum_successand_create_commit_info