Skip to content

feat: centralized peer registry, transport abstraction, and LEGACYSYNCING removal#565

Draft
oskarszoon wants to merge 42 commits intobsv-blockchain:mainfrom
oskarszoon:feature/legacy-peer-registry
Draft

feat: centralized peer registry, transport abstraction, and LEGACYSYNCING removal#565
oskarszoon wants to merge 42 commits intobsv-blockchain:mainfrom
oskarszoon:feature/legacy-peer-registry

Conversation

@oskarszoon
Copy link
Copy Markdown
Contributor

@oskarszoon oskarszoon commented Mar 9, 2026

Summary

  • Centralized peer registry in the blockchain service — transport-agnostic, thread-safe, in-memory store with reputation scoring, gRPC API (RegisterPeer, UpdatePeerMetrics, RemovePeer, ListPeers, GetPeer), JSON file persistence with TTL cleanup, and ban management with decay
  • Delegated wire-protocol catchup — BlockValidation delegates catchup to Legacy via a DelegateCatchup streaming gRPC. Legacy runs its existing headers-first sync pipeline and streams progress back. If Legacy is already syncing when the request arrives, it attaches to the running sync rather than starting a duplicate
  • Dashboard integration — catchup progress (blocksValidated/blocksFetched), current height, and phase are reported to the dashboard during delegated catchup. Legacy peers are prefixed with legacy: in the peer registry for easy identification
  • Peer reputation for legacy peers — each accepted block records an interaction success via onBlockAccepted callback, driving reputation above the 50.0 baseline
  • Legacy service integrationFetchHeadersFromPeer/FetchBlockFromPeer gRPC endpoints for ad-hoc fetches, DelegateCatchup streaming RPC for full sync delegation, dual-write to central registry on connect/disconnect/metrics
  • FSM simplification — removed LEGACYSYNCING state, consolidated into CATCHINGBLOCKS, simplified P2P SyncCoordinator and Legacy SyncManager delegation
  • Catchup orchestration — central registry poller in BlockValidation, per-peer exponential backoff on failure, wire peers dispatched to catchupViaLegacy, HTTP peers to existing catchup pipeline

Key Design Decisions

  • Delegated catchup over transport abstraction: Instead of BlockValidation driving wire sync block-by-block via FetchHeaders/FetchBlock RPCs (the old WireTransport), catchup is delegated to Legacy existing sync pipeline. This avoids reimplementing headers-first sync in BlockValidation and lets Legacy use its battle-tested wire protocol handling
  • Attach to existing sync: Legacy startSync often fires before BlockValidation poller, so RunDelegatedCatchup attaches to the running sync by setting delegated.active and waiting for completion — no restart needed
  • Registry lives in blockchain service (already the FSM authority, avoids new service)
  • Dual-write pattern: Legacy and P2P services register peers independently, BlockValidation queries
  • Fire-and-forget for registry updates (best-effort, system continues if registry unreachable)
  • Wire peers do not need BlockHash for catchup — Legacy manages block selection internally, so nil BlockHash is allowed for wire-protocol peers in the poller
  • FSM guarding: During delegated catchup, Legacy suppresses its own FSM transitions (startSync, handleCheckSyncPeer, blockHandler RUN events) since BlockValidation owns FSM state

Architecture

BlockValidation (poller)
  |-- HTTP peer --> catchup() [existing DataHub pipeline]
  +-- Wire peer --> catchupViaLegacy()
                      |-- acquireCatchupLock + FSM --> CATCHINGBLOCKS
                      |-- DelegateCatchup gRPC stream --> Legacy Server
                      |     +-- RunDelegatedCatchup
                      |           |-- Attach to existing sync OR start new sync
                      |           +-- handleBlockMsg hooks --> progress channel
                      +-- Read progress stream --> update dashboard counters

Test Plan

  • make test passes (unit tests including peer registry, transport, catchup, legacy registry)
  • make smoketest passes
  • Verify LEGACYSYNCING fully removed: grep -r LEGACYSYNCING returns no Go/proto hits
  • Manual: node syncs from legacy wire peer via delegated catchup
  • Manual: dashboard shows catchup progress with block count and current height
  • Manual: peer registry shows legacy:/Bitcoin SV:x.x.x/ with increasing reputation
  • Manual: central registry poller triggers catchup when peer with higher height appears
  • Manual: peer registry persists and reloads across restart

@oskarszoon oskarszoon force-pushed the feature/legacy-peer-registry branch from cd79d69 to 37eeff4 Compare March 18, 2026 12:22
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 27, 2026

Benchmark Comparison Report

Baseline: main (unknown)

Current: PR-565 (25b5883)

Summary

  • Regressions: 0
  • Improvements: 0
  • Unchanged: 151
  • Significance level: p < 0.05
All benchmark results (sec/op)
Benchmark Baseline Current Change p-value
_NewBlockFromBytes-4 1.408µ 1.396µ ~ 0.400
SplitSyncedParentMap_SetIfNotExists/256_buckets-4 61.41n 61.56n ~ 0.400
SplitSyncedParentMap_SetIfNotExists/16_buckets-4 61.50n 61.77n ~ 0.200
SplitSyncedParentMap_SetIfNotExists/1_bucket-4 61.56n 61.73n ~ 0.400
SplitSyncedParentMap_ConcurrentSetIfNotExists/256_buckets... 30.31n 30.59n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/16_buckets_... 51.18n 53.27n ~ 0.700
SplitSyncedParentMap_ConcurrentSetIfNotExists/1_bucket_pa... 108.7n 107.0n ~ 0.700
MiningCandidate_Stringify_Short-4 256.9n 261.7n ~ 0.100
MiningCandidate_Stringify_Long-4 1.843µ 1.845µ ~ 1.000
MiningSolution_Stringify-4 939.4n 945.8n ~ 0.400
BlockInfo_MarshalJSON-4 1.762µ 1.751µ ~ 0.400
NewFromBytes-4 127.9n 126.6n ~ 0.100
Mine_EasyDifficulty-4 57.86µ 59.18µ ~ 0.100
Mine_WithAddress-4 4.647µ 4.807µ ~ 0.700
BlockAssembler_AddTx-4 0.02806n 0.02925n ~ 1.000
AddNode-4 10.98 10.75 ~ 0.400
AddNodeWithMap-4 11.32 10.69 ~ 0.100
DirectSubtreeAdd/4_per_subtree-4 60.91n 61.20n ~ 1.000
DirectSubtreeAdd/64_per_subtree-4 31.61n 31.40n ~ 0.200
DirectSubtreeAdd/256_per_subtree-4 30.45n 30.47n ~ 1.000
DirectSubtreeAdd/1024_per_subtree-4 29.42n 29.32n ~ 0.300
DirectSubtreeAdd/2048_per_subtree-4 28.90n 28.88n ~ 1.000
SubtreeProcessorAdd/4_per_subtree-4 291.8n 292.0n ~ 0.800
SubtreeProcessorAdd/64_per_subtree-4 289.0n 288.5n ~ 0.700
SubtreeProcessorAdd/256_per_subtree-4 287.9n 289.3n ~ 1.000
SubtreeProcessorAdd/1024_per_subtree-4 290.1n 287.5n ~ 0.200
SubtreeProcessorAdd/2048_per_subtree-4 290.4n 286.8n ~ 0.100
SubtreeProcessorRotate/4_per_subtree-4 294.5n 295.5n ~ 1.000
SubtreeProcessorRotate/64_per_subtree-4 293.5n 290.5n ~ 0.400
SubtreeProcessorRotate/256_per_subtree-4 295.3n 293.0n ~ 0.200
SubtreeProcessorRotate/1024_per_subtree-4 294.4n 293.1n ~ 0.400
SubtreeNodeAddOnly/4_per_subtree-4 62.95n 63.07n ~ 0.400
SubtreeNodeAddOnly/64_per_subtree-4 38.76n 38.02n ~ 0.200
SubtreeNodeAddOnly/256_per_subtree-4 37.28n 36.70n ~ 0.100
SubtreeNodeAddOnly/1024_per_subtree-4 36.83n 36.25n ~ 0.400
SubtreeCreationOnly/4_per_subtree-4 139.7n 136.2n ~ 0.700
SubtreeCreationOnly/64_per_subtree-4 605.3n 589.2n ~ 0.100
SubtreeCreationOnly/256_per_subtree-4 2.094µ 2.034µ ~ 0.100
SubtreeCreationOnly/1024_per_subtree-4 7.591µ 7.446µ ~ 0.100
SubtreeCreationOnly/2048_per_subtree-4 14.24µ 14.10µ ~ 0.700
SubtreeProcessorOverheadBreakdown/64_per_subtree-4 292.8n 292.9n ~ 1.000
SubtreeProcessorOverheadBreakdown/1024_per_subtree-4 294.2n 291.6n ~ 0.400
ParallelGetAndSetIfNotExists/1k_nodes-4 938.1µ 934.4µ ~ 0.700
ParallelGetAndSetIfNotExists/10k_nodes-4 1.864m 1.852m ~ 0.700
ParallelGetAndSetIfNotExists/50k_nodes-4 8.074m 7.953m ~ 0.100
ParallelGetAndSetIfNotExists/100k_nodes-4 16.09m 15.49m ~ 0.100
SequentialGetAndSetIfNotExists/1k_nodes-4 789.7µ 746.5µ ~ 0.100
SequentialGetAndSetIfNotExists/10k_nodes-4 2.974m 2.919m ~ 0.100
SequentialGetAndSetIfNotExists/50k_nodes-4 10.88m 10.61m ~ 0.400
SequentialGetAndSetIfNotExists/100k_nodes-4 20.71m 20.43m ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/1k_nodes-4 968.7µ 992.2µ ~ 0.100
ProcessOwnBlockSubtreeNodesParallel/10k_nodes-4 4.637m 4.697m ~ 0.200
ProcessOwnBlockSubtreeNodesParallel/100k_nodes-4 18.73m 18.83m ~ 0.700
ProcessOwnBlockSubtreeNodesSequential/1k_nodes-4 802.5µ 804.7µ ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/10k_nodes-4 6.156m 6.161m ~ 1.000
ProcessOwnBlockSubtreeNodesSequential/100k_nodes-4 40.36m 39.79m ~ 0.200
DiskTxMap_SetIfNotExists-4 3.644µ 3.597µ ~ 0.700
DiskTxMap_SetIfNotExists_Parallel-4 3.425µ 3.437µ ~ 1.000
DiskTxMap_ExistenceOnly-4 326.0n 318.6n ~ 0.200
Queue-4 193.1n 195.1n ~ 0.500
AtomicPointer-4 5.044n 4.762n ~ 0.400
ReorgOptimizations/DedupFilterPipeline/Old/10K-4 857.2µ 847.7µ ~ 0.400
ReorgOptimizations/DedupFilterPipeline/New/10K-4 818.2µ 824.8µ ~ 0.700
ReorgOptimizations/AllMarkFalse/Old/10K-4 110.8µ 121.4µ ~ 0.100
ReorgOptimizations/AllMarkFalse/New/10K-4 62.50µ 61.93µ ~ 0.100
ReorgOptimizations/HashSlicePool/Old/10K-4 70.44µ 73.15µ ~ 0.700
ReorgOptimizations/HashSlicePool/New/10K-4 11.58µ 11.69µ ~ 0.200
ReorgOptimizations/NodeFlags/Old/10K-4 5.114µ 6.028µ ~ 0.700
ReorgOptimizations/NodeFlags/New/10K-4 1.785µ 2.420µ ~ 0.100
ReorgOptimizations/DedupFilterPipeline/Old/100K-4 10.78m 10.29m ~ 0.400
ReorgOptimizations/DedupFilterPipeline/New/100K-4 10.20m 10.50m ~ 0.100
ReorgOptimizations/AllMarkFalse/Old/100K-4 1.212m 1.232m ~ 0.100
ReorgOptimizations/AllMarkFalse/New/100K-4 685.8µ 686.0µ ~ 0.700
ReorgOptimizations/HashSlicePool/Old/100K-4 622.1µ 664.7µ ~ 0.400
ReorgOptimizations/HashSlicePool/New/100K-4 307.2µ 307.2µ ~ 0.700
ReorgOptimizations/NodeFlags/Old/100K-4 56.52µ 58.49µ ~ 0.100
ReorgOptimizations/NodeFlags/New/100K-4 19.50µ 20.86µ ~ 0.100
TxMapSetIfNotExists-4 52.06n 51.63n ~ 0.100
TxMapSetIfNotExistsDuplicate-4 38.02n 37.91n ~ 0.400
ChannelSendReceive-4 633.2n 618.7n ~ 0.100
CalcBlockWork-4 524.1n 505.7n ~ 1.000
CalculateWork-4 694.9n 693.3n ~ 0.700
BuildBlockLocatorString_Helpers/Size_10-4 1.521µ 1.292µ ~ 0.200
BuildBlockLocatorString_Helpers/Size_100-4 12.34µ 12.36µ ~ 0.400
BuildBlockLocatorString_Helpers/Size_1000-4 121.8µ 131.2µ ~ 0.100
CatchupWithHeaderCache-4 104.4m 104.7m ~ 0.100
_BufferPoolAllocation/16KB-4 4.734µ 3.310µ ~ 0.400
_BufferPoolAllocation/32KB-4 7.072µ 6.709µ ~ 0.200
_BufferPoolAllocation/64KB-4 14.52µ 13.73µ ~ 0.100
_BufferPoolAllocation/128KB-4 28.22µ 27.24µ ~ 0.100
_BufferPoolAllocation/512KB-4 104.89µ 84.53µ ~ 0.200
_BufferPoolConcurrent/32KB-4 17.07µ 16.78µ ~ 1.000
_BufferPoolConcurrent/64KB-4 27.00µ 26.57µ ~ 0.100
_BufferPoolConcurrent/512KB-4 146.6µ 137.4µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/16KB-4 669.5µ 603.5µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/32KB-4 651.8µ 621.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/64KB-4 653.3µ 629.2µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/128KB-4 658.5µ 631.1µ ~ 0.100
_SubtreeDeserializationWithBufferSizes/512KB-4 672.3µ 636.0µ ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/16KB-4 35.70m 34.66m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/32KB-4 35.51m 35.13m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/64KB-4 35.52m 35.04m ~ 0.100
_SubtreeDataDeserializationWithBufferSizes/128KB-4 35.28m 35.06m ~ 0.400
_SubtreeDataDeserializationWithBufferSizes/512KB-4 35.08m 35.10m ~ 0.700
_PooledVsNonPooled/Pooled-4 737.5n 733.1n ~ 0.100
_PooledVsNonPooled/NonPooled-4 7.112µ 6.923µ ~ 0.700
_MemoryFootprint/Current_512KB_32concurrent-4 6.771µ 6.679µ ~ 0.200
_MemoryFootprint/Proposed_32KB_32concurrent-4 11.025µ 9.452µ ~ 0.100
_MemoryFootprint/Alternative_64KB_32concurrent-4 9.748µ 8.819µ ~ 0.100
_prepareTxsPerLevel-4 415.9m 413.8m ~ 0.700
_prepareTxsPerLevelOrdered-4 3.751m 3.765m ~ 0.400
_prepareTxsPerLevel_Comparison/Original-4 431.2m 419.0m ~ 0.100
_prepareTxsPerLevel_Comparison/Optimized-4 3.647m 3.593m ~ 0.700
SubtreeProcessor/100_tx_64_per_subtree-4 80.53m 80.42m ~ 0.700
SubtreeProcessor/500_tx_64_per_subtree-4 382.6m 377.8m ~ 0.200
SubtreeProcessor/500_tx_256_per_subtree-4 395.2m 391.2m ~ 0.400
SubtreeProcessor/1k_tx_64_per_subtree-4 762.8m 754.4m ~ 0.200
SubtreeProcessor/1k_tx_256_per_subtree-4 783.1m 770.3m ~ 0.200
StreamingProcessorPhases/FilterValidated/100_tx-4 2.723m 2.620m ~ 0.700
StreamingProcessorPhases/ClassifyProcess/100_tx-4 236.9µ 233.7µ ~ 0.400
StreamingProcessorPhases/FilterValidated/500_tx-4 13.41m 12.93m ~ 0.100
StreamingProcessorPhases/ClassifyProcess/500_tx-4 596.6µ 588.7µ ~ 0.700
StreamingProcessorPhases/FilterValidated/1k_tx-4 26.62m 25.89m ~ 0.700
StreamingProcessorPhases/ClassifyProcess/1k_tx-4 1.060m 1.040m ~ 0.700
SubtreeSizes/10k_tx_4_per_subtree-4 1.292m 1.331m ~ 0.100
SubtreeSizes/10k_tx_16_per_subtree-4 309.9µ 324.5µ ~ 0.200
SubtreeSizes/10k_tx_64_per_subtree-4 74.67µ 76.49µ ~ 0.100
SubtreeSizes/10k_tx_256_per_subtree-4 18.48µ 19.13µ ~ 0.100
SubtreeSizes/10k_tx_512_per_subtree-4 9.169µ 9.616µ ~ 0.100
SubtreeSizes/10k_tx_1024_per_subtree-4 4.578µ 4.763µ ~ 0.100
SubtreeSizes/10k_tx_2k_per_subtree-4 2.245µ 2.355µ ~ 0.100
BlockSizeScaling/10k_tx_64_per_subtree-4 72.27µ 76.00µ ~ 0.100
BlockSizeScaling/10k_tx_256_per_subtree-4 17.98µ 19.21µ ~ 0.100
BlockSizeScaling/10k_tx_1024_per_subtree-4 4.503µ 4.712µ ~ 0.100
BlockSizeScaling/50k_tx_64_per_subtree-4 381.1µ 395.5µ ~ 0.100
BlockSizeScaling/50k_tx_256_per_subtree-4 91.19µ 95.74µ ~ 0.100
BlockSizeScaling/50k_tx_1024_per_subtree-4 22.38µ 23.60µ ~ 0.100
SubtreeAllocations/small_subtrees_exists_check-4 153.9µ 158.2µ ~ 0.100
SubtreeAllocations/small_subtrees_data_fetch-4 158.7µ 168.9µ ~ 0.100
SubtreeAllocations/small_subtrees_full_validation-4 312.1µ 327.9µ ~ 0.100
SubtreeAllocations/medium_subtrees_exists_check-4 8.882µ 9.386µ ~ 0.100
SubtreeAllocations/medium_subtrees_data_fetch-4 9.220µ 9.753µ ~ 0.100
SubtreeAllocations/medium_subtrees_full_validation-4 18.25µ 19.22µ ~ 0.100
SubtreeAllocations/large_subtrees_exists_check-4 2.132µ 2.209µ ~ 0.100
SubtreeAllocations/large_subtrees_data_fetch-4 2.252µ 2.372µ ~ 0.100
SubtreeAllocations/large_subtrees_full_validation-4 4.557µ 4.750µ ~ 0.100
GetUtxoHashes-4 256.4n 250.1n ~ 0.200
GetUtxoHashes_ManyOutputs-4 42.61µ 42.52µ ~ 1.000
_NewMetaDataFromBytes-4 237.6n 237.3n ~ 1.000
_Bytes-4 625.7n 628.6n ~ 0.400
_MetaBytes-4 571.6n 571.2n ~ 1.000

Threshold: >10% with p < 0.05 | Generated: 2026-03-30 18:57 UTC

@oskarszoon oskarszoon force-pushed the feature/legacy-peer-registry branch from 571adfd to 6a0b37c Compare March 27, 2026 13:53
@oskarszoon oskarszoon self-assigned this Mar 27, 2026
@oskarszoon oskarszoon changed the title Improve legacy vs peer syncing feat: centralized peer registry, transport abstraction, and LEGACYSYNCING removal Mar 27, 2026
@oskarszoon oskarszoon force-pushed the feature/legacy-peer-registry branch 6 times, most recently from 53e28e8 to 3ed504d Compare March 27, 2026 15:45
…CING removal

Introduce a centralized peer registry in the blockchain service that tracks
peers across both HTTP (P2P/DataHub) and wire protocol (legacy Bitcoin)
transports. This replaces the fragmented per-service peer tracking with a
single source of truth for peer information, reputation scoring, and
transport-aware catchup orchestration.

Key changes:

Centralized Peer Registry (blockchain service):
- Thread-safe in-memory registry with reputation scoring algorithm
- gRPC API: RegisterPeer, UpdatePeerMetrics, RemovePeer, ListPeers, GetPeer
- JSON file persistence with atomic writes and TTL-based cleanup
- PeerRegistryClientI interface with connection ownership tracking

Transport Abstraction (blockvalidation):
- CatchupTransport interface abstracting HTTP and wire protocol fetching
- HTTPTransport: extracts existing DataHub HTTP fetch logic
- WireTransport: delegates to Legacy service via gRPC for wire protocol
- Central registry polling for autonomous catchup orchestration

Legacy Service Integration:
- FetchHeadersFromPeer/FetchBlockFromPeer gRPC endpoints for wire protocol
- Dual-write to central registry on peer connect/disconnect/metrics
- One-shot request pattern with LoadOrStore for concurrency safety

FSM Simplification:
- Remove LEGACYSYNCING state — consolidate into CATCHINGBLOCKS
- Simplify P2P SyncCoordinator (remove Kafka catchup publishing)
- Legacy SyncManager now delegates catchup to BlockValidation
- All FSM transitions and references updated across codebase

Proto Changes:
- Add PeerRegistryService to blockchain_api.proto
- Add FetchHeadersFromPeer/FetchBlockFromPeer to legacy peer_api.proto
- Reserve removed LEGACYSYNCING/LEGACYSYNC enum values
- Regenerate all protobuf Go code
- Remove all LEGACYSYNCING/LEGACYSYNC references from miner guides,
  CLI docs, sync tutorials, dashboard docs, and protobuf docs
- Update mermaid sync state flow diagram for 3-state FSM
- Remove dead LEGACYSYNC event handling from dashboard UI
- Remove legacy sync button, state colors, and API functions from dashboard
- Fix pre-existing markdown lint issues (MD025, MD036, MD051)
- Delete obsolete fsm_legacy_sync PlantUML diagram and SVG
@oskarszoon oskarszoon force-pushed the feature/legacy-peer-registry branch from 3ed504d to b7cd309 Compare March 27, 2026 15:49
…s, sync coordinator

- Transition FSM to CATCHINGBLOCKS immediately when catchup starts (Step 1.5)
  so subtree validation stops processing network messages during sync
- Central registry poller: fast 3s initial interval for 10 attempts, then 30s
- Central registry poller: skip poll when catchup already in progress
- Central registry poller: prefer full nodes over pruned for catchup
- Propagate storage mode (full/pruned) to central registry via updateStorage
- SyncCoordinator: stop rotating peers when behind (defer to central poller)
- SyncCoordinator: don't clear sync peer in handleRunningState if already set
- Raise max accumulated headers to reach next checkpoint for quick validation
- Pass maxHeadersOverride through catchupGetBlockHeaders without mutating settings
- Add ban scoring with decay, threshold, auto-unban to CentralizedPeerRegistry
- Add BanConfig with defaults matching existing P2P BanManager
- Add gRPC RPCs: AddBanScore, IsPeerBanned, ListBannedPeers, ClearBannedPeers
- Add ReconsiderBadPeers for reputation recovery after cooldown
- Add StartBanDecay background goroutine for score decay
- Update PeerRegistryClientI with ban methods
- Update all mock implementations
- Track per-peer cooldowns after failed catchup attempts
- Exponential backoff: 30s, 60s, 120s, up to 5min max per peer
- Skip peers on cooldown, try next best peer instead
- Clear all cooldowns on successful catchup
- Remove peerRegistry, syncCoordinator, banManager, peerSelector from Server
- centralRegistry is now REQUIRED (checked in Init)
- All peer/ban/metrics ops go through centralRegistry gRPC
- Rewire all test files to use central registry mocks
- Add //go:build ignore to SyncCoordinator test files (code being removed)
- Adapt Server_test.go, server_handler_test.go, report_invalid_block_test.go
BlockValidation now reports catchup metrics (success, failure, malicious,
attempt) directly to the central registry via UpdatePeerMetrics instead
of routing through P2P service RPCs. p2pClient is retained for non-metric
operations (GetPeersForCatchup, RecordBytesDownloaded, parallel fetch).
Call peerRegistry.StartBanDecay(ctx) during blockchain service startup
so ban scores automatically decay over time (1 point/minute).
… PeerSelector

Remove files replaced by the centralized registry in blockchain service:
- peer_registry.go, peer_registry_cache.go and their tests
- sync_coordinator.go and all related tests
- BanManager.go, BanManager_test.go
- peer_selector.go, peer_selector_test.go
- peer_registry_reputation_test.go

Fix remaining references: remove MockPeerBanManager, BanReason refs,
use string reason constants, skip tests needing local registry rewrite.
Ban management tests (18 tests in peer_registry_ban_test.go):
- AddBanScore: scoring, threshold, decay, peer info sync, config lookup
- IsBannedPeer: not banned, banned, auto-unban on expiry
- ListBannedPeers: empty, returns only banned
- ClearBannedPeers: clears all, resets peer info
- ReconsiderBadPeers: old failures reset, recent failures kept, count
- decayBanScores: decay over time, zero-score cleanup, banned entry kept
- StartBanDecay: context cancellation

Catchup poller tests (20 tests in central_registry_poller_test.go):
- nextCooldownForPeer: exponential backoff 30s-5min, per-peer tracking
- selectBestPeersFromCentralRegistry: height filter, full>pruned sort, wire protocol
- pollCentralRegistry: no peers, isCatchingUp skip, cooldown skip, nil hash,
  error handling, expired cooldown retry
…s wiring

- Fix TestCatchup_FSMStateManagement: setFSMCatchingBlocks now calls
  GetFSMCurrentState to check if already in CATCHINGBLOCKS before
  transitioning. Update mock to expect RUNNING state first.
- Fix TestServerInit* tests: Init() now requires centralRegistry to be set.
  Add newPermissiveMockRegistry() helper and SetCentralPeerRegistry calls.
- Wire P2P ban settings (BanThreshold, BanDuration) from settings.conf
  to the central registry's BanConfig instead of using 24h default.
  Fixes TestPeerIDBanExpirationE2E smoketest.
…eer ID encoding

- Add GetFSMCurrentState mock to setupTestCatchupServer (both instances)
  for early Step 1.5 FSM transition in catchup
- Add centralRegistry to all 47 Server struct literals in P2P tests
- Fix TestServer_GetPeer: use peerID.String() for mock expectations
  (peer.ID("non-existent").String() != "non-existent" due to base58 encoding)
- Fix mockPeerRegistryClient ban methods to actually call m.Called()
  instead of returning hardcoded values (IsPeerBanned, AddBanScore, etc)
- Fix TestIsBannedChecksBothBanSystems: remove extra context arg from
  IsPeerBanned mock expectation
- Fix TestCatchup_FSMStateManagement: filter permissive GetFSMCurrentState
  before setting ordered .Once() expectations
- Fix TestCatchup/Empty_Catchup_Headers: add FSM mocks to standalone
  Server setup (CatchUpBlocks, GetFSMCurrentState, Run)
- Fix gci formatting in central_registry_test.go
- Add FSM mocks (CatchUpBlocks, GetFSMCurrentState, Run) to
  createServerWithEnhancedCatchup helper in catchup_test.go for
  TestCatchupIntegrationScenarios/Context_Cancellation_During_Catchup
Must-fix:
- Bounded worker pool (4 workers, chan size 256) replaces unbounded
  fire-and-forget goroutines for central registry updates in P2P
- List() now checks ban expiry via banScores instead of stale IsBanned field
- TransportType only updated when TransportTypeSet=true (fixes wire peer reset)
- blockHashToBytes returns defensive copy to avoid slice aliasing

Should-fix:
- Throttle updatePeerLastMessageTime (30s cooldown per peer)
- Remove duplicate addConnectedPeer (identical to addPeer)
- Simplify nextCooldownForPeer with bit shift math
- Document single-goroutine invariant on cooldown maps
- Add TODO for getPeerIDFromDataHubURL efficiency
- Fix waitForLegacyMockCalls race condition with atomic counter
- Remove dead shouldSkipDuringSync and its test

Nice-to-have:
- Remove redundant nil checks in handle_catchup_metrics.go
- Improve WireTransport error messages
- Rename baseURL to peerEndpoint in CatchupTransport interface
- Improve stub RPC logging (Debug -> Info for no-ops)
100 new test cases across 6 test files:

P2P handle_catchup_metrics_test.go (30 tests):
- All 12 RPC handlers tested with success, nil registry, error propagation

Blockchain peer_registry_grpc_test.go (27 tests):
- All 9 gRPC handlers, proto conversion round-trips, hash aliasing

Blockchain peer_registry_client_test.go (6 tests):
- ownsConn pattern: close with/without ownership, idempotent

Blockchain peer_registry_persistence_additional_test.go (8 tests):
- Full field round-trip, atomic rename, file permissions, TTL expiry

BlockValidation http_transport_additional_test.go (15 tests):
- FetchBlock/FetchBlocks/FetchSubtree/FetchSubtreeData coverage

BlockValidation wire_transport_additional_test.go (14 tests):
- FetchBlock/FetchHeaders, maxHeaders truncation, unsupported methods
peer_metrics_helpers_test.go (35 tests):
- All 8 helper functions: reportCatchup*, isPeerMalicious, isPeerBad,
  reportValidBlockForPeers with nil/empty/error coverage

peer_registry_client_additional_test.go (22 tests):
- Full gRPC client-server integration via real TCP listener
- All 9 client methods, full lifecycle, context cancellation

server_registry_test.go (8 tests):
- Stop saves registry, savePeerRegistryPeriodically, reload round-trip
…trict

Two issues:
- Central registry Register() never set LastMessageTime on peers
- Dashboard filtered peers with last_message_time > 1min ago, but the
  field was always 0, filtering out every peer

Fixes:
- Set LastMessageTime=now on both new and updated peers in Register()
- Relax dashboard filter: include peers with last_message_time=0
  (recently registered), extend timeout to 5 minutes
…ng 100k

The checkpoint-aware header cap only raised the limit but never lowered it.
With default of 100k and first testnet checkpoint at height 546, catchup
always downloaded 100k headers before starting block validation.

Now the cap is SET to the checkpoint-based value (not just raised), so:
- Height 0, checkpoint at 546: cap = 10,546 (~1 iteration, blocks start in ~9s)
- Height 546, checkpoint at 100k: cap = 109,454 (~11 iterations)
- Height 100k, checkpoint at 200k: cap = 110,000 (~11 iterations)

If checkpoint is further than 100k, cap is raised as before.
…s during catchup

Header optimization:
- Request only as many headers as needed to reach the accumulated cap,
  instead of always requesting maxBlockHeadersPerRequest (10,000).
  For checkpoint at height 546, first request asks for 546 headers
  instead of 10,000 — single iteration instead of two.

Quick validation fix:
- Skip checkOldBlockIDs during catchup mode (IsCatchupMode=true).
  Blocks verified by checkpoint chain are guaranteed to be on the
  correct chain. The check was failing because quickValidateBlockAsync
  partially commits blocks (AddBlock) before UTXO processing, making
  the block ID chain temporarily inconsistent when quick validation
  falls back to normal validation.
…ng catchup

Header cap: change checkpoint buffer from +10,000 to +100. For checkpoint
at height 546 from height 0, cap is now 646 (single request) instead of
10,546 (two requests with 10k first).

Pruned peer handling: BLOCK_INCOMPLETE errors (pruned node can't provide
full block data) now put the peer on 24h cooldown instead of 30s. A pruned
node will never have genesis block data, so retrying is pointless. The
poller immediately tries the next peer on its 3s tick.
Don't toggle FSM CATCHING→RUNNING→CATCHING between consecutive catchup
rounds when the node is still significantly behind the target. This avoids
unnecessary FSM state change notifications to all subscribers and prevents
subtree validation from briefly activating between rounds.

Only restore to RUNNING when within 100 blocks of the target.
When a peer returns incomplete blocks (pruned node), instead of failing
the entire catchup round and re-downloading headers from scratch, try
alternative peers from the central registry using the same validated
headers. This avoids wasting the header fetch time (~8s per checkpoint)
and keeps the node in CATCHINGBLOCKS without FSM toggling.

Flow: headers fetched once → try best peer for blocks → if pruned,
try next peer → continue until a full node is found or all peers fail.
highestCheckpointHeight was set from ALL configured checkpoints (up to
900k+), causing every block to be quick-validated regardless of which
checkpoints were actually verified in the current header batch.

Now tracks the highest checkpoint actually verified within the fetched
header range. For the first catchup round (headers 1-646, checkpoint
at 546), only blocks 1-546 get quick validation. Blocks 547+ get normal
validation until the next checkpoint is verified.
The poller switched to 30s interval after the first triggered catchup,
causing a 30s gap between successive catchup rounds even when the node
was still far behind. Now the poller checks if we're still behind after
each completed round and resets to the fast 3s interval if so.

The 30s interval is only used for the steady-state monitoring once
we've caught up with all peers.
When a catchup round targets a known checkpoint, fetch headers by walking
backwards from the checkpoint hash using /headers/{hash}?n=1000 instead
of the expensive headers_from_common_ancestor endpoint.

The /headers endpoint walks the parent chain directly (~0.4s per 1000
headers) while headers_from_common_ancestor computes a common ancestor
first (0.4-16s unpredictably). For checkpoint 546 from genesis, this
is a single 0.4s request vs 8-16s.

For blocks beyond the last checkpoint, falls back to the original
headers_from_common_ancestor method since there's no known hash
to walk backwards from.

Headers are fetched in reverse (newest→oldest) then reversed to
produce the forward-order chain needed for validation.
Catchup rounds take 30-60 seconds (headers + block validation). Reporting
this as the peer's response time gives a 36s avg which triggers the 0.6x
speed penalty in the reputation algorithm, dropping a 100% success peer
from ~80 to ~54 reputation. Pass 0 for responseTimeMs so catchup duration
doesn't skew the speed-based reputation multiplier.
- Use UTXO store height as CurrentHeight fallback (was showing 0 during
  header fetch because catchupCtx.currentHeight is only set after
  findCommonAncestor)
- Add Phase field to catchup status: "downloading_headers",
  "validating_blocks", or "finalizing"
- Dashboard shows "Downloading headers..." during header fetch phase
  instead of "0 / 0 blocks"
- Rename "Starting Height" to "Current Height" in dashboard
…try wiring

- Asset GetPeers now queries blockchain PeerRegistryService directly
  instead of routing through P2P service. Works with legacy-only mode.
- Fix legacy SetCentralPeerRegistry: store client on outer Server struct
  and apply to inner server during Start() (was nil because inner server
  doesn't exist when daemon calls SetCentralPeerRegistry before Start)
- Legacy height updates now record success interactions for reputation
- Add debug logging for legacy peer registration success/failure
…ming

Instead of BlockValidation driving wire-protocol sync block-by-block
through FetchHeaders/FetchBlock RPCs (WireTransport), catchup is now
delegated to Legacy's existing sync pipeline via a new DelegateCatchup
streaming RPC.

Legacy runs its normal headers-first sync and reports progress back to
BlockValidation, which manages FSM transitions, the catchup lock, and
dashboard progress display. If Legacy is already syncing when the
delegation request arrives, it attaches to the running sync.

Also fixes: peer reputation for legacy peers (records interaction
success on each accepted block), peer name prefix (legacy: prefix in
central registry), and BlockHash population for wire-protocol peers.
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
74.9% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant