Conversation
If ClusterUtils.mkStormClusterState() threw (e.g. ZooKeeper unreachable), the exception was caught and printed to stderr via e.printStackTrace(), leaving stormClusterState null. Every subsequent blob store operation (startSyncBlobs, setupBlobstore, blobSync) would then crash with a NullPointerException rather than a meaningful error. Rethrow as RuntimeException to match the existing pattern used a few lines above for FileBlobStoreImpl initialization, and to surface the root cause at startup instead of at first use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
reiabreu
approved these changes
Mar 2, 2026
rzo1
approved these changes
Mar 3, 2026
Contributor
|
@jnioche Looks like it needs an adjustment in |
…ore.prepare() Both tests called prepare() without a reachable ZooKeeper, relying on the now-fixed silent catch to let prepare() succeed with a null stormClusterState. LocalFsBlobStoreSynchronizerTest: add InProcessZookeeper to @BeforeEach/@AfterEach and pass its port in the conf used by initLocalFs(), matching the pattern already established in LocalFsBlobStoreTest. AsyncLocalizerTest.testKeyNotFoundException: wrap the test body in a try-with-resources InProcessZookeeper and pass its port in the conf, so that prepare() can initialise cluster state successfully before getBlob() is called to exercise the KeyNotFoundException path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
Author
|
Sorry, had rushed this one a bit. About to push a fix Root cause of the test failures: Both tests were accidentally relying on the silent exception swallow. They called prepare() without any ZooKeeper in the config, which caused mkStormClusterState to throw — but the old |
rzo1
approved these changes
Mar 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In
LocalFsBlobStore.prepare(), the call toClusterUtils.mkStormClusterState()is wrapped in a try/catch that silently swallows any exception viae.printStackTrace():If this call fails — for example because ZooKeeper is unreachable at startup —
stormClusterStatestaysnullandprepare()returns successfully. The failure is only visible as a stack trace on stderr, which is easily lost in production log aggregation pipelines.This creates a time-bomb: the blob store appears to initialise correctly, but every subsequent operation that touches the cluster state will throw a
NullPointerExceptionwith no indication of the root cause:startSyncBlobs()—this.stormClusterState.blobstore(...)setupBlobstore()—state.activeKeys()(wherestate = stormClusterState)blobSync()—state.blobstore(...)The NPE stack trace points into blob store internals, making it very hard to diagnose that the real problem was a ZooKeeper connection failure that happened earlier during
prepare().Fix
Rethrow the exception as a
RuntimeExceptionso thatprepare()itself fails with a clear message and the original cause preserved:This matches the pattern already used a few lines above in the same method for
FileBlobStoreImplinitialisation, making the two failure modes consistent:Nimbus now fails fast at startup with a meaningful error rather than entering a degraded state where blob operations crash with confusing NPEs.