fix(cdk): handle null JSON nodes in partition split boundary deserialization (AI-Triage PR)#74093
Draft
devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
Draft
fix(cdk): handle null JSON nodes in partition split boundary deserialization (AI-Triage PR)#74093devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
Conversation
…ization Filter out JSON null nodes and null deserialization results when splitting partition boundaries in DefaultJdbcPartitionFactory.split(). Jackson's treeToValue can return null when deserializing JSON null nodes, violating Kotlin's non-null type safety through Java interop. This caused a NullPointerException in cursorPair() when called on null DefaultJdbcStreamStateValue receivers during concurrent partition splitting for cursor-based incremental syncs. The null boundaries originate from cursorIncrementalCheckpoint() which produces Jsons.nullNode() when sampled cursor column values are null. Co-Authored-By: bot_apk <apk@cognition.ai>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Contributor
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
Co-Authored-By: bot_apk <apk@cognition.ai>
Contributor
|
Deploy preview for airbyte-kotlin-cdk ready! ✅ Preview Built with commit 7c0188f. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Fixes a
NullPointerExceptioninDefaultJdbcPartitionFactory.split()that crashes cursor-based incremental syncs when sampled split boundaries contain null cursor values.Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/15906:
How
When the concurrent partitions creator samples rows to determine split boundaries, rows with null cursor column values produce JSON null nodes via
cursorIncrementalCheckpoint(). These JSON null nodes are validJsonNodeobjects (not Kotlin nulls), so they pass through upstreammapNotNullfilters. WhenJsons.treeToValuedeserializes a JSON null node, Jackson returns Javanull, which violates Kotlin's non-null type expectation forList<DefaultJdbcStreamStateValue>. Subsequent calls tocursorPair()on the null receiver cause an NPE.The fix adds two defensive filters in
split():.filter { !it.isNull }— removes JSON null nodes before deserialization.mapNotNull { ... }— catches any remaining nulls that Jackson may produceThe net effect is that null-valued split boundaries are silently dropped, resulting in fewer (but valid) partition splits rather than a crash.
Review guide
airbyte-cdk/bulk/toolkits/extract-jdbc/src/main/kotlin/io/airbyte/cdk/read/DefaultJdbcPartitionFactory.kt— the only change, in thesplit()method's lazysplitPartitionBoundariescomputation.Key questions for reviewer:
JdbcConcurrentPartitionsCreator.run()(line 263) wheremapNotNullfilters Kotlin nulls but not JSON null nodes. Should that also be hardened?split()would increase confidence. Worth adding?User Impact
Users running cursor-based incremental syncs on JDBC sources (Oracle, Postgres, etc.) where the cursor column contains null values will no longer experience sync failures with
NullPointerException. The sync will proceed with valid partition boundaries only, potentially with slightly less parallelism (fewer split partitions) but no functional difference in data correctness.Can this PR be safely reverted and rolled back?
Reverting restores the previous behavior where null cursor values in sampled rows cause an NPE crash during partition splitting. This is the pre-existing broken behavior, so reverting is safe (returns to known state) but re-exposes the bug.
Requested by: bot_apk (apk@cognition.ai)
Devin session