Skip to content

fix(cdk): handle null JSON nodes in partition split boundary deserialization (AI-Triage PR)#74093

Draft
devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
devin/1772233807-fix-cursor-pair-npe
Draft

fix(cdk): handle null JSON nodes in partition split boundary deserialization (AI-Triage PR)#74093
devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
devin/1772233807-fix-cursor-pair-npe

Conversation

@devin-ai-integration
Copy link
Contributor

What

Fixes a NullPointerException in DefaultJdbcPartitionFactory.split() that crashes cursor-based incremental syncs when sampled split boundaries contain null cursor values.

Resolves https://github.com/airbytehq/airbyte-internal-issues/issues/15906:

How

When the concurrent partitions creator samples rows to determine split boundaries, rows with null cursor column values produce JSON null nodes via cursorIncrementalCheckpoint(). These JSON null nodes are valid JsonNode objects (not Kotlin nulls), so they pass through upstream mapNotNull filters. When Jsons.treeToValue deserializes a JSON null node, Jackson returns Java null, which violates Kotlin's non-null type expectation for List<DefaultJdbcStreamStateValue>. Subsequent calls to cursorPair() on the null receiver cause an NPE.

The fix adds two defensive filters in split():

  1. .filter { !it.isNull } — removes JSON null nodes before deserialization
  2. .mapNotNull { ... } — catches any remaining nulls that Jackson may produce

The net effect is that null-valued split boundaries are silently dropped, resulting in fewer (but valid) partition splits rather than a crash.

Review guide

  1. airbyte-cdk/bulk/toolkits/extract-jdbc/src/main/kotlin/io/airbyte/cdk/read/DefaultJdbcPartitionFactory.kt — the only change, in the split() method's lazy splitPartitionBoundaries computation.

Key questions for reviewer:

  • Silent filtering vs. logging: Should we log a warning when null boundaries are dropped, to aid future debugging?
  • Upstream fix needed? The null boundaries originate from JdbcConcurrentPartitionsCreator.run() (line 263) where mapNotNull filters Kotlin nulls but not JSON null nodes. Should that also be hardened?
  • No test added: This is a defensive null-safety fix. A unit test that passes a JSON null node through split() would increase confidence. Worth adding?
  • Blast radius: This is CDK-level code used by all JDBC connectors (Oracle, Postgres, MySQL, etc.), not just Oracle. The change is small and strictly additive (filter before map), but reviewers should be aware of the scope.

User Impact

Users running cursor-based incremental syncs on JDBC sources (Oracle, Postgres, etc.) where the cursor column contains null values will no longer experience sync failures with NullPointerException. The sync will proceed with valid partition boundaries only, potentially with slightly less parallelism (fewer split partitions) but no functional difference in data correctness.

Can this PR be safely reverted and rolled back?

  • YES 💚

Reverting restores the previous behavior where null cursor values in sampled rows cause an NPE crash during partition splitting. This is the pre-existing broken behavior, so reverting is safe (returns to known state) but re-exposes the bug.


Requested by: bot_apk (apk@cognition.ai)
Devin session

…ization

Filter out JSON null nodes and null deserialization results when
splitting partition boundaries in DefaultJdbcPartitionFactory.split().

Jackson's treeToValue can return null when deserializing JSON null
nodes, violating Kotlin's non-null type safety through Java interop.
This caused a NullPointerException in cursorPair() when called on
null DefaultJdbcStreamStateValue receivers during concurrent partition
splitting for cursor-based incremental syncs.

The null boundaries originate from cursorIncrementalCheckpoint() which
produces Jsons.nullNode() when sampled cursor column values are null.

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Contributor

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • 🛠️ Quick Fixes
    • /format-fix - Fixes most formatting issues.
    • /bump-version - Bumps connector versions, scraping changelog description from the PR title.
  • ❇️ AI Testing and Review (internal link: AI-SDLC Docs):
    • /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
    • /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
    • /ai-review - AI-powered PR review for connector safety and quality gates.
  • 🚀 Connector Releases:
    • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
    • /bump-progressive-rollout-version - Bumps connector version with an RC suffix (2.16.10-rc.1) for progressive rollouts (enableProgressiveRollout: true).
      • Example: /bump-progressive-rollout-version changelog="Add new feature for progressive rollout"
  • ☕️ JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
    • /bump-bulk-cdk-version bump=patch changelog='foo' - Bump the Bulk CDK's version. bump can be major/minor/patch.
  • 🐍 Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
  • ⚙️ Admin commands:
    • /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
      Example: /force-merge reason="CI is flaky, tests pass locally"
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

Co-Authored-By: bot_apk <apk@cognition.ai>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2026

Deploy preview for airbyte-kotlin-cdk ready!

✅ Preview
https://airbyte-kotlin-mxpgtj3sw-airbyte-growth.vercel.app

Built with commit 7c0188f.
This pull request is being automatically deployed with vercel-action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants