Skip to content

fix(source-marketo): fix CSV column misalignment with CJK characters (AI-Triage PR)#74088

Draft
devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
devin/1772215378-fix-marketo-cjk-csv-parsing
Draft

fix(source-marketo): fix CSV column misalignment with CJK characters (AI-Triage PR)#74088
devin-ai-integration[bot] wants to merge 2 commits intomasterfrom
devin/1772215378-fix-marketo-cjk-csv-parsing

Conversation

@devin-ai-integration
Copy link
Contributor

What

Fixes CSV column misalignment in the Marketo connector when syncing Leads containing CJK (Chinese, Japanese, Korean) characters. The misalignment causes the primary key id to be parsed as null, which triggers BasicAirbyteMessageValidator to abort the sync.

Resolves https://github.com/airbytehq/oncall/issues/11468:

Related OSS issue: #74087

How

requests.Response.iter_lines() internally uses Python's str.splitlines() when no delimiter is specified. splitlines() splits on Unicode line separators (U+2028, U+2029) in addition to \n/\r\n. When CJK text fields contain these characters, a single CSV row is incorrectly split into multiple lines, misaligning all subsequent columns.

The fix passes delimiter="\n" to iter_lines() so it uses str.split("\n") instead of str.splitlines(), preserving Unicode line separators within field values. An additional rstrip("\r") handles \r\n line endings.

This is the same class of bug previously fixed for Java destinations in #3327.

Review guide

  1. source_marketo/source.py — the core fix (3 lines changed in parse_response). Key things to verify:
    • The if line filter drops empty strings produced by split("\n"). The previous splitlines() behavior also dropped trailing empty lines. Check whether dropping empty lines mid-stream could ever be problematic.
    • rstrip("\r") strips trailing carriage returns. Check whether this could strip legitimate \r from the last field in a CSV row (extremely unlikely, but worth considering).
  2. unit_tests/test_source.py — updated existing mock from splitlines() to split("\n") to match new behavior, and added a new test test_parse_response_with_unicode_line_separator that directly validates U+2028 in a field value doesn't cause misalignment.

User Impact

Marketo connector syncs with CJK data that previously failed with null primary key errors should now succeed. No configuration changes required.

Can this PR be safely reverted and rolled back?

  • YES 💚

Link to Devin run: https://app.devin.ai/sessions/65603bd6dc3540e9b16e6be6c4c379c2
Requested by: bot_apk (apk@cognition.ai)

…ine separator splitting

The CSV parser in source-marketo uses requests.iter_lines() which
internally calls str.splitlines(). Python's splitlines() splits on
Unicode line separators (U+2028, U+2029) in addition to \n and \r\n.

When CJK text fields contain these Unicode characters, splitlines()
incorrectly splits a single CSV row into multiple lines, causing
column misalignment. This results in the primary key 'id' field
being parsed as null, which triggers BasicAirbyteMessageValidator
to abort the sync.

Fix: Pass delimiter='\n' to iter_lines() so it uses str.split('\n')
instead of str.splitlines(), preserving Unicode line separators in
field values. Also strip \r to handle \r\n line endings.

This is the same class of bug fixed in #3327 for
Java destinations.

Resolves airbytehq/oncall#11468

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Contributor

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

PR Slash Commands

Airbyte Maintainers (that's you!) can execute the following slash commands on your PR:

  • 🛠️ Quick Fixes
    • /format-fix - Fixes most formatting issues.
    • /bump-version - Bumps connector versions, scraping changelog description from the PR title.
  • ❇️ AI Testing and Review (internal link: AI-SDLC Docs):
    • /ai-prove-fix - Runs prerelease readiness checks, including testing against customer connections.
    • /ai-canary-prerelease - Rolls out prerelease to 5-10 connections for canary testing.
    • /ai-review - AI-powered PR review for connector safety and quality gates.
  • 🚀 Connector Releases:
    • /publish-connectors-prerelease - Publishes pre-release connector builds (tagged as {version}-preview.{git-sha}) for all modified connectors in the PR.
    • /bump-progressive-rollout-version - Bumps connector version with an RC suffix (2.16.10-rc.1) for progressive rollouts (enableProgressiveRollout: true).
      • Example: /bump-progressive-rollout-version changelog="Add new feature for progressive rollout"
  • ☕️ JVM connectors:
    • /update-connector-cdk-version connector=<CONNECTOR_NAME> - Updates the specified connector to the latest CDK version.
      Example: /update-connector-cdk-version connector=destination-bigquery
    • /bump-bulk-cdk-version bump=patch changelog='foo' - Bump the Bulk CDK's version. bump can be major/minor/patch.
  • 🐍 Python connectors:
    • /poe connector source-example lock - Run the Poe lock task on the source-example connector, committing the results back to the branch.
    • /poe source example lock - Alias for /poe connector source-example lock.
    • /poe source example use-cdk-branch my/branch - Pin the source-example CDK reference to the branch name specified.
    • /poe source example use-cdk-latest - Update the source-example CDK dependency to the latest available version.
  • ⚙️ Admin commands:
    • /force-merge reason="<REASON>" - Force merges the PR using admin privileges, bypassing CI checks. Requires a reason.
      Example: /force-merge reason="CI is flaky, tests pass locally"
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

Co-Authored-By: bot_apk <apk@cognition.ai>
@github-actions
Copy link
Contributor

source-marketo Connector Test Results

74 tests   71 ✅  19s ⏱️
 2 suites   3 💤
 2 files     0 ❌

Results for commit 8b1b77d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant