feat(source-intercom): add configurable num_workers for sync throughput#74067
feat(source-intercom): add configurable num_workers for sync throughput#74067Lucas Leadbetter (lleadbet) wants to merge 6 commits intomasterfrom
Conversation
Co-Authored-By: lucas.leadbetter@airbyte.io <lucas.leadbetter@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
Co-Authored-By: lucas.leadbetter@airbyte.io <lucas.leadbetter@gmail.com>
…orkers with default=2 Co-Authored-By: lucas.leadbetter@airbyte.io <lucas.leadbetter@gmail.com>
…tion Co-Authored-By: lucas.leadbetter@airbyte.io <lucas.leadbetter@gmail.com>
|
Deploy preview for airbyte-docs ready! ✅ Preview Built with commit 470c08d. |
|
Co-Authored-By: lucas.leadbetter@airbyte.io <lucas.leadbetter@gmail.com>
|
/ai-review
Reviewing PR for connector safety and quality.
|
|
🔍 AI PR Review in progress — gathering evidence and evaluating gates. Will post the full report shortly. Session: https://app.devin.ai/sessions/bbe9a0c047b64e6d8fca244e59c13df6 |
…sive rollout Co-Authored-By: lucas.leadbetter@airbyte.io <lucas.leadbetter@gmail.com>
AI PR Review ReportReview Action: NO ACTION (INCONCLUSIVE)
📋 PR Details & EligibilityConnector & PR InfoConnector(s): Auto-Approve EligibilityEligible: No Review Action DetailsNO ACTION (INCONCLUSIVE) — The Live / E2E Tests gate is UNKNOWN because validation evidence could not be verified (MCP prod database unavailable — Cloud SQL Proxy not running). All other gates pass. Human review is recommended to verify that the concurrency changes work as expected in a live environment. No PR review submitted.
🔍 Gate Evaluation DetailsGate-by-Gate Analysis
Pre-Release Checks Failure Details (informational, excluded from CI Checks)The
These are pre-release validation issues and do not affect the CI Checks gate, but may need to be resolved before publishing. 📚 Evidence ConsultedEvidence
❓ How to RespondProviding Context or JustificationYou can add explanations that the bot will see on the next review: Option 1: PR Description (recommended) ## AI PR Review Justification
### Live / E2E Tests
[Your explanation here — e.g., why live testing is not required for this change, or evidence that it was tested]Option 2: PR Comment After adding your response, re-run Note: For the Live / E2E Tests gate, a sufficient justification explaining why validation is not required for this specific change can lead to PASS. For example: the change preserves default behavior ( |
What
Adds a user-configurable
num_workersoption to the Intercom source connector, controlling the number of concurrent worker threads. Without this, the CDK defaults to only 2 worker threads, which severely underutilizes the Intercom API rate limit (10k–50k requests/min depending on workspace) and causes syncs with large datasets to take weeks — particularly theconversation_partsstream which uses an N+1 substream pattern (one API call per conversation).Motivated by a customer sync taking >2 weeks despite having Intercom bump their rate limit to 50k/min.
How
concurrency_levelblock tomanifest.yamlthat reads from config:default_concurrency: "{{ config.get('num_workers', 2) }}"— defaults to 2 (preserving existing behavior)max_concurrency: 25num_workersoptional integer field to the connector spec (default: 2, min: 1, max: 25,airbyte_hidden: true)0.13.16-rc.3No Python code changes. The existing custom
ErrorHandlerWithRateLimiterincomponents.pyalready provides proactive per-request backoff based onX-RateLimit-Remainingheaders for most streams.Updates since last revision
default_concurrency: 10to a config-drivennum_workersfield that defaults to 2. This preserves existing behavior for all current users while allowing users with higher rate limits to opt in to higher concurrency.num_workersasairbyte_hidden: trueand moved it to the last position in the spec, so it does not appear in the standard connector UI.0.13.16-rc.3(next RC after master's0.13.16-rc.2) to comply with progressive rollout versioning requirements.Review guide
manifest.yaml— adds theconcurrency_levelblock betweenstreamsandspec, and addsnum_workers(hidden) to the spec properties as the last fieldmetadata.yaml— version bump to0.13.16-rc.3docs/integrations/sources/intercom.md— changelog entrycomponents.py(unchanged) — the customIntercomRateLimitersleeps per-thread based on shared rate limit headers. With higher concurrency, multiple threads may read stale remaining-capacity values and collectively overshoot, causing more 429s. The CDK's default retry/backoff should handle this, but it's worth considering whether the proactive rate limiter needs adjustment for higher concurrency.conversation_partsandcompaniesstreams do not use the custom rate limiter — they useDefaultErrorHandlerand rely solely on 429 retry/backoff, so they will be the most aggressive under higher concurrency.Suggested human review checklist
config.get('num_workers', 2)Jinja2 interpolation works correctly withConcurrencyLevelin the CDK (string-to-int coercion)max_concurrency: 25is appropriate given Intercom's default 10k/min rate limitairbyte_hidden: trueachieves the desired UX (hidden from standard UI, settable via API or internal tooling)User Impact
num_workersto 10–25 to significantly improve sync throughputCan this PR be safely reverted and rolled back?
Link to Devin run: https://app.devin.ai/sessions/60341a71384b4c03ab3c54a0f9ee4d2d
Requested by: Lucas Leadbetter (@lleadbet)