Skip to content

fix: data sink worker improvements (CM-1054)#3996

Merged
themarolt merged 11 commits intomainfrom
fix/data-sink-worker-improvements-CM-1054
Apr 6, 2026
Merged

fix: data sink worker improvements (CM-1054)#3996
themarolt merged 11 commits intomainfrom
fix/data-sink-worker-improvements-CM-1054

Conversation

@themarolt
Copy link
Copy Markdown
Contributor

@themarolt themarolt commented Apr 3, 2026

Note

Medium Risk
Touches production cron jobs and worker error-handling, including deleting webhook rows and changing retry/delay behavior; mistakes could drop work or change processing volume. Changes are scoped and guarded by backlog thresholds and retry limits, but operate on core queue/DB flow.

Overview
Adds new prod-only cron jobs to clean up and recover stuck ingestion: incoming-webhooks-check deletes orphaned incomingWebhooks rows (no integration) and re-queues up to 10k webhooks stuck in PENDING for >1 day, while integration-results-reporting posts a daily Slack summary of integration.results states plus top error groups.

Refactors Kafka backlog counting by introducing shared getKafkaMessageCounts (handles -1 offsets) and switching existing monitoring/repair jobs to use it, with proper admin disconnects; integration-results-check is now prod-only.

Improves data-sink robustness by guarding against missing activity payloads (fail only the bad result, not the whole batch), capping identity-conflict delays to respect maxStreamRetries, and stopping automatic re-queuing of terminal ERROR integration results. Webhook processing now deletes processed webhook rows instead of marking them PROCESSED, and adds triggerWebhookProcessingBatch to enqueue webhook reprocessing with the correct priority context.

Reviewed by Cursor Bugbot for commit ff79db6. Bugbot is set up for automated code reviews on this repo. Configure here.

Copilot AI review requested due to automatic review settings April 3, 2026 07:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves robustness and operational behavior of the data sink + integration stream workers by preventing certain deterministic crashes, tightening retry/queueing behavior, and adding cron-based monitoring/reporting jobs.

Changes:

  • Stop re-queuing terminal ERROR integration results and ensure identity-conflict delays respect maxStreamRetries.
  • Prevent batch crashes when activity payloads are missing/invalid; fix identity matching to include platform in-memory.
  • Add cron jobs for integration results Slack reporting and for re-triggering stuck pending webhooks; gate existing integration-results check to prod.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
services/libs/data-access-layer/src/old/apps/integration_stream_worker/incomingWebhook.repo.ts Removes “mark processed” behavior and simplifies webhook deletion.
services/libs/data-access-layer/src/old/apps/data_sink_worker/repo/requestedForErasureMemberIdentities.repo.ts Adjusts in-memory identity matching logic to include platform.
services/libs/data-access-layer/src/old/apps/data_sink_worker/repo/dataSink.repo.ts Excludes ERROR results from “old results to process” queries with rationale.
services/apps/integration_stream_worker/src/service/integrationStreamService.ts Deletes processed webhooks instead of marking them processed.
services/apps/data_sink_worker/src/service/dataSink.service.ts Prevents unbounded retries for identity-conflict delays by respecting retry limit.
services/apps/data_sink_worker/src/service/activity.service.ts Adds guard for missing activity data to avoid crashing whole batches.
services/apps/cron_service/src/jobs/integrationResultsReporting.job.ts New Slack report job summarizing integration results and top error groups.
services/apps/cron_service/src/jobs/integrationResultsCheck.job.ts Runs the job only in prod via enabled.
services/apps/cron_service/src/jobs/incomingWebhooksCheck.job.ts New prod-only job to re-trigger stuck pending webhooks when queue backlog allows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
Signed-off-by: Uroš Marolt <uros@marolt.me>
@themarolt themarolt requested review from mbani01 and skwowet and removed request for skwowet April 3, 2026 09:31
Copy link
Copy Markdown
Contributor

@mbani01 mbani01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@themarolt data-sink-worker changes LGTM, left some comments on cron-jobs

Signed-off-by: Uroš Marolt <uros@marolt.me>
@themarolt themarolt force-pushed the fix/data-sink-worker-improvements-CM-1054 branch from b125129 to 2952e09 Compare April 6, 2026 12:43
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2952e09. Configure here.

@mbani01 mbani01 self-requested a review April 6, 2026 13:00
mbani01
mbani01 previously approved these changes Apr 6, 2026
Signed-off-by: Uroš Marolt <uros@marolt.me>
@themarolt themarolt merged commit 697b076 into main Apr 6, 2026
15 checks passed
@themarolt themarolt deleted the fix/data-sink-worker-improvements-CM-1054 branch April 6, 2026 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants