Skip to content

refactor: smol moving around db/network calls#745

Merged
Prajna1999 merged 1 commit intomainfrom
fix/collection-session-leaks
Apr 10, 2026
Merged

refactor: smol moving around db/network calls#745
Prajna1999 merged 1 commit intomainfrom
fix/collection-session-leaks

Conversation

@Prajna1999
Copy link
Copy Markdown
Collaborator

@Prajna1999 Prajna1999 commented Apr 10, 2026

Summary

Target issue is #PLEASE_TYPE_ISSUE_NUMBER
Explain the motivation for making this change. What existing problem does the pull request solve?

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

  • Refactor
    • Improved collection creation performance by restructuring document batching workflow and reducing the number of database queries required during processing.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 903e0330-c9e8-4016-a372-765acaae98de

📥 Commits

Reviewing files that changed from the base of the PR and between 0ca58bd and a0732ee.

📒 Files selected for processing (2)
  • backend/app/services/collections/create_collection.py
  • backend/app/services/collections/providers/openai.py

📝 Walkthrough

Walkthrough

Document batching logic is relocated from OpenAIProvider.create to the caller (execute_job in create_collection.py). The provider now receives pre-batched documents (list[list[Document]]) instead of a DocumentCrud instance, eliminating secondary database reads.

Changes

Cohort / File(s) Summary
Collection Creation Service
backend/app/services/collections/create_collection.py
Moves document batching responsibility to execute_job; batches documents by batch_size before provider invocation, removing the secondary DocumentCrud.read_each database query.
OpenAI Provider
backend/app/services/collections/providers/openai.py
Updates create method signature to accept pre-batched documents (docs_batches: list[list[Document]]) instead of DocumentCrud instance; removes internal batching logic and related imports.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Fix/db connection pool limit #744: Implements the same provider signature refactoring pattern, replacing DocumentCrud parameter with pre-batched documents and shifting batching responsibility to the caller.

Suggested reviewers

  • vprashrex

Poem

🐰 Batching hops from here to there,
Pre-grouped docs float through the air!
One less query, cleaner call,
The provider's load grows small. ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/collection-session-leaks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Prajna1999 Prajna1999 merged commit dbcdb20 into main Apr 10, 2026
0 of 2 checks passed
@Prajna1999 Prajna1999 deleted the fix/collection-session-leaks branch April 10, 2026 03:28
@Prajna1999 Prajna1999 restored the fix/collection-session-leaks branch April 10, 2026 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant