-
Notifications
You must be signed in to change notification settings - Fork 4.6k
fix(files): stream content in 1 MB chunks to prevent ConnectionResetError on large Batch API files #2985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(files): stream content in 1 MB chunks to prevent ConnectionResetError on large Batch API files #2985
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -394,7 +394,17 @@ def __init__(self, response: httpx.Response) -> None: | |
|
|
||
| @property | ||
| def content(self) -> bytes: | ||
| return self.response.content | ||
| """Return the response content, streamed in chunks to avoid ConnectionResetError on large files. | ||
|
|
||
| Streaming in 1 MB chunks prevents issues with large Batch API result files (>200 MB) | ||
| where reading the entire body at once can trigger a server-side connection reset | ||
| on long-lived HTTP connections. Fixes #2959. | ||
| """ | ||
| _CHUNK_SIZE = 1024 * 1024 # 1 MB | ||
| buf = bytearray() | ||
| for chunk in self.response.iter_bytes(chunk_size=_CHUNK_SIZE): | ||
| buf.extend(chunk) | ||
| return bytes(buf) | ||
|
Comment on lines
+404
to
+407
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In the same non-streaming path, the response body has already been buffered by the time this wrapper is created ( Useful? React with 👍 / 👎. |
||
|
|
||
| @property | ||
| def text(self) -> str: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APIResponse.parse()reads itFor the normal
client.files.content(...)flow,HttpxBinaryResponseContentis only constructed afterAPIResponse.parse()has already calledself.read()on thehttpx.Response(src/openai/_response.py:323-339). Any large-file download failure therefore happens before this property is ever reached, so moving the loop here does not change the code path that actually reads from the socket and will not fix the reportedConnectionResetErrorfor the non-streaming binary-download API.Useful? React with 👍 / 👎.