Skip to content

feat!: open_reader accepts optional size to skip HEAD request#664

Merged
kylebarron merged 3 commits intodevelopmentseed:mainfrom
fvaleye:feat/open-reader-size-hint
Apr 16, 2026
Merged

feat!: open_reader accepts optional size to skip HEAD request#664
kylebarron merged 3 commits intodevelopmentseed:mainfrom
fvaleye:feat/open-reader-size-hint

Conversation

@fvaleye
Copy link
Copy Markdown
Contributor

@fvaleye fvaleye commented Apr 11, 2026

Summary

Add an optional size parameter to obstore.open_reader / open_reader_async that lets callers skip the HEAD request used to fetch the file size when the size is already known from external metadata.

create_reader currently HEADs the object solely to populate an ObjectMeta, but BufReader::with_capacity consumes only meta.location and meta.size from it. There is a network round-trip per file to fetch a single u64.

The same pattern already exists one layer up in the arrow-rs workspace: parquet::arrow::async_reader::ParquetObjectReader::with_file_size(u64) accepts an optional size hint as Option<u64>. Python-side similar examples exist in fsspec.AbstractBufferedFile(size=None) and pyarrow.dataset.FileFormat.make_fragment(file_size=None).

When size is provided, create_reader constructs the ObjectMeta directly with the hint and skips the HEAD entirely. When size is None, behavior is the same as before.

Notes

The caller is responsible for the size value being accurate.
Failure modes:

  • A hint larger than the actual file surfaces as an OSError at read time, when a range request past EOF is issued (Invalid range).
  • A hint smaller than the actual file causes silent truncation: the reader treats the hint as authoritative EOF.

@ds-release-bot ds-release-bot Bot added the feat label Apr 11, 2026
@fvaleye fvaleye force-pushed the feat/open-reader-size-hint branch 3 times, most recently from 7b38e1e to e3a6e25 Compare April 11, 2026 16:46
Copy link
Copy Markdown
Member

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I think best to no longer store meta: ObjectMeta internally.

Curious if it's worth making the breaking changes non-breaking and deprecated for a release cycle, or if it's worth just removing .meta now. What do you think?

Comment thread obstore/src/buffered.rs Outdated
Comment thread obstore/src/buffered.rs Outdated
@fvaleye fvaleye force-pushed the feat/open-reader-size-hint branch from b4a8f2a to 07a5221 Compare April 16, 2026 11:53
@fvaleye fvaleye requested a review from kylebarron April 16, 2026 11:54
Copy link
Copy Markdown
Member

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@kylebarron kylebarron merged commit 8874812 into developmentseed:main Apr 16, 2026
10 checks passed
@kylebarron kylebarron changed the title feat: open_reader accepts optional size to skip HEAD request feat!: open_reader accepts optional size to skip HEAD request Apr 22, 2026
@kylebarron
Copy link
Copy Markdown
Member

Forgot to mark this as breaking because it removes the meta param

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants