-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Description
Parent Issue
Part of #124 (support partitioned table)
Depends on #126 (BinaryRow deserialization), #127 (partition path generation)
Background
Currently, TableScan and TableRead do not support partitioned tables:
TableScan::plan_snapshot()discards partition info and builds wrong bucket paths ({base_path}/bucket-{bucket}instead of{base_path}/{partition_path}/bucket-{bucket})TableRead::to_arrow()explicitly rejects partitioned tables withUnsupportederror- No integration tests for partitioned table reading
What needs to be done
1. Fix TableScan to generate correct partition paths
- Pass partition type info (from
TableSchema) intoplan_snapshot(), or change it to an instance method that can accessself.table.schema - Decode partition bytes from
ManifestEntryintoBinaryRowusingBinaryRow::from_bytes(arity, data) - Use partition path utils (from Implement partition path generation (PartitionPathUtils) #127) to compute the partition path segment
- Construct
bucket_pathas{table_path}/{partition_path}/bucket-{bucket} - Store actual decoded
BinaryRowinDataSplitinstead of emptyBinaryRow::new(0)
2. Remove partitioned table read restriction in TableRead
- Remove the partition key check in
TableRead::to_arrow()(read_builder.rs:88-95) - ArrowReader should work without changes once paths are correct
3. Add integration tests
- Prepare test fixtures for partitioned tables (single and multiple partition keys)
- Integration test: read a partitioned table end-to-end, verify DataSplits have correct
bucket_pathwith partition segments - Verify partition column values are correct in returned RecordBatches
Affected files
crates/paimon/src/table/table_scan.rs—plan_snapshot()methodcrates/paimon/src/table/read_builder.rs—TableRead::to_arrow()methodcrates/integration_tests/tests/— new test file(s)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels