Skip to content

Support partitioned table in TableScan and TableRead #131

@luoyuxia

Description

@luoyuxia

Parent Issue

Part of #124 (support partitioned table)
Depends on #126 (BinaryRow deserialization), #127 (partition path generation)

Background

Currently, TableScan and TableRead do not support partitioned tables:

  1. TableScan::plan_snapshot() discards partition info and builds wrong bucket paths ({base_path}/bucket-{bucket} instead of {base_path}/{partition_path}/bucket-{bucket})
  2. TableRead::to_arrow() explicitly rejects partitioned tables with Unsupported error
  3. No integration tests for partitioned table reading

What needs to be done

1. Fix TableScan to generate correct partition paths

  • Pass partition type info (from TableSchema) into plan_snapshot(), or change it to an instance method that can access self.table.schema
  • Decode partition bytes from ManifestEntry into BinaryRow using BinaryRow::from_bytes(arity, data)
  • Use partition path utils (from Implement partition path generation (PartitionPathUtils) #127) to compute the partition path segment
  • Construct bucket_path as {table_path}/{partition_path}/bucket-{bucket}
  • Store actual decoded BinaryRow in DataSplit instead of empty BinaryRow::new(0)

2. Remove partitioned table read restriction in TableRead

  • Remove the partition key check in TableRead::to_arrow() (read_builder.rs:88-95)
  • ArrowReader should work without changes once paths are correct

3. Add integration tests

  • Prepare test fixtures for partitioned tables (single and multiple partition keys)
  • Integration test: read a partitioned table end-to-end, verify DataSplits have correct bucket_path with partition segments
  • Verify partition column values are correct in returned RecordBatches

Affected files

  • crates/paimon/src/table/table_scan.rsplan_snapshot() method
  • crates/paimon/src/table/read_builder.rsTableRead::to_arrow() method
  • crates/integration_tests/tests/ — new test file(s)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions