Skip to content

[fix][cp] Remove DictEncoder dtor checking in Parquet writer#435

Open
WangGuangxin wants to merge 1 commit intobytedance:mainfrom
WangGuangxin:cp_10445
Open

[fix][cp] Remove DictEncoder dtor checking in Parquet writer#435
WangGuangxin wants to merge 1 commit intobytedance:mainfrom
WangGuangxin:cp_10445

Conversation

@WangGuangxin
Copy link
Copy Markdown
Collaborator

@WangGuangxin WangGuangxin commented Mar 30, 2026

What problem does this PR solve?

Issue Number: close #191

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

Summary:
Below exception is noticed when running spark_aggregation_fuzzer_test with facebookincubator/velox#9559, which writes Velox vectors into Parquet files for Spark to read.

velox/dwio/parquet/writer/arrow/Encoding.cpp:513:  Check failed: buffered_indices_.empty()
./velox/functions/sparksql/fuzzer/spark_aggregation_fuzzer_test

This PR follows apache/arrow@02ad5ae to remove this check.

Corresponding PR: facebookincubator/velox#10445

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

  • Positive Impact: I have run benchmarks.

    Click to view Benchmark Results
    Paste your google-benchmark or TPC-H results here.
    Before: 10.5s
    After:   8.2s  (+20%)
    
  • Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

Please describe the changes in this PR

Release Note:

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

@WangGuangxin
Copy link
Copy Markdown
Collaborator Author

cc @guhaiyan0221

@WangGuangxin WangGuangxin force-pushed the cp_10445 branch 3 times, most recently from 96e8da7 to be130ac Compare April 2, 2026 11:28
Summary:
Below exception is noticed when running spark_aggregation_fuzzer_test with
facebookincubator/velox#9559, which writes Velox vectors into Parquet files for Spark to read.

```
velox/dwio/parquet/writer/arrow/Encoding.cpp:513:  Check failed: buffered_indices_.empty()
./velox/functions/sparksql/fuzzer/spark_aggregation_fuzzer_test
```
This PR follows apache/arrow@02ad5ae to remove this check.

Pull Request resolved: facebookincubator/velox#10445

Reviewed By: kagamiori

Differential Revision: D59814851

Pulled By: mbasmanova

fbshipit-source-id: a72e6f7608d8bb1763e11e57ef1e5b8daa40f9cf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants