[improvement](iceberg) Improve VIcebergSortWriter code quality#60978
[improvement](iceberg) Improve VIcebergSortWriter code quality#60978morningman wants to merge 4 commits intoapache:masterfrom
Conversation
Based on PR apache#60540 review suggestions: 6.1: Encapsulate _current_writer in VIcebergTableWriter - Move _current_writer from public to private - Add const getter method current_writer() - Update SpillIcebergTableSinkLocalState to use getter 6.2: Split VIcebergSortWriter header-only into .h + .cpp - Move all method implementations to viceberg_sort_writer.cpp - Replace heavy includes (spill_stream.h, spill_stream_manager.h) with forward declarations in header to reduce compilation dependencies - Add comprehensive documentation comments to all methods and members 6.4: Fix error handling in VIcebergSortWriter::close() - Track internal_status for intermediate operations - Pass actual error status to underlying partition writer's close() instead of the original caller status (which could be OK) 6.5: Add explicit do_sort() in _do_spill() - Call _sorter->do_sort() before prepare_for_read() for clarity - Add detailed comments explaining the relationship with FullSorter::prepare_for_read() internal sorting 6.6: Fix typo 'SpillReadDerializeBlockTime' -> 'SpillReadDeserializeBlockTime' - Fix counter name string in spill_iceberg_table_sink_operator.cpp - Fix counter name string in operator.h (PipelineXSpillLocalState) - Fix counter lookup string in spill_reader.h - Fix member variable name _spill_read_derialize_block_timer
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
Code Review Summary for PR #60978
Overall Assessment
This PR is a clean refactoring that splits a header-only implementation into .h + .cpp, encapsulates a public field, and fixes a typo. The code quality is improved overall. However, there are a few issues to address.
Critical Checkpoint Conclusions
1. Goal and test coverage: The PR achieves its stated goals (encapsulation, header split, typo fix, close() error handling). No new tests are added, which is acceptable for a refactor, however the typo fix is incomplete—test files still reference the old misspelled counter name, which will cause counter name mismatches at runtime during tests.
2. Is the modification minimal and focused? Yes. The changes are well-scoped.
3. Concurrency: No new concurrency concerns introduced. The refactoring preserves existing concurrency patterns.
4. Lifecycle management: No new lifecycle issues. The Defer pattern in close() is reasonable but has a subtle error-swallowing concern (see inline comment).
5. Configuration items: N/A.
6. Incompatible changes: The counter name rename (SpillReadDerializeBlockTime -> SpillReadDeserializeBlockTime) is technically a metrics name change visible in profiles. Not a compatibility concern for storage/protocol, but worth noting.
7. Parallel code paths: The typo fix must be applied to all paths that reference the counter name. Two test files are missed (see inline comment).
8. compile_check: New .cpp file is missing compile_check_begin.h/compile_check_end.h.
9. Performance: The explicit do_sort() before prepare_for_read(true) is redundant but harmless (no double-sort occurs due to internal guard). A C-style narrowing cast should use the already-existing safe helper method.
10. Other issues: One C-style cast (int)_spill_block_batch_row_count should use the _get_spill_batch_size() method that was specifically created for this purpose.
Issues Found
| # | Severity | File | Description |
|---|---|---|---|
| 1 | Medium | viceberg_sort_writer.cpp:234 |
C-style cast (int) instead of using _get_spill_batch_size() |
| 2 | Medium | viceberg_sort_writer.cpp (close) |
Defer lambda swallows _iceberg_partition_writer->close() error |
| 3 | Medium | test files | Typo fix incomplete: 2 test files still use old counter name |
| 4 | Low | viceberg_sort_writer.cpp |
Missing compile_check_begin.h/compile_check_end.h |
1. Replace C-style cast (int)_spill_block_batch_row_count with the safe _get_spill_batch_size() helper that includes bounds checking, preventing potential narrowing conversion issues from size_t to int. 2. Propagate _iceberg_partition_writer->close() error from Defer block. Previously the close error was logged but silently discarded, which could mask data loss (caller receives OK but file was not properly closed). Now the error is captured in close_status and returned to the caller. 3. Add compile_check_begin.h/compile_check_end.h includes per codebase convention. This enables -Wconversion as error for the new .cpp file. 4. Fix typo in test files that still used old misspelled counter name 'SpillReadDerializeBlockTime' -> 'SpillReadDeserializeBlockTime': - be/test/pipeline/exec/multi_cast_data_streamer_test.cpp (2 occurrences) - be/test/pipeline/operator/spillable_operator_test_helper.cpp (1 occurrence)
|
run buildall |
TPC-H: Total hot run time: 28975 ms |
TPC-DS: Total hot run time: 184376 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
What problem does this PR solve?
Based on PR #60540 review suggestions:
1: Encapsulate _current_writer in VIcebergTableWriter
2: Split VIcebergSortWriter header-only into .h + .cpp
4: Fix error handling in VIcebergSortWriter::close()
5: Add explicit do_sort() in _do_spill()
6: Fix typo 'SpillReadDerializeBlockTime' -> 'SpillReadDeserializeBlockTime'
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)