Skip to content

[improvement](iceberg) Improve VIcebergSortWriter code quality#60978

Open
morningman wants to merge 4 commits intoapache:masterfrom
morningman:fix/iceberg-sort-writer-improvements
Open

[improvement](iceberg) Improve VIcebergSortWriter code quality#60978
morningman wants to merge 4 commits intoapache:masterfrom
morningman:fix/iceberg-sort-writer-improvements

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Mar 3, 2026

What problem does this PR solve?

Based on PR #60540 review suggestions:

1: Encapsulate _current_writer in VIcebergTableWriter

  • Move _current_writer from public to private
  • Add const getter method current_writer()
  • Update SpillIcebergTableSinkLocalState to use getter

2: Split VIcebergSortWriter header-only into .h + .cpp

  • Move all method implementations to viceberg_sort_writer.cpp
  • Replace heavy includes (spill_stream.h, spill_stream_manager.h) with forward declarations in header to reduce compilation dependencies
  • Add comprehensive documentation comments to all methods and members

4: Fix error handling in VIcebergSortWriter::close()

  • Track internal_status for intermediate operations
  • Pass actual error status to underlying partition writer's close() instead of the original caller status (which could be OK)

5: Add explicit do_sort() in _do_spill()

  • Call _sorter->do_sort() before prepare_for_read() for clarity
  • Add detailed comments explaining the relationship with FullSorter::prepare_for_read() internal sorting

6: Fix typo 'SpillReadDerializeBlockTime' -> 'SpillReadDeserializeBlockTime'

  • Fix counter name string in spill_iceberg_table_sink_operator.cpp
  • Fix counter name string in operator.h (PipelineXSpillLocalState)
  • Fix counter lookup string in spill_reader.h
  • Fix member variable name _spill_read_derialize_block_timer

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Based on PR apache#60540 review suggestions:

6.1: Encapsulate _current_writer in VIcebergTableWriter
  - Move _current_writer from public to private
  - Add const getter method current_writer()
  - Update SpillIcebergTableSinkLocalState to use getter

6.2: Split VIcebergSortWriter header-only into .h + .cpp
  - Move all method implementations to viceberg_sort_writer.cpp
  - Replace heavy includes (spill_stream.h, spill_stream_manager.h) with
    forward declarations in header to reduce compilation dependencies
  - Add comprehensive documentation comments to all methods and members

6.4: Fix error handling in VIcebergSortWriter::close()
  - Track internal_status for intermediate operations
  - Pass actual error status to underlying partition writer's close()
    instead of the original caller status (which could be OK)

6.5: Add explicit do_sort() in _do_spill()
  - Call _sorter->do_sort() before prepare_for_read() for clarity
  - Add detailed comments explaining the relationship with
    FullSorter::prepare_for_read() internal sorting

6.6: Fix typo 'SpillReadDerializeBlockTime' -> 'SpillReadDeserializeBlockTime'
  - Fix counter name string in spill_iceberg_table_sink_operator.cpp
  - Fix counter name string in operator.h (PipelineXSpillLocalState)
  - Fix counter lookup string in spill_reader.h
  - Fix member variable name _spill_read_derialize_block_timer
@Thearas
Copy link
Contributor

Thearas commented Mar 3, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman
Copy link
Contributor Author

/review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary for PR #60978

Overall Assessment

This PR is a clean refactoring that splits a header-only implementation into .h + .cpp, encapsulates a public field, and fixes a typo. The code quality is improved overall. However, there are a few issues to address.

Critical Checkpoint Conclusions

1. Goal and test coverage: The PR achieves its stated goals (encapsulation, header split, typo fix, close() error handling). No new tests are added, which is acceptable for a refactor, however the typo fix is incomplete—test files still reference the old misspelled counter name, which will cause counter name mismatches at runtime during tests.

2. Is the modification minimal and focused? Yes. The changes are well-scoped.

3. Concurrency: No new concurrency concerns introduced. The refactoring preserves existing concurrency patterns.

4. Lifecycle management: No new lifecycle issues. The Defer pattern in close() is reasonable but has a subtle error-swallowing concern (see inline comment).

5. Configuration items: N/A.

6. Incompatible changes: The counter name rename (SpillReadDerializeBlockTime -> SpillReadDeserializeBlockTime) is technically a metrics name change visible in profiles. Not a compatibility concern for storage/protocol, but worth noting.

7. Parallel code paths: The typo fix must be applied to all paths that reference the counter name. Two test files are missed (see inline comment).

8. compile_check: New .cpp file is missing compile_check_begin.h/compile_check_end.h.

9. Performance: The explicit do_sort() before prepare_for_read(true) is redundant but harmless (no double-sort occurs due to internal guard). A C-style narrowing cast should use the already-existing safe helper method.

10. Other issues: One C-style cast (int)_spill_block_batch_row_count should use the _get_spill_batch_size() method that was specifically created for this purpose.

Issues Found

# Severity File Description
1 Medium viceberg_sort_writer.cpp:234 C-style cast (int) instead of using _get_spill_batch_size()
2 Medium viceberg_sort_writer.cpp (close) Defer lambda swallows _iceberg_partition_writer->close() error
3 Medium test files Typo fix incomplete: 2 test files still use old counter name
4 Low viceberg_sort_writer.cpp Missing compile_check_begin.h/compile_check_end.h

zhangstar333
zhangstar333 previously approved these changes Mar 3, 2026
1. Replace C-style cast (int)_spill_block_batch_row_count with the safe
   _get_spill_batch_size() helper that includes bounds checking, preventing
   potential narrowing conversion issues from size_t to int.

2. Propagate _iceberg_partition_writer->close() error from Defer block.
   Previously the close error was logged but silently discarded, which could
   mask data loss (caller receives OK but file was not properly closed).
   Now the error is captured in close_status and returned to the caller.

3. Add compile_check_begin.h/compile_check_end.h includes per codebase
   convention. This enables -Wconversion as error for the new .cpp file.

4. Fix typo in test files that still used old misspelled counter name
   'SpillReadDerializeBlockTime' -> 'SpillReadDeserializeBlockTime':
   - be/test/pipeline/exec/multi_cast_data_streamer_test.cpp (2 occurrences)
   - be/test/pipeline/operator/spillable_operator_test_helper.cpp (1 occurrence)
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28975 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 557fdd355f6849f6e5ef5eaf82a00e4c7479e210, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17687	4513	4324	4324
q2	q3	10633	752	520	520
q4	4689	355	251	251
q5	7587	1190	1024	1024
q6	172	172	145	145
q7	805	838	665	665
q8	9825	1453	1314	1314
q9	5610	4763	4701	4701
q10	6846	1895	1651	1651
q11	486	250	236	236
q12	759	571	468	468
q13	17774	4234	3438	3438
q14	232	239	212	212
q15	940	810	787	787
q16	761	715	686	686
q17	745	859	457	457
q18	5907	5312	5285	5285
q19	1485	993	614	614
q20	531	494	387	387
q21	4699	1956	1554	1554
q22	389	327	256	256
Total cold run time: 98562 ms
Total hot run time: 28975 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4783	4697	4590	4590
q2	q3	2030	2250	1746	1746
q4	882	1174	763	763
q5	4060	4366	4362	4362
q6	188	175	144	144
q7	1778	1665	1520	1520
q8	2491	2730	2616	2616
q9	7496	7422	7429	7422
q10	2675	2839	2456	2456
q11	550	423	410	410
q12	525	594	460	460
q13	4184	4779	3598	3598
q14	283	298	278	278
q15	868	826	811	811
q16	723	796	709	709
q17	1180	1660	1365	1365
q18	7331	6752	6690	6690
q19	891	868	889	868
q20	2103	2134	1984	1984
q21	3941	3497	3357	3357
q22	473	414	368	368
Total cold run time: 49435 ms
Total hot run time: 46517 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184376 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 557fdd355f6849f6e5ef5eaf82a00e4c7479e210, data reload: false

query5	4344	641	517	517
query6	331	231	198	198
query7	4209	469	282	282
query8	333	245	240	240
query9	8727	2760	2766	2760
query10	543	386	352	352
query11	17082	17548	17211	17211
query12	204	129	130	129
query13	1369	508	370	370
query14	6626	3424	3157	3157
query14_1	2932	2927	2862	2862
query15	211	207	188	188
query16	2114	449	453	449
query17	1075	816	648	648
query18	2556	471	350	350
query19	220	230	195	195
query20	139	135	131	131
query21	220	139	117	117
query22	5573	5252	4751	4751
query23	17249	16795	16519	16519
query23_1	16588	16734	16706	16706
query24	7171	1624	1226	1226
query24_1	1229	1241	1230	1230
query25	561	455	394	394
query26	1257	259	150	150
query27	2776	468	310	310
query28	4481	1881	1888	1881
query29	824	587	512	512
query30	305	245	209	209
query31	881	714	689	689
query32	81	71	73	71
query33	513	341	272	272
query34	908	909	574	574
query35	625	683	612	612
query36	1122	1126	962	962
query37	130	112	86	86
query38	2994	2916	2904	2904
query39	893	863	849	849
query39_1	828	814	841	814
query40	226	153	135	135
query41	62	60	56	56
query42	104	100	108	100
query43	375	374	346	346
query44	
query45	195	184	187	184
query46	887	987	623	623
query47	2177	2139	2045	2045
query48	312	309	231	231
query49	625	462	379	379
query50	685	274	215	215
query51	4093	4137	4107	4107
query52	107	106	95	95
query53	288	338	272	272
query54	292	262	260	260
query55	89	80	84	80
query56	331	314	317	314
query57	1335	1335	1272	1272
query58	315	271	268	268
query59	2517	2674	2524	2524
query60	349	335	323	323
query61	151	146	146	146
query62	626	547	557	547
query63	320	276	275	275
query64	4925	1267	1006	1006
query65	
query66	1455	449	359	359
query67	16456	16497	16262	16262
query68	
query69	409	306	310	306
query70	1007	977	900	900
query71	340	299	296	296
query72	2799	2680	2614	2614
query73	539	551	333	333
query74	10003	9961	9791	9791
query75	2876	2750	2495	2495
query76	2315	1032	667	667
query77	390	391	312	312
query78	11199	11436	10726	10726
query79	1162	800	603	603
query80	1555	666	570	570
query81	569	301	256	256
query82	1044	154	117	117
query83	364	273	260	260
query84	261	125	114	114
query85	1239	547	425	425
query86	422	317	305	305
query87	3108	3132	3003	3003
query88	3580	2691	2664	2664
query89	433	368	353	353
query90	1861	174	169	169
query91	164	155	135	135
query92	81	78	68	68
query93	988	897	507	507
query94	624	327	278	278
query95	582	358	382	358
query96	644	532	229	229
query97	2488	2494	2425	2425
query98	232	223	226	223
query99	993	998	896	896
Total cold run time: 255576 ms
Total hot run time: 184376 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 1.32% (3/228) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.58% (19642/37354)
Line Coverage 36.22% (183463/506567)
Region Coverage 32.49% (142232/437735)
Branch Coverage 33.45% (61694/184436)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 1.32% (3/228) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.57% (26179/36576)
Line Coverage 54.36% (274520/505014)
Region Coverage 51.78% (228810/441872)
Branch Coverage 53.07% (98176/185000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants