Skip to content

[fix](materialization) Make merge_multi_response detect insufficient backend rows and return error instead of DCHECK#60970

Open
HappenLee wants to merge 1 commit intoapache:masterfrom
HappenLee:materialization
Open

[fix](materialization) Make merge_multi_response detect insufficient backend rows and return error instead of DCHECK#60970
HappenLee wants to merge 1 commit intoapache:masterfrom
HappenLee:materialization

Conversation

@HappenLee
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings March 3, 2026 04:31
@Thearas
Copy link
Contributor

Thearas commented Mar 3, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@HappenLee
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves runtime robustness by converting a debug-only DCHECK into a real error in MaterializationSharedState::merge_multi_response, and also expands query-cache related logic to support multiple scan ranges/tablets while adjusting FE shuffle parallelism behavior when query cache is enabled.

Changes:

  • BE: Replace a DCHECK in merge_multi_response() with an explicit bounds check and InternalError to avoid out-of-bounds reads in release builds.
  • BE: Rework query-cache key building to handle multiple scan ranges/tablets, and update the cache source operator + tests accordingly.
  • FE: Add logic to cap UnassignedShuffleJob instance count when query cache is enabled, with new unit tests.

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Show a summary per file
File Description
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/distribute/worker/job/UnassignedShuffleJob.java Changes parallelism computation to apply a query-cache-based instance cap.
fe/fe-core/src/test/java/org/apache/doris/nereids/trees/plans/distribute/worker/job/UnassignedShuffleJobTest.java Adds unit tests covering the new degree-of-parallelism and instance-limiting behavior.
be/src/pipeline/query_cache/query_cache.h Extends QueryCache::build_cache_key to support multiple tablets and adds additional validations.
be/src/pipeline/exec/cache_source_operator.cpp Removes the single-scan-range restriction and updates profiling output for multiple tablets.
be/test/pipeline/exec/query_cache_test.cpp Updates/adds tests for multi-tablet cache key behavior and error cases.
be/src/util/bitmap_value.h Switches BITMAP fast-union implementation to use Roaring64Map::fastunion.
be/src/pipeline/exec/materialization_opertor.cpp Converts a DCHECK into a production error when backend blocks have insufficient rows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@doris-robot
Copy link

TPC-H: Total hot run time: 28908 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1559eefee5715a4aa4f19ba45c1853dfefc9841e, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17650	4539	4328	4328
q2	q3	10645	827	542	542
q4	4694	366	253	253
q5	7850	1196	1039	1039
q6	197	176	151	151
q7	826	851	684	684
q8	10621	1491	1310	1310
q9	5788	4827	4765	4765
q10	6849	1915	1622	1622
q11	496	243	252	243
q12	752	576	468	468
q13	17842	4228	3430	3430
q14	231	235	211	211
q15	973	801	788	788
q16	772	717	667	667
q17	724	873	462	462
q18	6341	5347	5226	5226
q19	1137	984	633	633
q20	523	503	409	409
q21	4494	1876	1430	1430
q22	347	293	247	247
Total cold run time: 99752 ms
Total hot run time: 28908 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4491	4363	4373	4363
q2	q3	1761	2166	1719	1719
q4	838	1163	780	780
q5	4043	4359	4321	4321
q6	185	177	144	144
q7	1730	1584	1459	1459
q8	2410	2696	2524	2524
q9	7857	7479	7274	7274
q10	2671	2879	2477	2477
q11	529	459	431	431
q12	495	616	472	472
q13	3993	4415	3715	3715
q14	280	325	292	292
q15	840	823	857	823
q16	713	753	694	694
q17	1148	1541	1332	1332
q18	7174	6762	6549	6549
q19	938	973	971	971
q20	2091	2194	2217	2194
q21	4052	3615	3362	3362
q22	434	426	398	398
Total cold run time: 48673 ms
Total hot run time: 46294 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184578 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1559eefee5715a4aa4f19ba45c1853dfefc9841e, data reload: false

query5	4337	651	534	534
query6	335	215	202	202
query7	4205	464	271	271
query8	339	244	239	239
query9	8733	2789	2791	2789
query10	534	384	350	350
query11	17058	17023	16673	16673
query12	186	133	129	129
query13	1264	453	364	364
query14	6331	3257	3013	3013
query14_1	2896	2801	2839	2801
query15	199	192	179	179
query16	988	458	454	454
query17	1111	725	624	624
query18	2510	451	356	356
query19	211	210	188	188
query20	143	132	131	131
query21	222	151	123	123
query22	5042	5504	5462	5462
query23	17619	17180	16889	16889
query23_1	16990	17384	16974	16974
query24	7355	1613	1218	1218
query24_1	1225	1243	1240	1240
query25	533	442	401	401
query26	1240	267	149	149
query27	2758	464	286	286
query28	4518	1889	1887	1887
query29	805	553	474	474
query30	313	253	211	211
query31	855	727	648	648
query32	83	79	71	71
query33	514	332	286	286
query34	947	912	561	561
query35	641	669	589	589
query36	1069	1124	960	960
query37	138	92	84	84
query38	2937	2993	2820	2820
query39	882	870	849	849
query39_1	831	880	819	819
query40	231	151	135	135
query41	65	64	60	60
query42	110	104	101	101
query43	368	378	354	354
query44	
query45	197	191	180	180
query46	888	986	626	626
query47	2094	2114	2027	2027
query48	305	332	237	237
query49	639	464	395	395
query50	673	294	223	223
query51	4076	4123	4020	4020
query52	107	107	95	95
query53	294	342	288	288
query54	296	266	263	263
query55	92	88	81	81
query56	324	322	302	302
query57	1362	1330	1272	1272
query58	296	276	275	275
query59	2704	2795	2544	2544
query60	346	333	315	315
query61	155	148	183	148
query62	629	581	526	526
query63	314	283	278	278
query64	4947	1280	1002	1002
query65	
query66	1446	467	352	352
query67	16337	16497	16279	16279
query68	
query69	404	298	286	286
query70	988	914	951	914
query71	336	310	324	310
query72	2792	2702	2480	2480
query73	550	545	325	325
query74	10007	9871	9770	9770
query75	2871	2777	2494	2494
query76	2269	1062	692	692
query77	379	397	325	325
query78	11132	11329	10680	10680
query79	1151	796	599	599
query80	1348	647	576	576
query81	562	286	257	257
query82	1010	152	118	118
query83	336	272	260	260
query84	255	121	104	104
query85	970	574	513	513
query86	418	319	309	309
query87	3139	3156	3021	3021
query88	3610	2732	2679	2679
query89	422	370	350	350
query90	1988	181	173	173
query91	158	154	135	135
query92	74	77	68	68
query93	927	836	510	510
query94	631	328	294	294
query95	594	412	330	330
query96	663	513	234	234
query97	2493	2505	2421	2421
query98	232	219	226	219
query99	1007	1034	900	900
Total cold run time: 253267 ms
Total hot run time: 184578 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.57% (19636/37352)
Line Coverage 36.18% (183288/506545)
Region Coverage 32.47% (142133/437740)
Branch Coverage 33.44% (61675/184438)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.12% (18696/36574)
Line Coverage 35.03% (176913/504992)
Region Coverage 30.96% (136792/441877)
Branch Coverage 32.23% (59627/185002)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 51.11% (18694/36573)
Line Coverage 35.03% (176882/504962)
Region Coverage 30.95% (136768/441852)
Branch Coverage 32.23% (59612/184980)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 56.69% (20733/36574)
Line Coverage 39.77% (200826/504991)
Region Coverage 36.17% (159812/441873)
Branch Coverage 36.94% (68332/184998)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.02% (20855/36574)
Line Coverage 40.10% (202516/504991)
Region Coverage 36.54% (161454/441873)
Branch Coverage 37.34% (69073/184998)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.13% (20894/36574)
Line Coverage 40.19% (202957/504991)
Region Coverage 36.66% (161981/441873)
Branch Coverage 37.46% (69292/184998)

…backend rows and return error instead of DCHECK
@HappenLee
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28769 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8d0836b792a8c06009d09c8ab2049418d32e3e78, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17633	4392	4267	4267
q2	q3	10649	786	516	516
q4	4670	363	248	248
q5	7549	1188	1026	1026
q6	180	175	144	144
q7	773	830	656	656
q8	9397	1465	1292	1292
q9	4956	4770	4732	4732
q10	6870	1871	1650	1650
q11	465	271	266	266
q12	743	631	475	475
q13	17769	4232	3408	3408
q14	228	230	211	211
q15	945	790	811	790
q16	755	727	675	675
q17	730	864	425	425
q18	5920	5473	5272	5272
q19	1424	991	610	610
q20	508	505	389	389
q21	5043	1909	1428	1428
q22	380	342	289	289
Total cold run time: 97587 ms
Total hot run time: 28769 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4629	4598	4671	4598
q2	q3	1846	2196	1777	1777
q4	875	1201	762	762
q5	4058	4509	4369	4369
q6	188	177	142	142
q7	1765	1642	1544	1544
q8	2506	2700	2558	2558
q9	7493	7312	7333	7312
q10	2724	2849	2488	2488
q11	514	444	416	416
q12	498	584	467	467
q13	4230	4408	3570	3570
q14	283	292	274	274
q15	858	851	828	828
q16	752	829	741	741
q17	1302	1583	1320	1320
q18	7198	6862	6639	6639
q19	887	891	865	865
q20	2113	2197	2017	2017
q21	4034	3499	3376	3376
q22	451	421	372	372
Total cold run time: 49204 ms
Total hot run time: 46435 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183614 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8d0836b792a8c06009d09c8ab2049418d32e3e78, data reload: false

query5	4325	628	514	514
query6	316	217	218	217
query7	4231	473	268	268
query8	335	247	238	238
query9	8734	2757	2812	2757
query10	511	402	336	336
query11	16997	17325	17209	17209
query12	181	135	153	135
query13	1361	477	347	347
query14	6546	3388	3039	3039
query14_1	2979	2965	2890	2890
query15	203	192	182	182
query16	999	489	490	489
query17	1203	749	628	628
query18	2996	486	361	361
query19	201	212	187	187
query20	143	152	136	136
query21	215	151	123	123
query22	5199	5142	4591	4591
query23	17156	16757	16510	16510
query23_1	16741	16628	16632	16628
query24	7217	1608	1217	1217
query24_1	1239	1250	1256	1250
query25	534	456	400	400
query26	1224	252	149	149
query27	2792	476	291	291
query28	4520	1879	1877	1877
query29	820	566	476	476
query30	316	250	212	212
query31	863	742	650	650
query32	77	69	71	69
query33	505	334	285	285
query34	920	908	559	559
query35	623	691	584	584
query36	1063	1133	960	960
query37	135	94	86	86
query38	2986	2949	2864	2864
query39	890	880	855	855
query39_1	828	838	818	818
query40	235	160	138	138
query41	69	68	64	64
query42	108	105	105	105
query43	372	389	365	365
query44	
query45	199	192	188	188
query46	898	997	632	632
query47	2125	2165	2091	2091
query48	330	331	236	236
query49	648	485	394	394
query50	697	282	218	218
query51	4106	4127	4048	4048
query52	109	109	100	100
query53	295	354	293	293
query54	330	289	273	273
query55	95	87	86	86
query56	323	340	336	336
query57	1359	1348	1275	1275
query58	298	285	290	285
query59	2592	2751	2557	2557
query60	345	352	344	344
query61	169	171	171	171
query62	629	605	546	546
query63	322	282	287	282
query64	5078	1343	1059	1059
query65	
query66	1487	465	365	365
query67	16788	16489	16299	16299
query68	
query69	393	328	309	309
query70	1053	1014	970	970
query71	347	320	302	302
query72	2947	2644	2390	2390
query73	535	564	319	319
query74	9999	9915	9715	9715
query75	2861	2743	2447	2447
query76	2283	1025	705	705
query77	371	384	317	317
query78	11149	11439	10642	10642
query79	2439	788	625	625
query80	1808	622	537	537
query81	570	284	248	248
query82	971	160	118	118
query83	338	266	242	242
query84	254	115	102	102
query85	887	470	430	430
query86	442	295	300	295
query87	3106	3110	2974	2974
query88	3572	2696	2638	2638
query89	429	379	349	349
query90	2006	178	175	175
query91	164	167	144	144
query92	78	69	69	69
query93	1176	825	506	506
query94	639	325	301	301
query95	575	343	383	343
query96	644	505	230	230
query97	2438	2476	2397	2397
query98	231	217	215	215
query99	1013	986	893	893
Total cold run time: 256081 ms
Total hot run time: 183614 ms

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 16.67% (1/6) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.57% (19637/37354)
Line Coverage 36.18% (183296/506564)
Region Coverage 32.49% (142215/437735)
Branch Coverage 33.45% (61699/184442)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants