Skip to content

[Fix](Cloud)decouple min pipeline executor size from ConnectContext#60958

Open
CalvinKirs wants to merge 5 commits intoapache:masterfrom
CalvinKirs:master-scan-params
Open

[Fix](Cloud)decouple min pipeline executor size from ConnectContext#60958
CalvinKirs wants to merge 5 commits intoapache:masterfrom
CalvinKirs:master-scan-params

Conversation

@CalvinKirs
Copy link
Member

#60648

getMinPipelineExecutorSize in cloud paths implicitly depended on ConnectContext, which caused two issues:

Unstable behavior when no thread-local context is available (e.g. internal/async paths). Unclear API semantics since callers could not explicitly specify the target cluster. This PR makes the API explicit by requiring clusterName.

What Changed
Removed the no-arg getMinPipelineExecutorSize() API and kept only: getMinPipelineExecutorSize(String clusterName). Unified SystemInfoService and CloudSystemInfoService implementations to the string-arg API. Updated SessionVariable#getParallelExecInstanceNum() to call the string-arg API, with cluster resolved from session/auth information (instead of directly depending on ConnectContext). Added synchronization in ConnectContext:

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

apache#60648

getMinPipelineExecutorSize in cloud paths implicitly depended on ConnectContext, which caused two issues:

Unstable behavior when no thread-local context is available (e.g. internal/async paths).
Unclear API semantics since callers could not explicitly specify the target cluster.
This PR makes the API explicit by requiring clusterName.

What Changed
Removed the no-arg getMinPipelineExecutorSize() API and kept only: getMinPipelineExecutorSize(String clusterName).
Unified SystemInfoService and CloudSystemInfoService implementations to the string-arg API.
Updated SessionVariable#getParallelExecInstanceNum() to call the string-arg API, with cluster resolved from session/auth information (instead of directly depending on ConnectContext).
Added synchronization in ConnectContext:
@Thearas
Copy link
Contributor

Thearas commented Mar 2, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@CalvinKirs
Copy link
Member Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 38.03% (54/142) 🎉
Increment coverage report
Complete coverage report

@@ -4176,6 +4180,10 @@ public void setDebugSkipFoldConstant(boolean debugSkipFoldConstant) {
}

public int getParallelExecInstanceNum() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this unused method

return clusterName;
}

private static String resolveCloudClusterName(ConnectContext connectContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is same as in SessionVariables.java?

@morningman
Copy link
Contributor

run buildall

morningman
morningman previously approved these changes Mar 3, 2026
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 3, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Mar 3, 2026
@CalvinKirs
Copy link
Member Author

run buildall

CalvinKirs added a commit to CalvinKirs/incubator-doris that referenced this pull request Mar 3, 2026
CalvinKirs added a commit to CalvinKirs/incubator-doris that referenced this pull request Mar 3, 2026
@doris-robot
Copy link

TPC-H: Total hot run time: 29137 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5dc93b541dc1fd305b5fa6a89aeacc72bddc28b6, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17668	4559	4289	4289
q2	q3	10638	807	523	523
q4	4681	380	270	270
q5	7574	1205	1038	1038
q6	179	174	153	153
q7	791	863	671	671
q8	9297	1483	1434	1434
q9	4891	4744	4672	4672
q10	6849	1883	1656	1656
q11	477	253	243	243
q12	754	584	491	491
q13	17751	4232	3446	3446
q14	225	229	211	211
q15	960	811	789	789
q16	756	730	669	669
q17	708	921	425	425
q18	5951	5412	5234	5234
q19	1122	985	640	640
q20	525	519	400	400
q21	4665	2047	1594	1594
q22	408	325	289	289
Total cold run time: 96870 ms
Total hot run time: 29137 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4591	4521	4532	4521
q2	q3	1807	2211	1787	1787
q4	912	1205	793	793
q5	4098	4400	4379	4379
q6	196	188	146	146
q7	1879	1646	1511	1511
q8	2615	2736	2736	2736
q9	7615	7425	7346	7346
q10	2672	2790	2403	2403
q11	515	432	414	414
q12	495	582	437	437
q13	3955	4446	3686	3686
q14	291	300	271	271
q15	833	806	842	806
q16	725	735	687	687
q17	1178	1455	1312	1312
q18	7026	6815	6652	6652
q19	1018	1008	945	945
q20	2240	2177	2021	2021
q21	4045	3483	3497	3483
q22	483	440	404	404
Total cold run time: 49189 ms
Total hot run time: 46740 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184054 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5dc93b541dc1fd305b5fa6a89aeacc72bddc28b6, data reload: false

query5	4923	626	508	508
query6	334	222	205	205
query7	4224	462	280	280
query8	330	264	251	251
query9	8758	2720	2750	2720
query10	536	369	332	332
query11	16939	17404	17196	17196
query12	193	126	127	126
query13	1419	465	347	347
query14	6587	3469	3123	3123
query14_1	2994	3070	3023	3023
query15	211	207	181	181
query16	1052	519	514	514
query17	1521	799	686	686
query18	2568	440	345	345
query19	226	207	182	182
query20	138	133	130	130
query21	210	136	122	122
query22	4842	5068	4747	4747
query23	17160	16736	16574	16574
query23_1	16681	16665	16686	16665
query24	7358	1611	1232	1232
query24_1	1246	1258	1245	1245
query25	570	480	428	428
query26	1237	278	158	158
query27	2753	468	298	298
query28	4498	1879	1886	1879
query29	826	581	503	503
query30	319	248	213	213
query31	881	741	651	651
query32	88	76	75	75
query33	525	338	300	300
query34	927	909	550	550
query35	630	684	600	600
query36	1108	1130	927	927
query37	140	98	88	88
query38	2987	2947	2864	2864
query39	887	875	855	855
query39_1	843	824	878	824
query40	243	157	145	145
query41	111	58	59	58
query42	109	109	104	104
query43	387	381	354	354
query44	
query45	207	190	196	190
query46	887	987	604	604
query47	2113	2183	2084	2084
query48	311	317	231	231
query49	619	462	374	374
query50	701	273	211	211
query51	4055	4103	4036	4036
query52	105	110	97	97
query53	287	341	283	283
query54	301	266	253	253
query55	87	83	80	80
query56	313	319	321	319
query57	1360	1353	1280	1280
query58	289	283	278	278
query59	2550	2709	2563	2563
query60	336	330	328	328
query61	144	142	151	142
query62	612	583	543	543
query63	318	281	273	273
query64	4891	1309	1022	1022
query65	
query66	1439	475	359	359
query67	16423	16427	16373	16373
query68	
query69	407	311	289	289
query70	908	944	984	944
query71	328	314	293	293
query72	2855	2673	2419	2419
query73	533	540	319	319
query74	10031	9906	9772	9772
query75	2840	2749	2417	2417
query76	2311	1053	688	688
query77	357	372	309	309
query78	11141	11666	10697	10697
query79	1122	800	598	598
query80	1369	622	559	559
query81	568	283	242	242
query82	1016	149	115	115
query83	351	271	243	243
query84	252	120	100	100
query85	937	462	424	424
query86	424	299	317	299
query87	3109	3102	2979	2979
query88	3585	2706	2667	2667
query89	426	363	340	340
query90	1960	183	178	178
query91	173	154	134	134
query92	85	78	75	75
query93	961	833	512	512
query94	655	310	297	297
query95	623	380	330	330
query96	650	511	227	227
query97	2477	2517	2365	2365
query98	231	223	223	223
query99	1013	1001	913	913
Total cold run time: 253870 ms
Total hot run time: 184054 ms

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 59.52% (100/168) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants