Skip to content

[opt](TabletScheduler) introduce TableDispatchScheduler for table-level tablet scheduling#60955

Open
uchenily wants to merge 1 commit intoapache:masterfrom
uchenily:table-level-schd
Open

[opt](TabletScheduler) introduce TableDispatchScheduler for table-level tablet scheduling#60955
uchenily wants to merge 1 commit intoapache:masterfrom
uchenily:table-level-schd

Conversation

@uchenily
Copy link
Contributor

@uchenily uchenily commented Mar 2, 2026

What problem does this PR solve?

This PR optimizes the TabletScheduler by introducing a table-level dispatching mechanism. Instead of processing tablets sequentially from a single global queue, the scheduler now dispatches tablets to table-specific queues handled by a pool of worker threads.

This optimization aims to prevent a long-held lock or a potential deadlock on a specific table from blocking the entire TabletScheduler and enhance overall scheduling throughput by allowing multiple tables to be processed concurrently.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@uchenily
Copy link
Contributor Author

uchenily commented Mar 2, 2026

run buildall

…el tablet scheduling

This PR optimizes the TabletScheduler by introducing a table-level dispatching mechanism.
Instead of processing tablets sequentially from a single global queue, the scheduler now
dispatches tablets to table-specific queues handled by a pool of worker threads.

This optimization aims to prevent a long-held lock or a potential deadlock on a specific
table from blocking the entire TabletScheduler and enhance overall scheduling throughput
by allowing multiple tables to be processed concurrently.
@uchenily
Copy link
Contributor Author

uchenily commented Mar 3, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28613 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6f31d1bac0b441c0a33a5a1cf249aa29ec8e9341, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17641	4446	4293	4293
q2	q3	10649	776	519	519
q4	4687	354	251	251
q5	7551	1191	1024	1024
q6	176	177	147	147
q7	776	827	692	692
q8	9735	1460	1279	1279
q9	4993	4758	4721	4721
q10	6829	1889	1634	1634
q11	452	254	251	251
q12	741	568	468	468
q13	17795	4200	3405	3405
q14	230	234	211	211
q15	962	794	782	782
q16	740	725	653	653
q17	724	897	415	415
q18	5983	5259	5311	5259
q19	1418	955	593	593
q20	502	497	388	388
q21	4499	1834	1395	1395
q22	345	287	233	233
Total cold run time: 97428 ms
Total hot run time: 28613 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4440	4341	4368	4341
q2	q3	1752	2158	1708	1708
q4	824	1157	760	760
q5	4007	4316	4319	4316
q6	178	175	142	142
q7	1729	1593	1486	1486
q8	2401	2634	2513	2513
q9	7462	7350	7491	7350
q10	2734	2874	2449	2449
q11	513	430	413	413
q12	507	592	453	453
q13	4028	4472	3605	3605
q14	281	304	272	272
q15	865	808	831	808
q16	758	759	719	719
q17	1156	1496	1322	1322
q18	6957	7035	6493	6493
q19	910	849	878	849
q20	2088	2169	2033	2033
q21	3944	3466	3619	3466
q22	447	451	396	396
Total cold run time: 47981 ms
Total hot run time: 45894 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183411 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6f31d1bac0b441c0a33a5a1cf249aa29ec8e9341, data reload: false

query5	4828	650	506	506
query6	337	214	203	203
query7	4227	463	278	278
query8	332	238	226	226
query9	8756	2779	2805	2779
query10	532	370	331	331
query11	17042	17528	17257	17257
query12	192	132	133	132
query13	1262	495	356	356
query14	6706	3329	2999	2999
query14_1	2956	2936	2854	2854
query15	208	196	182	182
query16	1103	473	469	469
query17	1341	749	610	610
query18	2976	446	390	390
query19	260	243	196	196
query20	138	136	135	135
query21	215	141	118	118
query22	5604	5003	4557	4557
query23	17041	16784	16586	16586
query23_1	16593	16677	16548	16548
query24	7289	1629	1219	1219
query24_1	1215	1235	1226	1226
query25	538	457	405	405
query26	1241	261	146	146
query27	2766	467	284	284
query28	4531	1908	1909	1908
query29	784	555	469	469
query30	311	245	211	211
query31	877	730	656	656
query32	79	72	68	68
query33	507	349	284	284
query34	911	921	561	561
query35	633	678	599	599
query36	1039	1130	954	954
query37	134	98	86	86
query38	2989	2996	2913	2913
query39	898	868	851	851
query39_1	833	837	820	820
query40	231	152	138	138
query41	62	59	60	59
query42	104	104	101	101
query43	376	383	345	345
query44	
query45	200	197	186	186
query46	862	984	639	639
query47	2140	2131	2046	2046
query48	340	321	231	231
query49	639	466	393	393
query50	673	278	212	212
query51	4091	4116	4067	4067
query52	112	106	96	96
query53	297	334	275	275
query54	292	276	262	262
query55	91	80	79	79
query56	311	346	307	307
query57	1373	1303	1270	1270
query58	294	282	276	276
query59	2587	2700	2591	2591
query60	341	338	326	326
query61	149	145	142	142
query62	630	595	542	542
query63	320	299	284	284
query64	4903	1276	1048	1048
query65	
query66	1451	468	378	378
query67	16444	16393	16063	16063
query68	
query69	415	326	305	305
query70	1015	945	964	945
query71	384	303	296	296
query72	2826	2640	2426	2426
query73	541	538	315	315
query74	9943	9897	9748	9748
query75	2855	2737	2476	2476
query76	2293	1025	686	686
query77	357	377	324	324
query78	11162	11226	10620	10620
query79	1156	791	593	593
query80	789	642	538	538
query81	503	283	243	243
query82	1315	157	120	120
query83	352	267	242	242
query84	251	119	98	98
query85	897	570	510	510
query86	379	300	321	300
query87	3155	3165	2994	2994
query88	3522	2699	2676	2676
query89	434	375	350	350
query90	1859	179	174	174
query91	180	169	150	150
query92	79	75	75	75
query93	905	821	518	518
query94	522	343	304	304
query95	600	354	324	324
query96	644	524	231	231
query97	2457	2506	2396	2396
query98	231	219	229	219
query99	1010	976	930	930
Total cold run time: 253951 ms
Total hot run time: 183411 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants