Conversation
|
Please merge or rebase master branch which fixed CI failure. |
There was a problem hiding this comment.
Pull request overview
This PR introduces an experimental Active Task mechanism for bthread workers to run non-blocking maintenance hooks (e.g. harvesting io_uring completions) and adds a local pin + within-worker wake path so waiters can reliably resume on the same worker without cross-thread routing.
Changes:
- Add Active Task registration/types in
bthread/unstable.hand snapshot/init/destroy integration inTaskControl/TaskGroup. - Implement pin-aware scheduling and butex support (
bthread_butex_wait_local,bthread_butex_wake_within, pinned runqueues, and strict rejection for generic butex wake APIs). - Add a comprehensive unit test suite and Chinese documentation for Active Task usage; link docs from READMEs.
Reviewed changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
test/bthread_active_task_unittest.cpp |
Adds end-to-end tests for local wait/wake, strict invariants, pinned/no-steal behavior, and competition cases. |
src/bthread/unstable.h |
Defines the experimental Active Task API surface and callback/context structs. |
src/bthread/task_meta.h |
Extends task metadata with local pin state (home group/control/tag, depth, enabled flag). |
src/bthread/task_group_inl.h |
Adds pinned runqueue push and flushes pinned remote nosignal tasks. |
src/bthread/task_group.h |
Declares active-task worker lifecycle/harvest plumbing and pin-aware routing helpers/queues. |
src/bthread/task_group.cpp |
Implements active-task init/destroy/harvest, worker-loop polling/idle integration, and pin-aware ready-to-run routing. |
src/bthread/task_control.h |
Stores an active-task type snapshot on init; exposes internal snapshot helper. |
src/bthread/task_control.cpp |
Captures active-task registry snapshot during init; initializes/destroys per-worker active tasks; extends signal counting. |
src/bthread/parking_lot.h |
Adds timed wait support (relative timeout) used by active-task idle waiting. |
src/bthread/butex.h |
Documents strict pinned-waiter behavior and exposes butex_wake_to_task_group helper. |
src/bthread/butex.cpp |
Adds within-worker wake helper, strict rejection for pinned waiters, and pin-aware wake routing adjustments. |
src/bthread/bthread.cpp |
Adds active-task type registry + registration API and implements bthread_butex_wait_local/wake_within. |
docs/cn/bthread_active_task.md |
New documentation describing the Active Task model, constraints, and recommended usage pattern. |
README_cn.md |
Links the new Active Task documentation from the Chinese README. |
README.md |
Links the new Active Task documentation from the main README. |
.gitignore |
Adds .cache to ignored paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // wake_within() runs on target_group's owner worker. For pinned waiters, | ||
| // if owner-local pinned runqueue is already full, report EAGAIN and | ||
| // keep waiter on butex list for next harvest retry, instead of | ||
| // blocking/spinning here. | ||
| if (is_pinned_waiter(bw) && target_group->pinned_rq_full()) { | ||
| errno = EAGAIN; | ||
| return -1; | ||
| } | ||
| bw->RemoveFromList(); | ||
| bw->container.store(NULL, butil::memory_order_relaxed); | ||
| } | ||
|
|
||
| unsleep_if_necessary(bbw, get_global_timer_thread()); | ||
| target_group->ready_to_run(bbw->task_meta, true); | ||
| return 1; |
There was a problem hiding this comment.
butex_wake_to_task_group tries to avoid blocking in active-task hooks by checking pinned_rq_full() and returning EAGAIN, but the check isn't atomic with the subsequent enqueue (target_group->ready_to_run(...)). If the pinned runqueue becomes full between the check and ready_to_run, push_pinned_rq may spin/sleep inside the hook, violating the non-blocking intent. Consider adding a non-blocking/try-enqueue path for within-wake (or re-check/return EAGAIN on enqueue failure) so hooks never block.
There was a problem hiding this comment.
in this code path, pinned_rq_full() and the following enqueue are executed by the same owner worker, so the TOCTOU case you mentioned does not occur here.
butex_wake_within_active_task() is only allowed from the active-task hook of the current TaskGroup (validated in validate_active_task_hook_ctx). For pinned waiters, ready_to_run() routes to ready_to_run_pinned_local() on the same home worker. Also, _pinned_rq is owner-local (not used by steal path, and remote routing for pinned tasks goes to _pinned_remote_rq), so there is no concurrent producer that can make _pinned_rq become full between the check and enqueue.
so this check is intended as a fast fail (EAGAIN) when already full, not as a cross-thread non-blocking CAS gate.
| bthread::TaskControl& GetSharedSingleWorkerTaskControl() { | ||
| return GetSharedTaskControl(); | ||
| } | ||
|
|
||
| bthread::TaskControl& GetSharedTwoWorkerTaskControl() { | ||
| return GetSharedTaskControl(); | ||
| } |
There was a problem hiding this comment.
These helpers are named as if they return different TaskControl configurations, but both currently return the same shared TaskControl initialized with concurrency=2. This is misleading for readers and can cause incorrect assumptions in future edits; please either rename them to reflect the shared 2-worker instance, or actually initialize distinct single-worker vs two-worker TaskControls (if feasible).
…ou/add_active_task


What problem does this PR solve?
Issue Number: resolve #3212
Problem Summary:
_remote_rq) and incur extra bthread switch / scheduling cost.waiton a request and be resumed by the same worker’s event/harvest loop, without being stolen or migrated during the wait lifecycle.What is changed and the side effects?
Changed:
bthread/unstable.h(startup registration, per-worker init/destroy, singleharvesthook).poll_every_nswitch=1)bthread_butex_wake_within(ctx, butex):home_group)bthread_butex_wait_local(...):_pinned_rq_pinned_remote_rqready_to_run*, timeout, interruption, and genericbutex_wake*exchangebypass bug).wake_within == 0competition cases (timeout/interruption/generic wake)docs/cn/bthread_active_task.md) with Plan-A usage (butex_wait_local + harvest + wake_within).Side effects:
Performance effects:
poll_every_nswitch=1), which improves completion->resume latency but may increase CPU usage.bthread_butex_wait_locallose steal-based load balancing during the wait lifecycle by design (correctness/locality tradeoff).Breaking backward compatibility:
bthread/unstable.h(experimental/UNSTABLE).bthread_butex_wait_local(...).Check List: