[client] Move macOS sleep detection into the daemon (purego)#5926
[client] Move macOS sleep detection into the daemon (purego)#5926
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughReplaces the macOS cgo-based IOKit/CoreFoundation sleep detector with a pure-Go purego dlopen implementation, removes the NotifyOSLifecycle protobuf RPC and UI listener, and starts an in-process detector from Server with serialized runloop lifecycle and per-detector done channels. Changes
Sequence DiagramsequenceDiagram
participant Server
participant Detector as Sleep Detector
participant OS as macOS\n(IOKit / CFRunLoop)
participant Handler as Sleep Handler
Server->>Detector: startSleepDetector() / Register()
Detector->>Detector: dlopen IOKit/CF, resolve symbols, create callback
Detector->>OS: IORegisterForSystemPower() + CFRunLoopRun() (OS thread)
OS-->>Detector: Power event (willSleep / didWake)
Detector->>Detector: powerCallback() -> dispatch (snapshot callbacks)
Detector->>Handler: HandleSleep() / HandleWakeUp() (async)
Handler->>Handler: process event
Server->>Detector: Deregister() / shutdown
Detector->>OS: CFRunLoopStop() + IOKit cleanup
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/internal/sleep/detector_darwin.go`:
- Around line 141-206: Deregister currently closes d.done permanently and
triggerCallback can spawn goroutines that outlive its timeout, causing panics on
re-register and concurrent callback runs; fix by making d.done lifecycle
resettable and serializing callback execution: in Register, if d.done is nil
create a new channel (d.done = make(chan struct{})), and in Deregister close
d.done and set it to nil after removing from serviceRegistry to avoid
double-close; modify triggerCallback to use a per-detector worker goroutine or a
mutex/sequence to ensure only one callback executes at a time and to cancel/stop
a timed-out callback (use a context with timeout or select on a done channel) so
the goroutine doesn't continue running after triggerCallback returns.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fd75d602-7c8b-4ceb-8cff-aa38f1418ca4
⛔ Files ignored due to path filters (2)
client/proto/daemon.pb.gois excluded by!**/*.pb.goclient/proto/daemon_grpc.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (6)
client/internal/sleep/detector_darwin.goclient/proto/daemon.protoclient/server/server.goclient/server/sleep.goclient/ui/client_ui.gogo.mod
💤 Files with no reviewable changes (2)
- client/ui/client_ui.go
- client/proto/daemon.proto
d74acc9 to
5b4f2d0
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/internal/sleep/detector_darwin.go`:
- Around line 174-197: The current early return when len(serviceRegistry) > 1
lets later registrants skip waiting for the runloop: instead of unlocking and
returning nil, capture the current runLoopReady into a local variable, unlock,
wait on that channel ( <-ready ), then re-lock and check runLoopErr; if
runLoopErr is non-nil remove the current detector from serviceRegistry
(delete(serviceRegistry, d)) and return the error; otherwise return nil. Apply
this change around the existing branch that uses serviceRegistry,
serviceRegistryMu, runLoopReady, runLoopErr and runRunLoop() so all concurrent
registrations properly wait for runloop startup and roll themselves back on
error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 37646692-b298-4f29-9a9e-b2935a576b56
⛔ Files ignored due to path filters (2)
client/proto/daemon.pb.gois excluded by!**/*.pb.goclient/proto/daemon_grpc.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (6)
client/internal/sleep/detector_darwin.goclient/proto/daemon.protoclient/server/server.goclient/server/sleep.goclient/ui/client_ui.gogo.mod
💤 Files with no reviewable changes (2)
- client/ui/client_ui.go
- client/proto/daemon.proto
✅ Files skipped from review due to trivial changes (1)
- client/server/server.go
0a5a21c to
7019fd2
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/internal/sleep/detector_darwin.go`:
- Around line 163-195: The register/deregister race occurs because Register and
Deregister can observe and replace runLoopReady while a setup/teardown
transition is still in flight; fix by serializing those transitions with a
guarded "transition in progress" flag (e.g., add a runLoopTransition bool
protected by serviceRegistryMu) and update both Register and Deregister to set
runLoopTransition = true before releasing the mutex to wait on the current
runLoopReady, and clear runLoopTransition = false (under the same mutex) once
the wait completes or the new run loop is fully started/rolled back; reference
and update the Register and Deregister code paths, the runLoopReady/runLoopErr
variables, and the runRunLoop/rollbackIfRunLoopFailed flow so only one
transition can create/replace runLoopReady at a time.
- Around line 16-20: The code currently only handles kIOMessageSystemWillSleep
and kIOMessageSystemHasPoweredOn; add support for kIOMessageCanSystemSleep
(define const kIOMessageCanSystemSleep = 0xe0000281) and in the message
dispatch/handler (the routine that receives messages from
IORegisterForSystemPower) detect when msg == kIOMessageCanSystemSleep and call
IOAllowPowerChange with the same root port/notification/reference used for other
power-change acknowledgements, but do not emit or enqueue a sleep event for this
message — just acknowledge via IOAllowPowerChange so macOS can proceed to idle
sleep.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a83cd760-6517-4aca-ab7c-f7e06c944b6c
⛔ Files ignored due to path filters (2)
client/proto/daemon.pb.gois excluded by!**/*.pb.goclient/proto/daemon_grpc.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (6)
client/internal/sleep/detector_darwin.goclient/proto/daemon.protoclient/server/server.goclient/server/sleep.goclient/ui/client_ui.gogo.mod
💤 Files with no reviewable changes (2)
- client/proto/daemon.proto
- client/ui/client_ui.go
✅ Files skipped from review due to trivial changes (1)
- client/server/server.go
🚧 Files skipped from review as they are similar to previous changes (2)
- go.mod
- client/server/sleep.go
7019fd2 to
ce12c62
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
client/internal/sleep/detector_darwin.go (1)
149-159:⚠️ Potential issue | 🟠 MajorSnapshot callback state with the dispatch list.
dispatchEventsnapshots only*Detector, thentriggerCallbackreadsd.callback/d.donewithout the registry lock. A fast deregister/re-register can make a stale event use the new registration’s callback/channel, and these reads race with Line 187 and Line 188 assignments. Capturecallbackanddonewhile holdingserviceRegistryMu.🔒 Proposed fix
+type detectorCallback struct { + detector *Detector + callback func(event EventType) + done <-chan struct{} +} + func dispatchEvent(event EventType) { serviceRegistryMu.Lock() - detectors := make([]*Detector, 0, len(serviceRegistry)) + detectors := make([]detectorCallback, 0, len(serviceRegistry)) for d := range serviceRegistry { - detectors = append(detectors, d) + detectors = append(detectors, detectorCallback{ + detector: d, + callback: d.callback, + done: d.done, + }) } serviceRegistryMu.Unlock() - for _, d := range detectors { - d.triggerCallback(event) + for _, item := range detectors { + item.detector.triggerCallback(event, item.callback, item.done) } } @@ - close(d.done) + done := d.done + d.callback = nil + d.done = nil + if done != nil { + close(done) + } delete(serviceRegistry, d) @@ -func (d *Detector) triggerCallback(event EventType) { - cb := d.callback - if cb == nil { +func (d *Detector) triggerCallback(event EventType, cb func(event EventType), done <-chan struct{}) { + if cb == nil || done == nil { return } + select { + case <-done: + return + default: + } @@ select { case <-doneChan: - case <-d.done: + case <-done: case <-timeout.C: log.Warn("sleep callback timed out") } }Also applies to: 187-188, 223-224, 262-285
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/sleep/detector_darwin.go` around lines 149 - 159, dispatchEvent currently snapshots pointers to Detector and then calls d.triggerCallback which reads d.callback and d.done without holding serviceRegistryMu, allowing a race where a fast deregister/re-register swaps callbacks/channels; fix by reading and storing the concrete callback and done channel while holding serviceRegistryMu (i.e., inside the loop that builds the detectors slice) and then iterating over that snapshot outside the lock so triggerCallback uses the captured values; apply the same pattern wherever Detector fields are later read without the lock (see direct reads of d.callback and d.done around Lines 187-188, 223-224, and 262-285) so callbacks and done channels are immutable after snapshotting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@client/internal/sleep/detector_darwin.go`:
- Around line 137-146: The code clears the global session before calling
IODeregisterForSystemPower, allowing allowPowerChange (which reads session) to
miss IOAllowPowerChange; move the assignment that sets session = nil to after
the IODeregisterForSystemPower call completes. Specifically, protect access with
serviceRegistryMu as done in allowPowerChange, call IODeregisterForSystemPower
using the existing session.rp while session is still valid, then acquire
serviceRegistryMu and set session = nil only after deregistration returns;
ensure no early unlock removes the session reference before IOAllowPowerChange
can run.
---
Duplicate comments:
In `@client/internal/sleep/detector_darwin.go`:
- Around line 149-159: dispatchEvent currently snapshots pointers to Detector
and then calls d.triggerCallback which reads d.callback and d.done without
holding serviceRegistryMu, allowing a race where a fast deregister/re-register
swaps callbacks/channels; fix by reading and storing the concrete callback and
done channel while holding serviceRegistryMu (i.e., inside the loop that builds
the detectors slice) and then iterating over that snapshot outside the lock so
triggerCallback uses the captured values; apply the same pattern wherever
Detector fields are later read without the lock (see direct reads of d.callback
and d.done around Lines 187-188, 223-224, and 262-285) so callbacks and done
channels are immutable after snapshotting.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 44b0351e-f998-43e0-9c9d-0f42f2bf27ac
📒 Files selected for processing (1)
client/internal/sleep/detector_darwin.go
|
There was a problem hiding this comment.
♻️ Duplicate comments (1)
client/internal/sleep/detector_darwin.go (1)
230-257:⚠️ Potential issue | 🟠 Major
session = nilis still cleared before IOKit teardown — the same window re-opens after therunLoopSessionrefactor.Line 231 nils the global
sessionwhile the notification source is still attached to the runloop (removal happens at line 243, andIODeregisterForSystemPowerat line 247). During this window, ifkIOMessageCanSystemSleeporkIOMessageSystemWillSleepfires on the runloop thread,allowPowerChange(Lines 137-147) readssession == niland skipsIOAllowPowerChange, producing the ~30s IOKit acknowledgement timeout you previously fixed.lifecycleMudoesn’t help here since the callback thread is independent of Register/Deregister.This was flagged previously and marked addressed, but appears to have regressed when the handles were bundled into
runLoopSession.🔧 Proposed fix — clear the global only after IOKit deregistration
sess := session - session = nil serviceRegistryMu.Unlock() log.Info("sleep detection service stopping (deregister)") if sess == nil { return nil } // CFRunLoop and IOKit teardown calls are safe from a non-runloop thread. if sess.rl != 0 && sess.port != 0 { source := ioKit.IONotificationPortGetRunLoopSource(sess.port) cf.CFRunLoopRemoveSource(sess.rl, source, cfCommonModes) } if sess.notifier != 0 { n := sess.notifier ioKit.IODeregisterForSystemPower(&n) } + serviceRegistryMu.Lock() + if session == sess { + session = nil + } + serviceRegistryMu.Unlock() if sess.rp != 0 { ioKit.IOServiceClose(sess.rp) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@client/internal/sleep/detector_darwin.go` around lines 230 - 257, The global session is being cleared (session = nil) before IOKit/CFRunLoop teardown, allowing a concurrent runloop callback (allowPowerChange) to observe session==nil and skip IOAllowPowerChange; fix by deferring clearing the global until after all teardown calls (IONotificationPortGetRunLoopSource/CFRunLoopRemoveSource, IODeregisterForSystemPower, IOServiceClose, IONotificationPortDestroy, CFRunLoopStop) complete: change the sequence in runLoopSession teardown so you copy sess := session, unlock serviceRegistryMu, perform the IOKit/CF teardown using sess, then re-acquire serviceRegistryMu and set session = nil (or clear it just before return) to avoid the race with allowPowerChange.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@client/internal/sleep/detector_darwin.go`:
- Around line 230-257: The global session is being cleared (session = nil)
before IOKit/CFRunLoop teardown, allowing a concurrent runloop callback
(allowPowerChange) to observe session==nil and skip IOAllowPowerChange; fix by
deferring clearing the global until after all teardown calls
(IONotificationPortGetRunLoopSource/CFRunLoopRemoveSource,
IODeregisterForSystemPower, IOServiceClose, IONotificationPortDestroy,
CFRunLoopStop) complete: change the sequence in runLoopSession teardown so you
copy sess := session, unlock serviceRegistryMu, perform the IOKit/CF teardown
using sess, then re-acquire serviceRegistryMu and set session = nil (or clear it
just before return) to avoid the race with allowPowerChange.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0e0f14b0-18d6-4633-b795-bc0f29a96e79
📒 Files selected for processing (1)
client/internal/sleep/detector_darwin.go



Describe your changes
Run macOS sleep/wake detection from the daemon instead of the UI so it keeps working when the UI isn't running (headless installs, UI quit/crashed). Also replaces the CGO implementation with purego.
client/uiinto the daemon; drop theNotifyOSLifecyclegRPC and regenerate protoclient/internal/sleep/detector_darwin.gousing purego (IOKit + CoreFoundation via dlopen), with a single process-lifetime callback thunk and a dedicated OS-locked goroutine for the CFRunLoopNB_DISABLE_SLEEP_DETECTORenv var to opt out of sleep detectionebitengine/puregoto a direct dependencyIssue ticket number and link
Stack
Checklist
Documentation
Select exactly one:
Internal refactor; user-facing behavior is unchanged (sleep/wake handling is daemon-internal).
Docs PR URL (required if "docs added" is checked)
Paste the PR link from https://github.com/netbirdio/docs here:
https://github.com/netbirdio/docs/pull/__
Summary by CodeRabbit
New Features
Refactor
Chores
Breaking Change
Config