Skip to content

[client] Move macOS sleep detection into the daemon (purego)#5926

Merged
lixmal merged 4 commits intomainfrom
sleep-detector-daemon-purego
Apr 29, 2026
Merged

[client] Move macOS sleep detection into the daemon (purego)#5926
lixmal merged 4 commits intomainfrom
sleep-detector-daemon-purego

Conversation

@lixmal
Copy link
Copy Markdown
Collaborator

@lixmal lixmal commented Apr 20, 2026

Describe your changes

Run macOS sleep/wake detection from the daemon instead of the UI so it keeps working when the UI isn't running (headless installs, UI quit/crashed). Also replaces the CGO implementation with purego.

  • Move sleep detection from client/ui into the daemon; drop the NotifyOSLifecycle gRPC and regenerate proto
  • Rewrite client/internal/sleep/detector_darwin.go using purego (IOKit + CoreFoundation via dlopen), with a single process-lifetime callback thunk and a dedicated OS-locked goroutine for the CFRunLoop
  • Make Register synchronous: surface IOKit registration failures to the caller, and have Deregister await runloop setup so teardown can't skip cleanup on a fast register/deregister
  • Add NB_DISABLE_SLEEP_DETECTOR env var to opt out of sleep detection
  • Promote ebitengine/purego to a direct dependency

Issue ticket number and link

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Internal refactor; user-facing behavior is unchanged (sleep/wake handling is daemon-internal).

Docs PR URL (required if "docs added" is checked)

Paste the PR link from https://github.com/netbirdio/docs here:

https://github.com/netbirdio/docs/pull/__

Summary by CodeRabbit

  • New Features

    • Server now starts OS sleep/wake detection automatically; UI no longer initializes it and events are forwarded internally.
  • Refactor

    • Sleep detection rewritten to use runtime dynamic bindings, serialized lifecycle transitions, safer callback dispatch, and robust startup/shutdown.
  • Chores

    • Made native runtime binding dependency explicit in module configuration.
  • Breaking Change

    • Removed legacy OS-lifecycle RPC — external lifecycle notifications must stop.
  • Config

    • Detector can be disabled via NB_DISABLE_SLEEP_DETECTOR.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 90a244bb-bcc9-4aab-8503-fd62efaf990d

📥 Commits

Reviewing files that changed from the base of the PR and between fe4fe40 and 2c2bbdc.

📒 Files selected for processing (1)
  • client/internal/sleep/detector_darwin.go

📝 Walkthrough

Walkthrough

Replaces the macOS cgo-based IOKit/CoreFoundation sleep detector with a pure-Go purego dlopen implementation, removes the NotifyOSLifecycle protobuf RPC and UI listener, and starts an in-process detector from Server with serialized runloop lifecycle and per-detector done channels.

Changes

Cohort / File(s) Summary
macOS Sleep Detector Implementation
client/internal/sleep/detector_darwin.go
Replaced cgo bridge with purego dlopen of IOKit/CoreFoundation, resolved symbols (including kCFRunLoopCommonModes), registered a single purego.NewCallback(powerCallback), refactored event dispatch/acknowledgement, added per-detector done chan struct{}, serialized lifecycle via lifecycleMu, and OS-locked CFRunLoop thread setup/teardown.
Daemon Service Protocol
client/proto/daemon.proto
Removed DaemonService.NotifyOSLifecycle RPC and deleted the OSLifecycleRequest (and CycleType enum) and OSLifecycleResponse messages.
Server Sleep Detection Integration
client/server/server.go, client/server/sleep.go
Server.New() now calls startSleepDetector(); added detector initialization/registration, env var gate NB_DISABLE_SLEEP_DETECTOR, asynchronous forwarding of events to sleephandler, and detector deregistration on server shutdown.
UI Client Sleep Handling
client/ui/client_ui.go
Removed UI-side sleep listener and code that translated OS events to the removed NotifyOSLifecycle RPC.
Dependency Management
go.mod
Promoted github.com/ebitengine/purego v0.8.4 from an indirect to a direct requirement.

Sequence Diagram

sequenceDiagram
    participant Server
    participant Detector as Sleep Detector
    participant OS as macOS\n(IOKit / CFRunLoop)
    participant Handler as Sleep Handler

    Server->>Detector: startSleepDetector() / Register()
    Detector->>Detector: dlopen IOKit/CF, resolve symbols, create callback
    Detector->>OS: IORegisterForSystemPower() + CFRunLoopRun() (OS thread)
    OS-->>Detector: Power event (willSleep / didWake)
    Detector->>Detector: powerCallback() -> dispatch (snapshot callbacks)
    Detector->>Handler: HandleSleep() / HandleWakeUp() (async)
    Handler->>Handler: process event
    Server->>Detector: Deregister() / shutdown
    Detector->>OS: CFRunLoopStop() + IOKit cleanup
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • pappz
  • mlsmaycon
  • heisbrot

Poem

🐇
I gently nudged the C away,
and opened libs at runtime play.
The server listens, soft and close—
no RPC bells, just quiet doze.
Thump-thump: the detector keeps our prose.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title '[client] Move macOS sleep detection into the daemon (purego)' clearly and specifically summarizes the main change: relocating sleep detection logic and replacing the CGO implementation with purego.
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering all required template sections including change description, refactor designation, and documentation rationale.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sleep-detector-daemon-purego

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/sleep/detector_darwin.go`:
- Around line 141-206: Deregister currently closes d.done permanently and
triggerCallback can spawn goroutines that outlive its timeout, causing panics on
re-register and concurrent callback runs; fix by making d.done lifecycle
resettable and serializing callback execution: in Register, if d.done is nil
create a new channel (d.done = make(chan struct{})), and in Deregister close
d.done and set it to nil after removing from serviceRegistry to avoid
double-close; modify triggerCallback to use a per-detector worker goroutine or a
mutex/sequence to ensure only one callback executes at a time and to cancel/stop
a timed-out callback (use a context with timeout or select on a done channel) so
the goroutine doesn't continue running after triggerCallback returns.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fd75d602-7c8b-4ceb-8cff-aa38f1418ca4

📥 Commits

Reviewing files that changed from the base of the PR and between 8ae8f20 and d74acc9.

⛔ Files ignored due to path filters (2)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • client/proto/daemon_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (6)
  • client/internal/sleep/detector_darwin.go
  • client/proto/daemon.proto
  • client/server/server.go
  • client/server/sleep.go
  • client/ui/client_ui.go
  • go.mod
💤 Files with no reviewable changes (2)
  • client/ui/client_ui.go
  • client/proto/daemon.proto

Comment thread client/internal/sleep/detector_darwin.go
@lixmal lixmal force-pushed the sleep-detector-daemon-purego branch from d74acc9 to 5b4f2d0 Compare April 20, 2026 05:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/sleep/detector_darwin.go`:
- Around line 174-197: The current early return when len(serviceRegistry) > 1
lets later registrants skip waiting for the runloop: instead of unlocking and
returning nil, capture the current runLoopReady into a local variable, unlock,
wait on that channel ( <-ready ), then re-lock and check runLoopErr; if
runLoopErr is non-nil remove the current detector from serviceRegistry
(delete(serviceRegistry, d)) and return the error; otherwise return nil. Apply
this change around the existing branch that uses serviceRegistry,
serviceRegistryMu, runLoopReady, runLoopErr and runRunLoop() so all concurrent
registrations properly wait for runloop startup and roll themselves back on
error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 37646692-b298-4f29-9a9e-b2935a576b56

📥 Commits

Reviewing files that changed from the base of the PR and between d74acc9 and 5b4f2d0.

⛔ Files ignored due to path filters (2)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • client/proto/daemon_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (6)
  • client/internal/sleep/detector_darwin.go
  • client/proto/daemon.proto
  • client/server/server.go
  • client/server/sleep.go
  • client/ui/client_ui.go
  • go.mod
💤 Files with no reviewable changes (2)
  • client/ui/client_ui.go
  • client/proto/daemon.proto
✅ Files skipped from review due to trivial changes (1)
  • client/server/server.go

Comment thread client/internal/sleep/detector_darwin.go Outdated
@lixmal lixmal force-pushed the sleep-detector-daemon-purego branch 2 times, most recently from 0a5a21c to 7019fd2 Compare April 20, 2026 05:35
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/sleep/detector_darwin.go`:
- Around line 163-195: The register/deregister race occurs because Register and
Deregister can observe and replace runLoopReady while a setup/teardown
transition is still in flight; fix by serializing those transitions with a
guarded "transition in progress" flag (e.g., add a runLoopTransition bool
protected by serviceRegistryMu) and update both Register and Deregister to set
runLoopTransition = true before releasing the mutex to wait on the current
runLoopReady, and clear runLoopTransition = false (under the same mutex) once
the wait completes or the new run loop is fully started/rolled back; reference
and update the Register and Deregister code paths, the runLoopReady/runLoopErr
variables, and the runRunLoop/rollbackIfRunLoopFailed flow so only one
transition can create/replace runLoopReady at a time.
- Around line 16-20: The code currently only handles kIOMessageSystemWillSleep
and kIOMessageSystemHasPoweredOn; add support for kIOMessageCanSystemSleep
(define const kIOMessageCanSystemSleep = 0xe0000281) and in the message
dispatch/handler (the routine that receives messages from
IORegisterForSystemPower) detect when msg == kIOMessageCanSystemSleep and call
IOAllowPowerChange with the same root port/notification/reference used for other
power-change acknowledgements, but do not emit or enqueue a sleep event for this
message — just acknowledge via IOAllowPowerChange so macOS can proceed to idle
sleep.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a83cd760-6517-4aca-ab7c-f7e06c944b6c

📥 Commits

Reviewing files that changed from the base of the PR and between 5b4f2d0 and 7019fd2.

⛔ Files ignored due to path filters (2)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • client/proto/daemon_grpc.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (6)
  • client/internal/sleep/detector_darwin.go
  • client/proto/daemon.proto
  • client/server/server.go
  • client/server/sleep.go
  • client/ui/client_ui.go
  • go.mod
💤 Files with no reviewable changes (2)
  • client/proto/daemon.proto
  • client/ui/client_ui.go
✅ Files skipped from review due to trivial changes (1)
  • client/server/server.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • go.mod
  • client/server/sleep.go

Comment thread client/internal/sleep/detector_darwin.go
Comment thread client/internal/sleep/detector_darwin.go
@lixmal lixmal force-pushed the sleep-detector-daemon-purego branch from 7019fd2 to ce12c62 Compare April 20, 2026 08:02
Comment thread client/internal/sleep/detector_darwin.go Outdated
Comment thread client/internal/sleep/detector_darwin.go Outdated
Comment thread client/internal/sleep/detector_darwin.go Outdated
Comment thread client/internal/sleep/detector_darwin.go Outdated
Copy link
Copy Markdown
Collaborator

@pappz pappz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for changes

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
client/internal/sleep/detector_darwin.go (1)

149-159: ⚠️ Potential issue | 🟠 Major

Snapshot callback state with the dispatch list.

dispatchEvent snapshots only *Detector, then triggerCallback reads d.callback/d.done without the registry lock. A fast deregister/re-register can make a stale event use the new registration’s callback/channel, and these reads race with Line 187 and Line 188 assignments. Capture callback and done while holding serviceRegistryMu.

🔒 Proposed fix
+type detectorCallback struct {
+	detector *Detector
+	callback func(event EventType)
+	done     <-chan struct{}
+}
+
 func dispatchEvent(event EventType) {
 	serviceRegistryMu.Lock()
-	detectors := make([]*Detector, 0, len(serviceRegistry))
+	detectors := make([]detectorCallback, 0, len(serviceRegistry))
 	for d := range serviceRegistry {
-		detectors = append(detectors, d)
+		detectors = append(detectors, detectorCallback{
+			detector: d,
+			callback: d.callback,
+			done:     d.done,
+		})
 	}
 	serviceRegistryMu.Unlock()
 
-	for _, d := range detectors {
-		d.triggerCallback(event)
+	for _, item := range detectors {
+		item.detector.triggerCallback(event, item.callback, item.done)
 	}
 }
@@
-	close(d.done)
+	done := d.done
+	d.callback = nil
+	d.done = nil
+	if done != nil {
+		close(done)
+	}
 	delete(serviceRegistry, d)
@@
-func (d *Detector) triggerCallback(event EventType) {
-	cb := d.callback
-	if cb == nil {
+func (d *Detector) triggerCallback(event EventType, cb func(event EventType), done <-chan struct{}) {
+	if cb == nil || done == nil {
 		return
 	}
+	select {
+	case <-done:
+		return
+	default:
+	}
@@
 	select {
 	case <-doneChan:
-	case <-d.done:
+	case <-done:
 	case <-timeout.C:
 		log.Warn("sleep callback timed out")
 	}
 }

Also applies to: 187-188, 223-224, 262-285

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/sleep/detector_darwin.go` around lines 149 - 159,
dispatchEvent currently snapshots pointers to Detector and then calls
d.triggerCallback which reads d.callback and d.done without holding
serviceRegistryMu, allowing a race where a fast deregister/re-register swaps
callbacks/channels; fix by reading and storing the concrete callback and done
channel while holding serviceRegistryMu (i.e., inside the loop that builds the
detectors slice) and then iterating over that snapshot outside the lock so
triggerCallback uses the captured values; apply the same pattern wherever
Detector fields are later read without the lock (see direct reads of d.callback
and d.done around Lines 187-188, 223-224, and 262-285) so callbacks and done
channels are immutable after snapshotting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/sleep/detector_darwin.go`:
- Around line 137-146: The code clears the global session before calling
IODeregisterForSystemPower, allowing allowPowerChange (which reads session) to
miss IOAllowPowerChange; move the assignment that sets session = nil to after
the IODeregisterForSystemPower call completes. Specifically, protect access with
serviceRegistryMu as done in allowPowerChange, call IODeregisterForSystemPower
using the existing session.rp while session is still valid, then acquire
serviceRegistryMu and set session = nil only after deregistration returns;
ensure no early unlock removes the session reference before IOAllowPowerChange
can run.

---

Duplicate comments:
In `@client/internal/sleep/detector_darwin.go`:
- Around line 149-159: dispatchEvent currently snapshots pointers to Detector
and then calls d.triggerCallback which reads d.callback and d.done without
holding serviceRegistryMu, allowing a race where a fast deregister/re-register
swaps callbacks/channels; fix by reading and storing the concrete callback and
done channel while holding serviceRegistryMu (i.e., inside the loop that builds
the detectors slice) and then iterating over that snapshot outside the lock so
triggerCallback uses the captured values; apply the same pattern wherever
Detector fields are later read without the lock (see direct reads of d.callback
and d.done around Lines 187-188, 223-224, and 262-285) so callbacks and done
channels are immutable after snapshotting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 44b0351e-f998-43e0-9c9d-0f42f2bf27ac

📥 Commits

Reviewing files that changed from the base of the PR and between ce12c62 and fe4fe40.

📒 Files selected for processing (1)
  • client/internal/sleep/detector_darwin.go

Comment thread client/internal/sleep/detector_darwin.go Outdated
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
client/internal/sleep/detector_darwin.go (1)

230-257: ⚠️ Potential issue | 🟠 Major

session = nil is still cleared before IOKit teardown — the same window re-opens after the runLoopSession refactor.

Line 231 nils the global session while the notification source is still attached to the runloop (removal happens at line 243, and IODeregisterForSystemPower at line 247). During this window, if kIOMessageCanSystemSleep or kIOMessageSystemWillSleep fires on the runloop thread, allowPowerChange (Lines 137-147) reads session == nil and skips IOAllowPowerChange, producing the ~30s IOKit acknowledgement timeout you previously fixed. lifecycleMu doesn’t help here since the callback thread is independent of Register/Deregister.

This was flagged previously and marked addressed, but appears to have regressed when the handles were bundled into runLoopSession.

🔧 Proposed fix — clear the global only after IOKit deregistration
 	sess := session
-	session = nil
 	serviceRegistryMu.Unlock()
 
 	log.Info("sleep detection service stopping (deregister)")
 
 	if sess == nil {
 		return nil
 	}
 
 	// CFRunLoop and IOKit teardown calls are safe from a non-runloop thread.
 	if sess.rl != 0 && sess.port != 0 {
 		source := ioKit.IONotificationPortGetRunLoopSource(sess.port)
 		cf.CFRunLoopRemoveSource(sess.rl, source, cfCommonModes)
 	}
 	if sess.notifier != 0 {
 		n := sess.notifier
 		ioKit.IODeregisterForSystemPower(&n)
 	}
+	serviceRegistryMu.Lock()
+	if session == sess {
+		session = nil
+	}
+	serviceRegistryMu.Unlock()
 	if sess.rp != 0 {
 		ioKit.IOServiceClose(sess.rp)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/internal/sleep/detector_darwin.go` around lines 230 - 257, The global
session is being cleared (session = nil) before IOKit/CFRunLoop teardown,
allowing a concurrent runloop callback (allowPowerChange) to observe
session==nil and skip IOAllowPowerChange; fix by deferring clearing the global
until after all teardown calls
(IONotificationPortGetRunLoopSource/CFRunLoopRemoveSource,
IODeregisterForSystemPower, IOServiceClose, IONotificationPortDestroy,
CFRunLoopStop) complete: change the sequence in runLoopSession teardown so you
copy sess := session, unlock serviceRegistryMu, perform the IOKit/CF teardown
using sess, then re-acquire serviceRegistryMu and set session = nil (or clear it
just before return) to avoid the race with allowPowerChange.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@client/internal/sleep/detector_darwin.go`:
- Around line 230-257: The global session is being cleared (session = nil)
before IOKit/CFRunLoop teardown, allowing a concurrent runloop callback
(allowPowerChange) to observe session==nil and skip IOAllowPowerChange; fix by
deferring clearing the global until after all teardown calls
(IONotificationPortGetRunLoopSource/CFRunLoopRemoveSource,
IODeregisterForSystemPower, IOServiceClose, IONotificationPortDestroy,
CFRunLoopStop) complete: change the sequence in runLoopSession teardown so you
copy sess := session, unlock serviceRegistryMu, perform the IOKit/CF teardown
using sess, then re-acquire serviceRegistryMu and set session = nil (or clear it
just before return) to avoid the race with allowPowerChange.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0e0f14b0-18d6-4633-b795-bc0f29a96e79

📥 Commits

Reviewing files that changed from the base of the PR and between ce12c62 and fe4fe40.

📒 Files selected for processing (1)
  • client/internal/sleep/detector_darwin.go

@lixmal lixmal merged commit 407e9d3 into main Apr 29, 2026
58 of 62 checks passed
@lixmal lixmal deleted the sleep-detector-daemon-purego branch April 29, 2026 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants