Skip to content

cmdrunner: release process handle in _pidAlive to avoid pidfd leak#378

Open
texasich wants to merge 1 commit intohashicorp:mainfrom
texasich:fix/pidfd-leak-pidalive
Open

cmdrunner: release process handle in _pidAlive to avoid pidfd leak#378
texasich wants to merge 1 commit intohashicorp:mainfrom
texasich:fix/pidfd-leak-pidalive

Conversation

@texasich
Copy link
Copy Markdown

Description

_pidAlive in internal/cmdrunner/process_posix.go calls os.FindProcess(pid) and never releases the returned *os.Process. On Linux with Go 1.23+ that call now opens a pidfd under the hood (os.pidfdFindpidfd_open), so every invocation leaks one file descriptor.

pidWait polls once per second per plugin, so the leak scales linearly with plugin count × uptime. In the downstream Nomad report (hashicorp/nomad#27847) a client with a few hundred allocations saw the host-side FD count cycle between 20k and 130k, eventually tripping EMFILE and breaking CNI config loads, disk-stats collection, pipe2, and docker socket dials.

The fix is a one-liner: defer proc.Release() right after the FindProcess call, so the pidfd is closed on every return path. The Windows implementation already does the equivalent with defer syscall.CloseHandle(h) in process_windows.go, so this just brings the POSIX side in line.

Before:

pidfd_open(686713, 0) = 90771
pidfd_open(690846, 0) = 90772
pidfd_open(123738, 0) = 90774
...

(one new FD per poll iteration per plugin, never closed)

Related Issue

Downstream report: hashicorp/nomad#27847

How Has This Been Tested?

  • go build ./... and go vet ./... clean on both host (Windows) and GOOS=linux
  • go test ./internal/cmdrunner/... passes
  • Reviewed against the Windows _pidAlive which has always done the equivalent cleanup via defer syscall.CloseHandle(h), so behavior is symmetric across platforms

Caching the *os.Process on the runner would avoid the repeated FindProcess entirely and is probably the better long-term change, but it's a larger refactor touching Runner lifecycle. Keeping this PR to the minimal, backport-friendly fix; happy to follow up with the caching variant in a separate PR if preferred.

os.FindProcess on Linux with Go 1.23+ opens a pidfd, and pidWait polls
_pidAlive roughly once per second for every plugin process. Without a
matching Release the pidfd leaks on each poll, and under Nomad with a
few hundred allocations it adds up fast -- one reporter saw it cycle
between 20k and 130k open FDs until the process hit EMFILE.

Defer proc.Release() right after FindProcess so the handle is closed on
every return path. Mirrors the syscall.CloseHandle defer already used in
the Windows implementation.

Reported downstream in hashicorp/nomad#27847.
@texasich texasich requested a review from a team as a code owner April 22, 2026 03:05
@hashicorp-cla-app
Copy link
Copy Markdown

hashicorp-cla-app Bot commented Apr 22, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant