cmdrunner: release process handle in _pidAlive to avoid pidfd leak#378
Open
texasich wants to merge 1 commit intohashicorp:mainfrom
Open
cmdrunner: release process handle in _pidAlive to avoid pidfd leak#378texasich wants to merge 1 commit intohashicorp:mainfrom
texasich wants to merge 1 commit intohashicorp:mainfrom
Conversation
os.FindProcess on Linux with Go 1.23+ opens a pidfd, and pidWait polls _pidAlive roughly once per second for every plugin process. Without a matching Release the pidfd leaks on each poll, and under Nomad with a few hundred allocations it adds up fast -- one reporter saw it cycle between 20k and 130k open FDs until the process hit EMFILE. Defer proc.Release() right after FindProcess so the handle is closed on every return path. Mirrors the syscall.CloseHandle defer already used in the Windows implementation. Reported downstream in hashicorp/nomad#27847.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
_pidAliveininternal/cmdrunner/process_posix.gocallsos.FindProcess(pid)and never releases the returned*os.Process. On Linux with Go 1.23+ that call now opens a pidfd under the hood (os.pidfdFind→pidfd_open), so every invocation leaks one file descriptor.pidWaitpolls once per second per plugin, so the leak scales linearly with plugin count × uptime. In the downstream Nomad report (hashicorp/nomad#27847) a client with a few hundred allocations saw the host-side FD count cycle between 20k and 130k, eventually trippingEMFILEand breaking CNI config loads, disk-stats collection,pipe2, and docker socket dials.The fix is a one-liner:
defer proc.Release()right after theFindProcesscall, so the pidfd is closed on every return path. The Windows implementation already does the equivalent withdefer syscall.CloseHandle(h)inprocess_windows.go, so this just brings the POSIX side in line.Before:
(one new FD per poll iteration per plugin, never closed)
Related Issue
Downstream report: hashicorp/nomad#27847
How Has This Been Tested?
go build ./...andgo vet ./...clean on both host (Windows) andGOOS=linuxgo test ./internal/cmdrunner/...passes_pidAlivewhich has always done the equivalent cleanup viadefer syscall.CloseHandle(h), so behavior is symmetric across platformsCaching the
*os.Processon the runner would avoid the repeatedFindProcessentirely and is probably the better long-term change, but it's a larger refactor touchingRunnerlifecycle. Keeping this PR to the minimal, backport-friendly fix; happy to follow up with the caching variant in a separate PR if preferred.