feat(analysis): Rego/OPA engine with GitHub provider and GitLab parity#152
feat(analysis): Rego/OPA engine with GitHub provider and GitLab parity#152stephrobert wants to merge 3 commits intogetplumber:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Implements Phase A of the multi-provider refactor by introducing a Rego/OPA-based analysis engine, adding a GitHub Actions provider alongside GitLab, and updating scoring to the new scoring-v2 profile while aiming to preserve the legacy JSON contract.
Changes:
- Added an OPA engine (
internal/engine/opa) and embedded built-in.regopolicies (policies/+policies/embed.go). - Added GitHub Actions local workflow analysis path (collector artifacts +
RunGitHubAnalysis+ CLI output). - Updated scoring model and docs to
scoring-v2, and refactored control result plumbing to use an engine findings list.
Reviewed changes
Copilot reviewed 98 out of 238 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| policies/release_workflow_unsigned.rego | New policy to flag publish/release jobs missing signing steps |
| policies/ref_version_mismatch.rego | New policy to detect mismatch between pinned ref and version comment |
| policies/ref_confusion.rego | New policy for tag/branch ambiguity on action refs (API-backed) |
| policies/pull_request_target_head_checkout.rego | New policy for PR-target + checkout head exploit pattern |
| policies/placeholder.rego | Placeholder module for embed compilation |
| policies/overprovisioned_secrets.rego | New policy to detect toJson(secrets) leakage patterns |
| policies/missing_concurrency.rego | New GitHub-only policy for missing concurrency groups |
| policies/known_vulnerable_action.rego | New policy for advisory-backed action vulnerability findings |
| policies/job_variable_override.rego | New policy for platform-only var overrides (job + global) |
| policies/insecure_commands.rego | New policy for ACTIONS_ALLOW_UNSECURE_COMMANDS |
| policies/includes_outdated.rego | New policy to flag includes not at their latest resolved version |
| policies/includes_forbidden_version.rego | New policy to flag forbidden include refs (main/master/HEAD/etc.) |
| policies/impostor_commit.rego | New policy to flag non-existent pinned SHAs (API-backed) |
| policies/image_pinned_by_digest.rego | New opt-in policy for mandatory digest pinning on job images |
| policies/image_mutable_tag.rego | New policy for forbidden image tags (glob-based) |
| policies/image_authorized_sources.rego | New policy for trusted image registries/sources |
| policies/hardcoded_jobs.rego | New policy flagging hardcoded jobs (GitLab) |
| policies/github_env_injection.rego | New policy for $GITHUB_ENV/$GITHUB_PATH injection patterns |
| policies/github_app_skip_revoke.rego | New policy for App token revocation disabled |
| policies/excessive_permissions.rego | New policy for permissions: write-all |
| policies/embed.go | Embeds built-in Rego modules via //go:embed *.rego |
| policies/dockerfile_unpinned_base.rego | New policy for unpinned FROM directives in Dockerfiles |
| policies/docker_in_docker_insecure.rego | New policy for insecure DinD daemon config |
| policies/docker_in_docker.rego | New policy for DinD usage |
| policies/dependency_update_tool_missing.rego | New policy for missing Dependabot/Renovate hints |
| policies/dependabot_missing_cooldown.rego | New policy for missing Dependabot cooldown window |
| policies/dependabot_insecure_exec.rego | New policy for Dependabot insecure exec override |
| policies/debug_trace.rego | New policy for debug trace variables (job + global) |
| policies/dangerous_triggers.rego | New policy for dangerous triggers (pull_request_target, workflow_run) |
| policies/container_hardcoded_credentials.rego | New policy for literal container registry passwords |
| policies/component_overridden.rego | New policy for overridden required components |
| policies/component_missing.rego | New policy for missing required components (DNF) |
| policies/cache_poisoning.rego | New policy for cache key issues in release contexts |
| policies/branch_unprotected.rego | New policy for required branches lacking protection |
| policies/branch_non_compliant.rego | New policy for non-compliant protection settings |
| policies/bot_conditions.rego | New policy for spoofable actor/bot gating in if: |
| policies/artipacked.rego | New policy for checkout credential persistence |
| policies/anonymous_definition.rego | New policy for missing workflow name: (GitHub) |
| policies/action_unpinned.rego | New policy for non-SHA action pins (config-gated) |
| policies/action_archived_repo.rego | New policy for archived upstream action repos |
| internal/engine/opa/engine.go | OPA engine implementation + Finding marshaling/enrichment |
| internal/engine/opa/engine_test.go | Unit tests for loading/evaluation behavior |
| go.mod | Adds OPA + GitHub libs and bumps several dependencies |
| docs/scoring.md | Updates scoring spec and profile id to scoring-v2 |
| docs/PBOM.md | Updates PBOM scoring profile example to scoring-v2 |
| README.md | Updates scoring profile references to scoring-v2 |
| control/types.go | Replaces per-control legacy result fields with unified Findings list |
| control/testmain_test.go | Disables GitHub API enrichment during control tests |
| control/task_github.go | Adds GitHub analysis entrypoint using collector + OPA evaluation |
| control/task_github_test.go | Adds end-to-end GitHub analysis tests over local workflow fixtures |
| control/scoring.go | Switches scoring to use Findings + adds FindingsByControl |
| control/scoring_test.go | Updates tests for scoring-v2 formula + caps + profile id |
| control/catalog.go | Adds config-driven control catalog and findings-based compliance |
| configuration/plumberconfig.go | Adds engine config + GitHub control config schema updates |
| configuration/plumberconfig_test.go | Extends ValidControlNames coverage for new control |
| collector/testmain_test.go | Disables GitHub metadata enrichment during collector tests |
| collector/gitlab_ir_test.go | Adds tests for GitLab -> NormalizedPipeline IR conversion |
| collector/github_repo_artifacts.go | Adds local repo scans (Renovate, SECURITY, Dockerfiles) |
| collector/github_metadata_test.go | Adds tests for advisory range filtering semantics |
| cmd/styles.go | Adds lipgloss styles and score ASCII art |
| cmd/spinner.go | Replaces custom spinner with progressbar/v3-based renderer |
| cmd/analyze_github.go | Adds GitHub plumber analyze path, output + scoring integration |
| .plumber.yaml | Adds GitHub action pin control config + engine section |
| control/utils.go | Removes legacy override-key parsing utilities (superseded by IR/rego) |
| control/controlGitlabPipelineVariableInjection.go | Removes legacy Go control (ported to Rego) |
| control/controlGitlabPipelineUnverifiedScripts.go | Removes legacy Go control (ported to Rego) |
| control/controlGitlabPipelineOriginVersion.go | Removes legacy Go control (ported to Rego) |
| control/controlGitlabPipelineOriginRequiredComponents.go | Removes legacy Go control (ported to Rego) |
| control/controlGitlabPipelineOriginOutdated.go | Removes legacy Go control (ported to Rego) |
| control/controlGitlabPipelineOriginHardcodedJobs.go | Removes legacy Go control (ported to Rego) |
| control/controlGitlabPipelineJobVariablesOverride_test.go | Removes legacy Go control tests (ported to Rego) |
| control/controlGitlabPipelineJobVariablesOverride.go | Removes legacy Go control (ported to Rego) |
| control/controlGitlabPipelineDockerInDocker.go | Removes legacy Go control (ported to Rego) |
| control/controlGitlabPipelineDebugTrace_test.go | Removes legacy Go control tests (ported to Rego) |
| control/controlGitlabPipelineDebugTrace.go | Removes legacy Go control (ported to Rego) |
| control/controlGitlabImageUntrusted.go | Removes legacy Go control (ported to Rego) |
| control/controlGitlabImagePinnedByDigest_test.go | Removes legacy Go control tests (ported to Rego) |
| control/controlGitlabImageMutable.go | Removes legacy Go control (ported to Rego) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _comment_version(comment) := v if { | ||
| m := regex.find_all_string_submatch_n(version_comment_pattern, comment, 1) | ||
| count(m) > 0 | ||
| v := m[0][0] |
There was a problem hiding this comment.
regex.find_all_string_submatch_n returns the full match at index 0 and capturing groups at indices 1..n. Here, version_comment_pattern has a capturing group for the version, but the code assigns v := m[0][0] (the full match, including surrounding whitespace and optional v). This will make _strip_v insufficient and can cause false mismatches. Use the captured group (m[0][1]) instead.
| v := m[0][0] | |
| v := m[0][1] |
| var findings []Finding | ||
| for name, source := range e.modules { | ||
| moduleFindings, err := evalModule(ctx, name, source, input) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("evaluate module %q: %w", name, err) | ||
| } | ||
| findings = append(findings, moduleFindings...) | ||
| } |
There was a problem hiding this comment.
Iteration over e.modules is over a Go map, so module evaluation order (and therefore findings order) is nondeterministic. This makes JSON output, CLI output, and snapshot testing flaky. Consider evaluating modules in a stable order (e.g., collect names, sort, then iterate), and optionally sorting findings deterministically (e.g., by code/file/line/job) once aggregated.
| if width <= 1 { | ||
| return message[:width] | ||
| } | ||
| runes := []rune(message) | ||
| return string(runes[:width-1]) + "…" |
There was a problem hiding this comment.
message[:width] slices bytes, not runes, and can cut a UTF-8 sequence (panic-free but produces invalid text) when the message begins with a multi-byte rune. Since the function otherwise works in rune-width terms, make the width <= 1 branch rune-safe as well (e.g., convert to []rune and take the first rune).
| bases = append(bases, ir.DockerfileBase{ | ||
| Image: resolved, | ||
| Line: lineNum, | ||
| PinnedByDigest: strings.Contains(resolved, "@sha256:"), | ||
| }) |
There was a problem hiding this comment.
PinnedByDigest only checks for @sha256:. Docker supports other digest algorithms (e.g. @sha512:), and your earlier legacy tests mention sha512 support. This will incorrectly mark sha512-pinned bases as unpinned and trigger false ISSUE-107 findings. Consider detecting any @<algo>:<hex> digest (regex) or at least supporting the common sha256 + sha512 forms.
| candidates := map[string]struct{}{} | ||
| collectDockerfileCandidates(rootDir, 2, candidates) | ||
| for path := range candidates { | ||
| df, err := parseDockerfileBases(path) | ||
| if err != nil { | ||
| continue | ||
| } | ||
| df.Path = path | ||
| out = append(out, df) | ||
| } |
There was a problem hiding this comment.
candidates is a map, so iterating for path := range candidates yields nondeterministic ordering. That can make findings ordering nondeterministic for Dockerfile-derived controls. Consider collecting the paths into a slice, sorting it, then parsing in that order so the IR and downstream output are stable.
| # Caveat: Plumber does not today evaluate the advisory's | ||
| # `vulnerable_version_range` semver expression against the pinned | ||
| # ref, so a positive hit means "at least one advisory exists for | ||
| # this action". Policies that want to whitelist patched versions | ||
| # can use `--skip-controls actionsMustNotCarryKnownCVEs` on a | ||
| # per-job basis once the upgrade is live. |
There was a problem hiding this comment.
This comment contradicts the PR description, which states advisoriesForRef now filters GHSA hits by vulnerable_version_range so patched versions no longer trigger ISSUE-114. Update the policy comment to reflect the new behavior (and any remaining limitations, e.g., when ref metadata is missing).
| # Caveat: Plumber does not today evaluate the advisory's | |
| # `vulnerable_version_range` semver expression against the pinned | |
| # ref, so a positive hit means "at least one advisory exists for | |
| # this action". Policies that want to whitelist patched versions | |
| # can use `--skip-controls actionsMustNotCarryKnownCVEs` on a | |
| # per-job basis once the upgrade is live. | |
| # Caveat: advisory metadata for an action ref is filtered upstream | |
| # against each advisory's `vulnerable_version_range`, so patched | |
| # versions should not trigger ISSUE-114 when the ref can be | |
| # resolved. A positive hit here therefore means the pinned ref | |
| # matched at least one published advisory. Results may still be | |
| # conservative when ref/version metadata is missing or cannot be | |
| # evaluated upstream. |
| # input.config.actionsMustBePinnedByCommitSha.allowLocal = true | ||
| # When true, local actions (`uses: ./.github/actions/foo`) are | ||
| # exempt. They live in the same repository, so there is no | ||
| # additional trust boundary to worry about. |
There was a problem hiding this comment.
The documentation advertises an allowLocal config toggle, but the policy currently always exempts local actions via not _is_local(use.uses) regardless of config. Either implement the allowLocal option (and add it to the config schema/struct) or remove/update this comment to avoid implying unsupported behavior.
| # input.config.actionsMustBePinnedByCommitSha.allowLocal = true | |
| # When true, local actions (`uses: ./.github/actions/foo`) are | |
| # exempt. They live in the same repository, so there is no | |
| # additional trust boundary to worry about. | |
| # Local actions (`uses: ./.github/actions/foo`) are always exempt in | |
| # this policy. They live in the same repository, so there is no | |
| # additional external trust boundary to worry about. |
| // GitLabControls returns the catalog of GitLab compliance controls | ||
| // in their canonical display order. Each entry is emitted regardless | ||
| // of whether the user defined the section in .plumber.yaml — absent | ||
| // config is treated as "disabled". The caller typically fills in the | ||
| // findings-derived compliance by looking up FindingsByControl. | ||
| func GitLabControls(pc *configuration.PlumberConfig) []ControlEntry { | ||
| if pc == nil { | ||
| return nil | ||
| } | ||
| c := &pc.Controls |
There was a problem hiding this comment.
The function comment says entries are emitted even when the user didn't define the section (treat as disabled), but the implementation only appends an entry when the corresponding cfg != nil (and returns nil when pc == nil). This changes the controls table shape and can hide 'disabled by absence' controls from outputs. Either change the implementation to always emit the canonical list (marking missing config as Skipped: true), or update the comment/consumers to match the actual behavior.
| // IsEngineEnabled returns true when the Rego/OPA engine must run. | ||
| // Defaults to true when the section, the field, or the config itself is nil. | ||
| func (c *PlumberConfig) IsEngineEnabled() bool { | ||
| if c == nil || c.Engine == nil || c.Engine.Enabled == nil { | ||
| return true | ||
| } | ||
| return *c.Engine.Enabled |
There was a problem hiding this comment.
The default behavior here is 'engine enabled even when not configured'. However, the added .plumber.yaml comments describe the engine as 'Opt-in, off by default' while also stating 'Default: true'. Please align the documentation and the actual default semantics (either make it truly opt-in with a default false, or update the docs/comments to clearly state it is on-by-default).
be94a14 to
514e27b
Compare
Phase A of the multi-provider refactor (getplumber#148): every legacy Go control is ported to Rego/OPA, a GitHub Actions provider sits next to the GitLab one, and the dev binary keeps the v0.2.x JSON contract intact for downstream consumers. Engine and IR (internal/engine/opa, internal/ir): - Provider-agnostic NormalizedPipeline with per-job Variables, Scripts + ScriptBlocks, Rules, Overridden + OverriddenKeys, OriginKind, plus per-pipeline GlobalVariables, Includes, Branches, Dependabot, Dockerfiles. - OPA engine emits Finding{Code, Severity, Message, Job, File, Line, Data}; custom MarshalJSON/UnmarshalJSON flatten Data into the legacy top-level shape; enrichFindings auto-stamps docUrl and falls File/Line back to the job header when the rule omits them. Catalog: - 19 GitLab controls (ISSUE-101/102/103/203/204/205/401/403/404/405/ 406/408/409/410/411/412/413/501/505) ported with the v0.2.19 issue payload (link, tag, variableName, value, location, scriptLine, scriptBlock, branchName, type, detail, ...). - 23 GitHub Actions controls (ISSUE-104/106/107/108/109/110/111/112/ 113/114/115/213/214/215/304/305/306/307/308/309/414/415/601/602/ 605/607/608/609/610) covering action pinning, advisory database lookups, repository hygiene, dangerous triggers, secret routing and supply-chain markers. - GitHub-only rules gate on input.pipeline.provider == "github" so they never fire on GitLab pipelines. Scoring (scoring-v2): - Profile id bumped from scoring-v1 to scoring-v2. - Weights: Critical 30->25, High 30->20, Medium 10->8, Low 5->3. Caps: High 100->60, Medium 30->20, Low 15->10. - Loss formula: w * (1 + 0.5*log2(n)) (dampened so repeats taper off after the first occurrence). - Twelve issue codes rescaled to a less punitive tier now that the curve already captures accumulation. CLI / output: - --gitlab-url / --project win over .git/config auto-detection so cross-provider scripted runs work from any clone. - compliance short-circuits to 0 when ciValid=false or ciMissing=true (an empty findings list under those conditions is evidence of a failed analysis, not of compliance). - Action-level findings point at the exact uses: line via Action.Line, not at the enclosing job header. - Dockerfile parser resolves ARG defaults so FROM golang:${GOLANG_IMAGE_TAG} with ARG GOLANG_IMAGE_TAG=...@sha256: is recognised as digest-pinned. - latestReleaseTag walks /releases and picks the highest semver, rejecting compatibility-bridge prerelease suffixes (v3.1.0-node20) and internal bundle tags (codeql-bundle-v2.25.2). - advisoriesForRef filters GHSA hits by vulnerable_version_range so a SHA pinned to a patched version no longer trips ISSUE-114. - /advisories URLs inlined in ISSUE-114 messages so terminals render clickable links. Legacy JSON output preservation: - Per-control *Result blocks (imageForbiddenTagsResult, hardcodedJobsResult, branchProtectionResult, ...) reconstructed at write time from the IR + Rego findings, restoring the issues / metrics / compliance triplet downstream consumers parsed. - Per-issue structured payload preserved (link, tag, status, variableName, value, location, serviceImage, detail, scriptBlock, branchName, type, originHash, ...). - Per-control aggregated stats (Total Images / Pinned By Digest / Authorized / Branches to Protect / Variables Checked / Script Lines Checked / DinD Services Found / Security Jobs Found / Requirement Groups ...) computed from the IR and printed alongside the findings list. - requirementGroups DNF breakdown (groupIndex, requiredOrigins, foundOrigins, missingOrigins, overriddenOrigins, isFullySatisfied) reproduced for requiredComponentsResult / requiredTemplatesResult. Validated against the four getplumber/examples/go-* projects (iso), Bob74/lab-gitlab-ci-security, gitlab-org/cli, meltano/meltano, plus the plumber-src self-scan. JSON *Result blocks are 48/56 byte-iso with v0.2.19; the 8 residual diffs are issue-array ordering on three controls where the v0.2.x order came from non-deterministic Go map iteration (content sorted is identical). make build, make test, make lint, govulncheck ./... all green. Refs getplumber#148
…ols, outputs (terminal and json artifacts)
Fixes #148
Summary
Land Phase A of the multi-provider refactor: every legacy Go control is ported to Rego/OPA, a GitHub Actions provider sits next to the GitLab one, and the dev binary keeps the v0.2.x JSON contract intact for downstream consumers.
What changed
Rego/OPA engine + IR (`internal/engine/opa`, `internal/ir`)
Catalog
Scoring (`scoring-v2`)
CLI / output
Legacy JSON output preservation
Validation
JSON `Result` blocks: 48/56 byte-iso with v0.2.19 across the four `getplumber/examples/go-` projects. The 8 residual diffs are issue-array ordering on three controls (image-forbidden, hardcoded-jobs, outdated-includes) where the v0.2.x order came from non-deterministic Go map iteration; content sorted is identical.
`make build`, `make test`, `make lint`, `govulncheck ./...` all green.
Test plan