feat(hosting): add Helm chart for Agenta OSS Kubernetes deployment by endoze · Pull Request #3852 · Agenta-AI/agenta

endoze · 2026-02-26T22:23:54Z

Enable self-hosted Kubernetes deployments as an alternative to docker-compose. The chart packages all Agenta OSS components (API, web, services, workers, cron, Redis, SuperTokens, PostgreSQL) with Bitnami PostgreSQL as a subchart dependency, Alembic migrations as a pre-install/pre-upgrade hook, and an optional ingress resource. Includes a CI workflow to publish the chart to GHCR on changes.

CLAassistant · 2026-02-26T22:24:01Z

All committers have signed the CLA.

vercel · 2026-02-26T22:24:02Z

@endoze is attempting to deploy a commit to the agenta projects Team on Vercel.

A member of the Team first needs to authorize it.

mmabrouk

Thank you for putting this together. This is a solid Helm chart that correctly models our docker-compose architecture. The dual Redis setup, the existingSecret pattern, and the external database support are all done well. I appreciate the comprehensive documentation updates too.

I reviewed the chart against Helm community best practices and tested it locally. Below are my findings.

What I Did

I compared this PR against our current docker-compose infrastructure and an older Helm chart attempt (PR #2775). I also ran the chart through multiple validation layers: helm lint, helm template with dry-run, and a full install in a Kind cluster.
The lint and template steps passed. The cluster install revealed two issues that block deployment.

Critical Issues

1. Helm hook ordering causes a deadlock
The Alembic job uses pre-install,pre-upgrade hooks. However, it depends on PostgreSQL, which is part of the main release. Helm runs hooks before the main release. This means Alembic waits for a PostgreSQL that does not exist yet.
The install times out after 10 minutes with the init container stuck waiting.

To fix this the agent suggest the following:

Change the hook to post-install,post-upgrade in templates/alembic-job.yaml:

annotations:
  helm.sh/hook: post-install,post-upgrade
  helm.sh/hook-weight: "0"

2. PostgreSQL image tag does not exist
The bundled Bitnami PostgreSQL subchart (v16.4.16) defaults to an image tag that has been removed from Docker Hub:

Failed to pull image "docker.io/bitnami/postgresql:17.4.0-debian-12-r4": not found

Best Practice Improvements

These are not blockers, but they would strengthen the chart.

Security contexts. The chart defaults to empty security contexts, which means containers run as root. The Helm community recommends setting secure defaults:

securityContext:
  runAsNonRoot: true
  allowPrivilegeEscalation: false
  capabilities:
    drop: [ALL]
  seccompProfile:
    type: RuntimeDefault

Image tags default to latest. This makes deployments unpredictable. Consider defaulting to .Chart.AppVersion instead:

{{ .Values.api.image.tag | default .Chart.AppVersion }}

No values.schema.json. A JSON Schema catches misconfiguration at install time rather than at runtime. This is especially helpful for required fields like secrets.agentaAuthKey.

PostgreSQL password sync. Users must set both secrets.postgresPassword and postgresql.auth.password to the same value. If they mismatch, the app cannot connect. Consider wiring the subchart to use the chart-managed secret via postgresql.auth.existingSecret.

No startup probes. The deployments have liveness and readiness probes, but no startup probes. If the API takes longer than 30 seconds to start, Kubernetes will kill it. Startup probes give slow-starting containers more time.

Empty resource defaults. All components default to resources: {}. This means pods get "BestEffort" QoS class and are first to be evicted under memory pressure. Consider adding suggested defaults or a production values example.

Missing .helmignore. Without this file, the packaged chart may include unnecessary files.

No lint step in CI. The GitHub Actions workflow packages and pushes to GHCR, but it does not run helm lint or ct lint. Adding these steps would catch issues before publishing.

endoze · 2026-02-27T15:58:25Z

Thank you very much for such quick feedback on my contribution!

I've updated my commit to address your feedback. One major thing to note, I swapped the bitnami Postgres chart to a newer version which will deploy a newer version of Postgres as well. My cursory look through the codebase led me to think this is a safe upgrade but I'm curious of your thoughts on this. As for why I upgraded it, bitnami only keeps around so many old tags before they clean things up so I chose a much newer version of things to prolong its viability. I can adjust this as needed however to use the new chart version and default to a specific version of Postgres as necessary to meet the project's database needs.

Let me know if you find any other issues and I'll do my best to address them.

mmabrouk · 2026-03-01T15:52:06Z

Follow-up: Full Cluster Testing Results

I deployed the chart on a k3s cluster (v1.33) and tested end-to-end in a browser. Thank you for addressing all the points from my first review -- the post-install hook, PostgreSQL upgrade, security contexts, startup probes, values schema, lint CI step, and shared PostgreSQL secret all look good.

The chart works, but I found three bugs and one documentation gap during testing. Details below.

Bugs Found

1. appVersion is missing the v prefix (image pull fails)

Chart.yaml has appVersion: "0.86.8", but the GHCR images are tagged v0.86.8 (with the v). A default install without explicit image tag overrides will fail with ImagePullBackOff because the tag 0.86.8 does not exist.

Fix: change appVersion: "0.86.8" to appVersion: "v0.86.8" in Chart.yaml.

2. Web container is unreachable (Next.js binding)

Next.js 15 defaults to binding on the pod hostname, not 0.0.0.0. Health probes and ingress traffic connect via localhost or the pod IP, so they cannot reach the web server. All readiness/liveness probes fail and the pod enters CrashLoopBackOff.

Fix: add HOSTNAME=0.0.0.0 to the web deployment env vars in templates/web-deployment.yaml:

env:
  - name: HOSTNAME
    value: "0.0.0.0"

3. runAsNonRoot: true crashes all pods

The security contexts set runAsNonRoot: true, but our Docker images currently run as root (USER is not set in the Dockerfiles). Every pod fails immediately with a security context violation.

Short-term fix: change the default to runAsNonRoot: false in values.yaml for all components.

Long-term: we need to update our Dockerfiles to run as non-root (tracked in #3868). Once that ships, the chart can flip back to true.

Nginx Ingress: Paths Need Regex Capture Groups

The ingress template uses plain paths (/api, /services, /) with pathType: Prefix. This works with Traefik's StripPrefix middleware, but not with nginx's rewrite-target annotation.

For nginx, the rewrite-target: /$1 annotation requires regex capture groups in the paths. Without them, $1 is empty and everything rewrites to /, causing a redirect loop on the web frontend.

The docs correctly tell nginx users to set rewrite-target and use-regex annotations, but the chart's hardcoded paths won't work with those annotations. Users would need to manually patch the ingress paths to:

/api/(.*)
/services/(.*)
/(.*)

with pathType: ImplementationSpecific.

This is tricky to fix in the template since Traefik needs Prefix paths and nginx needs ImplementationSpecific regex paths. One option: add a ingress.pathOverrides value, or detect the className and switch path styles. Or just document it clearly for now and fix in a follow-up.

Testing Summary

Test	Result
`helm lint`	Pass
`helm template --dry-run`	Pass
Cluster install (all 11 pods)	Pass (with the three fixes above)
`helm test`	Pass
Migration job (Alembic)	Completed successfully
Web UI in browser	Works (login, navigation)
API health	`{"status":"ok"}`
Services health	200 OK

Cluster: k3s v1.33 on Hetzner, nginx ingress controller, all images pulled with tag: latest.

I pushed a commit with all three fixes plus documentation improvements to your branch.

mmabrouk · 2026-03-01T16:07:34Z

Follow-up: Configurable ingress paths for NGINX support

Pushed a second commit (1c80b77) that makes ingress paths configurable via values.yaml.

Problem

The ingress template hardcoded Prefix paths (/api, /services, /). This works with Traefik's StripPrefix middleware, but NGINX Ingress Controller needs regex capture groups in the paths for rewrite-target to work. Without them, $1 is empty and the web frontend gets stuck in a redirect loop.

Fix

Added ingress.paths.{api,services,web} to values, each with path and pathType. Defaults are unchanged (Prefix), so Traefik setups are not affected.

NGINX users override like this:

ingress:
  className: "nginx"
  annotations:
    nginx.ingress.kubernetes.io/use-regex: "true"
    nginx.ingress.kubernetes.io/rewrite-target: /$1
  paths:
    api:
      path: /api/(.*)
      pathType: ImplementationSpecific
    services:
      path: /services/(.*)
      pathType: ImplementationSpecific
    web:
      path: /(.*)
      pathType: ImplementationSpecific

Verified

Upgraded the chart on the test cluster (k3s + NGINX Ingress Controller) with the new path overrides. All routes work:

Web: 200 (follows redirect from / to /w)
API: {"status":"ok"}
Services: 200

Also updated values.schema.json and the Kubernetes deployment docs with the new fields and a complete NGINX example.

endoze · 2026-03-02T14:47:28Z

@mmabrouk Do you want me to address Devin-ai's latest comment (which does indeed seem to be a logical hole) or did you want to? Happy to do so but I don't want to step on anyone's toes 😄

mmabrouk · 2026-03-02T15:45:10Z

@endoze I'd be thankful if you did :)

Enable self-hosted Kubernetes deployments as an alternative to docker-compose. The chart packages all Agenta OSS components (API, web, services, workers, cron, Redis, SuperTokens, PostgreSQL) with Bitnami PostgreSQL as a subchart dependency, Alembic migrations as a pre-install/pre-upgrade hook, and an optional ingress resource. Includes a CI workflow to publish the chart to GHCR on changes.

- Fix appVersion to use v-prefixed tag (v0.86.8) matching GHCR images - Add HOSTNAME=0.0.0.0 to web deployment so Next.js binds to all interfaces - Change runAsNonRoot default to false (images currently run as root) - Document PostgreSQL secret name dependency on release name - Document ingress className default (traefik) with override instructions

The ingress template previously hardcoded Prefix paths which only work with Traefik. NGINX Ingress Controller requires regex capture groups and ImplementationSpecific pathType for rewrite-target to work. Add ingress.paths.{api,services,web} to values.yaml so users can override path patterns and pathType per backend. Defaults remain Prefix (backward compatible with Traefik). Update docs with the full nginx configuration including path overrides.

When users provide a pre-created Kubernetes Secret via secrets.existingSecret, the Bitnami PostgreSQL subchart silently fails to find the password unless global.postgresql.auth.existingSecret is also pointed at the same secret. This adds a fail-fast validation template and clearer NOTES.txt guidance so users get an actionable error at install time instead of a broken deployment.

endoze · 2026-03-03T02:41:46Z

@mmabrouk I've rebased the branch off the latest from main as well as addressed the last bit of feedback from Devin-ai's review. Let me know if you need anything else on this branch.

vercel · 2026-03-03T11:14:18Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Mar 3, 2026 11:55am

mmabrouk · 2026-03-03T11:18:00Z

@all-contributors please add @endoze for infrastructure and docs and infra

allcontributors · 2026-03-03T11:18:11Z

@mmabrouk

I've put up a pull request to add @endoze! 🎉

mmabrouk

Many thanks @endoze this looks great!

@jp-agenta lgtm from my side, I did a final test locally on k3 and it worked all fine.

mmabrouk · 2026-03-03T11:54:25Z

Hey @endoze feel free to share your linkedin or twitter if you would like to be mentioned in a post when we merge this

endoze · 2026-03-04T01:12:00Z

@mmabrouk Just my GitHub if you'd like. I also sent over a pull request to handle running the containers as non-root as a compliment to this one. #3899 should enable the ability to harden the defaults in the helm chart.

jp-agenta · 2026-03-06T16:19:57Z

Hey @endoze,

Thank you for contributing, and specifically for this PR. 🚀

Only two open issues (caught by our agents) before we can merge, Leaving aside the root/non-root stuff.

1. `postgresql-auth-secret.yaml` is missing hook annotations

It looks like the Bridge secret for Bitnami subchart has no Helm hook lifecycle

secrets.yaml is a pre-install,pre-upgrade hook with helm.sh/resource-policy: keep.
postgresql-auth-secret.yaml has none of these annotations. It is a plain resource managed in the default Helm sync wave.

Two failure modes. First, Helm does not guarantee resource ordering within the same sync wave. The Bitnami PostgreSQL StatefulSet may attempt to start before this secret exists, causing the init container to fail to read POSTGRES_PASSWORD. The pod enters CrashLoopBackOff until the secret appears. Second, helm uninstall deletes this secret (since it has no resource-policy: keep), but the main secret survives (it does have keep). A subsequent helm install fails because the main secret still exists in the namespace while the pgauth secret is gone and the Bitnami subchart cannot read credentials. The user must manually delete the orphaned main secret or recreate the pgauth secret before re-installing.

Suggestion: Add the same hook annotations as secrets.yaml:

annotations:
  helm.sh/hook: pre-install,pre-upgrade
  helm.sh/hook-weight: "-5"
  helm.sh/resource-policy: keep

This ensures the pgauth secret is created before any release resources and survives helm uninstall alongside the main secret.

2. Release name mismatch with `pgauth` secret default

The default global.postgresql.auth.existingSecret value only works for release name agenta

global.postgresql.auth.existingSecret defaults to the hardcoded string agenta-pgauth. The Bitnami PostgreSQL subchart reads this value as a plain string at template render time and cannot evaluate Helm template expressions. The values.yaml comments explain this and advise users with non-default release names to override the value. The existing _validations.tpl only validates the existingSecret case, not the release name mismatch.

A user who runs helm install myrelease hosting/helm/agenta-oss ... gets a chart where every resource is named myrelease-agenta-oss-* except the pgauth secret, which is still named agenta-pgauth. This works (both sides agree on the name) but creates a naming inconsistency that is confusing when inspecting resources. More importantly, two releases in the same namespace would collide on the same agenta-pgauth secret name.

Suggestion: Add a validation in _validations.tpl that detects when the release name would cause a fullname different from agenta-agenta-oss while the pgauth secret still has its default value:

{{- if and .Values.postgresql.enabled
          (not .Values.secrets.existingSecret)
          (ne (include "agenta.fullname" .) "agenta-agenta-oss")
          (eq .Values.global.postgresql.auth.existingSecret "agenta-pgauth") }}
{{- fail "..." }}
{{- end }}

This catches the mismatch at helm install time rather than producing a silent naming inconsistency.

devin-ai-integration

Devin Review found 1 new potential issue.

View 22 additional findings in Devin Review.

devin-ai-integration · 2026-03-06T16:53:36Z

docs/docs/self-host/guides/03-deploy-to-kubernetes.mdx

+```bash
+helm install agenta hosting/helm/agenta-oss \
+  --namespace agenta --create-namespace \
+  --set secrets.agentaAuthKey=$AG_AUTH_KEY \
+  --set secrets.agentaCryptKey=$AG_CRYPT_KEY \
+  --set secrets.postgresPassword=$PG_PASS \
+  --set postgresql.auth.password=$PG_PASS
+```
+
+:::info
+`secrets.postgresPassword` and `postgresql.auth.password` must match. The first is used by the application pods; the second is used by the Bitnami PostgreSQL subchart to set the database password.
+:::


🟡 Documentation instructs setting postgresql.auth.password which is silently ignored

The quick start install command includes --set postgresql.auth.password=$PG_PASS and states that secrets.postgresPassword and postgresql.auth.password must match because "the second is used by the Bitnami PostgreSQL subchart to set the database password." This is factually incorrect. The chart's values.yaml:26 sets global.postgresql.auth.existingSecret: "agenta-pgauth", which causes the Bitnami subchart to read the password exclusively from the pgauth secret (created by hosting/helm/agenta-oss/templates/postgresql-auth-secret.yaml from secrets.postgresPassword). When existingSecret is configured, the Bitnami chart completely ignores auth.password. This means --set postgresql.auth.password=$PG_PASS is a no-op.

If a user later does helm upgrade and only changes postgresql.auth.password (believing it updates the database password), the actual password remains unchanged. Or if a user sets them to different values, the database uses secrets.postgresPassword while the user believes postgresql.auth.password controls it, causing operational confusion during debugging.

Suggested change

```bash

helm install agenta hosting/helm/agenta-oss \

--namespace agenta --create-namespace \

--set secrets.agentaAuthKey=$AG_AUTH_KEY \

--set secrets.agentaCryptKey=$AG_CRYPT_KEY \

--set secrets.postgresPassword=$PG_PASS \

--set postgresql.auth.password=$PG_PASS

```

:::info

`secrets.postgresPassword` and `postgresql.auth.password` must match. The first is used by the application pods; the second is used by the Bitnami PostgreSQL subchart to set the database password.

:::

```bash

helm install agenta hosting/helm/agenta-oss \

--namespace agenta --create-namespace \

--set secrets.agentaAuthKey=$AG_AUTH_KEY \

--set secrets.agentaCryptKey=$AG_CRYPT_KEY \

--set secrets.postgresPassword=$PG_PASS

:::info
secrets.postgresPassword is used both by the application pods and by the Bitnami PostgreSQL subchart (via the chart-managed pgauth secret).
:::

 <a href="https://app.devin.ai/review/agenta-ai/agenta/pull/3852" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a>  --- *Was this helpful? React with 👍 or 👎 to provide feedback.*

@endoze ☝️

jp-agenta · 2026-03-06T16:53:57Z

On another note,

You prompted us to start community projects. And Kubernetes is the first one : Kubernetes (community).

Thank you @endoze !

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Feb 26, 2026

dosubot bot added ci/cd feature labels Feb 26, 2026

mmabrouk requested review from jp-agenta and mmabrouk February 26, 2026 23:27

This comment was marked as resolved.

Sign in to view

mmabrouk requested changes Feb 27, 2026

View reviewed changes

endoze force-pushed the feat-add-helm-chart-for-agenta-oss branch from 4a1e166 to e499e94 Compare February 27, 2026 15:53

This comment was marked as resolved.

Sign in to view

endoze force-pushed the feat-add-helm-chart-for-agenta-oss branch from e499e94 to ee80414 Compare February 28, 2026 03:12

This comment was marked as resolved.

Sign in to view

endoze force-pushed the feat-add-helm-chart-for-agenta-oss branch from ee80414 to 9cf74d2 Compare February 28, 2026 16:09

mmabrouk mentioned this pull request Mar 1, 2026

Make Docker images run as non-root user #3868

Open

This comment was marked as resolved.

Sign in to view

endoze and others added 4 commits March 2, 2026 21:39

endoze force-pushed the feat-add-helm-chart-for-agenta-oss branch from 3f1dd6e to 26291b7 Compare March 3, 2026 02:40

allcontributors bot mentioned this pull request Mar 3, 2026

docs: add endoze as a contributor for infra, doc, and infra #3890

Merged

vercel bot deployed to Preview March 3, 2026 11:19 View deployment

mmabrouk approved these changes Mar 3, 2026

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 3, 2026

docs(hosting): add community-maintained beta notice to Kubernetes guide

0a8ba1d

vercel bot deployed to Preview March 3, 2026 11:55 View deployment

Merge branch 'main' into feat-add-helm-chart-for-agenta-oss

1d6715a

jp-agenta added runtime/kubernetes Kubernetes runtime maintenance/community Primarily under community maintenance support/experimental Experimental or best-effort repository surface and removed runtime/kubernetes Kubernetes runtime labels Mar 6, 2026

github-project-automation bot added this to Kubernetes (community) Mar 6, 2026

github-project-automation bot moved this to Todo in Kubernetes (community) Mar 6, 2026

jp-agenta moved this from Todo to In Progress in Kubernetes (community) Mar 6, 2026

Merge branch 'main' into feat-add-helm-chart-for-agenta-oss

3717672

devin-ai-integration bot reviewed Mar 6, 2026

View reviewed changes

Conversation

endoze commented Feb 26, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Feb 26, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

mmabrouk left a comment

Choose a reason for hiding this comment

What I Did

Critical Issues

Best Practice Improvements

Uh oh!

endoze commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

mmabrouk commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Follow-up: Full Cluster Testing Results

Bugs Found

Nginx Ingress: Paths Need Regex Capture Groups

Testing Summary

Uh oh!

mmabrouk commented Mar 1, 2026

Follow-up: Configurable ingress paths for NGINX support

Problem

Fix

Verified

Uh oh!

This comment was marked as resolved.

Uh oh!

endoze commented Mar 2, 2026

Uh oh!

mmabrouk commented Mar 2, 2026

Uh oh!

endoze commented Mar 3, 2026

Uh oh!

vercel bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mmabrouk commented Mar 3, 2026

Uh oh!

allcontributors bot commented Mar 3, 2026

Uh oh!

mmabrouk left a comment

Choose a reason for hiding this comment

Uh oh!

mmabrouk commented Mar 3, 2026

Uh oh!

endoze commented Mar 4, 2026

Uh oh!

jp-agenta commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. postgresql-auth-secret.yaml is missing hook annotations

2. Release name mismatch with pgauth secret default

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

jp-agenta Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jp-agenta commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

endoze commented Feb 26, 2026 •

edited by devin-ai-integration bot

Loading

CLAassistant commented Feb 26, 2026 •

edited

Loading

endoze commented Feb 27, 2026 •

edited

Loading

mmabrouk commented Mar 1, 2026 •

edited

Loading

vercel bot commented Mar 3, 2026 •

edited

Loading

jp-agenta commented Mar 6, 2026 •

edited

Loading

1. `postgresql-auth-secret.yaml` is missing hook annotations

2. Release name mismatch with `pgauth` secret default

jp-agenta Mar 6, 2026 •

edited

Loading