chore(test): stabilize Playwright OSS tests with ephemeral projects and CI by mmabrouk · Pull Request #3925 · Agenta-AI/agenta

mmabrouk · 2026-03-05T18:38:00Z

Summary

Stabilize all 12 OSS Playwright acceptance tests (10 pass, 2 skip gracefully)
Add ephemeral project isolation — each test run creates a fresh project, runs tests in it, then cleans up. Prevents data accumulation from repeated CI runs.
Add CI workflow that runs Playwright tests after Railway preview deploy on every PR
Add test dimension tags (lens, cost, license) to all tests per testing documentation spec
Update design docs with coverage expansion plan based on analysis of 15 legacy BDD feature files

What changed

Ephemeral projects (`global-setup.ts` / `global-teardown.ts`)

Setup: saves original default project, creates e2e-{timestamp} project with make_default=true
All tests navigate to /apps which auto-redirects to the ephemeral project — zero test changes needed
Teardown: restores original default, deletes ephemeral project
Opt-out via AGENTA_EPHEMERAL_PROJECT=false for local dev

CI workflow (`.github/workflows/10-playwright-oss-tests.yml`)

Runs after Railway preview deploy completes
Installs Playwright, waits for deployment readiness, runs full suite
Uploads HTML report and failure artifacts
Wired into existing build pipeline: 06 (build) → 07 (deploy) → 10 (test)

Workflow 07 outputs

Added preview_url output so downstream jobs can use the deploy URL

CI secrets needed

PLAYWRIGHT_OSS_OWNER_EMAIL — OSS admin email for test auth
PLAYWRIGHT_OSS_OWNER_PASSWORD — OSS admin password for test auth

Test results (local, ephemeral project)

Result	Count	Details
Pass	6	smoke, 2x app creation, save prompt, prompt registry, model hub
Skip	2	API keys (needs setup), testsets (no data in fresh project)
Expected fail	4	Playground run + deployment + observability (no OpenAI key in fresh project — Phase 4 mock LLM will fix)

Test plan

Smoke test passes with ephemeral project
Full suite: 6 pass, 2 skip, 4 expected failures
Ephemeral project created and deleted correctly
Original default project restored after teardown
CI YAML validates
Configure PLAYWRIGHT_OSS_OWNER_EMAIL / PLAYWRIGHT_OSS_OWNER_PASSWORD secrets in GitHub
Verify CI workflow triggers after preview deploy

🤖 Generated with Claude Code

Harden Playwright auth/setup and runner behavior for deployed OSS environments, add planning docs, and refresh OSS acceptance tests/selectors to better match current UI and routing patterns.

Fix 10 failing acceptance tests against deployed OSS environment. Root cause: direct URL navigation without workspace prefix caused 404s. All tests now navigate via /apps -> sidebar links. Also adds Gherkin BDD feature files documenting each test scenario with caveats. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…state Clean up all design docs - mark Phase 0 complete, update backlog with completed items, rewrite status to remove verbose session logs, update QA profile with full suite commands, and refresh context/research with discovered patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add ephemeral project per CI run, test independence goals, and parallelization strategy using project-scoped isolation. Run full suite on every PR instead of tiered smoke/nightly approach. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add lens:functional, cost:free/paid, and license:oss tags to all tests per the testing docs spec. Playground run tests are cost:paid (call LLM), all others are cost:free. Also add mock LLM phase to plan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- global-setup: create ephemeral project per test run (make_default=true) so tests run in isolated project context without data accumulation - global-teardown: restore original default project, then delete ephemeral - Add CI workflow (10-playwright-oss-tests.yml) that runs OSS acceptance suite after Railway preview deploy - Wire Playwright job into existing build/deploy pipeline (workflow 06→07→10) - Add deploy outputs (preview_url) from workflow 07 for downstream use - Add test:smoke script alias in package.json - Update design docs with coverage expansion plan and data seeding strategy Tested locally: full create→test→restore→delete cycle works. 6 pass, 2 skip, 4 expected failures (no OpenAI key in fresh project). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vercel · 2026-03-05T18:38:05Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	Mar 5, 2026 8:30pm

devin-ai-integration

Devin Review found 2 potential issues.

View 8 additional findings in Devin Review.

devin-ai-integration · 2026-03-05T18:45:14Z

web/tests/tests/fixtures/base.fixture/apiHelpers/index.ts

+export const getAppById = async (page: Page, appId: string) => {
+    const appsResponse = waitForApiResponse<ListAppsItem[]>(page, {
+        route: "/api/apps",
+        method: "GET",
+    })
+
+    // Trigger the API call by going to apps page if not already there
+    const currentUrl = page.url()
+    if (!currentUrl.includes("/apps")) {
+        await page.goto("/apps", {waitUntil: "domcontentloaded"})
+        await page.waitForURL("**/apps", {waitUntil: "domcontentloaded"})
+    }
+
+    const apps = await appsResponse
+
+    const app = apps.find((app) => app.app_id === appId)
+    if (!app) {
+        console.error(`[App Fixture] App not found with ID: ${appId}`)
+        throw new Error(`App not found with ID: ${appId}`)
+    }
+
+    return app


🔴 getAppById hangs forever when page URL already contains /apps

The new getAppById function sets up waitForApiResponse (which internally calls page.waitForResponse) to listen for a GET /api/apps response on line 87, but then only navigates to /apps if the current URL does not already include /apps (line 94). When the page is already on the apps page, no navigation occurs, no new GET /api/apps request is triggered, and the appsResponse promise on line 99 will never resolve — causing the function to hang until the test timeout with no actionable error message. This is the same API interception race condition pattern this PR explicitly fixes in other files (e.g., getApp, getVariants, getTestsets), but the fix was not applied to the newly added getAppById.

Prompt for agents

In web/tests/tests/fixtures/base.fixture/apiHelpers/index.ts, the getAppById function (lines 86-107) has a race condition: it starts listening for a GET /api/apps response before checking whether navigation is needed, but if the page already contains /apps in its URL, no navigation happens and no new API call fires, so the waitForApiResponse promise hangs forever. Fix: Always trigger a navigation to /apps regardless of the current URL, similar to how getApp (lines 49-56) does it. Move the waitForApiResponse call to be stored as a non-awaited promise first, then always call page.goto('/apps', ...), then await the promise. For example: export const getAppById = async (page: Page, appId: string) => { const appsResponse = waitForApiResponse<ListAppsItem[]>(page, { route: '/api/apps', method: 'GET', }) await page.goto('/apps', {waitUntil: 'domcontentloaded'}) await page.waitForURL('**/apps', {waitUntil: 'domcontentloaded'}) const apps = await appsResponse // ... rest of function }

Was this helpful? React with 👍 or 👎 to provide feedback.

docs/design/playwright-oss-stabilization/status.md

github-actions · 2026-03-05T18:48:46Z

Railway Preview Environment


Preview URL	https://gateway-production-683c.up.railway.app/w
Project	`agenta-oss-pr-3925`
Image tag	`pr-3925-b9c4474`
Status	Deployed
Railway logs	Open logs
Workflow logs	View workflow run
Updated at 2026-03-05T20:36:56.564Z

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ility web/tests is part of the pnpm workspace and has no package-lock.json, causing `npm ci` to fail. Use pnpm with --filter to install only the test package and its dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

web/tests has no workspace: dependencies — its package.json is fully self-contained. Using npm install avoids both the missing lockfile issue (npm ci) and the patchedDependencies mismatch (pnpm frozen). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

npm install fails because the parent web/package.json has workspace: protocol deps that npm can't resolve. pnpm handles this natively. Using --no-frozen-lockfile to avoid patchedDependencies mismatch on feature branches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 2 new potential issues.

View 12 additional findings in Devin Review.

devin-ai-integration · 2026-03-05T19:28:26Z

web/oss/tests/playwright/acceptance/app/test.ts

+            const createAppPromise = page.waitForResponse((response) => {
+                if (
+                    !response.url().includes("/apps") ||
+                    response.request().method() !== "POST"
+                ) {
+                    return false
+                }
+
+                const payload = response.request().postData() || ""
+                return payload.includes(appName)
            })


🟡 App creation response listener set up after appType click can miss fast POST responses

In web/oss/tests/playwright/acceptance/app/test.ts:59-69, the page.waitForResponse for the POST to /apps is registered after typing the app name and clicking the app type. If clicking the app type somehow triggers an early POST request (e.g., via an auto-submit or debounce), the listener would miss it. More importantly, the waitForResponse is set up at line 59 but the click that actually triggers the POST happens at line 70. Between lines 59 and 70, page.waitForResponse is already listening, so this is actually correctly ordered — the response listener is set up before the triggering action (clickButton on line 70). However, this is an unusual pattern: the response matcher on line 61 checks response.url().includes("/apps") which is very broad and could match other GET requests to /apps that happen during UI interactions (e.g., app list fetches), not just the creation POST. The method !== "POST" check on line 62 should filter those out, but the /apps URL match is still quite broad — it could match /api/apps/{id}/variants or similar URLs containing /apps.

Suggested change

const createAppPromise = page.waitForResponse((response) => {

if (

!response.url().includes("/apps") ||

response.request().method() !== "POST"

) {

return false

}

const payload = response.request().postData() || ""

return payload.includes(appName)

})

const createAppPromise = page.waitForResponse((response) => {

if (

!response.url().includes("/api/apps") ||

response.request().method() !== "POST"

) {

return false

}

// Exclude URLs like /api/apps/{id}/variants

const url = new URL(response.url())

if (/\/api\/apps\/.+/.test(url.pathname)) {

return false

}

const payload = response.request().postData() || ""

return payload.includes(appName)

})

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-03-05T19:28:28Z

web/tests/playwright/global-setup.ts

+    const ephemeralEnabled =
+        String(process.env.AGENTA_EPHEMERAL_PROJECT ?? "true").toLowerCase() !== "false"


🟡 Ephemeral project enabled by default may break tests relying on pre-existing data

The AGENTA_EPHEMERAL_PROJECT defaults to true at web/tests/playwright/global-setup.ts:332. This means every test run creates a fresh empty project and sets it as default. However, several tests depend on pre-existing data: the deployment test (deployment/index.ts) requires a completion app with variants, the observability test requires prior traces, and the playground tests call apiHelpers.getApp("completion") which navigates to /apps expecting apps to exist. In a fresh ephemeral project, none of these will exist, causing test failures. The CI workflow at .github/workflows/10-playwright-oss-tests.yml:36 explicitly sets AGENTA_EPHEMERAL_PROJECT: "true", so the tests that depend on app creation running first (in sequential order) will work — but only if app creation runs in the same project context. Since the app creation test creates apps by navigating to /apps which auto-redirects to the ephemeral project, this should work sequentially. The real issue is that for non-CI runs (local runs), the ephemeral project is silently enabled by default and will break if users don't expect it.

Was this helpful? React with 👍 or 👎 to provide feedback.

The deploy workflow outputs URLs like https://host/w but the API health endpoint is at /api/health (no /w prefix). Strip the path suffix to construct the correct health check URL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When AGENTA_WEB_URL includes a subpath (e.g. /w), the API URL was incorrectly constructed as /w/api instead of /api. Extract the origin from the URL to always target the correct API endpoint. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mmabrouk and others added 6 commits March 5, 2026 10:29

chore(frontend): stabilize playwright OSS deployment workflow

a05cad1

Harden Playwright auth/setup and runner behavior for deployed OSS environments, add planning docs, and refresh OSS acceptance tests/selectors to better match current UI and routing patterns.

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. ci/cd tests labels Mar 5, 2026

devin-ai-integration bot reviewed Mar 5, 2026

View reviewed changes

docs: replace hardcoded URL with placeholder in status.md

1a491f5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mmabrouk force-pushed the chore/playwright-oss-stabilization-wip branch from cf7e41a to 1a491f5 Compare March 5, 2026 18:56

vercel bot deployed to Preview March 5, 2026 18:57 View deployment

vercel bot deployed to Preview March 5, 2026 19:11 View deployment

vercel bot deployed to Preview March 5, 2026 19:16 View deployment

vercel bot deployed to Preview March 5, 2026 19:23 View deployment

devin-ai-integration bot reviewed Mar 5, 2026

View reviewed changes

vercel bot deployed to Preview March 5, 2026 19:58 View deployment

vercel bot deployed to Preview March 5, 2026 20:30 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(test): stabilize Playwright OSS tests with ephemeral projects and CI#3925

chore(test): stabilize Playwright OSS tests with ephemeral projects and CI#3925
mmabrouk wants to merge 12 commits intomainfrom
chore/playwright-oss-stabilization-wip

mmabrouk commented Mar 5, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

vercel bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Mar 5, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Mar 5, 2026

Uh oh!

devin-ai-integration bot Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		const ephemeralEnabled =
		String(process.env.AGENTA_EPHEMERAL_PROJECT ?? "true").toLowerCase() !== "false"

Conversation

mmabrouk commented Mar 5, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Ephemeral projects (global-setup.ts / global-teardown.ts)

CI workflow (.github/workflows/10-playwright-oss-tests.yml)

Workflow 07 outputs

CI secrets needed

Test results (local, ephemeral project)

Test plan

Uh oh!

vercel bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mmabrouk commented Mar 5, 2026 •

edited by devin-ai-integration bot

Loading

Ephemeral projects (`global-setup.ts` / `global-teardown.ts`)

CI workflow (`.github/workflows/10-playwright-oss-tests.yml`)

vercel bot commented Mar 5, 2026 •

edited

Loading

github-actions bot commented Mar 5, 2026 •

edited

Loading