Skip to content

chore(test): stabilize Playwright OSS tests with ephemeral projects and CI#3925

Open
mmabrouk wants to merge 12 commits intomainfrom
chore/playwright-oss-stabilization-wip
Open

chore(test): stabilize Playwright OSS tests with ephemeral projects and CI#3925
mmabrouk wants to merge 12 commits intomainfrom
chore/playwright-oss-stabilization-wip

Conversation

@mmabrouk
Copy link
Member

@mmabrouk mmabrouk commented Mar 5, 2026

Summary

  • Stabilize all 12 OSS Playwright acceptance tests (10 pass, 2 skip gracefully)
  • Add ephemeral project isolation — each test run creates a fresh project, runs tests in it, then cleans up. Prevents data accumulation from repeated CI runs.
  • Add CI workflow that runs Playwright tests after Railway preview deploy on every PR
  • Add test dimension tags (lens, cost, license) to all tests per testing documentation spec
  • Update design docs with coverage expansion plan based on analysis of 15 legacy BDD feature files

What changed

Ephemeral projects (global-setup.ts / global-teardown.ts)

  • Setup: saves original default project, creates e2e-{timestamp} project with make_default=true
  • All tests navigate to /apps which auto-redirects to the ephemeral project — zero test changes needed
  • Teardown: restores original default, deletes ephemeral project
  • Opt-out via AGENTA_EPHEMERAL_PROJECT=false for local dev

CI workflow (.github/workflows/10-playwright-oss-tests.yml)

  • Runs after Railway preview deploy completes
  • Installs Playwright, waits for deployment readiness, runs full suite
  • Uploads HTML report and failure artifacts
  • Wired into existing build pipeline: 06 (build) → 07 (deploy) → 10 (test)

Workflow 07 outputs

  • Added preview_url output so downstream jobs can use the deploy URL

CI secrets needed

  • PLAYWRIGHT_OSS_OWNER_EMAIL — OSS admin email for test auth
  • PLAYWRIGHT_OSS_OWNER_PASSWORD — OSS admin password for test auth

Test results (local, ephemeral project)

Result Count Details
Pass 6 smoke, 2x app creation, save prompt, prompt registry, model hub
Skip 2 API keys (needs setup), testsets (no data in fresh project)
Expected fail 4 Playground run + deployment + observability (no OpenAI key in fresh project — Phase 4 mock LLM will fix)

Test plan

  • Smoke test passes with ephemeral project
  • Full suite: 6 pass, 2 skip, 4 expected failures
  • Ephemeral project created and deleted correctly
  • Original default project restored after teardown
  • CI YAML validates
  • Configure PLAYWRIGHT_OSS_OWNER_EMAIL / PLAYWRIGHT_OSS_OWNER_PASSWORD secrets in GitHub
  • Verify CI workflow triggers after preview deploy

🤖 Generated with Claude Code


Open with Devin

mmabrouk and others added 6 commits March 5, 2026 10:29
Harden Playwright auth/setup and runner behavior for deployed OSS environments, add planning docs, and refresh OSS acceptance tests/selectors to better match current UI and routing patterns.
Fix 10 failing acceptance tests against deployed OSS environment.
Root cause: direct URL navigation without workspace prefix caused 404s.
All tests now navigate via /apps -> sidebar links. Also adds Gherkin
BDD feature files documenting each test scenario with caveats.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…state

Clean up all design docs - mark Phase 0 complete, update backlog with
completed items, rewrite status to remove verbose session logs, update
QA profile with full suite commands, and refresh context/research with
discovered patterns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ephemeral project per CI run, test independence goals, and
parallelization strategy using project-scoped isolation. Run full
suite on every PR instead of tiered smoke/nightly approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add lens:functional, cost:free/paid, and license:oss tags to all
tests per the testing docs spec. Playground run tests are cost:paid
(call LLM), all others are cost:free. Also add mock LLM phase to plan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- global-setup: create ephemeral project per test run (make_default=true)
  so tests run in isolated project context without data accumulation
- global-teardown: restore original default project, then delete ephemeral
- Add CI workflow (10-playwright-oss-tests.yml) that runs OSS acceptance
  suite after Railway preview deploy
- Wire Playwright job into existing build/deploy pipeline (workflow 06→07→10)
- Add deploy outputs (preview_url) from workflow 07 for downstream use
- Add test:smoke script alias in package.json
- Update design docs with coverage expansion plan and data seeding strategy

Tested locally: full create→test→restore→delete cycle works.
6 pass, 2 skip, 4 expected failures (no OpenAI key in fresh project).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Mar 5, 2026 8:30pm

Request Review

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. ci/cd tests labels Mar 5, 2026
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 8 additional findings in Devin Review.

Open in Devin Review

Comment on lines +86 to +107
export const getAppById = async (page: Page, appId: string) => {
const appsResponse = waitForApiResponse<ListAppsItem[]>(page, {
route: "/api/apps",
method: "GET",
})

// Trigger the API call by going to apps page if not already there
const currentUrl = page.url()
if (!currentUrl.includes("/apps")) {
await page.goto("/apps", {waitUntil: "domcontentloaded"})
await page.waitForURL("**/apps", {waitUntil: "domcontentloaded"})
}

const apps = await appsResponse

const app = apps.find((app) => app.app_id === appId)
if (!app) {
console.error(`[App Fixture] App not found with ID: ${appId}`)
throw new Error(`App not found with ID: ${appId}`)
}

return app
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 getAppById hangs forever when page URL already contains /apps

The new getAppById function sets up waitForApiResponse (which internally calls page.waitForResponse) to listen for a GET /api/apps response on line 87, but then only navigates to /apps if the current URL does not already include /apps (line 94). When the page is already on the apps page, no navigation occurs, no new GET /api/apps request is triggered, and the appsResponse promise on line 99 will never resolve — causing the function to hang until the test timeout with no actionable error message. This is the same API interception race condition pattern this PR explicitly fixes in other files (e.g., getApp, getVariants, getTestsets), but the fix was not applied to the newly added getAppById.

Prompt for agents
In web/tests/tests/fixtures/base.fixture/apiHelpers/index.ts, the getAppById function (lines 86-107) has a race condition: it starts listening for a GET /api/apps response before checking whether navigation is needed, but if the page already contains /apps in its URL, no navigation happens and no new API call fires, so the waitForApiResponse promise hangs forever.

Fix: Always trigger a navigation to /apps regardless of the current URL, similar to how getApp (lines 49-56) does it. Move the waitForApiResponse call to be stored as a non-awaited promise first, then always call page.goto('/apps', ...), then await the promise. For example:

export const getAppById = async (page: Page, appId: string) => {
    const appsResponse = waitForApiResponse<ListAppsItem[]>(page, {
        route: '/api/apps',
        method: 'GET',
    })

    await page.goto('/apps', {waitUntil: 'domcontentloaded'})
    await page.waitForURL('**/apps', {waitUntil: 'domcontentloaded'})

    const apps = await appsResponse
    // ... rest of function
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

Railway Preview Environment

Preview URL https://gateway-production-683c.up.railway.app/w
Project agenta-oss-pr-3925
Image tag pr-3925-b9c4474
Status Deployed
Railway logs Open logs
Workflow logs View workflow run
Updated at 2026-03-05T20:36:56.564Z

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ility

web/tests is part of the pnpm workspace and has no package-lock.json,
causing `npm ci` to fail. Use pnpm with --filter to install only the
test package and its dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
web/tests has no workspace: dependencies — its package.json is fully
self-contained. Using npm install avoids both the missing lockfile
issue (npm ci) and the patchedDependencies mismatch (pnpm frozen).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
npm install fails because the parent web/package.json has workspace:
protocol deps that npm can't resolve. pnpm handles this natively.
Using --no-frozen-lockfile to avoid patchedDependencies mismatch
on feature branches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 12 additional findings in Devin Review.

Open in Devin Review

Comment on lines +59 to 69
const createAppPromise = page.waitForResponse((response) => {
if (
!response.url().includes("/apps") ||
response.request().method() !== "POST"
) {
return false
}

const payload = response.request().postData() || ""
return payload.includes(appName)
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 App creation response listener set up after appType click can miss fast POST responses

In web/oss/tests/playwright/acceptance/app/test.ts:59-69, the page.waitForResponse for the POST to /apps is registered after typing the app name and clicking the app type. If clicking the app type somehow triggers an early POST request (e.g., via an auto-submit or debounce), the listener would miss it. More importantly, the waitForResponse is set up at line 59 but the click that actually triggers the POST happens at line 70. Between lines 59 and 70, page.waitForResponse is already listening, so this is actually correctly ordered — the response listener is set up before the triggering action (clickButton on line 70). However, this is an unusual pattern: the response matcher on line 61 checks response.url().includes("/apps") which is very broad and could match other GET requests to /apps that happen during UI interactions (e.g., app list fetches), not just the creation POST. The method !== "POST" check on line 62 should filter those out, but the /apps URL match is still quite broad — it could match /api/apps/{id}/variants or similar URLs containing /apps.

Suggested change
const createAppPromise = page.waitForResponse((response) => {
if (
!response.url().includes("/apps") ||
response.request().method() !== "POST"
) {
return false
}
const payload = response.request().postData() || ""
return payload.includes(appName)
})
const createAppPromise = page.waitForResponse((response) => {
if (
!response.url().includes("/api/apps") ||
response.request().method() !== "POST"
) {
return false
}
// Exclude URLs like /api/apps/{id}/variants
const url = new URL(response.url())
if (/\/api\/apps\/.+/.test(url.pathname)) {
return false
}
const payload = response.request().postData() || ""
return payload.includes(appName)
})
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +331 to +332
const ephemeralEnabled =
String(process.env.AGENTA_EPHEMERAL_PROJECT ?? "true").toLowerCase() !== "false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Ephemeral project enabled by default may break tests relying on pre-existing data

The AGENTA_EPHEMERAL_PROJECT defaults to true at web/tests/playwright/global-setup.ts:332. This means every test run creates a fresh empty project and sets it as default. However, several tests depend on pre-existing data: the deployment test (deployment/index.ts) requires a completion app with variants, the observability test requires prior traces, and the playground tests call apiHelpers.getApp("completion") which navigates to /apps expecting apps to exist. In a fresh ephemeral project, none of these will exist, causing test failures. The CI workflow at .github/workflows/10-playwright-oss-tests.yml:36 explicitly sets AGENTA_EPHEMERAL_PROJECT: "true", so the tests that depend on app creation running first (in sequential order) will work — but only if app creation runs in the same project context. Since the app creation test creates apps by navigating to /apps which auto-redirects to the ephemeral project, this should work sequentially. The real issue is that for non-CI runs (local runs), the ephemeral project is silently enabled by default and will break if users don't expect it.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

The deploy workflow outputs URLs like https://host/w but the API
health endpoint is at /api/health (no /w prefix). Strip the path
suffix to construct the correct health check URL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When AGENTA_WEB_URL includes a subpath (e.g. /w), the API URL was
incorrectly constructed as /w/api instead of /api. Extract the origin
from the URL to always target the correct API endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/cd size:XXL This PR changes 1000+ lines, ignoring generated files. tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant