test: add mock-LLM e2e coverage for recent PRs (2026-06-10)#1291
test: add mock-LLM e2e coverage for recent PRs (2026-06-10)#1291malhotra5 wants to merge 2 commits into
Conversation
Add two new mock-LLM E2E spec files covering features merged in the last 24 hours that lacked end-to-end test coverage: - mock-llm-drawer-and-empty-states.spec.ts (PR #1288): - Browser chrome bar renders with URL placeholder in empty state - Terminal tab shows empty state message - Tab switching between browser, terminal, and files tabs - VS Code drawer link visibility in tab bar - mock-llm-tool-visualizers.spec.ts (PR #1246): - Bash/terminal tool visualizer renders command and output - File editor tool visualizer renders file path and diff content - Agent reply renders correctly after tool call events Both specs use the page.route() mock-conversation pattern established in mock-llm-ui-regressions.spec.ts, matching existing test conventions. Co-authored-by: openhands <openhands@all-hands.dev>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
❌ Mock-LLM E2E Tests55/61 passed · 2 failed · 4 skipped · 🆕 7 new Commit:
🔍 Failure details (2)❌ mock-llm-drawer-and-empty-states.spec.ts › drawer tabs and empty states › terminal tab shows empty state message❌ mock-llm-tool-visualizers.spec.ts › tool visualizers › bash tool visualizer renders command and outputPosted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
🛑 Mock-LLM Docker E2E Test Results46/60 passed · 3 failed · 11 skipped · Commit:
🔍 Failure details (3)❌ chromium › mock-llm-drawer-and-empty-states.spec.ts › drawer tabs and empty states › terminal tab shows empty state message❌ chromium › mock-llm-drawer-and-empty-states.spec.ts › drawer tabs and empty states › terminal tab shows empty state message❌ chromium › mock-llm-tool-visualizers.spec.ts › tool visualizers › bash tool visualizer renders command and outputPosted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
|
🤖 OpenHands is reviewing this PR. Trigger label: This comment was posted by an AI agent (OpenHands). |
|
Thanks for adding targeted E2E coverage. I found a couple of material issues that need to be fixed before this can merge:
I did not see security concerns in the added test code, but the failing/inadequate tests mean this coverage is not merge-ready yet. This review comment was generated by an AI agent (OpenHands) on behalf of the user. 🔄 CHANGES REQUESTED This comment was posted by an AI agent (OpenHands). |
📸 Snapshot Test ReportWarning Snapshot comparison step crashed (timeout, OOM, or runner error) — diff results below may be incomplete or absent. ❌ 25 snapshots differ from the main branch baselines. Add the
🔴 Changed snapshots (25)
|
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
automations — 3 snapshots
automations-list-active-inactive
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
automations-no-automations
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
automations-search-no-results
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
backends-extended — 3 snapshots
backend-add-cloud-no-key-disabled
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
backend-add-invalid-url-disabled
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
backend-dropdown-two-backends
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
backends — 3 snapshots
backend-add-modal
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
backend-manage-modal
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
backend-selector-open
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
changes-tab
changes-empty
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
mcp-page — 4 snapshots
mcp-custom-server-1-editor-open
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
mcp-custom-server-editor
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
mcp-search-filtered
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
mcp-slack-install-2-modal
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
onboarding
onboarding-step-0-check-backend
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
settings-page — 3 snapshots
analytics-consent-modal
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
home-screen
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
settings-page
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
settings-secrets
secrets-list
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
settings-verification
condenser-settings
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
skills-page — 4 snapshots
skills-empty
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
skills-loaded
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
skills-no-match
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
skills-search-filtered
| Expected (main) | Actual (PR) | Diff |
|---|---|---|
![]() |
![]() |
![]() |
✅ Unchanged snapshots (49)
archived-conversation
- conversation-panel-with-archived-badges
- conversation-view-sandbox-error
automations
- automations-delete-modal
backends-extended
- backend-add-blank-disabled
- backend-add-cloud-advanced-open
- backend-add-cloud-with-key-enabled
- backend-add-form-partially-filled
- backend-add-local-ready
- backend-add-name-only-disabled
- backend-add-two-column-layout
- backend-add-whitespace-host-disabled
- backend-after-switch
- backend-cancel-nothing-saved
- backend-edit-prefilled
- backend-manage-after-removal
- backend-manage-two-listed
- backend-remove-cancelled
- backend-remove-confirmation
- backend-switch-overlay
changes-tab
- changes-deleted-file
- changes-diff-viewer
collapsible-thinking
- reasoning-content-collapsed
- reasoning-content-expanded
- think-action-collapsed
- think-action-expanded
mcp-page
- mcp-custom-server-2-url-filled
- mcp-custom-server-3-all-filled
- mcp-custom-server-4-installed
- mcp-empty-installed
- mcp-slack-install-1-marketplace
- mcp-slack-install-3-filled
- mcp-slack-install-4-installed
onboarding
- onboarding-step-1-choose-agent
- onboarding-step-2-setup-llm
- onboarding-step-3-say-hello
projects-workspace-browser
- projects-workspace-browser
settings-page
- add-backend-modal
- settings-app-page
settings-secrets
- secrets-add-form-filled
- secrets-add-form
- secrets-after-save
- secrets-delete-confirm
settings-verification
- verification-settings-critic-enabled
- verification-settings-off
- verification-settings-on
sidebar
- sidebar-collapsed
- sidebar-conversation-panel
- sidebar-filter-menu
skills-page
- skills-type-filter
Generated by the Snapshot Tests workflow. This comment was created by an AI agent (OpenHands) on behalf of the repo maintainers.
❌ Mock-LLM E2E Tests54/60 passed · 2 failed · 4 skipped · 🆕 7 new Commit:
🔍 Failure details (2)❌ mock-llm-drawer-and-empty-states.spec.ts › drawer tabs and empty states › terminal tab shows empty state message❌ mock-llm-tool-visualizers.spec.ts › tool visualizers › bash tool visualizer renders command and outputPosted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
❌ Mock-LLM Docker E2E Test Results49/60 passed · 2 failed · 9 skipped · 🆕 7 new Commit:
🔍 Failure details (2)❌ mock-llm-drawer-and-empty-states.spec.ts › drawer tabs and empty states › terminal tab shows empty state message❌ mock-llm-tool-visualizers.spec.ts › tool visualizers › bash tool visualizer renders command and outputPosted by the Mock-LLM E2E workflow · results are deterministic (scripted LLM responses) |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable — but the test suite has a few weak assertions and the PR description is missing the required Evidence section.
The new specs cover useful real-user paths (drawer tabs, empty states, bash + file-editor visualizers), and the mock event payloads are well-structured. I verified the referenced i18n keys (BROWSER$NO_PAGE_LOADED, TERMINAL$NO_OUTPUT, BROWSER$URL_PLACEHOLDER), the targeted data-testids (chat-interface, browser-chrome-bar, browser-chrome-url, conversation-tab-*, drawer-vscode-link, files-tab-diff-toggle), and the visualizer schemas (ExecuteBashAction/TerminalAction handling observation.command + observation.content; FileEditorAction/StrReplaceEditorAction handling observation.path/old_content/new_content). The mock payloads match what those components consume, so the wiring is correct.
However, a handful of the assertions are weaker than they should be — they would silently pass through a regression instead of catching it. The PR is also missing the Evidence section required by the repo's review policy (commands + output proving the real test paths ran in CI, or a link to the agent conversation).
See inline comments for specific fixes.
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format. - Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.
Was this review helpful? React with 👍 or 👎 to give feedback.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
Test-only change (two new spec files, no production code modified). Worst case: tests are flaky and need follow-up. No user-facing impact.
VERDICT:
❌ Needs rework — add the Evidence section required by the PR template, and tighten the weak assertions flagged inline.
KEY INSIGHT:
The test wiring against real production components is correct, but two assertions (the .or(files) fallback and the toBeTruthy() URL check) are written to never fail by design — they prove the test ran, not that the code is correct.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
| // No external link should be active when there's no page loaded | ||
| const openExternal = page.getByTestId("browser-chrome-open-external"); | ||
| await expect(openExternal).toHaveCount(0); | ||
| }); |
There was a problem hiding this comment.
🟠 Important: Weak assertion — toBeTruthy() on the URL field proves nothing about the empty state. The comment says it should be the i18n placeholder, but the test only checks the field is non-empty. A regression that renders the last-seen URL (or "https://…") in empty state would pass. Assert on the actual placeholder text:
| }); | |
| const text = (await urlField.textContent()) ?? ""; | |
| expect(text).toContain("No URL loaded"); |
(The BROWSER$URL_PLACEHOLDER i18n value is No URL loaded, verified in src/i18n/translation.json.)
| }); | ||
| }); | ||
|
|
||
| // ── Terminal tab: empty state ────────────────────────────────────── |
There was a problem hiding this comment.
🟡 Suggestion: Brittle i18n string match. getByText("No page loaded yet", { exact: false }) will pass on the current translation but break for any future copy edit, or silently no-op in any non-English locale the test runner picks up. Add a data-testid to the empty-state message component and assert on that instead — it's the same pattern you already use everywhere else in this file.
| }); | ||
| }); | ||
|
|
||
| // ── Tab switching ────────────────────────────────────────────────── |
There was a problem hiding this comment.
🟡 Suggestion: Brittle i18n string match (regex). Same issue as the browser empty-state check: getByText(/No terminal output|No output/i) couples the test to the English copy. Add a data-testid to the empty-terminal component and use that.
| page.locator('[class*="file"]').first(), | ||
| ), | ||
| ).toBeVisible({ timeout: 10_000 }); | ||
| }); |
There was a problem hiding this comment.
🔴 Critical: .or() fallback makes the assertion meaningless. page.getByTestId("files-tab-diff-toggle").or(page.locator('[class*="file"]').first()) passes if any element in the DOM has the substring file in its class — the body container, a dropdown, an unrelated file-tree node. If the files tab is completely broken, this test will still pass. Replace with a real assertion on the files tab's own testid (e.g. assert a container that lives inside the files tab panel is visible):
| }); | |
| await expect( | |
| page.getByTestId("files-tab-diff-toggle"), | |
| ).toBeVisible({ timeout: 10_000 }); |
If files-tab-diff-toggle doesn't always exist, that's a separate bug to fix in the component, not a reason to weaken the test.
| await expect(page.getByTestId("browser-chrome-bar")).not.toBeVisible(); | ||
| }); | ||
| }); | ||
|
|
There was a problem hiding this comment.
🟡 Suggestion: waitForTimeout(500) is a magic number. The comment says "wait for tab switch" but a fixed sleep is exactly what Playwright's auto-waiting was designed to avoid. Either drop the timeout (the next assertion already has its own timeout) or wait for a specific selector that only appears in the terminal tab.
| }); | ||
| }); | ||
|
|
||
| // ── VS Code drawer link ──────────────────────────────────────────── |
There was a problem hiding this comment.
🟡 Suggestion: Negative assertion is too weak. "Verify the browser chrome bar is not visible" doesn't actually prove the terminal tab is the active one — the chrome bar could simply be off-screen, or the panel could be in a transient state during the click. A positive assertion (e.g. that the terminal empty-state testid from line 223 is visible) is much stronger evidence the tab switch succeeded.











































































Summary
This PR adds mock-LLM E2E test coverage for two recently merged PRs that introduced user-facing changes without corresponding E2E tests.
Covered PRs
mock-llm-drawer-and-empty-states.spec.tsmock-llm-tool-visualizers.spec.tsNew test coverage
mock-llm-drawer-and-empty-states.spec.ts(PR #1288):mock-llm-tool-visualizers.spec.ts(PR #1246):Implementation
Both specs use the
page.route()mock-conversation pattern established inmock-llm-ui-regressions.spec.ts, injecting synthetic conversation events to test UI rendering without requiring a real LLM conversation.Verification
npm run typecheckpasses ✅npm run buildpasses ✅This PR was created by an AI agent (OpenHands) on behalf of the user.
HUMAN:
🐳 Docker images for this PR
• GHCR package: https://github.com/OpenHands/agent-canvas/pkgs/container/agent-canvas
ghcr.io/openhands/agent-canvasghcr.io/openhands/agent-server:1.28.1-pythonopenhands-automation==1.0.0a94f4181a16f7c72cf15c3876c87630f2d0eda3829Pull (multi-arch manifest)
# Multi-arch manifest — Docker automatically pulls the correct architecture docker pull ghcr.io/openhands/agent-canvas:sha-4f4181aRun
All tags pushed for this build
About Multi-Architecture Support
sha-4f4181a) is a multi-arch manifest supporting both amd64 and arm64sha-4f4181a-amd64) are also available if needed