Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion apps/demo-frontend/public/app.js
Original file line number Diff line number Diff line change
Expand Up @@ -10030,6 +10030,9 @@ function bindEvents() {
}

async function bootstrap() {
const initialTabId = readTabIdFromHash() ?? readStoredTabId();
setActiveTab(initialTabId, { syncHash: false });

const runtimeConfig = await loadRuntimeConfig();
if (runtimeConfig?.wsUrl) {
el.wsUrl.value = runtimeConfig.wsUrl;
Expand Down Expand Up @@ -10065,7 +10068,7 @@ async function bootstrap() {
resetOperatorBoardView({ mode: readStoredOperatorBoardMode(), persistMode: false });
renderTaskList();
evaluateConstraints();
setActiveTab(readStoredTabId());
setActiveTab(readStoredTabId(), { syncHash: false });
setUiTaskFieldsVisibility();
initBackgroundVideoLoopBlend();
enhanceSelectControls();
Expand Down
47 changes: 29 additions & 18 deletions apps/demo-frontend/public/styles.css
Original file line number Diff line number Diff line change
Expand Up @@ -738,9 +738,9 @@ textarea {
}

select {
appearance: none;
-webkit-appearance: none;
-moz-appearance: none;
appearance: none !important;
-webkit-appearance: none !important;
-moz-appearance: none !important;
cursor: pointer;
padding-right: 42px;
--select-surface-start: var(--surface-control);
Expand All @@ -751,6 +751,10 @@ select {
linear-gradient(180deg, var(--select-surface-start), var(--select-surface-end));
}

select::-ms-expand {
display: none;
}

select:hover {
--select-surface-start: var(--surface-control-hover);
}
Expand Down Expand Up @@ -1750,6 +1754,7 @@ textarea {
min-height: 74px;
padding-top: 11px;
padding-bottom: 11px;
overflow: visible;
}

.panel-live-connection .action-group-primary > .export-menu {
Expand All @@ -1759,11 +1764,11 @@ textarea {
}

.panel-live-connection .export-menu[open] {
z-index: 240;
z-index: 520;
}

.panel-live-connection .export-menu-list {
z-index: 260;
z-index: 540;
}

.action-group {
Expand All @@ -1788,6 +1793,12 @@ textarea {
.export-menu {
position: relative;
min-width: 0;
z-index: 140;
isolation: isolate;
}

.export-menu[open] {
z-index: 420;
}

.export-menu > summary {
Expand Down Expand Up @@ -1890,7 +1901,7 @@ textarea {
position: absolute;
right: 0;
top: calc(100% + 8px);
z-index: 120;
z-index: 460;
min-width: min(320px, calc(100vw - 56px));
padding: 8px;
display: grid;
Expand Down Expand Up @@ -2305,7 +2316,7 @@ button:focus-visible {

.meta-row-status-live {
margin-top: 14px;
padding: 8px;
padding: 10px;
border: 1px solid color-mix(in oklch, var(--primary) 20%, var(--border-soft));
border-radius: calc(var(--radius) - 4px);
background:
Expand All @@ -2318,14 +2329,14 @@ button:focus-visible {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(162px, 1fr));
grid-template-columns: repeat(4, minmax(0, 1fr));
gap: 8px;
gap: 10px;
overflow: visible;
}

.meta-row-status-live > div {
min-width: 0;
min-height: 46px;
padding: 7px 10px;
min-height: 48px;
padding: 8px 11px;
border-radius: calc(var(--radius) - 10px);
display: inline-flex;
align-items: center;
Expand Down Expand Up @@ -2401,18 +2412,18 @@ button:focus-visible {
}

.meta-row-status-live strong {
color: color-mix(in oklch, white 98%, var(--foreground));
font-size: 0.74rem;
color: color-mix(in oklch, white 99.6%, var(--foreground));
font-size: 0.76rem;
text-transform: none;
letter-spacing: 0.015em;
letter-spacing: 0.012em;
white-space: nowrap;
text-shadow: 0 1px 0 color-mix(in oklch, black 30%, transparent);
}

.meta-row-status-live > div > span:not(.status-pill) {
color: color-mix(in oklch, white 99.5%, var(--foreground));
font-size: 0.82rem;
font-weight: 650;
color: color-mix(in oklch, white 99.8%, var(--foreground));
font-size: 0.84rem;
font-weight: 670;
line-height: 1.28;
}

Expand All @@ -2435,7 +2446,7 @@ button:focus-visible {
max-width: 100%;
padding: 4px 9px;
border-radius: 999px;
border: 1px solid color-mix(in oklch, var(--primary) 32%, var(--border-soft));
border: 1px solid color-mix(in oklch, var(--primary) 38%, var(--border-soft));
background:
radial-gradient(160px 64px at 12% -34%, color-mix(in oklch, var(--primary) 10%, transparent), transparent 72%),
linear-gradient(
Expand All @@ -2453,7 +2464,7 @@ button:focus-visible {
word-break: break-word;
box-shadow:
inset 0 1px 0 color-mix(in oklch, var(--foreground) 10%, transparent),
0 0 0 1px color-mix(in oklch, var(--primary) 16%, transparent);
0 0 0 1px color-mix(in oklch, var(--primary) 20%, transparent);
}

.meta-row > div {
Expand Down
22 changes: 18 additions & 4 deletions docs/judge-quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Fast, judge-facing entry point for a 5-10 minute evaluation run.

This project covers all three challenge categories in one platform:

1. Live Agent (realtime speech, interruption, translation, negotiation)
1. Live Agent (realtime speech, interruption, translation, negotiation, grounded research)
2. Creative Storyteller (text + audio + image + video narrative flow)
3. UI Navigator (computer-use style UI planning/execution with approval guardrails)

Expand Down Expand Up @@ -55,6 +55,8 @@ Artifacts:
8. `artifacts/judge-visual-evidence/presentation.md`
9. `artifacts/demo-e2e/epic-summary.json`

If deploy/publish artifacts are present, `manifest.md` and `presentation.md` also surface compact deploy/publish provenance from `artifacts/deploy/railway-deploy-summary.json` and `artifacts/deploy/repo-publish-summary.json`. Ordinary local judge flows omit that section, and raw deploy/publish JSON is not embedded into the judge-facing markdown.

## 3) Validate Release Readiness

```bash
Expand All @@ -64,21 +66,30 @@ npm run verify:release
## 4) What Judges Should See in UI

1. Connection + assistant lifecycle (`idle/streaming/speaking`).
2. Live interruption, truncate/delete evidence, and gateway error correlation.
2. Live interruption, truncate/delete evidence, gateway error correlation, and optional `research` citations/source URLs.
3. Operator Console panels:
- Live Bridge Status
- Approvals Queue
- Workflow Runtime / Runtime Guardrails
- Device Nodes Health / Updates
- Bootstrap Doctor / Browser Workers
- Governance Policy Lifecycle
- Skills Registry Lifecycle
- Plugin Marketplace Lifecycle
- Agent Usage Evidence
- Cost & Tokens Evidence
4. Session export controls:
4. Operator support panels:
- `Runtime Drill Runner` for repo-owned dry-run/live recovery drills and `followUpContext` handoff.
- `Workflow Control Panel` for redacted assistive-router/runtime override posture.
- `Operator Session Ops` for saved `operatorPurpose`, session replay, and cross-agent discovery.
- `Bootstrap Doctor & Auth Profiles` for provider/auth-profile/device/fallback posture.
- `Browser Worker Control` for queue/checkpoint posture on long-running UI jobs.
5. Session export controls:
- `Export Session -> Export Markdown`
- `Export Session -> Export JSON`
- `Export Session -> Export Audio (WAV)`
5. Story Timeline panel:
- Confirm exported Markdown/JSON include `runtimeGuardrailsSignalPaths`, `operatorPurpose`, `operatorSessionReplay`, and `operatorDiscovery`.
6. Story Timeline panel:
- Confirm `Timeline State` KPI transitions (`0%` idle -> ready/pending) as story output arrives.
- Segment scrubber/selector reflects `output.story.timeline`
- Preview card shows segment text + `image/video/audio` refs
Expand All @@ -100,15 +111,18 @@ npm run verify:release
- Start mic, send live request, then trigger interruption.
- Show truncate/delete/gateway-correlation evidence in Operator Console.
- Mention roundtrip and interrupt KPI lanes in `artifacts/demo-e2e/badge-details.json`.
- If judges ask for grounded-research proof, switch to `intent=research` once and show citation-bearing `answer`, `citations`, and `sourceUrls`.
3. `02:15-03:30` Creative Storyteller category:
- Send storyteller prompt.
- Open `Story Timeline` panel and scrub segments.
- Show image/video/audio refs and async media behavior.
4. `03:30-04:45` UI Navigator category:
- Send `ui_task` intent with grounding fields.
- Show approval flow and damage-control verdict in Operator Console.
- Save a short purpose in `Operator Session Ops`, then open `Bootstrap Doctor & Auth Profiles` and `Browser Worker Control` once to show runtime posture before execution.
- Confirm safety gates before execution.
5. `04:45-05:30` Evidence close:
- Run `npm run demo:epic` (or fallback `npm run demo:e2e:visual:judge` if e2e/policy/badge were already executed).
- Open `artifacts/judge-visual-evidence/presentation.md`.
- Confirm all evidence lanes are `pass` in `artifacts/demo-e2e/badge-details.json`.
- Export session `JSON` or `Markdown` and confirm `runtimeGuardrailsSignalPaths`, `operatorPurpose`, `operatorSessionReplay`, and `operatorDiscovery`.
13 changes: 10 additions & 3 deletions docs/judge-visual-evidence.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@
Create one reproducible visual bundle for judges:

1. Screenshot checklist status (present/missing).
2. Critical badge-evidence lane status (`pass/fail/unavailable`).
2. Critical badge-evidence lane status (`pass/fail/unavailable`), including runtime guardrails and provider provenance.
3. Single manifest for quick go/no-go before submission.
4. One-page presentation bundle with runtime guardrails snapshot, provider adapter snapshot, and compact deploy/publish provenance when optional Railway/repo-publish artifacts are available.

## Commands

Expand Down Expand Up @@ -76,7 +77,9 @@ Defaults used by `scripts/judge-visual-evidence-pack.mjs`:

1. Badge details: `artifacts/demo-e2e/badge-details.json`
2. Demo summary: `artifacts/demo-e2e/summary.json`
3. Screenshot directory: `artifacts/judge-visual-evidence/screenshots`
3. Optional Railway deploy summary: `artifacts/deploy/railway-deploy-summary.json`
4. Optional repo publish summary: `artifacts/deploy/repo-publish-summary.json`
5. Screenshot directory: `artifacts/judge-visual-evidence/screenshots`

Defaults used by `scripts/judge-visual-capture.mjs`:

Expand All @@ -94,6 +97,8 @@ Defaults used by `scripts/judge-visual-capture.mjs`:
5. `artifacts/judge-visual-evidence/presentation.md`
6. `artifacts/demo-e2e/epic-summary.json`

`manifest.md` and `presentation.md` surface compact deploy/publish provenance from `railway-deploy-summary.json` / `repo-publish-summary.json` when those optional files are present. Ordinary local judge flows omit that section instead of filling the page with `unavailable` placeholders, and raw deploy/publish JSON is not embedded into the judge-facing markdown.

## Required Screenshot Filenames

Put files into `artifacts/judge-visual-evidence/screenshots`:
Expand All @@ -120,4 +125,6 @@ Pack marks these as critical:
6. `pluginMarketplace`
7. `deviceNodes`
8. `agentUsage`
9. `deviceNodeUpdates` (derived from `deviceNodes` updates fields)
9. `runtimeGuardrailsSignalPaths`
10. `providerUsage`
11. `deviceNodeUpdates` (derived from `deviceNodes` updates fields)
Loading