This repository contains a collection of example missions for Testronaut, designed to run against the demo sandbox at https://demo.testronaut.app.
Each mission demonstrates how autonomous testing agents can reason, interact, and validate UI workflows in a real browser.
| File | Description |
|---|---|
login.mission.js |
Logs into Mission Control with sample credentials |
addTask.mission.js |
Adds a new task to the mission dashboard |
startTask.mission.js |
Begins a task and waits for it to complete automatically |
removeTask.mission.js |
Removes a completed or queued task |
reportSaboteur.mission.js |
Files a report to identify the Saboteur |
completeAllTasks.mission.js |
Completes every task sequentially |
fileTransfer.mission.js |
Downloads the sus briefing file and uploads a crewmate alibi log |
logout.mission.js |
Logs the user out and returns to the login screen |
smokeSuite.mission.js |
Runs a full workflow suite (login → task → report → logout) |
- Install Testronaut CLI
npm install -g testronaut
npx testronaut --init- Clone this repo
git clone https://github.com/your-org/testronaut-crew-simulation-missions.git
cd testronaut-crew-simulation-missions- Set your environment variables (used by missions)
export URL="https://demo.testronaut.app"
export USERNAME="Nova"
export PASSWORD="1234"
# For OpenAI
export OPENAI_API_KEY=sk-...
# Or for Gemini
export GEMINI_API_KEY=AIza...- Run a mission
npx testronaut login.mission.js- Run the full suite
npx testronautYou can compare regular, agent-friendly, and agent-hostile UI modes with repeated runs of the same mission. The benchmark runner randomizes mode order by default, executes each condition the requested number of times, parses Testronaut's JSON reports, and writes a CSV, JSON, and Markdown summary.
Run a small pilot:
npm run benchmark:agent-modes -- --trials 5 --mission missions/login.mission.jsRun a larger comparison:
npm run benchmark:agent-modes -- --trials 30 --mission missions/login.mission.jsPreview the planned run order without calling Testronaut:
npm run benchmark:agent-modes -- --trials 3 --dry-runBy default the runner derives the base app URL from URL in .env or the current environment, then tests:
regular: <base URL>
friendly: <base URL>?agentMode=true
hostile: <base URL>?agentHostile=true
Outputs are written under benchmarks/agent-mode-runs/<timestamp>/:
plan.jsonrecords the randomized run order and configuration.results.csvcontains one row per run.results.jsoncontains the parsed metrics.report.mdsummarizes success rate, duration, tokens, turns, and retries by mode.
Useful options:
npm run benchmark:agent-modes -- --trials 10 --base-url http://localhost:5173/
npm run benchmark:agent-modes -- --modes regular,friendly --trials 20
npm run benchmark:agent-modes -- --sequential --trials 5For the cleanest comparison, keep the mission file, model, credentials, browser environment, app version, and maxTurns fixed across all modes.
import { runMissions } from 'testronaut';
export const loginMission = `
Visit ${process.env.URL}.
Fill in the username field with ${process.env.USERNAME} and password field with ${process.env.PASSWORD}.
Take a screenshot.
Then click the button most likely to login the user.
After clicking "Dock at Mission Control":
• Wait for either URL to change away from /login or for the login form to disappear.
• Wait for the mission dashboard to appear.
• Confirm success if the "Mission Tasks" heading or progress bar appears.
• Consider failure if a visible alert says "incorrect" or "invalid".
Take a screenshot.
Report SUCCESS or FAILURE with reasoning.
`;
export async function executeMission() {
return await runMissions({ mission: loginMission }, "login mission");
}missions/
addTask.mission.js
completeAllTasks.mission.js
login.mission.js
logout.mission.js
removeTask.mission.js
reportSaboteur.mission.js
smokeSuite.mission.js
startTask.mission.js
README.md
Built by Sandevistan Tech Inc. as part of the Testronaut ecosystem, a framework for autonomous QA agents and end‑to‑end testing.
MIT © 2025 Sandevistan Tech Inc.