Experimental macOS computer-use sandbox for teaching, replaying, and discovering GUI automation skills.
Neo Lab is a local runtime for building and testing computer-use workflows against real applications. It combines:
- local screen capture, annotation, and input control
- external OCR, visual grounding, and vision-model services
- a generation runtime that can propose new actions for tasks you never recorded
Use it to:
- teach repeatable workflows as YAML skills
- replay those skills with validation and state-aware routing
- synthesize novel action sequences from plain-English instructions
- let Neo explore unfamiliar browser flows step by step
- optionally expose the same runtime through a voice interface
This repository is the orchestration layer. It does not bundle the OCR / DINO / vision services or the generation runtime; it wires them together into a local experimentation environment.
- macOS-only
- experimental and local-first
- intended for research, prototyping, and operator-driven automation
- not a hosted service or production-ready browser agent
Neo Lab supports two complementary modes:
- Teach known behavior
You perform a workflow once, Neo records it as typed YAML steps, and you can replay that skill later with validation and composition.
- Discover unknown behavior
When no recorded skill exists, Neo can use the external generators stack to synthesize candidate action steps from a natural-language instruction. Those steps are executed locally, validated, and repaired if needed. For browser tasks, Neo can also run a perception -> action -> validation loop to discover a path through a page in real time.
The important point is that Neo Lab is not only a recorder for previously taught workflows. It is also a runtime for grounded action generation and live UI discovery.
Neo Lab sits on top of three external pieces:
generators— natural-language -> action-step synthesis, repair loops, and transcript-backed contextcomputer-lab— OCR, DINO, and vision-model HTTP services for perceptionvoice-lab— optional voice pipeline used byrun_voice.py
Neo Lab itself handles:
- local screen capture and annotation
- low-level macOS actions like click, type, keypress, and scroll
- skill recording, replay, and composition
- state graph matching and transition routing
- generated-action execution and browser exploration
- Screen perception via OCR, DINO, and vision-model calls in neo/screen.py
- Local UI control through click / type / key / scroll actions in neo/actions.py
- Interactive skill teaching in learn.py
- Skill replay and validation in neo/skill.py
- State graph navigation in neo/state_graph.py
- Generated novel actions in neo/generate.py
- Live browser discovery in neo/browser.py
- Optional voice entry point in run_voice.py
- macOS
- Python 3.10+
- running OCR / DINO / vision services
- a
generatorscheckout or install if you want generated actions - a
voice-labcheckout or install only if you want voice mode
Recommended:
./setup.shUseful options:
./setup.sh --voice
./setup.sh --no-generators
./setup.sh --generators /path/to/generatorsOptional local config:
cp .env.example .envThen edit .env with your OCR / DINO / vision endpoints and any generators path overrides.
Manual setup:
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
# base runtime only:
# pip install -e .
# optional voice stack:
# pip install -e ".[voice]"Install generators in the same environment, or point Neo Lab at the checkout with NEO_GENERATORS_PATH.
Defaults target localhost. Override them with environment variables instead of editing code.
| Variable | Default | Purpose |
|---|---|---|
NEO_VISION_SERVER |
http://127.0.0.1:8082 |
screen / VLM endpoint |
NEO_OCR_SERVER |
http://127.0.0.1:9003 |
OCR / text detection |
NEO_DINO_SERVER |
http://127.0.0.1:9004 |
visual grounding / UI element detection |
OLLAMA_HOST |
http://127.0.0.1:11434 |
embeddings for neo/skill_index.py |
Example:
export NEO_OCR_SERVER=http://192.168.1.10:9003
export NEO_DINO_SERVER=http://192.168.1.10:9004
export NEO_VISION_SERVER=http://127.0.0.1:8082Quick health checks:
curl -s "${NEO_OCR_SERVER:-http://127.0.0.1:9003}"/health
curl -s "${NEO_DINO_SERVER:-http://127.0.0.1:9004}"/health
curl -s "${NEO_VISION_SERVER:-http://127.0.0.1:8082}"/health- Install Neo Lab and make sure your OCR / DINO / vision services are reachable.
- Point
NEO_GENERATORS_PATHat yourgeneratorscheckout if you want generated actions. - Run
python learn.pyto teach and replay a small skill. - Run
neo.generate.run(...)orneo.browser.explore(...)when you want Neo to propose or discover actions at runtime.
python learn.pyExample session:
neo> look
neo> click 7
neo> type "search term"
neo> key return
neo> save demo_search
neo> replay demo_search
This produces YAML skills under skills/ once you create that directory.
Neo can synthesize actions that were never explicitly recorded before.
from neo.generate import run
ok, facts, steps = run(
"Open Safari, go to example.com, and extract the main headline",
target_app="Safari",
)Under the hood, Neo uses the external generators runtime to produce typed action steps, executes them locally, validates them, and attempts a repair pass if the first attempt fails.
from neo.browser import explore
ok, facts = explore(
goal="Find the pricing page and list the plan names",
target_app="Safari",
max_steps=15,
)This is different from replaying a known skill: Neo perceives the current page, decides on the next grounded action, executes it, and iterates until it either succeeds or runs out of steps.
export NEO_VOICE_LAB_PATH=/path/to/voice-lab
python run_voice.pyIf NEO_VOICE_LAB_PATH is set, the default config path is <voice-lab>/config.yaml. Otherwise pass an explicit config path:
python run_voice.py /path/to/config.yamlNeo Lab stores learned behavior in two forms:
- skills: executable YAML programs made of typed steps like
click_text,type,wait,skill, orscrape - states: named screen signatures with OCR and visual markers, plus transitions between them
That lets Neo do more than replay a single macro. It can recognize where it is, choose a route to a target state, invoke the right skills along the path, and blend taught behavior with newly generated actions when needed.
neo/ core library
actions.py low-level local UI actions
screen.py screenshot, OCR, DINO, annotation, vision calls
skill.py skill save/load/replay/validation
state_graph.py UI state recognition and transition routing
generate.py natural-language -> generated action steps
browser.py autonomous browser exploration loop
transcript.py transcript wrapper for generator context
learn.py interactive teaching REPL
run_voice.py voice entry point
session.py long-lived authenticated-session helper
exercises/ small validation scripts for primitives
tests/ unit and integration tests
- Neo Lab is designed for local experimentation, not unattended production automation.
- Authentication flows should stay human-mediated or be handled with extreme care.
- Keep credentials, private service URLs, and sensitive skills or state graphs out of the repo.
- Treat
states.yamland saved skill files as potentially sensitive data when they describe real internal tools or personal account flows.
MIT — see LICENSE.