Skip to content

belarusian/neo-lab

Repository files navigation

Neo Lab

Experimental macOS computer-use sandbox for teaching, replaying, and discovering GUI automation skills.

Neo Lab is a local runtime for building and testing computer-use workflows against real applications. It combines:

  • local screen capture, annotation, and input control
  • external OCR, visual grounding, and vision-model services
  • a generation runtime that can propose new actions for tasks you never recorded

Use it to:

  • teach repeatable workflows as YAML skills
  • replay those skills with validation and state-aware routing
  • synthesize novel action sequences from plain-English instructions
  • let Neo explore unfamiliar browser flows step by step
  • optionally expose the same runtime through a voice interface

This repository is the orchestration layer. It does not bundle the OCR / DINO / vision services or the generation runtime; it wires them together into a local experimentation environment.

Status

  • macOS-only
  • experimental and local-first
  • intended for research, prototyping, and operator-driven automation
  • not a hosted service or production-ready browser agent

What It Does

Neo Lab supports two complementary modes:

  1. Teach known behavior

You perform a workflow once, Neo records it as typed YAML steps, and you can replay that skill later with validation and composition.

  1. Discover unknown behavior

When no recorded skill exists, Neo can use the external generators stack to synthesize candidate action steps from a natural-language instruction. Those steps are executed locally, validated, and repaired if needed. For browser tasks, Neo can also run a perception -> action -> validation loop to discover a path through a page in real time.

The important point is that Neo Lab is not only a recorder for previously taught workflows. It is also a runtime for grounded action generation and live UI discovery.

Architecture

Neo Lab sits on top of three external pieces:

  1. generators — natural-language -> action-step synthesis, repair loops, and transcript-backed context
  2. computer-lab — OCR, DINO, and vision-model HTTP services for perception
  3. voice-lab — optional voice pipeline used by run_voice.py

Neo Lab itself handles:

  • local screen capture and annotation
  • low-level macOS actions like click, type, keypress, and scroll
  • skill recording, replay, and composition
  • state graph matching and transition routing
  • generated-action execution and browser exploration

Core Capabilities

What You Need

  • macOS
  • Python 3.10+
  • running OCR / DINO / vision services
  • a generators checkout or install if you want generated actions
  • a voice-lab checkout or install only if you want voice mode

Install

Recommended:

./setup.sh

Useful options:

./setup.sh --voice
./setup.sh --no-generators
./setup.sh --generators /path/to/generators

Optional local config:

cp .env.example .env

Then edit .env with your OCR / DINO / vision endpoints and any generators path overrides.

Manual setup:

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
# base runtime only:
# pip install -e .
# optional voice stack:
# pip install -e ".[voice]"

Install generators in the same environment, or point Neo Lab at the checkout with NEO_GENERATORS_PATH.

Service Endpoints

Defaults target localhost. Override them with environment variables instead of editing code.

Variable Default Purpose
NEO_VISION_SERVER http://127.0.0.1:8082 screen / VLM endpoint
NEO_OCR_SERVER http://127.0.0.1:9003 OCR / text detection
NEO_DINO_SERVER http://127.0.0.1:9004 visual grounding / UI element detection
OLLAMA_HOST http://127.0.0.1:11434 embeddings for neo/skill_index.py

Example:

export NEO_OCR_SERVER=http://192.168.1.10:9003
export NEO_DINO_SERVER=http://192.168.1.10:9004
export NEO_VISION_SERVER=http://127.0.0.1:8082

Quick health checks:

curl -s "${NEO_OCR_SERVER:-http://127.0.0.1:9003}"/health
curl -s "${NEO_DINO_SERVER:-http://127.0.0.1:9004}"/health
curl -s "${NEO_VISION_SERVER:-http://127.0.0.1:8082}"/health

Quick Start

  1. Install Neo Lab and make sure your OCR / DINO / vision services are reachable.
  2. Point NEO_GENERATORS_PATH at your generators checkout if you want generated actions.
  3. Run python learn.py to teach and replay a small skill.
  4. Run neo.generate.run(...) or neo.browser.explore(...) when you want Neo to propose or discover actions at runtime.

Common Workflows

1. Teach a skill interactively

python learn.py

Example session:

neo> look
neo> click 7
neo> type "search term"
neo> key return
neo> save demo_search
neo> replay demo_search

This produces YAML skills under skills/ once you create that directory.

2. Run generated actions for a new instruction

Neo can synthesize actions that were never explicitly recorded before.

from neo.generate import run

ok, facts, steps = run(
    "Open Safari, go to example.com, and extract the main headline",
    target_app="Safari",
)

Under the hood, Neo uses the external generators runtime to produce typed action steps, executes them locally, validates them, and attempts a repair pass if the first attempt fails.

3. Let Neo discover a path through a browser flow

from neo.browser import explore

ok, facts = explore(
    goal="Find the pricing page and list the plan names",
    target_app="Safari",
    max_steps=15,
)

This is different from replaying a known skill: Neo perceives the current page, decides on the next grounded action, executes it, and iterates until it either succeeds or runs out of steps.

4. Use the voice entry point

export NEO_VOICE_LAB_PATH=/path/to/voice-lab
python run_voice.py

If NEO_VOICE_LAB_PATH is set, the default config path is <voice-lab>/config.yaml. Otherwise pass an explicit config path:

python run_voice.py /path/to/config.yaml

Skills and State Graphs

Neo Lab stores learned behavior in two forms:

  • skills: executable YAML programs made of typed steps like click_text, type, wait, skill, or scrape
  • states: named screen signatures with OCR and visual markers, plus transitions between them

That lets Neo do more than replay a single macro. It can recognize where it is, choose a route to a target state, invoke the right skills along the path, and blend taught behavior with newly generated actions when needed.

Repository Layout

neo/                 core library
  actions.py         low-level local UI actions
  screen.py          screenshot, OCR, DINO, annotation, vision calls
  skill.py           skill save/load/replay/validation
  state_graph.py     UI state recognition and transition routing
  generate.py        natural-language -> generated action steps
  browser.py         autonomous browser exploration loop
  transcript.py      transcript wrapper for generator context
learn.py             interactive teaching REPL
run_voice.py         voice entry point
session.py           long-lived authenticated-session helper
exercises/           small validation scripts for primitives
tests/               unit and integration tests

Scope and Safety

  • Neo Lab is designed for local experimentation, not unattended production automation.
  • Authentication flows should stay human-mediated or be handled with extreme care.
  • Keep credentials, private service URLs, and sensitive skills or state graphs out of the repo.
  • Treat states.yaml and saved skill files as potentially sensitive data when they describe real internal tools or personal account flows.

License

MIT — see LICENSE.

About

macOS computer-use sandbox for recording skills, generating novel action plans, and discovering UI flows with OCR, grounding, and vision models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors