wendle is a Python framework for black-box Android UI automation, built on ADB and
uiautomator2. It records a manual walkthrough of a device into a directed graph of
screens that spans multiple apps and system surfaces (Settings, the launcher, the
notification shade), not a single app, then replays that recording and navigates between
any two screens on the graph. Arrivals are verified by screen fingerprint; on a mismatch
it stops and reports rather than guessing. Hooks can run arbitrary code (Frida, an LLM,
plain Python) between replayed steps.
Everything runs from the command line against one phone.json. You don't write Python until
you want custom logic.
wendle record --out phone.json --duration 90 # walk the device by hand; save the map
wendle replay phone.json --param password=… # re-enact it (credentials never logged)
wendle replay phone.json --hooks my_hooks.py # inject a HookRegistry between steps
wendle nodes phone.json # list node ids (verified anchors marked)
wendle navigate phone.json --to <node-id> # route to a node and verify arrival
wendle render phone.json --target dot # offline, redaction-safe map (dot/flow/maestro/python)Exit codes carry the result into the shell:
0 verified success (replay completed / navigate arrived)
1 crash (an uncaught exception)
2 usage error (bad flags, missing file, unknown node)
3 honest stop / refusal (stopped / arrived_unverified / off_graph / ...)
3 is separate from 1 so a script can tell a refusal to guess apart from a failure.
uv sync # or: pip install -e .You need a device reachable over adb (USB or wireless debugging). uiautomator2 is
imported lazily, only when you construct a real driver, so tests and offline tooling need
no device.
import wendle
# RECORD: walk the device by hand for 90s. Returns a navigable map and saves it.
# Connects to the device and calibrates touch input on its own.
graph = wendle.record(duration=90, out="phone.json")
# REPLAY: re-enact the recorded walk on the device.
result = wendle.replay("phone.json", wendle.U2Driver())
print(result.status) # COMPLETED, or STOPPED (with a typed reason) if a step failed to verify
# NAVIGATE: route to any node in the map and confirm arrival.
outcome = wendle.navigate(graph, graph.anchors()[0], target_id, wendle.U2Driver())
print(outcome.status) # ARRIVED, ARRIVED_UNVERIFIED, OFF_GRAPH, ...Pass a serial with wendle.U2Driver("RF8…"), or set ANDROID_SERIAL. For tests without a
phone, use wendle.FakeDriver. Render the map with wendle.render(graph, "map.dot").
Because the map models the device rather than one app, you address screens by node id regardless of which app or surface they belong to. To reach a target, the navigator:
- finds the nearest anchor, a node with a verified way to be forced into existence (an app's launcher entry, or a system keyevent such as HOME, back, or recents);
- forces that anchor, then verifies its fingerprint before trusting it;
- computes a weighted shortest path to the target over the recorded graph (so it prefers reliable routes over fewest hops) and walks it, re-observing and re-planning at each step.
Cross-app edges (a share sheet, an OAuth handoff, an OEM intent) are kept in the graph as
re-anchor checkpoints: when a route leaves one app for another, the navigator re-roots at
the destination app's anchor and continues. How to launch an app is decided by a ladder:
recorded component, then recorded launcher-icon tap, then package default, then monkey
launcher. The icon-tap rung is what reaches entries that share a process with another app
(for example an assistant inside the system Google app), where am start cannot.
# Cross surfaces in one workflow: read a value in Settings, act on it in another app.
graph = wendle.Graph.from_json(open("phone.json").read())
driver = wendle.U2Driver()
# Each target is a node id; the navigator launches and routes to whichever app owns it.
wendle.navigate(graph, start, settings_node, driver) # into a system surface
value = read_something(driver) # your code, off the live screen
wendle.navigate(graph, settings_node, other_app_node, driver) # across the app boundarySystem surfaces (launcher, shade, quick settings) are reached by their keyevent anchors, so a route can pass through them and not only through apps.
A hook runs between two replay steps, keyed by step index or by the screen wendle observes.
It reaches the device only through ctx, and returns a directive that decides what happens
next:
from wendle.replay.hooks import HookRegistry, cont, stop, goto
hooks = HookRegistry()
@hooks.after(2) # fires after recorded step 2 has verified
def inspect(ctx):
# ctx.driver is the device seam; ctx.node_id is the verified map node we are on
# (None when the landing was ambiguous, so check it).
reading = read_live_state(ctx) # a Frida read, an LLM call, or plain Python
ctx.emit("native_modules", reading.count) # record a value-free fact on the result
if reading.blocked:
return stop("policy_blocked") # halt instead of continuing
if reading.needs_detour:
return goto(other_node_id) # reroute; the navigator pathfinds and verifies it
return cont() # continue the recorded path (None also means cont)
wendle.replay("phone.json", wendle.U2Driver(), hooks=hooks)@hooks.before(n)and@hooks.after(n)fire immediately before and after a step;@hooks.screen("pkg/.Activity")fires once each time replay arrives at that screen.cont()continues the recorded path.stop(reason)halts with a value-free label; the framework will not continue past it.goto(node)hands control to the navigator, which pathfinds to the target and confirms arrival, returning a typed stop on an ambiguous landing rather than a claimed success.- A hook reaches the device only through
ctx.driverand steers only through its return value. Step indices always refer to the original recording, even after agoto, and a per-run budget bounds reroute loops.
scripts/demo_hook_frida.py is a working on-device Frida hook between two replay steps.
examples/settings_assistant.py routes a system map by node and steers a hooked replay
with goto, live reads, and a stop.
An LLM agent re-decides every action on every run and can confidently tap the wrong thing. With wendle the path is recorded and verified up front, and the agent is one hook at a known point that reads state and reroutes, rather than the thing driving every tap.
replay and navigate return typed results. Branch on the constants, not on strings.
ReplayStatusisCOMPLETEDorSTOPPED, with a typedStopReasonon a stop (ELEMENT_NOT_PRESENT,AMBIGUOUS_MATCH,CREDENTIAL_REQUIRED,HOOK_STOP,GOTO_FAILED, and others).NavStatusis one ofARRIVED(confirmed),ARRIVED_UNVERIFIED(plausibly there but unconfirmable, so the caller decides),OFF_GRAPH,CONTENT_DRIFT,CROSS_APP_BOUNDARY,FORCE_FAILED,NO_ROUTE,COORDINATE_ONLY_REFUSED, orCREDENTIAL_REQUIRED.
A text-free structural match is reported as ARRIVED only when corroborated by a verified
interaction in the same call (a gated launch or a walked recorded edge). Without that
corroboration it could be an unrecorded look-alike screen (Inbox versus Archive), so it
degrades to ARRIVED_UNVERIFIED. A result's repr never contains a selector value or a
secret.
- Capture. A manual recorder watches you drive the device. Scaled
geteventis the primary signal: it reports that a physical tap happened and where, on every screen. Each tap binds to the settled hierarchy snapshot that was on screen at tap time via a timestamped ring buffer. Typed text becomes aset_textaction, with password fields reduced to a{param}handle at capture, before the literal leaves the buffer. - Fingerprint. Each screen gets a structural signature: a hashed tree of
(class, resource-id, clickable, content-desc shape)with list children collapsed and volatile subtrees (clock, battery, badges) stripped, namespaced by foreground package and window. A separate text-freestructure_iddrives a graded match tier (EXACT, STRUCTURE, WEAK, UNVERIFIABLE) used to decide how confident an arrival is. - Graph. A
networkxMultiDiGraph, persisted to JSON as structure only: no callables, no secret literals. Routing is a weighted shortest path from the nearest forceable anchor. - Replay and verify. A launch step plus a wait-then-verify loop with a tolerance band absorbs A/B and loading-screen variation without false aborts, and stops with a report when it lands off-graph.
wendle takes several pieces from existing tools:
- The launch and per-command wait-then-verify model is from Maestro's flow runner;
wendle render --target maestroemits a Maestro flow. - Re-observing and re-planning at each navigation step, and restarting at an anchor to recover, follow DroidBot's UI Transition Graph.
- The hashed structural-signature approach to fingerprinting is from APE and Fastbot.
set_checked(read, modify, verify; flip only on a mismatch) follows Playwright'ssetCheckedand the Appium check-then-click idiom.- Attaching hooks to graph nodes by reference, rather than serializing them, mirrors
LangGraph's
add_node(name, callable).
v1, in this repo: the manual recorder, deterministic replay, graph navigation, and the inter-step hooks. Record, replay, and navigate are validated on a Galaxy S23, including launching arbitrary apps (tested across 50), launching entries that share a process, passing through system surfaces, and telling structural twins apart.
Limits:
- Device-scoped, not fleet-portable. A recording is bound to one device's calibration, resolution, and OS build. Semantic selector edges port across devices; structural fingerprints, coordinate-only edges, and force actions do not. One graph across a fleet is a v2 design (a logical graph plus a per-device overlay).
- Multi-app handoff within a single recorded run is not yet proven end-to-end on a device. The data model, routing, and per-app launch support it, but a live A→B→C handoff in one run has not been validated.
- Compose, coordinate-only, and WebView. testTag-less single-Activity Compose is the limit of structural fingerprinting; screens with no stable selector replay through flagged raw coordinates; WebView is low-confidence. Flutter and games are out of scope (no usable accessibility tree).
- Maintenance is reduced, not removed. An app UI change invalidates the affected screens' fingerprints, which then need re-recording. Re-recording a changed flow is usually cheaper than re-authoring the equivalent selector code.
Not built (planned for v2): crawling an app to build the map without a manual walk, and a live observability dashboard.
uv run python -m pytest -q # 844 tests against FakeDriver, no phone neededThe device sits behind a DeviceDriver port, so fingerprinting, gesture segmentation, and
the selector ladder are pure functions over driver output (XML in, hash out) and run in CI
against recorded-hierarchy fixtures, including adversarial ones (truncated trees, empty
dumps).