Use this file as context when building agents that control Android/iOS devices via baremobile.
Every agent interaction follows observe-think-act:
import { connect } from 'baremobile';
const page = await connect(); // auto-detect device (auto-reconnects WiFi if needed)
let snapshot = await page.snapshot(); // observe
// Agent reads snapshot, picks action
await page.tap(5); // act
snapshot = await page.snapshot(); // observe againAlways snapshot after every action. Refs reset per snapshot — never cache them.
- ScrollView [ref=1]
- Group
- Text "Settings"
- Group [ref=2]
- Text "Search settings"
- List
- Group [ref=3]
- Text "Wi-Fi"
- Switch [ref=4] (Wi-Fi) [checked]
- Group [ref=5] [disabled]
- Text "Airplane mode"
What to read:
[ref=N]— interactive element, use with tap/type/scroll"quoted text"— visible text on screen(parenthesized)— contentDesc / accessibility label[checked],[selected],[focused],[disabled]— element state- Indentation = nesting (parent-child)
Roles: Text, TextInput, Button, Image, ImageButton, CheckBox, Switch, Radio, Toggle, Slider, Progress, Select, List, ScrollView, Group, TabList, Tab. Unknown classes show their short Java class name.
await page.launch('com.android.settings'); // open app by package
await page.intent('android.settings.BLUETOOTH_SETTINGS'); // deep nav via intent
await page.back(); // press back
await page.home(); // press home
await page.press('recent'); // app switcherconst yaml = await page.snapshot(); // pruned YAML with refs
const png = await page.screenshot(); // PNG bufferawait page.tap(ref); // tap element
await page.tapXY(540, 1200); // tap by pixel coordinates
await page.tapGrid('C5'); // tap by grid cell
await page.type(ref, 'text'); // type into field
await page.type(ref, 'new', {clear: true}); // clear field first, then type
await page.press('enter'); // press key
await page.scroll(ref, 'down'); // scroll within element
await page.longPress(ref); // long press
await page.swipe(x1, y1, x2, y2, 300); // raw swipeconst snap = await page.waitForText('Bluetooth', 10000); // poll until text appears
const snap = await page.waitForState(3, 'checked', 10000); // poll until state matches
// States: 'enabled', 'disabled', 'checked', 'unchecked', 'focused', 'selected'back, home, enter, delete, tab, escape, up, down, left, right, space, power, volup, voldown, recent
Snapshot shows: TextInput [ref=3] "Search settings" [focused]
- If
[focused]— just type, no extra tap needed:page.type(3, 'wifi') - If not focused —
page.type(3, 'wifi')will tap first automatically - To replace existing text:
page.type(3, 'new text', {clear: true})
Snapshot shows: ScrollView [ref=1] → List → Group [ref=2] "Wi-Fi" ...
- Tap an item:
page.tap(2) - Scroll for more:
page.scroll(1, 'down')then snapshot again - Items at the bottom may not be visible — scroll and re-snapshot
Snapshot shows: Text "Allow access?" → Button [ref=5] "Allow" → Button [ref=6] "Deny"
- Read dialog text, decide, tap the appropriate button
- Dialogs always have their buttons in the snapshot with refs
await page.launch('com.android.settings');
await new Promise(r => setTimeout(r, 2000)); // wait for app to load
const snapshot = await page.snapshot();Common packages: com.android.settings, com.android.chrome, com.google.android.apps.messaging, com.google.android.dialer, com.android.contacts
await page.intent('android.settings.BLUETOOTH_SETTINGS');
await page.intent('android.settings.WIFI_SETTINGS');
await page.intent('android.settings.DISPLAY_SETTINGS');
await page.intent('android.settings.SOUND_SETTINGS');
await page.intent('android.settings.LOCATION_SOURCE_SETTINGS');
await page.intent('android.settings.AIRPLANE_MODE_SETTINGS');
await page.intent('android.settings.APPLICATION_SETTINGS');
// With extras:
await page.intent('android.intent.action.VIEW', { url: 'https://example.com' });Skip multi-step navigation when you know the intent action.
const png = await page.screenshot(); // get visual
const grid = await page.grid(); // get grid info
console.log(grid.text); // "Screen: 1080×2400, Grid: 10 cols (A-J) × 22 rows..."
// Send screenshot + grid.text to vision model
// Model responds: "tap C5"
await page.tapGrid('C5'); // or page.tapXY(x, y)Use when: Flutter apps crash uiautomator, WebView content invisible, snapshot seems wrong.
launch('com.google.android.apps.messaging')- Snapshot → find "Start chat" button →
tap(ref) - Snapshot → find TextInput for "To:" →
type(ref, '5551234567') - Snapshot → find suggestion like "Send to (555) 123-4567" →
tap(ref) - Snapshot → find compose TextInput →
type(ref, 'Hello!') - Snapshot → find "Send SMS" button →
tap(ref)
Each step: snapshot, read, decide, act. The agent adapts to whatever the UI shows.
- In compose view, find emoji button (contentDesc contains "emoji") →
tap(ref) - Snapshot → emoji grid appears, each emoji is
View [ref=N] (😀)with name in contentDesc - Tap the emoji ref → it inserts into the TextInput
- Press back or tap outside to close emoji panel
- Find attach/
+button (contentDesc "Show attach" or "Show more options") →tap(ref) - Snapshot → options appear: Gallery, Files, Location, etc. →
tap(ref)for Files - System file picker opens → snapshot shows folders and files with refs
- Navigate to file →
tap(ref)to select
await page.press('power'); // wake
await page.swipe(540, 1800, 540, 800, 300); // swipe up
await page.type(ref, '1234'); // PIN (if needed)
await page.press('enter');Refs reset every snapshot. Never store a ref and use it after another snapshot. Always re-read.
Snapshot takes 1-5 seconds. uiautomator dump is slow, especially on emulators. Don't snapshot in a tight loop.
Wait after actions. UI needs time to settle. Wait 500ms-2s after taps, 2-3s after launching apps.
Some list items aren't clickable. Android file picker drawer items, some system UI elements don't have clickable=true so they don't get refs. Use raw swipe() to coordinates as fallback.
WebView content is invisible. uiautomator can't see inside WebViews. If the snapshot looks empty/shallow in a browser or hybrid app, that's why. Future: CDP bridge.
Switch/toggle may disappear when off. Android sometimes removes unchecked Switch/Toggle elements from the accessibility tree. On the Bluetooth page, when BT is off the Switch disappears — only Text "Use Bluetooth" remains. No switch present = off. Don't look for Switch [unchecked].
Toggles have transitional states. After tapping a system toggle (Bluetooth, WiFi), it briefly shows [disabled] while the hardware state changes. Use waitForText() or waitForState() instead of fixed delays to confirm the action completed.
HTML entities in text. Decoded at parse time. & → &, < → <, etc. Snapshots show clean text.
Emojis show as entities in contentDesc. View [ref=8] (😀) means the emoji 😀. The agent can read the unicode codepoint or just tap by ref position in the grid.
type() is word-by-word. On API 35+, adb input text is broken for spaces. baremobile splits text into words and injects KEYCODE_SPACE between them. This means typing is slower for long strings. Shell special characters (& | ; $ ~ # % ^ * { } [ ] ! ? and quotes) are escaped automatically.
Wireless debugging drops on reboot. Must re-enable in Developer Options and re-pair after every device restart. The connection is not persistent.
Pairing port differs from connect port. The port shown when tapping "Pair device with pairing code" is NOT the port for adb connect. The connect port is shown on the main Wireless debugging screen.
No screen control. Termux:API cannot read the screen, take snapshots, or tap elements. It provides direct Android API access only (SMS, calls, location, etc.). Use Termux ADB for screen control.
Commands are blocking. termux-* commands run synchronously. location() can take several seconds waiting for a GPS fix. cameraPhoto() blocks until capture completes.
Some commands need a real device. smsSend(), call(), location() require hardware (SIM card, GPS) that emulators don't have. batteryStatus(), clipboardGet/Set(), volumeGet(), wifiInfo(), vibrate() work on emulators.
Termux:API addon must be installed separately. The termux-api package (CLI tools) AND the Termux:API Android app (F-Droid) are both required. Missing the app causes silent failures.
baremobile can run inside Termux on the phone itself — no USB, no host machine.
# In Termux:
pkg install android-tools nodejs-lts
# On the phone: Settings → Developer options → Wireless debugging → ON
# Tap "Pair device with pairing code" — note the port + code
adb pair localhost:PORT CODE
# Note the connect port (shown on Wireless debugging screen, different from pairing port)
adb connect localhost:PORT
# Verify
adb devices # should show localhost:PORT deviceThen in Node.js:
import { connect } from 'baremobile';
const page = await connect({ termux: true }); // or auto-detects
const snap = await page.snapshot();Limitations: Wireless debugging must be re-enabled after every reboot. The pairing code is one-time but the connection drops on reboot.
Install Termux:API addon from F-Droid, then:
pkg install termux-apiimport * as api from 'baremobile/src/termux-api.js';
// Check availability
if (await api.isAvailable()) {
await api.smsSend('5551234', 'Hello from baremobile!');
const inbox = await api.smsList({ limit: 5, type: 'inbox' });
await api.call('5551234');
const loc = await api.location({ provider: 'network' });
const battery = await api.batteryStatus();
await api.clipboardSet('copied text');
const text = await api.clipboardGet();
await api.notify('Agent', 'Task complete', { sound: true });
await api.torch(true); // flashlight on
await api.vibrate({ duration: 500 });
}Termux:API is not screen control — it's direct Android API access. Use it for SMS, calls, location, camera, clipboard. Faster and more reliable than tapping through the UI.
Same snapshot() / tap(ref) pattern as Android. WDA XML is translated into the shared prune/format pipeline, producing identical YAML output.
import { connect } from 'baremobile/src/ios.js';
const page = await connect();
console.log(await page.snapshot());
await page.tap(1);
await page.type(2, 'hello');
await page.launch('com.apple.Preferences');
await page.back();
await page.screenshot();
page.close();| Method | What it does |
|---|---|
page.snapshot() |
Hierarchical YAML (same format as Android) |
page.tap(ref) |
Coordinate tap at bounds center |
page.type(ref, text, opts) |
Tap to focus + WDA keys. {clear: true} to clear first |
page.scroll(ref, direction) |
Swipe within element bounds (up/down/left/right) |
page.swipe(x1, y1, x2, y2, duration) |
Raw swipe between coordinates |
page.longPress(ref) |
Long press at bounds center (1s) |
page.tapXY(x, y) |
Tap by pixel coordinates |
page.back() |
Find back button in refMap, fallback to swipe-from-left-edge |
page.home() |
WDA homescreen |
page.launch(bundleId) |
Launch app by bundle ID |
page.screenshot() |
PNG buffer |
page.waitForText(text, timeout) |
Poll snapshot until text appears |
page.press(key) |
home, volumeup, volumedown only |
page.unlock(passcode) |
Unlock device (throws if wrong passcode) |
page.close() |
Close connection and clean up |
- Bundle IDs, not package names —
com.apple.Preferencesnotcom.android.settings - No intents — use
page.launch(bundleId)for app navigation - No grid/tapGrid — coordinate tap from bounds is reliable
- Back is semantic — searches refMap for back button, falls back to swipe gesture
- press() is limited — only
home,volumeup,volumedown. Usetap(ref)for UI buttons.
- WDA on device — signed with free Apple ID (7-day cert, re-sign weekly)
- pymobiledevice3 — setup only (tunnel, DDI mount, WDA launch). Python 3.12.
- USB cable required — WiFi tunnel needs Mac/Xcode, not possible on Linux
- Developer Mode on iPhone — required for developer services
baremobile setup # interactive wizard — Android (emulator/USB/WiFi/Termux) + iOS
baremobile ios resign # re-sign WDA when cert expires (every 7 days)
baremobile ios teardown # kill tunnel/WDA processesMCP server (mcp-server.js) for Claude Code and other MCP clients.
claude mcp add baremobile -- node /path/to/baremobile/mcp-server.jsAll tools accept optional platform: "android" | "ios" (default: android).
| Tool | Params | Returns |
|---|---|---|
snapshot |
maxChars?, platform? |
YAML tree (or file path if >30K chars) |
tap |
ref, platform? |
'ok' |
type |
ref, text, clear?, platform? |
'ok' |
press |
key, platform? |
'ok' |
scroll |
ref, direction, platform? |
'ok' |
swipe |
x1, y1, x2, y2, duration?, platform? |
'ok' |
long_press |
ref, platform? |
'ok' |
launch |
pkg, platform? |
'ok' |
screenshot |
platform? |
base64 PNG |
back |
platform? |
'ok' |
find_by_text |
text, platform? |
ref number or null |
Action tools return 'ok' — call snapshot to observe the result. Large snapshots saved to .baremobile/screen-{timestamp}.yml when exceeding maxChars (default 30,000). iOS cert warning prepended to first snapshot if cert is >6 days old.
Session-based control for shell scripting and automation.
baremobile open [--device=SERIAL] [--platform=android|ios]
baremobile status
baremobile close# Screen
baremobile snapshot # -> .baremobile/screen-*.yml
baremobile screenshot # -> .baremobile/screenshot-*.png
baremobile grid # screen grid info (for vision fallback)
# Interaction
baremobile tap <ref>
baremobile tap-xy <x> <y>
baremobile tap-grid <cell>
baremobile type <ref> <text> [--clear]
baremobile press <key>
baremobile scroll <ref> <direction>
baremobile swipe <x1> <y1> <x2> <y2> [--duration=N]
baremobile long-press <ref>
baremobile launch <pkg>
baremobile intent <action> [--extra-string key=val ...]
baremobile back
baremobile home
# Waiting
baremobile wait-text <text> [--timeout=N]
baremobile wait-state <ref> <state> [--timeout=N]
# iOS management
baremobile setup
baremobile ios resign
baremobile ios teardown
# Logging
baremobile logcat [--filter=TAG] [--clear]All output goes to .baremobile/ in the current directory. Action commands print ok. File-producing commands print the file path. Errors go to stderr with non-zero exit.
baremobile open --json # {"ok":true,"pid":1234,"port":40049}
baremobile snapshot --json # {"ok":true,"file":"/path/.baremobile/screen-*.yml"}
baremobile tap 4 --json # {"ok":true}
baremobile status --json # {"ok":false,"error":"No session found."}Every response has ok: true|false. File-producing commands include file. Errors include error.
If an action doesn't seem to work:
- waitForText — use
waitForText('expected text', 5000)instead of guessing delays - Snapshot again — the UI may have changed during the action
- Screenshot + vision —
screenshot()+grid()if the ARIA tree looks wrong - Press back — if stuck in an unexpected state, back out and retry
- Home + relaunch — nuclear option to reset to known state