Drive any Android phone from Python — no root, no Appium, no app instrumentation.
Pixel-perfect tapping · OpenCV + OCR element finding · text input that actually works in WeChat & Flutter.
简体中文 · Features · Installation · Quick Start · Modules
Most Android automation breaks on the same two walls: adb shell input text mangles or drops characters the moment a real IME is active, and custom-rendered UIs (WeChat, Flutter, WebView, games) expose no accessibility tree, so uiautomator2 selectors find nothing.
This kit solves both. It uses set_text() where a native EditText exists and falls back to an ADB-Keyboard base64 broadcast where it doesn't, and it locates elements by template matching + OCR when there is no view hierarchy to query. Everything runs over plain ADB, with no root and nothing installed inside the target app.
- Precision Tap Calibration — Verify that
adb shell input tapmaps 1:1 to physical screen pixels using grid, boundary, and crosshair tests. - Universal Tap Controller — Click, long-press, swipe, double-tap, pinch, and scroll with boundary protection and relative/absolute coordinates.
- Device Controller — System-level control: volume, brightness, navigation, notifications, app launch, WiFi/Bluetooth/airplane mode, battery and storage queries.
- Vision Controller — Template matching (OpenCV), OCR text detection (Tesseract), color region detection, scroll-until-found, screenshot diffing, and wait-for-change.
- IME Controller — Dual-strategy text input:
uiautomator2.set_text()for native EditText, ADB Keyboard base64 broadcast for custom inputs (WeChat, Flutter, WebView). - App Scanner — Batch-scan multiple apps to detect EditText availability and OCR readability.
All modules work without root access. A USB-debugging-enabled Android device and adb are sufficient.
# macOS
brew install android-platform-tools tesseract opencv
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install adb tesseract-ocr libtesseract-dev python3-opencv
# Windows (via Chocolatey)
choco install adb tesseract opencvpip install -r requirements.txtCore Python packages:
| Package | Purpose |
|---|---|
uiautomator2 |
UI hierarchy access and gesture simulation |
opencv-python |
Template matching, image processing |
pytesseract |
OCR text recognition |
numpy |
Numerical operations |
- Open Settings → About phone
- Tap Build number 7 times to enable Developer options
- Go to Settings → System → Developer options
- Turn on USB debugging
- Connect your phone via USB and authorize the computer
Verify connection:
adb devices
# Expected output:
# xxxxxxxx deviceCopy the example configuration and edit it with your device serial and screen resolution:
cp config.example.json config.jsonEdit config.json:
{
"device_serial": "your_device_serial_here",
"screen_width": 1200,
"screen_height": 2670,
"screenshot_dir": "./screenshots",
"default_ime": "com.android.inputmethod.latin/.LatinIME",
"adb_keyboard_ime": "com.android.adbkeyboard/.AdbIME"
}Tip: Find your device serial with
adb devices. Find screen resolution withadb shell wm size.
python tap_calibrator.py --verify # Check resolution
python tap_calibrator.py --grid # Grid test (requires Pointer location ON)
python tap_calibrator.py --boundary # Corner + center test
python tap_calibrator.py --cross 600 1335 # Crosshair around centerfrom tap_controller import tap, swipe, scroll_down, double_tap
# Simple tap
tap(500, 1000)
# Long press (1 second)
tap(500, 1000, duration_ms=1000)
# Swipe
swipe(100, 1500, 100, 500, duration_ms=300)
# Scroll down
scroll_down()
# Relative tap (50% across, 80% down)
from tap_controller import tap_relative
tap_relative(0.5, 0.8)from ime_controller import IMEController
ctrl = IMEController()
# Auto-detects native EditText vs custom input
ctrl.type_text("Hello, world!")
# Force ADB Keyboard for WeChat/Flutter/WebView
ctrl.type_text("Custom input", force_adb=True)
# Clear input
ctrl.clear()from vision_controller import VisionController
vis = VisionController()
# Tap a template image
vis.tap_template("icons/search_button.png")
# Tap text via OCR
vis.tap_text("Settings")
# Scroll down until text appears
vis.scroll_find_text("Load more", max_scrolls=10)
# Wait for screen to change after a tap
vis.wait_for_change(timeout=5)from device_controller import DeviceController
dev = DeviceController()
dev.press_home()
dev.set_brightness(128)
dev.set_media_volume(10)
dev.launch_app("com.android.settings")
print(dev.get_battery())Diagnostic tool to confirm ADB tap coordinates map 1:1 to screen pixels.
| Flag | Description |
|---|---|
--verify |
Check that device resolution matches config |
--grid |
Tap a grid of points; use with "Pointer location" developer option to verify red dots align |
--boundary |
Tap four corners + center |
--cross X Y |
Tap center, left, right, top, bottom around (X,Y) |
--tap X Y |
Single tap at coordinate |
Core tapping primitives. All coordinates are absolute pixels from top-left (0,0).
Key functions:
tap(x, y, duration_ms=0)— precise tap or long-presstap_safe(x, y, ...)— tap with margin protection to avoid system gesturesswipe(x1, y1, x2, y2, duration_ms)— linear swipedouble_tap(x, y)— two rapid tapspinch(center_x, center_y, ...)— pinch gesture via uiautomator2scroll_down/up/left/right()— convenience scroll wrapperstap_relative(px, py)— proportional coordinates (0.0–1.0)CalibrationSet— save/load named calibration points as JSON
System-level Android control via ADB keyevents and service calls.
Categories:
- Navigation — back, home, recent, lock, wake
- Volume — media, ring, mute
- Brightness — manual level, auto-toggle
- Screenshot / Screenrecord
- Notifications — expand, collapse, clear
- Apps — launch, force-stop, clear data, get foreground app
- Connectivity — WiFi, Bluetooth, airplane mode
- Media — play/pause, next, prev
- Phone / SMS — dial, compose
- IME — list, switch
- Info — battery, storage, memory
Computer-vision-based screen interaction for elements that uiautomator2 cannot access.
Capabilities:
find_template()/tap_template()— OpenCV template matching with NMS deduplicationfind_text()/tap_text()— Tesseract OCR with confidence filteringlist_texts()— enumerate all recognized text regionsfind_color()— detect regions by BGR color rangescroll_find_text()/scroll_find_template()— scroll until target appearscompare_screenshots()— pixel-level diff with thresholdwait_for_change()— poll screen until a region changes
Text input abstraction that auto-selects the best strategy.
Strategy A — set_text() (preferred):
Uses uiautomator2 to directly set text on native EditText elements. Bypasses the IME entirely. Works for SMS, standard apps.
Strategy B — ADB Keyboard (fallback):
Switches to com.android.adbkeyboard and broadcasts base64-encoded text. Required for WeChat custom input fields, Flutter, WebView, and other non-native text containers.
Why not adb shell input text?
Because when an IME is active, input text causes character reordering or crashes on many devices. This toolkit avoids it entirely.
Additional features:
clear()— empty the input fielddelete(n)— send N backspace keyeventsselect_all()/copy()/paste()— clipboard operations (native EditText only)get_text()— read current input content
Batch diagnostic tool. Launches a list of apps, counts native EditText elements, runs OCR, and reports which apps are easy to automate vs. which need vision/ADB-Keyboard strategies.
The toolkit reads device settings from config.json in the working directory. If the file is missing, it falls back to environment variables:
| Config Key | Environment Variable | Default |
|---|---|---|
device_serial |
ADB_DEVICE_SERIAL |
First device in adb devices |
screen_width |
SCREEN_WIDTH |
1200 |
screen_height |
SCREEN_HEIGHT |
2670 |
screenshot_dir |
SCREENSHOT_DIR |
./screenshots |
default_ime |
DEFAULT_IME |
(none) |
adb_keyboard_ime |
ADB_KEYBOARD_IME |
(none) |
android-adb-automation-kit/
├── README.md
├── README.zh-CN.md
├── LICENSE
├── requirements.txt
├── config.example.json
├── tap_calibrator.py # Calibration & verification CLI
├── tap_controller.py # Core tap/swipe/scroll API
├── device_controller.py # System control API
├── vision_controller.py # CV-based element detection
├── ime_controller.py # Text input abstraction
├── app_scanner.py # Batch app diagnostic
├── extended_verify.py # 25-point precision grid test
└── examples/
├── wechat_search.py # Example: search a contact via vision+tap
└── settings_flow.py # Example: navigate system settings
Unlock your phone and confirm the "Allow USB debugging?" dialog. Check "Always allow from this computer."
Run tap_calibrator.py --grid with Pointer location enabled (Developer options). If red dots are offset, your device may have a non-standard coordinate mapping. Some devices with curved edges or display cutouts require margin adjustments — use tap_safe() with custom margins.
Install the correct Tesseract language pack:
# macOS
brew install tesseract-lang
# Ubuntu
sudo apt-get install tesseract-ocr-chi-sim tesseract-ocr-engThen set lang="chi_sim+eng" in VisionController methods.
Download and install ADB Keyboard on your device:
adb install adbkeyboard.apkThen set its IME ID in config.json under adb_keyboard_ime.
# Re-init uiautomator2 on the device
python -m uiautomator2 initMIT