Augmented Thinking, Heuristic Evaluation & Navigation AI
This project is currently under active development and experimentation. For detailed research, experiments, and implementation work, please visit the research branch.
Athena is a device-integrated AI assistant that automates cross-application workflows using real-time screen vision, LLM-based intent parsing, voice recognition, and native UI extraction across platforms. ATHENA "sees" your screen, "hears" commands, and interacts with applications via UI tree queries when possible.
This project leverages cutting-edge AI and automation technologies:
- Magma-8B - Microsoft's vision-language model for multimodal understanding
- UI Automation - Native UI element extraction and interaction using
uiautomationlibrary - PyAutoGUI - Cross-platform GUI automation for fallback interactions
- LLM Integration - Intent parsing and decision-making capabilities
Current Focus: Windows
Planned Expansion:
- MacOS
- Android
- iOS
- Other desktop and mobile platforms
All experimental code, research notes, and development work can be found in the research branch. This includes:
- UI tree inspection tools
- Screen parsing experiments
- Magma model integration tests
- Cross-application automation prototypes
TBD
This is currently a research project. Contributions and feedback are welcome as the project evolves.