Automate computational experiments end-to-end using AI agents (Jules & OpenHands)
Transform experiment ideas into complete, production-ready research repositories with automated planning, execution, and comprehensive results β all powered by AI.
This orchestrator takes experiment ideas (from CSV) and uses AI agents to:
- Plan experiments - Design comprehensive experiment plans from scratch
- Write code - Implement all necessary scripts (setup, baseline, experiments, analysis)
- Run experiments - Execute via GitHub Actions with automated validation
- Generate results - Create publication-quality RESULTS.md with visualizations
- Document everything - Professional README with methods, findings, and next steps
All without human intervention.
- Status: βββββ Production-ready
- API: Full REST API automation
- Best for: Easiest setup, most reliable
- Features: Auto-planning, CI iteration, PR creation
- URL: https://jules.google
- Status: βββββ Production-ready
- API: Conversation-based API
- Best for: BYO models (use your own LLM API keys)
- Features: Auto-planning, comprehensive workflows, flexible
- URL: https://app.all-hands.dev
Note: Augment and Cosine were tested and removed:
- Augment: Backend failures (100% error rate)
- Cosine: Architectural mismatch (CI fixer, not experiment creator)
# Clone the repository
git clone <repository-url>
cd zero-shot-ai-agents
# Set environment variables
export GITHUB_TOKEN=ghp_your_token_here
export GITHUB_OWNER=your_github_username
export JULES_API_KEY=your_jules_key_here # For Jules
export OPENHANDS_API_KEY=your_key_here # For OpenHandsCreate data/ideas.csv:
title,has_experiments,idea,experiments
Analyze Stock Trends,False,Use ML to predict stock movements from historical data,
Build Recommender,False,Create a movie recommendation system using collaborative filtering,
# Interactive mode (recommended)
./run_experiments.sh
# Or run directly
cd providers/jules
python orchestrator.py --input ../../data/ideas.csv --max-concurrent 1- Jules: Visit https://jules.google to watch progress
- OpenHands: Visit https://app.all-hands.dev
- GitHub: Check your repos for new PRs with results
βββ data/ # Experiment ideas (CSV files)
β βββ ideas.csv # Your experiment ideas
βββ providers/
β βββ jules/ # Jules orchestrator (β Recommended)
β β βββ orchestrator.py # Main orchestrator script
β β βββ requirements.txt # Dependencies
β β βββ templates/ # Repo templates
β βββ openhands/ # OpenHands orchestrator
β βββ orchestrator.py # Main orchestrator script
β βββ requirements.txt # Dependencies
β βββ templates/ # Repo templates
βββ docs/ # Documentation
β βββ QUICKSTART.md # Detailed setup guide
β βββ CSV_FORMAT.md # Input format specs
β βββ ...
βββ tests/ # Test suite
βββ run_experiments.sh # Interactive launcher
βββ FIXES_APPLIED.md # Technical fixes documentation
βββ README.md # This file
- Dual Pipeline: AI planning OR pre-defined experiments
- Automated Code Generation: Scripts, tests, documentation
- GitHub Integration: Auto-create repos, manage branches
- CI/CD Automation: GitHub Actions workflows
- Publication-Quality Results: Visualizations, deep analysis, statistical tests
- Error Handling: Retry logic, comprehensive logging
- Reproducibility: Seeds, versions, hyperparameters documented
Both Jules and OpenHands now generate:
- Visualizations: Model comparisons, learning curves, error distributions
- Deep Analysis: Error analysis, feature importance, edge cases
- Statistical Validation: P-values, confidence intervals
- Implementation Details: Code links, hyperparameters, seeds
- Specific Next Steps: Actionable recommendations with expected improvements
- Smart .gitignore: Results committed, models excluded
- Timeout Handling: 5-hour limits for complex experiments
- Branch Detection: Works with both
mainandmaster - File Update Logic: Handles existing files correctly
- Connection Resilience: Auto-retry on network errors
1. Read Ideas (CSV)
β
2. Create GitHub Repo
β
3. Seed with Templates
β
4. Start AI Agent
β
5. AI Plans Experiments
β
6. AI Implements Code
β
7. AI Runs via GitHub Actions
β
8. AI Validates Results
β
9. AI Generates RESULTS.md (with plots!)
β
10. AI Creates PR
β
11. Review & Merge
title,has_experiments,idea,experiments
Test Neural Networks,False,Compare CNN vs RNN vs Transformer on MNIST,
your-username/test-neural-networks/
βββ README.md # Professional documentation
βββ RESULTS.md # Comprehensive findings with visualizations
βββ experiments/
β βββ experiments.yaml # Complete experiment plan
βββ scripts/
β βββ setup.py # Environment setup
β βββ data_prep.py # Data preprocessing
β βββ baseline.py # Baseline experiments
β βββ experiment.py # Main experiments
β βββ analysis.py # Results analysis
βββ artifacts/
β βββ plots/
β β βββ model_comparison.png
β β βββ learning_curves.png
β β βββ confusion_matrix.png
β βββ metrics.json # All metrics
β βββ results/ # Detailed results
βββ .github/workflows/
βββ run-experiments.yml # CI automation
- Python: 3.8 or higher
- Git: For repository management
- Network: Internet connection for API calls
For Jules:
export GITHUB_TOKEN=ghp_your_github_token
export GITHUB_OWNER=your_github_username
export JULES_API_KEY=your_jules_api_keyFor OpenHands:
export GITHUB_TOKEN=ghp_your_github_token
export GITHUB_OWNER=your_github_username
export OPENHANDS_API_KEY=your_openhands_key- GitHub Token: https://github.com/settings/tokens (requires
reposcope) - Jules API Key: https://jules.google β Settings β API Keys
- OpenHands Key: https://app.all-hands.dev β Settings
- Free Tier: 15 experiments/day - $0/month
- Pro Tier: 100 experiments/day - ~$30-50/month
- Ultra Tier: 300 experiments/day - ~$100-200/month
- Cloud: ~$10-100/month (usage-based)
- Self-hosted: Your own LLM API costs
- Free: 2,000 minutes/month
- Additional: $0.008/minute
- Typical: $0-20/month for moderate usage
Total: $0-300/month depending on usage
- QUICKSTART.md - Detailed setup guide
- CSV_FORMAT.md - Input file format
- FIXES_APPLIED.md - All 12 technical fixes applied
- TEST.md - Testing guide
# Test CSV parsing
python -m pytest tests/integration/test_csv_parsing.py
# Test GitHub API
python -m pytest tests/integration/test_github.py
# Test provider APIs
python -m pytest tests/providers/test_jules_api.py
python -m pytest tests/providers/test_openhands_api.py- A/B Testing: Compare different ML architectures
- Hyperparameter Tuning: Find optimal configurations
- Algorithm Comparison: Benchmark approaches
- Reproducibility: Automated reproducible experiments
- Proof of Concepts: Quickly validate ideas
- Baseline Implementations: Generate starting points
- Code Generation: Automate boilerplate
- Documentation: Auto-generate comprehensive docs
- Learning Projects: Build complete ML projects
- Course Assignments: Automated project scaffolding
- Tutorials: Generate working examples
Test Results (Current):
- β Jules: 100% success rate (multiple test runs)
- β OpenHands: 100% success rate (tested successfully)
- β Artifacts committed: Yes (Fix #11)
- β Visualizations generated: Yes (Fix #12)
- β Results quality: 5/5 stars (publication-ready)
Fixes Applied: 12 major improvements
- GitHub file handling, branch detection, workflow syntax
- Repository indexing, error logging, timeout handling
- Artifacts preservation, results quality enhancement
- Connection resilience, prompt improvements
We welcome contributions! Here's how:
- Test thoroughly - Run the test suite
- Document changes - Update docs for any new features
- Follow style - Match existing code patterns
- Test both providers - Ensure Jules and OpenHands work
"Repository not indexed" (Jules)
- Wait 20 seconds after repo creation
- Jules auto-retries up to 6 times
"Connection aborted"
- Auto-retry logic handles this (Fix #10)
- Transient network errors automatically recovered
"Artifacts not showing up"
- Fixed in v2.0 (Fix #11)
- New repos automatically commit artifacts
"Results lack visualizations"
- Fixed in v2.0 (Fix #12)
- Both providers now generate comprehensive visualizations
- β 12 major fixes applied
- β Enhanced results quality (visualizations, deep analysis)
- β Artifacts preservation
- β Connection resilience
- β Removed Augment (backend failures)
- β Removed Cosine (architectural mismatch)
- Initial 4-provider support
- Basic experiment orchestration
MIT License - See LICENSE file for details
- Jules by Google Labs - Excellent AI coding agent
- OpenHands by All-Hands - Flexible conversation-based agent
- GitHub - Platform for automation and hosting
- Issues: Open a GitHub issue
- Documentation: Check
docs/directory - Technical Details: See
FIXES_APPLIED.md
Ready to automate your experiments? Run ./run_experiments.sh to get started! π