ML-Master: An AI-for-AI Agent for Autonomous Machine Learning

This project is a Python implementation of the AI-for-AI agent described in the paper "ML-Master: Towards AI-for-AI via Integration of Exploration and Reasoning". It provides a general framework for an AI agent that can autonomously solve machine learning problems through a process of exploration and reasoning.

Core Concepts

The ML-Master agent is designed to mimic the iterative workflow of a human data scientist. It combines two core concepts:

Balanced Multi-Trajectory Exploration: The agent explores the vast space of possible solutions using a Monte Carlo Tree Search (MCTS). Each node in the tree represents a specific version of a solution (e.g., a piece of code). The agent uses the UCT algorithm to balance exploring new, uncertain paths (exploration) with refining known, promising paths (exploitation).
Steerable Reasoning: The agent uses a Large Language Model (LLM), such as Google's Gemini, as its reasoning engine. To guide the LLM, ML-Master employs an Adaptive Memory mechanism. Before generating a new solution, it constructs a detailed context (memory) from previous attempts, including the code, execution feedback, and performance scores from parent and sibling nodes in the MCTS tree. This allows the agent to learn from its mistakes and make informed decisions.

The interplay between these two components creates a powerful feedback loop: Exploration generates diverse experiences, and Reasoning learns from these experiences to guide future exploration.

Project Structure

ml-master/
├── src/
│   ├── ml_master/
│   │   ├── __init__.py
│   │   ├── core.py         # Defines the MCTSNode, the core data structure
│   │   ├── explorer.py     # Implements the MCTS exploration logic
│   │   ├── reasoner.py     # Implements the Steerable Reasoning and LLM interaction
│   │   └── main.py         # Main entry point for running the agent
│   └── __init__.py
├── titanic/                # Example project directory
│   ├── train.csv
│   ├── test.csv
│   ├── gender_submission.csv
│   ├── task_description.txt # Project-specific instructions for the agent
│   └── generated_submissions/ # Output directory for AI-generated solutions
├── .gitignore
├── pyproject.toml          # Project dependencies managed by `uv`
└── README.md               # This file

How It Works: A General Workflow

Initialization: The main.py script is launched, pointing to a specific project (e.g., titanic). It reads the task_description.txt for that project to understand its goal.
MCTS Cycle Begins: a. Select: The Explorer traverses the MCTS tree to find the most promising node to expand. b. Reason: The Reasoner constructs the adaptive memory from the selected node's ancestors and siblings. c. Generate: The Reasoner sends a detailed prompt (containing the task description and the adaptive memory) to the LLM, asking it to Draft, Debug, or Improve the solution. d. Verify: The Explorer takes the code generated by the LLM and runs it. For a standard ML problem, this involves: i. Writing the code to a file. ii. Executing the script, which is expected to train a model and print a performance metric (e.g., accuracy). iii. Parsing the metric from the script's output. e. Backpropagate: The performance metric is converted into a reward, which is then propagated back up the MCTS tree, updating the statistics of all parent nodes.
Iteration: The agent repeats the MCTS cycle, continuously refining its solution based on the feedback from previous attempts.

Usage

Follow these steps to run the ML-Master agent on the included Titanic example project.

1. Prerequisites

Make sure you have Python 3.10+ and uv installed.

2. Install Dependencies

Create the virtual environment and install the required packages:

uv sync

3. Set Environment Variable

The agent requires a Gemini API key to function. Create a .env file in the root of the project directory:

GEMINI_API_KEY='your_api_key_here'

The application will automatically load this key at runtime.

4. Run the Agent

Execute the agent using the following command:

uv run python -m src.ml_master.main [OPTIONS]

Key Options:

--project_name: The name of the project directory (default: titanic).
--task_description_file: Path to the file containing the task description (default: titanic/task_description.txt).
--num_iterations: Total number of MCTS cycles to run (default: 20).
--parallelism: The number of MCTS cycles to run in parallel (default: 3).

Example Command:

To run the agent on the Titanic project for 10 iterations with 2 parallel workers:

uv run python -m src.ml_master.main --project_name titanic --num_iterations 10 --parallelism 2

Note on API Quotas: Running with high parallelism may cause API rate limit errors (429). If this occurs, try reducing the --parallelism value to 1.

Case Study: Solving the Titanic Competition

When run on the Titanic problem, ML-Master demonstrates a clear learning progression:

Iteration 1: The agent typically generates a simple baseline script, which might include basic data loading and a simple model, achieving an initial accuracy.
Subsequent Iterations: The agent receives a positive reward and is prompted to Improve. It then starts adding more sophisticated features and techniques it has learned from the LLM's training data:
- Feature Engineering: It learns to handle missing Age values by using passenger titles, create a FamilySize feature, and one-hot encode categorical variables.
- Advanced Modeling: It often progresses from a simple LogisticRegression to a more powerful RandomForestClassifier or GradientBoostingClassifier.
- Hyperparameter Tuning: In later stages, it can even implement RandomizedSearchCV to optimize the model's hyperparameters.

The output logs will show the Q/N (average reward) of the root node increasing over time, reflecting the agent's successful improvement of the solution. The generated scripts, which can be found in titanic/generated_submissions, provide a fascinating step-by-step look into the agent's evolving strategy.

Conclusion

This project successfully implements the ML-Master framework, demonstrating its ability to autonomously solve a classic machine learning problem. It serves as a powerful proof-of-concept for the potential of AI-for-AI systems and provides a solid foundation for future research and development in this exciting field.

Future Development and Optimization Directions

To evolve this proof-of-concept into a more robust and capable framework, the following areas are identified as the most valuable for future development, based on the original paper's vision and practical implementation needs.

I. Robustness and Reliability

Code Execution Sandbox
- Problem: Currently, LLM-generated code is executed directly via subprocess, which poses a significant security risk.
- Solution: Implement a secure sandbox to isolate code execution. This could be achieved using technologies like Docker containers or OS-level sandboxing tools (e.g., nsjail). A sandbox would prevent the code from accessing unauthorized files or network resources, making the agent safe to run on any task.
State Persistence and Recovery
- Problem: The entire MCTS tree exists only in memory. If the program crashes or is stopped after hours of exploration, all progress is lost.
- Solution: Implement a mechanism to periodically save the state of the MCTS tree to disk (e.g., as a serialized object like a pickle file, or in a more structured format like a lightweight database). The agent should be able to resume a previous run from the last saved state, making long-running, complex tasks feasible.
Structured Evaluation and IPC
- Problem: The agent currently relies on parsing stdout (e.g., looking for "Accuracy: X.XX") to evaluate a solution's performance. This is brittle and can easily fail if the output format changes slightly.
- Solution: Define a more robust Inter-Process Communication (IPC) contract. For example, the generated script could be required to write its results (metrics, path to model file, etc.) to a predefined JSON file (e.g., results.json). The main process would then read this file, eliminating the need for fragile string parsing.

II. Intelligence and Capability Enhancement

Tool-Augmented Reasoning
- Problem: The Reasoner's knowledge is limited to its training data and the adaptive memory. It cannot access external information or inspect the project's context beyond what's provided in the prompt.
- Solution: Empower the LLM with tools, similar to the ReAct (Reason + Act) paradigm. This would involve giving the Reasoner the ability to call functions like:
  - read_file(path): To inspect other files in the project.
  - list_directory(path): To understand the project structure.
  - web_search(query): To look up documentation for a library or find solutions to specific errors. This would dramatically increase the agent's problem-solving capabilities.
Advanced Memory Retrieval
- Problem: The current adaptive memory is limited to the direct parent and siblings. The agent cannot recall valuable insights from distant, but relevant, branches of the MCTS tree.
- Solution: Implement a more sophisticated memory retrieval system. This could involve:
  - Vector Embeddings: Storing the "Reasoning" part of each node as a vector embedding.
  - Semantic Search: When facing a new problem (e.g., a specific error or an improvement goal), the agent could perform a semantic search across the entire tree to find nodes that dealt with similar issues, even if they are in completely different branches. This would allow for more powerful, non-local learning.

III. Framework Generalization and Usability

Configuration Management
- Problem: Key parameters like MCTS constants (exploration_constant, max_failed_improvements) are hardcoded or passed as numerous command-line arguments.
- Solution: Move all configuration into a dedicated file (e.g., config.yaml). This would make it much easier to manage settings for different projects and experiments, improving the framework's usability.
Abstract Task Interface
- Problem: The system is currently tailored to the Titanic project. Adapting it to a new problem requires manual changes to the code that runs and evaluates the solution.
- Solution: Define an abstract Task base class or interface. A new project would then simply need to provide a task.py file that implements this interface (e.g., setup_data(), evaluate_solution(submission_file)). The main agent logic would remain unchanged, making the framework truly plug-and-play for any machine learning competition or task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-Master: An AI-for-AI Agent for Autonomous Machine Learning

Core Concepts

Project Structure

How It Works: A General Workflow

Usage

1. Prerequisites

2. Install Dependencies

3. Set Environment Variable

4. Run the Agent

Case Study: Solving the Titanic Competition

Conclusion

Future Development and Optimization Directions

I. Robustness and Reliability

II. Intelligence and Capability Enhancement

III. Framework Generalization and Usability

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
titanic		titanic
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

ML-Master: An AI-for-AI Agent for Autonomous Machine Learning

Core Concepts

Project Structure

How It Works: A General Workflow

Usage

1. Prerequisites

2. Install Dependencies

3. Set Environment Variable

4. Run the Agent

Case Study: Solving the Titanic Competition

Conclusion

Future Development and Optimization Directions

I. Robustness and Reliability

II. Intelligence and Capability Enhancement

III. Framework Generalization and Usability

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages