Problem Statement:
Currently, autoresearch treats train.py as the only "editable asset." In many production scenarios, the model architecture is fixed and the primary research bottleneck is the data mixture—balancing domain-specific SFT data with general reasoning/alignment data to prevent catastrophic forgetting.
Proposed Solution:
Introduce a second editable asset, data_recipe.yaml. The agent will iterate on this file to optimize the distribution and synthesis of the training set.
Proposed Features:
Weighted Mixing Search
The agent adjusts the sampling weights $(\omega_1, \omega_2, \dots, \omega_n)$ for different datasets.
Multi-Objective Evaluation
The scoring logic is updated to a composite metric:$$Score = \alpha \cdot \text{Domain_Metric} + \beta \cdot \text{Reasoning_Metric}$$
Autonomous Synthesis Hook
If performance plateaus, the agent can modify synthesis_prompt.md to trigger a data generation pipeline that targets specific failure modes identified in the logs.
Constraint-Aware Research
Allow the human to set "floors" in program.md (e.g., "Do not let reasoning fall below 0.7").
Problem Statement:
Currently, autoresearch treats train.py as the only "editable asset." In many production scenarios, the model architecture is fixed and the primary research bottleneck is the data mixture—balancing domain-specific SFT data with general reasoning/alignment data to prevent catastrophic forgetting.
Proposed Solution:
Introduce a second editable asset, data_recipe.yaml. The agent will iterate on this file to optimize the distribution and synthesis of the training set.
Proposed Features:
Weighted Mixing Search
The agent adjusts the sampling weights$(\omega_1, \omega_2, \dots, \omega_n)$ for different datasets.
Multi-Objective Evaluation
The scoring logic is updated to a composite metric:$$Score = \alpha \cdot \text{Domain_Metric} + \beta \cdot \text{Reasoning_Metric}$$
Autonomous Synthesis Hook
If performance plateaus, the agent can modify synthesis_prompt.md to trigger a data generation pipeline that targets specific failure modes identified in the logs.
Constraint-Aware Research
Allow the human to set "floors" in program.md (e.g., "Do not let reasoning fall below 0.7").