Skip to content

feat(agent): ✨ Add goal-driven plan approval action#230

Open
jorben wants to merge 2 commits into
masterfrom
feat/plan-setgoal-act
Open

feat(agent): ✨ Add goal-driven plan approval action#230
jorben wants to merge 2 commits into
masterfrom
feat/plan-setgoal-act

Conversation

@jorben

@jorben jorben commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add a goal-driven plan approval action that creates a persistent goal before starting implementation.
  • Offer the new approval option only for threads without an existing goal, while preserving context-reset behavior.
  • Update frontend approval metadata parsing, stream bridge types, and English/Chinese labels for the new action.

Test Plan

  • Run cargo test --locked --manifest-path src-tauri/Cargo.toml.
  • Run npm run typecheck.
  • Manually verify a plan approval can choose “Set goal and implement” on a thread without an existing goal.

🤖 Generated with TiyCode

Add a new plan approval option "Set goal and implement" that creates a
persistent goal before starting implementation, enabling the goal
continuation loop to drive execution of approved plans.

Previously, users could only approve plans for direct implementation or
implementation with context reset. This adds a third option that:
- Creates a persistent goal record via GoalManager before the run starts
- Uses a simplified implementation prompt referencing the plan file path
- Resets context similar to the context-reset action
- Is only offered when the thread has no existing goal record
- Supports review checkpoints following the plan at each stage

The action is surfaced in both backend (Rust PlanApprovalAction enum and
approval metadata construction) and frontend (TypeScript types, UI
components, and i18n translations in English and Chinese).
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

AI Code Review Summary

PR: #230 (feat(agent): ✨ Add goal-driven plan approval action)
Preferred language: English

Overall Assessment

Detected 2 actionable findings, prioritize CRITICAL/HIGH before merge.

Major Findings by Severity

  • MEDIUM (1)
    • src-tauri/src/core/plan_checkpoint.rs:108 - New async DB query on hot path without early return
  • LOW (1)
    • src/modules/workbench-shell/ui/runtime-thread-surface-metadata.ts:155 - Frontend data-path guard could miss new action values

Actionable Suggestions

  • Add error recovery around goal_manager.create_goal() in agent_run_manager.rs to prevent approval prompt consumption on transient failures.
  • Add an integration test for execute_approved_plan with ApplyPlanWithGoal when a goal already exists to prevent duplicate goal regressions.
  • Extract the valid approval action constants in runtime-thread-surface-metadata.ts to a shared set for better maintainability.
  • Sanitize or replace the plan file path in the goal objective (agent_run_manager.rs:445) to avoid exposing internal filesystem structure in the UI and database.
  • Document that ApplyPlanWithGoal always resets conversation context, or add a variant that preserves context.
  • Consider renaming the placeholder API key in tests to a clearly non-sensitive value like 'test-key'.
  • Verify that the goals table has an index on thread_id to ensure the lookup in build_approval_prompt_metadata stays fast under load
  • Consider wrapping the goal lookup in a short-lived cache (e.g., per-request or per-thread metadata construction) to avoid redundant queries
  • Extract the goal-creation step from execute_approved_plan into a separate function and write a unit test for it instead of relying solely on the full-stack integration test.
  • Consider adding a synchronization mechanism (e.g., a barrier or channel) if the integration test is kept, to avoid the race between run task spawning and immediate cancellation.
  • Fail the test explicitly with separate messages for success vs early-failure paths so failure diagnosis is clearer.

Potential Risks

  • Goal creation failure leaves threads in an unrecoverable state because the approval is already marked as consumed.
  • If a goal already exists and execute_approved_plan is called with ApplyPlanWithGoal, no new goal is created and no error is raised, which is correct but not covered by integration tests, increasing regression risk.
  • Information disclosure of internal paths via goal objective text stored in DB and shown in UI.
  • Usability confusion: ApplyPlanWithGoal implies goal-creation but also forces context reset.
  • If the goals table grows very large and lacks a thread_id index, the approval prompt construction time could degrade significantly
  • The integration test test_execute_approved_plan_with_goal_creates_goal_record may produce false positives or flaky results in CI due to real runtime dependencies and immediate run cancellation.
  • build_approval_prompt_metadata became async and requires a DB pool; callers must correctly await. No compile-time guarantee of this is visible in diff.

Test Suggestions

  • Add integration test for goal creation failure preventing run start.
  • Add integration test for start_run_with_options failure after successful goal creation (orphan goal check).
  • Test ApplyPlanWithGoal UI flow to ensure context usage is cleared and goal is displayed correctly.
  • Add a performance test that benchmarks build_approval_prompt_metadata with a large number of goal records to ensure lookup time remains constant
  • Profile the plan approval flow to ensure the added DB roundtrip does not exceed acceptable latency thresholds
  • Unit-test build_approval_prompt_metadata for edge cases: goal table empty, DB error, goal with non-active status.
  • Add a frontend Vitest test for parseApprovalPromptMetadata handling apply_plan_with_goal action.
  • Add a frontend Vitest test for formatApprovalPromptState with approvedAction === 'apply_plan_with_goal'.
  • Add frontend component-level test for RuntimeThreadSurface resetting context usage on apply_plan_with_goal action.

File-Level Coverage Notes

  • src-tauri/src/commands/agent.rs: ok (Minimal addition; no existing test file for command-handler parsing was in scope, but the change itself is safe.)
  • src-tauri/src/core/agent_run_manager.rs: ok (The new branch correctly creates a persistent goal after building a custom implementation prompt; the integration test covers the high-level flow but unit-level testing would improve robustness.)
  • src-tauri/src/core/agent_run_summary.rs: ok (New variant is handled in the without_plan.tpl.md arm with a distinct action_note; unit test not included in the diff but should be added.)
  • src-tauri/src/core/agent_session_execution.rs: ok (The signature change to async and addition of pool/thread_id is covered by existing plan_checkpoint unit tests.)
  • src-tauri/src/core/plan_checkpoint.rs: ok (Good coverage with the existing metadata_round_trip_is_stable and the new build_approval_prompt_metadata_suppresses_goal_option_when_goal_exists. Test pool setup could be extracted into a test helper for reuse.)
  • src-tauri/tests/agent_run.rs: review_with_caveats (Test design is correct in intent but may become flaky.)
  • src/i18n/locales/en.ts: ok (New i18n keys added; no tests for locale files in scope.)
  • src/i18n/locales/zh-CN.ts: ok (Consistent label additions matching the PlanApprovalAction::label() strings.)
  • src/modules/workbench-shell/ui/runtime-thread-surface-metadata.ts: ok (New action added to TypeScript union type and validation branch; frontend unit tests for this parser may already exist but were not in scope.)
  • src/modules/workbench-shell/ui/runtime-thread-surface-state.ts: ok (New approved action handled; no tests in scope but logic is straightforward.)
  • src/modules/workbench-shell/ui/runtime-thread-surface.tsx: ok (Context reset behavior extended to the new action; no frontend test file in scope.)
  • src/services/bridge/agent-commands.ts: ok (TypeScript union widened to accept the new action.)
  • src/services/thread-stream/thread-stream.ts: ok (Same union widening as agent-commands; no logic change.)

Inline Downgraded Items (processed but not inline)

  • None

Coverage Status

  • Target files: 13
  • Covered files: 13
  • Uncovered files: 0
  • No-patch/binary covered as file-level: 0
  • Findings with unknown confidence (N/A): 0

Uncovered list:

  • None

No-patch covered list:

  • None

Runtime/Budget

  • Rounds used: 1/4
  • Planned batches: 1
  • Executed batches: 1
  • Sub-agent runs: 4
  • Planner calls: 1
  • Reviewer calls: 5
  • Model calls: 6/64
  • Structured-output summary-only degradation: NO

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 4
  • Findings with unknown confidence: 0
  • Inline comments attempted: 4
  • Target files: 12
  • Covered files: 12
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

}
};

// For ApplyPlanWithGoal: create a persistent goal before starting the

This comment was marked as outdated.


// For ApplyPlanWithGoal: create a persistent goal before starting the
// implementation run so the goal continuation loop can drive execution.
if let PlanApprovalAction::ApplyPlanWithGoal = action {

This comment was marked as outdated.

let plan_path = crate::core::plan_checkpoint::plan_file_path(thread_id)
.map(|p| p.display().to_string())
.unwrap_or_default();
format!(

This comment was marked as outdated.


// Only offer the goal-driven option when the thread has no existing goal
// record (any status: active, paused, budget_limited, or complete).
if crate::persistence::repo::goal_repo::find_by_thread_id(pool, thread_id)

This comment was marked as outdated.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated PR review completed.

  • Findings kept: 2
  • Findings with unknown confidence: 0
  • Inline comments attempted: 2
  • Target files: 13
  • Covered files: 13
  • Uncovered files: 0
    See the summary comment for detailed analysis and coverage details.

}

pub fn build_approval_prompt_metadata(
pub async fn build_approval_prompt_metadata(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] New async DB query on hot path without early return

Every plan approval prompt now performs an additional async database query to check for existing goals. While this is a lightweight lookup, it adds latency to the approval flow and could be avoided by caching or deferring the check.

Suggestion: Consider whether the goal-option visibility check can be performed only when the plan message is first built, or cache the existence of goals for the thread in memory. Alternatively, rely on the frontend to always send 'apply_plan_with_goal' and validate server-side only when the action is actually invoked.

Risk: Slight increase in plan approval prompt construction latency (~1-5 ms per DB roundtrip). Not significant for interactive use but adds up under concurrent load.

Confidence: 0.85

[From SubAgent: performance]

const label = readStringField(optionRecord, "label");
if (
(action !== "apply_plan" && action !== "apply_plan_with_context_reset")
(action !== "apply_plan" && action !== "apply_plan_with_context_reset" && action !== "apply_plan_with_goal")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Frontend data-path guard could miss new action values

The frontend validation regex is duplicated in three places within the same function (option action parsing, and two approvedAction checks), making it easy to miss one when adding new actions in the future.

Suggestion: Extract the valid action list into a constant set or array (e.g., const VALID_ACTIONS = new Set(['apply_plan', 'apply_plan_with_context_reset', 'apply_plan_with_goal'])) and use it for all validation points.

Risk: Future additions of another action could accidentally skip one of the validation branches, resulting in inconsistent behavior where options are allowed but approved actions are rejected or vice versa.

Confidence: 0.85

[From SubAgent: general]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant