Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
0284a26
Initial task doc
dannysmith Mar 23, 2026
54caa38
Tweak task doc
dannysmith Mar 23, 2026
d807d3e
Add AI-powered quick entry processing via Apple Intelligence (#30)
dannysmith Mar 23, 2026
8f0085f
Fix wikilinks using hash IDs instead of entity titles
dannysmith Mar 23, 2026
d4828be
Tigheten Apple Intelligence prompts
dannysmith Mar 23, 2026
5a83b98
Update AI powered stuff.
dannysmith Mar 25, 2026
ef4c24a
WIP
dannysmith Mar 25, 2026
f150bd8
Update task doc with implementation status and next phases
dannysmith Mar 25, 2026
350c168
Add developer doc for Apple Intelligence quick entry processing
dannysmith Mar 25, 2026
1ff6b2c
Add step-by-step walkthrough to Apple Intelligence developer doc
dannysmith Mar 26, 2026
ab47264
Tweak task doc
dannysmith Mar 26, 2026
9257a3f
Add AI evaluation harness for prompt iteration
dannysmith Mar 26, 2026
27714ef
Expand AI eval harness to 32 test cases
dannysmith Mar 26, 2026
d37ba59
Fix eval harness body expectations and make body check optional
dannysmith Mar 26, 2026
90b8532
Update task doc: add deterministic status phase and auto-ready rules
dannysmith Mar 26, 2026
3c79955
Split auto-ready into cherry-pickable Phase 6 and AI-specific Phase 7
dannysmith Mar 26, 2026
92716dd
Auto-promote inbox → ready when task has project/area + scheduled/defer
dannysmith Mar 26, 2026
0fc3589
Remove status from LLM, add keyword detection and near-term auto-ready
dannysmith Mar 26, 2026
4806aad
Tighten status keyword detection to reduce false positives
dannysmith Mar 26, 2026
4aa065d
Mark Phases 6-7 complete, remove duplicate phase, renumber
dannysmith Mar 26, 2026
940e9b4
Fix auto-ready Rule 2 timezone bug for same-day scheduled dates
dannysmith Mar 26, 2026
df686b1
Add deterministic date resolution and fuzzy project/area matching
dannysmith Mar 26, 2026
f4529de
Update task doc and developer doc for Phase 8 completion
dannysmith Mar 26, 2026
274e243
Tigheten Apple Intelligence prompts
dannysmith Mar 26, 2026
00f79e2
Reorder fields and add few-shot examples for project/due matching
dannysmith Mar 26, 2026
20e9e37
Fix beach ball by making AI command async with spawn_blocking
dannysmith Mar 26, 2026
9d241bb
Update all docs for AI processing, auto-ready, and polish
dannysmith Mar 26, 2026
4a4fef4
Code cleanup: fix formatting, clippy warnings, remove debug test
dannysmith Mar 26, 2026
5eb430c
Update task doc
dannysmith Mar 26, 2026
001718b
Address code review findings from CodeRabbit
dannysmith Mar 26, 2026
b37e5c8
Address remaining code review findings
dannysmith Mar 26, 2026
3ca0eda
Address second round of code review findings
dannysmith Mar 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 172 additions & 0 deletions docs/tasks-todo/task-x-quick-entry-ai-processing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Quick Entry AI Processing (Smart Dictation)

**GitHub Issue:** https://github.com/dannysmith/taskdn/issues/30
**Product:** tdn-desktop only

## Overview

Add AI-powered processing to the quick entry pane so users can dictate or type free-form natural language (e.g. "Create a new task in the Jengu project with a due date three weeks from now to review the meeting notes") and have it intelligently parsed into structured task fields (title, body, project, area, dates, status).

This does not involve voice-to-text transcription — we assume users have a transcription tool (e.g. macOS dictation). This is about taking transcribed/typed text and intelligently populating the quick entry form so the user can review and confirm before saving.

## Requirements

### From the GitHub Issue

- The contents of the title input field are sent to a local LLM with a short prompt, which returns structured task data for pre-populating the form.
- The prompt includes a list of current areas and projects, context about "now" (today's date), and instructions for lightly cleaning user input, extracting frontmatter fields, and generating a suitable title.
- The raw input text is always included in the body of the task doc.
- The prompt is not user-customizable.
- V1 supports only Apple Intelligence.
- There is no intent to ship downloadable LLMs or provide an interface for managing them for now.

### Product Requirements (from discussion)

- **Trigger:** Explicit keyboard shortcut (`Cmd+Shift+A`) + a visible button in the UI. The shortcut is only active when the quick entry pane is visible.
- **UX flow:** User opens quick pane → types/dictates free-form text → triggers AI processing → form fields are populated → user reviews and saves normally.
- **Body behavior:** The raw dictated/typed text is preserved in the body field, unless the AI-generated title is identical to the raw input (in which case no body is added, since nothing was transformed).
- **Invisible when unavailable:** If Apple Intelligence is not available (wrong platform, older macOS, not enabled), the feature must be completely invisible — no button, no shortcut, no trace. It should appear as if the feature doesn't exist.
- **Future provider support:** Don't prematurely optimise for Ollama or other providers, but at decision points, prefer architecture that wouldn't make adding them painful later. Keep the Rust-level interface clean (text + context in, structured result out).

### UI Placement

The quick entry pane is a compact floating card with: title input (top), metadata row with status/dates (middle), footer with project/area selectors + cancel/save (bottom). The AI processing button should sit adjacent to the title input area since that's where the action happens.

## Background: Handy Reference Implementation

The Handy codebase (`~/dev/handy`) has a production-grade Apple Intelligence integration. Our Swift bridge is adapted from it.

Key files in Handy for reference: `src-tauri/swift/apple_intelligence.swift`, `src-tauri/swift/apple_intelligence_bridge.h`, `src-tauri/src/apple_intelligence.rs`, `src-tauri/build.rs`.

Critical gotchas discovered via Handy: SIGABRT if accessing `SystemLanguageModel.default` during app init (defer to runtime); async→sync bridge via `DispatchSemaphore`; weak-link FoundationModels for older macOS compatibility; LLMs insert invisible Unicode chars (strip them); `@Generable` can fail (always have plain-text fallback).

## Current Implementation Status

### Completed (Phases 1-3)

**Swift bridge:** `@Generable ParsedTask` struct with `ParsedStatus` enum, `LanguageModelSession` with structured output + plain-text fallback, availability check. Build script with SDK detection, stub compilation, weak linking. All adapted from Handy.

**Rust layer:** Safe FFI wrapper (`apple_intelligence.rs`), Tauri commands (`commands/ai.rs`), centralized prompt templates (`commands/ai_prompts.rs`). System prompt with step-by-step field instructions, few-shot examples, and structured area→project context. Response parsing with date validation, project/area name→ID matching, body deduplication logic.

**Frontend:** Sparkles button in title row (conditionally rendered when AI available + text entered), `Cmd+Shift+A` shortcut (active only when pane open), loading spinner, form field population from AI result. Feature is completely invisible when Apple Intelligence is unavailable.

**Bug fix (pre-existing):** Fixed wikilinks using hash IDs instead of entity titles in all four write paths (create_task, create_project, update_task, update_project).

### Key files

| File | Purpose |
|------|---------|
| `src-tauri/swift/apple_intelligence.swift` | `@Generable` struct, LLM session, FFI functions |
| `src-tauri/swift/apple_intelligence_stub.swift` | Stub for builds without FoundationModels SDK |
| `src-tauri/swift/apple_intelligence_bridge.h` | C header for Swift ↔ Rust FFI |
| `src-tauri/src/apple_intelligence.rs` | Safe Rust wrapper over C FFI |
| `src-tauri/src/commands/ai.rs` | Tauri commands, response parsing, field validation |
| `src-tauri/src/commands/ai_prompts.rs` | All prompt text centralized for iteration |
| `src/components/quick-pane/QuickPaneApp.tsx` | AI processing handler, availability state |
| `src/components/quick-pane/QuickPaneTitle.tsx` | Sparkles button, loading state |
| `src/components/quick-pane/useQuickPaneKeyboard.ts` | `Cmd+Shift+A` shortcut |

## Learnings About Apple Intelligence (~3B Model)

These findings are from hands-on testing and WWDC25 research. They should inform all future prompt work.

### What works well

- **`@Generable` with `@Guide(description:)` for structured output.** The model reliably produces valid JSON matching the struct. `ParsedStatus` enum gives constrained decoding for free.
- **Few-shot examples are the single highest-impact technique.** Adding 2-3 input→output examples dramatically improved field accuracy vs. instructions alone.
- **"Empty string is the safe default" framing works.** Combined with few-shot examples showing empty fields, the model stopped hallucinating dates for simple inputs.
- **Project/area name validation in Rust catches hallucinations.** The model sometimes invents project/area names that don't exist; case-insensitive exact matching in Rust silently drops them.
- **Title generation is reliable.** The model consistently produces clean, concise titles.

### What doesn't work

- **Date arithmetic is unreliable.** "This Friday" from Wednesday March 25 → model returned March 30 (Monday, wrong). "Next Monday" → April 2 (Thursday, wrong). "End of the month" → October 31 (wrong month entirely). Apple explicitly says "avoid asking the model to act as a calculator."
- **Few-shot contamination.** When an input is similar to a few-shot example, the model copies fields from the example. "Submit Q1 tax return by April 15th" copied the body "Gather all receipts first" from the similar example — the input never mentioned receipts.
- **Project name fuzzy matching.** "Japan Trip" in the input didn't match "Japan Trip 2025" in the project list. The model returned empty rather than approximate-matching. Our Rust validation uses exact match only.
- **`@Guide(Regex{...})` breaks `@Generable`.** Regex constraints on date fields caused structured output to fail entirely, falling back to plain text. The `.default` model doesn't support regex-constrained generation well. Removed in favour of description-only guides.
- **`contentTagging` adapter is wrong for this task.** It's optimized for tag generation, not instruction-following. Produced topic tags ("task management, shopping") instead of following our field instructions.
- **Body generation for complex inputs.** The model sometimes fabricates body content not present in the input.

### Key principles for prompt iteration

1. **Positive framing outperforms negative.** "Set only if X is present" beats "Do NOT set unless X."
2. **Short instructions beat long ones.** Every token adds latency. Use `@Guide` for per-field constraints, prompt for high-level guidance.
3. **Chain-of-thought HURTS models under ~10B.** Don't ask the model to reason step-by-step.
4. **Few-shot examples need to be distinct from likely inputs** to avoid contamination.
5. **Structural constraints (enums, `@Guide(.anyOf)`) are stronger than description text** — but `.anyOf` needs compile-time values, limiting use for dynamic lists.

## Next Steps

### Phase 5: Evaluation Harness ✅

Done. 31 test cases in `commands/ai.rs` covering simple inputs, project/area matching, date extraction, status detection, complex dictation, and hallucination traps. Run with:

```
cd tdn-desktop/src-tauri && cargo test eval_ai --lib -- --ignored --nocapture
```

Current baseline: **16/31 passing**.

### Phase 6: Auto-Ready on Quick Entry (non-AI, cherry-pickable) ✅

Done. `useEffect` in `QuickPaneApp.tsx` watches `[projectId, areaId, scheduled, deferUntil]` and promotes `inbox` → `ready` when `(project OR area) AND (scheduled OR defer)` are set. Standalone commit, cherry-pickable.

### Phase 7: Deterministic Status for AI Processing ✅

Done. Status removed from `@Generable` struct (8→7 fields) and prompt. Status now determined by:
- Keyword detection in Rust: `blocked` / `waiting on` / `waitingon` → blocked; `icebox` / `ice box` / `ice-box` → icebox; `in progress` / `in-progress` / `inprogress` → in-progress; everything else → inbox
- Auto-ready Rule 1 (all quick entry): `useEffect` promotes inbox → ready when (project OR area) AND (scheduled OR defer) are set
- Auto-ready Rule 2 (AI only): if scheduled within 7 days and status is inbox → ready
- 8 unit tests for keyword detection in the normal test suite

Also fixed few-shot contamination: replaced Q1 tax return example (which leaked "Gather all receipts first" into responses) with Newsletter Setup example.

### Phase 8: Deterministic Date and Project/Area Resolution ✅

Done. The LLM now extracts raw date expressions ("tomorrow", "next Monday", "end of March") instead of computing YYYY-MM-DD dates. Rust resolves them deterministically via the `fuzzydate` crate with custom handlers for patterns fuzzydate doesn't support natively.

**Date resolution (`ai_resolve.rs`):**
- `fuzzydate::parse_relative_to()` handles: today, tomorrow, day names, "this/next [day]", "Month Day" format
- Custom handlers for: "end of [month]", "end of the month", "in N weeks/days", ordinal suffixes ("15th" → "15"), "on/by [day]" prefix stripping
- Falls back to None if unparseable — user sets date manually
- 19 unit tests (deterministic, normal test suite)

**Project/area matching (`ai_resolve.rs`):**
- Case-insensitive exact match first, then substring match (min 3 chars)
- "Japan Trip" now matches "Japan Trip 2025" via substring
- Bidirectional: checks if query is in name AND if name is in query

**What works well now:**
- Relative dates: "tomorrow" ✓, "this Friday" ✓, "next Monday" ✓
- Absolute dates: "April 15th" ✓, "June 1st" ✓
- End-of-month: "end of March" ✓, "end of the month" ✓
- Deadline detection: "due by Friday" ✓, "deadline is June 1st" ✓

**Remaining failures (15/31):**
- LLM sometimes returns empty for date refs despite clear language ("this afternoon", "tomorrow morning", "end of next week") — the model inconsistently extracts expressions
- LLM sometimes returns empty for project names even when explicitly mentioned — fuzzy matching helps when the LLM returns a name, but can't help when it returns empty
- LLM fills in parent area when only project should be set (hallucination)
- These are prompt refinement problems, not resolution problems

### Phase 9: Prompt Refinement ✅

Iterate on the system prompt and few-shot examples to improve the LLM's extraction reliability. The eval harness (`cargo test eval_ai --lib -- --ignored --nocapture` from `src-tauri/`) makes this a fast feedback loop — edit `ai_prompts.rs`, rebuild, run eval, compare results.

Key areas to improve:
- LLM not extracting date refs when they're present ("this afternoon" → empty, "tomorrow morning" → empty)
- LLM not returning project names even when explicitly mentioned in input
- LLM hallucinating area when only project is referenced (fills in parent area)
- Consider whether additional few-shot examples showing date ref extraction would help

### Phase 10: Polish and Edge Cases ✅

- Re-processing support (user processes, edits title, processes again)
- Cancellation during processing (Escape while LLM is running)

### Phase 11: Docs ✅

Done. Updated:
- `tdn-desktop/docs/developer/quick-panes.md` — added AI shortcut, auto-ready, and Apple Intelligence integration sections
- `tdn-desktop/docs/developer/apple-intelligence.md` — updated eval baseline, added async/spawn_blocking note
- `website/src/content/docs/desktop/quick-entry-pane.mdx` — added auto-ready and AI processing sections
- `website/src/content/docs/reference/desktop-reference/keyboard-shortcuts.mdx` — added `Cmd+Shift+A` shortcut
Loading