diff --git a/AGENTS.md b/AGENTS.md index 6946d441..3f55d5d8 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -19,7 +19,8 @@ All API keys live on a Cloudflare Worker proxy — nothing sensitive ships in th - **Text-to-Speech**: ElevenLabs (`eleven_flash_v2_5` model) via Cloudflare Worker proxy - **Screen Capture**: ScreenCaptureKit (macOS 14.2+), multi-monitor support - **Voice Input**: Push-to-talk via `AVAudioEngine` + pluggable transcription-provider layer. System-wide keyboard shortcut via listen-only CGEvent tap. -- **Element Pointing**: Claude embeds `[POINT:x,y:label:screenN]` tags in responses. The overlay parses these, maps coordinates to the correct monitor, and animates the blue cursor along a bezier arc to the target. +- **Input Modes**: Two interchangeable modes selected in the menu bar panel — voice (push-to-talk dictation, default) or text (floating green pill near the cursor for typed prompts). Same Claude → screenshot → response pipeline either way. +- **Element Pointing**: Claude embeds `[POINT:x,y:label:screenN]` tags in responses. The blue cursor is hidden during normal chat — when a tag is parsed, the cursor fades in on the target screen, flies along a bezier arc to point at the element, then fades back out. - **Concurrency**: `@MainActor` isolation, async/await throughout - **Analytics**: PostHog via `ClickyAnalytics.swift` @@ -27,11 +28,11 @@ All API keys live on a Cloudflare Worker proxy — nothing sensitive ships in th The app never calls external APIs directly. All requests go through a Cloudflare Worker (`worker/src/index.ts`) that holds the real API keys as secrets. -| Route | Upstream | Purpose | -|-------|----------|---------| -| `POST /chat` | `api.anthropic.com/v1/messages` | Claude vision + streaming chat | -| `POST /tts` | `api.elevenlabs.io/v1/text-to-speech/{voiceId}` | ElevenLabs TTS audio | -| `POST /transcribe-token` | `streaming.assemblyai.com/v3/token` | Fetches a short-lived (480s) AssemblyAI websocket token | +| Route | Upstream | Purpose | +| ------------------------ | ----------------------------------------------- | ------------------------------------------------------- | +| `POST /chat` | `api.anthropic.com/v1/messages` | Claude vision + streaming chat | +| `POST /tts` | `api.elevenlabs.io/v1/text-to-speech/{voiceId}` | ElevenLabs TTS audio | +| `POST /transcribe-token` | `streaming.assemblyai.com/v3/token` | Fetches a short-lived (480s) AssemblyAI websocket token | Worker secrets: `ANTHROPIC_API_KEY`, `ASSEMBLYAI_API_KEY`, `ELEVENLABS_API_KEY` Worker vars: `ELEVENLABS_VOICE_ID` @@ -48,33 +49,39 @@ Worker vars: `ELEVENLABS_VOICE_ID` **Transient Cursor Mode**: When "Show Clicky" is off, pressing the hotkey fades in the cursor overlay for the duration of the interaction (recording → response → TTS → optional pointing), then fades it out automatically after 1 second of inactivity. +**Input Modes (Voice vs Text)**: The user chooses one of two input modes via a picker in the menu bar panel — voice (push-to-talk dictation) or text (typed prompt). The same `ctrl + option` hotkey serves both modes: in voice mode it is press-and-hold for dictation, in text mode a tap summons the floating green chat bubble near the mouse cursor. Both paths terminate at `sendTranscriptToClaudeWithScreenshot(transcript:)`, so the entire downstream pipeline (screenshot capture, Claude streaming, TTS, pointing) is preserved regardless of mode. + +**Cursor Summon-on-Demand**: The blue cursor overlay is hidden by default and only revealed in three cases — during onboarding, during a voice push-to-talk interaction (so the user sees the waveform and spinner), and when Claude's response includes a `[POINT:x,y:label:screenN]` tag. Visibility is driven by `CompanionManager.isCursorTriangleVisible` together with `revealCursorTriangle()` / `hideCursorTriangle()`. The full-screen overlay `NSPanel`s on every connected monitor stay alive across visibility transitions so a `[POINT:]` flight to any screen can reveal the cursor instantly without rebuilding panels (rebuilding would race with the multi-monitor coordinate mapping). + ## Key Files -| File | Lines | Purpose | -|------|-------|---------| -| `leanring_buddyApp.swift` | ~89 | Menu bar app entry point. Uses `@NSApplicationDelegateAdaptor` with `CompanionAppDelegate` which creates `MenuBarPanelManager` and starts `CompanionManager`. No main window — the app lives entirely in the status bar. | -| `CompanionManager.swift` | ~1026 | Central state machine. Owns dictation, shortcut monitoring, screen capture, Claude API, ElevenLabs TTS, and overlay management. Tracks voice state (idle/listening/processing/responding), conversation history, model selection, and cursor visibility. Coordinates the full push-to-talk → screenshot → Claude → TTS → pointing pipeline. | -| `MenuBarPanelManager.swift` | ~243 | NSStatusItem + custom NSPanel lifecycle. Creates the menu bar icon, manages the floating companion panel (show/hide/position), installs click-outside-to-dismiss monitor. | -| `CompanionPanelView.swift` | ~761 | SwiftUI panel content for the menu bar dropdown. Shows companion status, push-to-talk instructions, model picker (Sonnet/Opus), permissions UI, DM feedback button, and quit button. Dark aesthetic using `DS` design system. | -| `OverlayWindow.swift` | ~881 | Full-screen transparent overlay hosting the blue cursor, response text, waveform, and spinner. Handles cursor animation, element pointing with bezier arcs, multi-monitor coordinate mapping, and fade-out transitions. | -| `CompanionResponseOverlay.swift` | ~217 | SwiftUI view for the response text bubble and waveform displayed next to the cursor in the overlay. | -| `CompanionScreenCaptureUtility.swift` | ~132 | Multi-monitor screenshot capture using ScreenCaptureKit. Returns labeled image data for each connected display. | -| `BuddyDictationManager.swift` | ~866 | Push-to-talk voice pipeline. Handles microphone capture via `AVAudioEngine`, provider-aware permission checks, keyboard/button dictation sessions, transcript finalization, shortcut parsing, contextual keyterms, and live audio-level reporting for waveform feedback. | -| `BuddyTranscriptionProvider.swift` | ~100 | Protocol surface and provider factory for voice transcription backends. Resolves provider based on `VoiceTranscriptionProvider` in Info.plist — AssemblyAI, OpenAI, or Apple Speech. | -| `AssemblyAIStreamingTranscriptionProvider.swift` | ~478 | Streaming transcription provider. Fetches temp tokens from the Cloudflare Worker, opens an AssemblyAI v3 websocket, streams PCM16 audio, tracks turn-based transcripts, and delivers finalized text on key-up. Shares a single URLSession across all sessions. | -| `OpenAIAudioTranscriptionProvider.swift` | ~317 | Upload-based transcription provider. Buffers push-to-talk audio locally, uploads as WAV on release, returns finalized transcript. | -| `AppleSpeechTranscriptionProvider.swift` | ~147 | Local fallback transcription provider backed by Apple's Speech framework. | -| `BuddyAudioConversionSupport.swift` | ~108 | Audio conversion helpers. Converts live mic buffers to PCM16 mono audio and builds WAV payloads for upload-based providers. | -| `GlobalPushToTalkShortcutMonitor.swift` | ~132 | System-wide push-to-talk monitor. Owns the listen-only `CGEvent` tap and publishes press/release transitions. | -| `ClaudeAPI.swift` | ~291 | Claude vision API client with streaming (SSE) and non-streaming modes. TLS warmup optimization, image MIME detection, conversation history support. | -| `OpenAIAPI.swift` | ~142 | OpenAI GPT vision API client. | -| `ElevenLabsTTSClient.swift` | ~81 | ElevenLabs TTS client. Sends text to the Worker proxy, plays back audio via `AVAudioPlayer`. Exposes `isPlaying` for transient cursor scheduling. | -| `ElementLocationDetector.swift` | ~335 | Detects UI element locations in screenshots for cursor pointing. | -| `DesignSystem.swift` | ~880 | Design system tokens — colors, corner radii, shared styles. All UI references `DS.Colors`, `DS.CornerRadius`, etc. | -| `ClickyAnalytics.swift` | ~121 | PostHog analytics integration for usage tracking. | -| `WindowPositionManager.swift` | ~262 | Window placement logic, Screen Recording permission flow, and accessibility permission helpers. | -| `AppBundleConfiguration.swift` | ~28 | Runtime configuration reader for keys stored in the app bundle Info.plist. | -| `worker/src/index.ts` | ~142 | Cloudflare Worker proxy. Three routes: `/chat` (Claude), `/tts` (ElevenLabs), `/transcribe-token` (AssemblyAI temp token). | +| File | Lines | Purpose | +| ------------------------------------------------ | ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `leanring_buddyApp.swift` | ~89 | Menu bar app entry point. Uses `@NSApplicationDelegateAdaptor` with `CompanionAppDelegate` which creates `MenuBarPanelManager` and starts `CompanionManager`. No main window — the app lives entirely in the status bar. | +| `CompanionManager.swift` | ~1190 | Central state machine. Owns dictation, shortcut monitoring, screen capture, Claude API, ElevenLabs TTS, and overlay management. Tracks voice state (idle/listening/processing/responding), conversation history, model selection, input mode (voice vs text), and cursor visibility. Coordinates the full push-to-talk → screenshot → Claude → TTS → pointing pipeline. | +| `MenuBarPanelManager.swift` | ~243 | NSStatusItem + custom NSPanel lifecycle. Creates the menu bar icon, manages the floating companion panel (show/hide/position), installs click-outside-to-dismiss monitor. | +| `CompanionPanelView.swift` | ~761 | SwiftUI panel content for the menu bar dropdown. Shows companion status, push-to-talk instructions, model picker (Sonnet/Opus), permissions UI, DM feedback button, and quit button. Dark aesthetic using `DS` design system. | +| `OverlayWindow.swift` | ~881 | Full-screen transparent overlay hosting the blue cursor, response text, waveform, and spinner. Handles cursor animation, element pointing with bezier arcs, multi-monitor coordinate mapping, and fade-out transitions. | +| `CompanionResponseOverlay.swift` | ~217 | SwiftUI view for the response text bubble and waveform displayed next to the cursor in the overlay. | +| `CompanionScreenCaptureUtility.swift` | ~132 | Multi-monitor screenshot capture using ScreenCaptureKit. Returns labeled image data for each connected display. | +| `BuddyDictationManager.swift` | ~866 | Push-to-talk voice pipeline. Handles microphone capture via `AVAudioEngine`, provider-aware permission checks, keyboard/button dictation sessions, transcript finalization, shortcut parsing, contextual keyterms, and live audio-level reporting for waveform feedback. | +| `BuddyTranscriptionProvider.swift` | ~100 | Protocol surface and provider factory for voice transcription backends. Resolves provider based on `VoiceTranscriptionProvider` in Info.plist — AssemblyAI, OpenAI, or Apple Speech. | +| `AssemblyAIStreamingTranscriptionProvider.swift` | ~478 | Streaming transcription provider. Fetches temp tokens from the Cloudflare Worker, opens an AssemblyAI v3 websocket, streams PCM16 audio, tracks turn-based transcripts, and delivers finalized text on key-up. Shares a single URLSession across all sessions. | +| `OpenAIAudioTranscriptionProvider.swift` | ~317 | Upload-based transcription provider. Buffers push-to-talk audio locally, uploads as WAV on release, returns finalized transcript. | +| `AppleSpeechTranscriptionProvider.swift` | ~147 | Local fallback transcription provider backed by Apple's Speech framework. | +| `BuddyAudioConversionSupport.swift` | ~108 | Audio conversion helpers. Converts live mic buffers to PCM16 mono audio and builds WAV payloads for upload-based providers. | +| `GlobalPushToTalkShortcutMonitor.swift` | ~132 | System-wide push-to-talk monitor. Owns the listen-only `CGEvent` tap and publishes press/release transitions. | +| `ChatInputBubbleManager.swift` | ~190 | Owns the floating chat bubble's borderless `NSPanel`. Positions near the mouse cursor clamped to the active screen's visible frame, hosts `ChatInputBubbleView` via `NSHostingView`, installs a global click-outside dismiss monitor (mirrors `MenuBarPanelManager`). Used only when input mode is `.text`. | +| `ChatInputBubbleView.swift` | ~80 | SwiftUI green capsule input ("Say something" placeholder) shown when the user taps the push-to-talk hotkey in text mode. Auto-focuses on appear via `@FocusState`, return-key submits to `CompanionManager.submitTextInput`, escape dismisses. | +| `ClaudeAPI.swift` | ~291 | Claude vision API client with streaming (SSE) and non-streaming modes. TLS warmup optimization, image MIME detection, conversation history support. | +| `OpenAIAPI.swift` | ~142 | OpenAI GPT vision API client. | +| `ElevenLabsTTSClient.swift` | ~81 | ElevenLabs TTS client. Sends text to the Worker proxy, plays back audio via `AVAudioPlayer`. Exposes `isPlaying` for transient cursor scheduling. | +| `ElementLocationDetector.swift` | ~335 | Detects UI element locations in screenshots for cursor pointing. | +| `DesignSystem.swift` | ~880 | Design system tokens — colors, corner radii, shared styles. All UI references `DS.Colors`, `DS.CornerRadius`, etc. | +| `ClickyAnalytics.swift` | ~121 | PostHog analytics integration for usage tracking. | +| `WindowPositionManager.swift` | ~262 | Window placement logic, Screen Recording permission flow, and accessibility permission helpers. | +| `AppBundleConfiguration.swift` | ~28 | Runtime configuration reader for keys stored in the app bundle Info.plist. | +| `worker/src/index.ts` | ~142 | Cloudflare Worker proxy. Three routes: `/chat` (Claude), `/tts` (ElevenLabs), `/transcribe-token` (AssemblyAI temp token). | ## Build & Run diff --git a/leanring-buddy.xcodeproj/project.pbxproj b/leanring-buddy.xcodeproj/project.pbxproj index 75e57261..bda3230f 100644 --- a/leanring-buddy.xcodeproj/project.pbxproj +++ b/leanring-buddy.xcodeproj/project.pbxproj @@ -408,10 +408,11 @@ ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon; ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor; CODE_SIGN_ENTITLEMENTS = "leanring-buddy/leanring-buddy.entitlements"; + "CODE_SIGN_IDENTITY[sdk=macosx*]" = "Apple Development"; CODE_SIGN_STYLE = Automatic; COMBINE_HIDPI_IMAGES = YES; CURRENT_PROJECT_VERSION = 1; - DEVELOPMENT_TEAM = 2UDAY4J48G; + DEVELOPMENT_TEAM = Y4SXF5YHY6; ENABLE_APP_SANDBOX = NO; ENABLE_HARDENED_RUNTIME = YES; ENABLE_OUTGOING_NETWORK_CONNECTIONS = YES; @@ -446,10 +447,11 @@ ASSETCATALOG_COMPILER_APPICON_NAME = AppIcon; ASSETCATALOG_COMPILER_GLOBAL_ACCENT_COLOR_NAME = AccentColor; CODE_SIGN_ENTITLEMENTS = "leanring-buddy/leanring-buddy.entitlements"; + "CODE_SIGN_IDENTITY[sdk=macosx*]" = "Apple Development"; CODE_SIGN_STYLE = Automatic; COMBINE_HIDPI_IMAGES = YES; CURRENT_PROJECT_VERSION = 1; - DEVELOPMENT_TEAM = 2UDAY4J48G; + DEVELOPMENT_TEAM = Y4SXF5YHY6; ENABLE_APP_SANDBOX = NO; ENABLE_HARDENED_RUNTIME = YES; ENABLE_OUTGOING_NETWORK_CONNECTIONS = YES; diff --git a/leanring-buddy/ChatInputBubbleManager.swift b/leanring-buddy/ChatInputBubbleManager.swift new file mode 100644 index 00000000..d5ccf26c --- /dev/null +++ b/leanring-buddy/ChatInputBubbleManager.swift @@ -0,0 +1,291 @@ +// +// ChatInputBubbleManager.swift +// leanring-buddy +// +// Owns the floating chat input bubble's `NSPanel` lifecycle. The panel is +// borderless, transparent, and non-activating so it doesn't steal focus +// from the user's current app — it merely accepts key input while visible. +// +// Mirrors the panel style used by `MenuBarPanelManager`: a `KeyablePanel` +// subclass overrides `canBecomeKey` so the embedded SwiftUI text field can +// receive focus, and a global click-outside monitor dismisses the panel +// when the user clicks anywhere else. +// +// Position: the bubble is anchored to the AI cursor, which itself sits at +// `mouseLocation + (35, -25)` in AppKit screen coords (matching the offset +// used in `OverlayWindow.BlueCursorView`). The bubble's tail points at the +// cursor, and a 60fps timer re-positions the panel as the user moves their +// mouse so the bubble follows the cursor in real time. +// + +import AppKit +import SwiftUI + +/// `NSPanel` subclass whose `canBecomeKey` returns true so the SwiftUI +/// `TextField` inside the chat bubble can become first responder despite +/// the panel being a non-activating floating panel. This is the same +/// trick used by `MenuBarPanelManager.KeyablePanel`. +private final class ChatInputBubblePanel: NSPanel { + override var canBecomeKey: Bool { true } +} + +@MainActor +final class ChatInputBubbleManager: NSObject { + private var panel: ChatInputBubblePanel? + private var clickOutsideMonitor: Any? + /// Local key-event monitor that catches the Escape key while the + /// bubble is the key window. SwiftUI's `.keyboardShortcut(.cancelAction)` + /// is unreliable inside non-activating NSPanels, so we intercept the + /// raw key event here. + private var escapeKeyMonitor: Any? + /// Timer that re-positions the panel as the mouse moves, so the bubble + /// follows the cursor like the AI cursor itself does. + private var mouseTrackingTimer: Timer? + + /// Bubble dimensions — sized to fit the bubble's natural padded + /// content exactly, with no extra buffer around it. The bubble is + /// flat (no drop shadow) so it doesn't need extra panel margin to + /// avoid clipping at the panel edges. + private let panelWidth: CGFloat = 280 + private let panelHeight: CGFloat = 44 + + /// Horizontal offset from the mouse to the AI cursor's center, mirroring + /// `OverlayWindow.BlueCursorView`'s `swiftUIPosition.x + 35`. + private let cursorOffsetXFromMouse: CGFloat = 35 + /// Vertical offset from the mouse to the AI cursor's center. The mouse- + /// to-cursor offset in screen-local SwiftUI coords is `+25` (downward), + /// which in AppKit (y-up) is `-25`. + private let cursorOffsetYFromMouseInAppKit: CGFloat = -25 + /// Half-width of the cursor triangle's bounding frame (16x16, so 8). + private let cursorHalfWidth: CGFloat = 8 + /// Small visual gap between the cursor and the bubble's tail. + private let gapBetweenCursorAndBubble: CGFloat = 4 + /// Where on the bubble's vertical axis the tail's TIP sits (0 = top + /// of bubble, 1 = bottom). Must match `ChatBubbleShape.tailVerticalAnchor` + /// in `ChatInputBubbleView.swift`. The position formula below uses this + /// so that the tip — not the tail's geometric center — aligns with the + /// cursor's vertical center, which is what the user perceives as "the + /// tail is pointing at the cursor." + private let tailVerticalAnchor: CGFloat = 0.18 + + deinit { + if let monitor = clickOutsideMonitor { + NSEvent.removeMonitor(monitor) + } + if let monitor = escapeKeyMonitor { + NSEvent.removeMonitor(monitor) + } + mouseTrackingTimer?.invalidate() + } + + /// Shows the chat bubble next to the AI cursor. If the bubble is + /// already visible, this is a no-op (avoids stealing focus from a + /// user who's already mid-type). + func showBubble( + onSubmit: @escaping (String) -> Void, + onCancel: @escaping () -> Void + ) { + if let panel, panel.isVisible { + return + } + + let bubbleView = ChatInputBubbleView( + onSubmit: { [weak self] submittedText in + self?.hideBubble() + onSubmit(submittedText) + }, + onCancel: { [weak self] in + self?.hideBubble() + onCancel() + } + ) + .frame(width: panelWidth, height: panelHeight) + + let hostingView = NSHostingView(rootView: bubbleView) + hostingView.frame = NSRect(x: 0, y: 0, width: panelWidth, height: panelHeight) + hostingView.wantsLayer = true + hostingView.layer?.backgroundColor = .clear + + let bubblePanel = ChatInputBubblePanel( + contentRect: NSRect(x: 0, y: 0, width: panelWidth, height: panelHeight), + styleMask: [.borderless, .nonactivatingPanel], + backing: .buffered, + defer: false + ) + + bubblePanel.isFloatingPanel = true + bubblePanel.level = .floating + bubblePanel.isOpaque = false + bubblePanel.backgroundColor = .clear + bubblePanel.hasShadow = false + bubblePanel.hidesOnDeactivate = false + bubblePanel.isExcludedFromWindowsMenu = true + bubblePanel.collectionBehavior = [.canJoinAllSpaces, .fullScreenAuxiliary] + bubblePanel.isMovableByWindowBackground = false + bubblePanel.titleVisibility = .hidden + bubblePanel.titlebarAppearsTransparent = true + bubblePanel.contentView = hostingView + + // Position the panel at the current mouse location before ordering + // it in so it appears in the right place from frame zero. + positionPanelToFollowCursor(bubblePanel, mouseLocation: NSEvent.mouseLocation) + + bubblePanel.makeKeyAndOrderFront(nil) + bubblePanel.orderFrontRegardless() + + panel = bubblePanel + installClickOutsideMonitor(onCancel: onCancel) + installEscapeKeyMonitor(onCancel: onCancel) + startMouseTracking() + } + + /// Dismisses the bubble immediately. Safe to call when no bubble is shown. + func hideBubble() { + stopMouseTracking() + panel?.orderOut(nil) + panel = nil + removeClickOutsideMonitor() + removeEscapeKeyMonitor() + } + + // MARK: - Position + + /// Places the panel so its tail points at the AI cursor (which itself + /// sits at `mouseLocation + (35, -25)` in AppKit coords). The bubble's + /// left edge sits just past the cursor's right edge with a small gap, + /// and the tail's vertical anchor aligns with the cursor's vertical + /// center so the tail looks like it's emerging directly from the cursor. + /// Clamps to the active screen's visible frame so the bubble never + /// lands off-screen, behind the menu bar, or under the dock. + private func positionPanelToFollowCursor(_ panelToPosition: NSPanel, mouseLocation: NSPoint) { + // Find which screen the mouse is currently on so we clamp against + // the right display in a multi-monitor setup. + let activeScreen = NSScreen.screens.first(where: { $0.frame.contains(mouseLocation) }) + ?? NSScreen.main + ?? NSScreen.screens.first + guard let activeScreen else { return } + + let visibleFrame = activeScreen.visibleFrame + + // Cursor center in AppKit coords (y-up). + let cursorCenterX = mouseLocation.x + cursorOffsetXFromMouse + let cursorCenterY = mouseLocation.y + cursorOffsetYFromMouseInAppKit + + // Place the panel's left edge just past the cursor's right edge + // with a small visual gap. The panel.minX is exactly where the + // bubble's tail tip sits in `ChatBubbleShape`, so this anchors the + // tail right at the cursor. + var panelOriginX = cursorCenterX + cursorHalfWidth + gapBetweenCursorAndBubble + + // Vertically position so the tail tip aligns with the cursor's + // vertical center. In AppKit, panel.top = panelOriginY + panelHeight, + // and the tail sits at `panelHeight * tailVerticalAnchor` below the + // panel's top. So tail Y = panelOriginY + panelHeight * (1 - tailVerticalAnchor). + // Setting that equal to cursorCenterY and solving: + var panelOriginY = cursorCenterY - panelHeight * (1.0 - tailVerticalAnchor) + + // Clamp horizontally — if the bubble would run off the right edge, + // place it to the LEFT of the cursor instead. (We don't bother + // flipping the tail in that case — the bubble still reads as a chat + // bubble; the tail just points the wrong way at edge cases.) + if panelOriginX + panelWidth > visibleFrame.maxX { + panelOriginX = max(visibleFrame.minX + 8, cursorCenterX - cursorHalfWidth - gapBetweenCursorAndBubble - panelWidth) + } + if panelOriginX < visibleFrame.minX { + panelOriginX = visibleFrame.minX + 8 + } + + // Clamp vertically + if panelOriginY < visibleFrame.minY { + panelOriginY = visibleFrame.minY + 8 + } + if panelOriginY + panelHeight > visibleFrame.maxY { + panelOriginY = visibleFrame.maxY - panelHeight - 8 + } + + panelToPosition.setFrame( + NSRect(x: panelOriginX, y: panelOriginY, width: panelWidth, height: panelHeight), + display: true + ) + } + + // MARK: - Mouse Tracking + + /// Starts a 60fps timer that keeps the bubble pinned next to the cursor + /// as the user moves their mouse. Mirrors the cursor-following behavior + /// in `OverlayWindow.BlueCursorView.startTrackingCursor()`. + private func startMouseTracking() { + stopMouseTracking() + + mouseTrackingTimer = Timer.scheduledTimer(withTimeInterval: 0.016, repeats: true) { [weak self] _ in + // Capture the manager weakly and the panel separately so the + // timer doesn't keep them alive after `hideBubble()`. + guard let self, let panel = self.panel else { return } + let currentMouseLocation = NSEvent.mouseLocation + self.positionPanelToFollowCursor(panel, mouseLocation: currentMouseLocation) + } + } + + private func stopMouseTracking() { + mouseTrackingTimer?.invalidate() + mouseTrackingTimer = nil + } + + // MARK: - Click Outside Dismissal + + /// Installs a global event monitor that dismisses the bubble when the + /// user clicks anywhere outside it. Same pattern used by + /// `MenuBarPanelManager.installClickOutsideMonitor()`. + private func installClickOutsideMonitor(onCancel: @escaping () -> Void) { + removeClickOutsideMonitor() + + clickOutsideMonitor = NSEvent.addGlobalMonitorForEvents( + matching: [.leftMouseDown, .rightMouseDown] + ) { [weak self] _ in + guard let self, let panel = self.panel else { return } + + let clickLocation = NSEvent.mouseLocation + if panel.frame.contains(clickLocation) { + return + } + + self.hideBubble() + onCancel() + } + } + + private func removeClickOutsideMonitor() { + if let monitor = clickOutsideMonitor { + NSEvent.removeMonitor(monitor) + clickOutsideMonitor = nil + } + } + + // MARK: - Escape Key Dismissal + + /// Installs a *local* key-event monitor (fires only when the bubble's + /// panel is the key window) that catches Escape and dismisses. macOS + /// keycode 53 is Escape. Returning `nil` from the monitor swallows + /// the event so the SwiftUI TextField doesn't also see it. + private func installEscapeKeyMonitor(onCancel: @escaping () -> Void) { + removeEscapeKeyMonitor() + + let escapeKeyCode: UInt16 = 53 + escapeKeyMonitor = NSEvent.addLocalMonitorForEvents(matching: .keyDown) { [weak self] keyEvent in + guard keyEvent.keyCode == escapeKeyCode else { + return keyEvent + } + self?.hideBubble() + onCancel() + // Returning nil consumes the event — the TextField never sees it. + return nil + } + } + + private func removeEscapeKeyMonitor() { + if let monitor = escapeKeyMonitor { + NSEvent.removeMonitor(monitor) + escapeKeyMonitor = nil + } + } +} diff --git a/leanring-buddy/ChatInputBubbleView.swift b/leanring-buddy/ChatInputBubbleView.swift new file mode 100644 index 00000000..7f75a45f --- /dev/null +++ b/leanring-buddy/ChatInputBubbleView.swift @@ -0,0 +1,155 @@ +// +// ChatInputBubbleView.swift +// leanring-buddy +// +// Floating chat bubble that appears next to the AI cursor when the user +// presses the push-to-talk hotkey while Clicky is in `.text` input mode. +// Visually styled to look like a speech bubble emerging from the cursor: +// a true capsule (corner radius = height/2) in the same blue as the +// cursor (DS.Colors.overlayCursorBlue), with a small triangular tail on +// the upper-left edge pointing back toward the cursor. +// +// Submission lifecycle: +// - Pressing return submits the trimmed text to `onSubmit` and dismisses. +// - Pressing escape (or clicking outside, handled by ChatInputBubbleManager) +// dismisses without submitting via `onCancel`. +// - The text field auto-focuses on appear so the user can start typing +// immediately without having to click the bubble first. +// + +import SwiftUI + +/// Speech-bubble shape: a capsule body (corner radius = body height / 2) +/// with a small triangular tail on the upper-left edge. +/// +/// Why the tip's y is intentionally aligned with the top body-anchor's y +/// (creating a horizontal upper edge): if the tip sat ABOVE the top +/// body-anchor, the tail's diagonal upper edge would slant up-and-left +/// from the body's curve to the tip. But above the body-anchor's y, the +/// capsule's outline curves outward (rightward) — so between the tail's +/// upper edge and the capsule's curve there's a wedge of empty space +/// that reads as a gap detaching the tail from the bubble. +/// +/// Aligning the tip with the top anchor's y makes the upper edge purely +/// horizontal, eliminating that wedge entirely. The tail then reads as a +/// small flag/pennant emerging from the bubble's upper-left, with all +/// edges either resting on the bubble's curve or extending cleanly out +/// to the tip. +struct ChatBubbleShape: Shape { + /// Where on the bubble's vertical axis the tail's TOP edge attaches + /// (0 = bubble top, 1 = bubble bottom). 0.18 puts the tail's top + /// fairly high on the bubble, with the tail extending downward and + /// outward from there. + var tailVerticalAnchor: CGFloat = 0.18 + /// How far the tail's tip sticks out from the bubble's left edge. + var tailWidth: CGFloat = 11 + /// Vertical extent of the tail (from top edge to the bottom anchor). + var tailHeight: CGFloat = 14 + + func path(in rect: CGRect) -> Path { + var path = Path() + + // Bubble body — capsule. Corner radius is exactly half the body's + // height so the left and right edges round all the way around. + // Inset from the left to leave room for the tail to extrude. + let bodyRect = CGRect( + x: rect.minX + tailWidth, + y: rect.minY, + width: rect.width - tailWidth, + height: rect.height + ) + let capsuleCornerRadius = bodyRect.height / 2.0 + path.addRoundedRect( + in: bodyRect, + cornerSize: CGSize(width: capsuleCornerRadius, height: capsuleCornerRadius), + style: .continuous + ) + + // Tail — flag/pennant shape. The tip and the top body-anchor share + // the same y so the upper edge is horizontal (no wedge of empty + // space above the tip). The bottom body-anchor is below them on + // the curve, with the tip-to-bottom edge slanting down-right. + let tailTopY = bodyRect.minY + (bodyRect.height * tailVerticalAnchor) + let tailBottomY = tailTopY + tailHeight + let tailTipX = rect.minX + let tailTipY = tailTopY + + let topAnchorX = leftEdgeX(at: tailTopY, in: bodyRect, cornerRadius: capsuleCornerRadius) + let bottomAnchorX = leftEdgeX(at: tailBottomY, in: bodyRect, cornerRadius: capsuleCornerRadius) + + path.move(to: CGPoint(x: tailTipX, y: tailTipY)) + path.addLine(to: CGPoint(x: topAnchorX, y: tailTopY)) + path.addLine(to: CGPoint(x: bottomAnchorX, y: tailBottomY)) + path.closeSubpath() + + return path + } + + /// Returns the x-coordinate of the body's left outline at a given y. + /// For a capsule (corner radius = half-height), every y on the left + /// side is within a corner curve, so this solves the circle equation + /// for the quarter-arc. Without this, anchoring at `bodyRect.minX` + /// places the tail outside the actual visible curve, leaving a gap. + private func leftEdgeX(at y: CGFloat, in bodyRect: CGRect, cornerRadius: CGFloat) -> CGFloat { + let centerY = bodyRect.midY + let yOffsetFromCenter = abs(y - centerY) + + // Outside the corner zone (won't happen for a true capsule but + // covered for safety in case future tweaks reduce cornerRadius). + if yOffsetFromCenter >= cornerRadius { + return bodyRect.minX + } + + // Solve circle equation: x² + (y - centerY)² = r² → x = sqrt(r² - dy²). + // The body's left edge at this y is `cornerRadius - x` to the right + // of `bodyRect.minX`. + let dx = sqrt(cornerRadius * cornerRadius - yOffsetFromCenter * yOffsetFromCenter) + return bodyRect.minX + (cornerRadius - dx) + } +} + +struct ChatInputBubbleView: View { + /// Called when the user presses return with non-empty trimmed text. + let onSubmit: (String) -> Void + /// Called when the user presses escape to dismiss without submitting. + let onCancel: () -> Void + + @State private var typedText: String = "" + @FocusState private var isTextFieldFocused: Bool + + var body: some View { + HStack(spacing: 0) { + TextField("Say something", text: $typedText) + .textFieldStyle(.plain) + .focused($isTextFieldFocused) + .font(.system(size: 15, weight: .semibold)) + .foregroundColor(.white) + .tint(.white) + .onSubmit { + submitIfNonEmpty() + } + } + // Extra padding on the leading edge so text sits clear of the tail. + .padding(.leading, 24) + .padding(.trailing, 22) + .padding(.vertical, 11) + .background( + ChatBubbleShape() + .fill(DS.Colors.overlayCursorBlue) + ) + .onAppear { + // Slight delay lets the hosting NSPanel finish becoming key + // before we request focus — without this the field can lose + // its first-responder status as the panel finishes ordering in. + DispatchQueue.main.asyncAfter(deadline: .now() + 0.05) { + isTextFieldFocused = true + } + } + } + + private func submitIfNonEmpty() { + let trimmed = typedText.trimmingCharacters(in: .whitespacesAndNewlines) + guard !trimmed.isEmpty else { return } + onSubmit(trimmed) + } +} diff --git a/leanring-buddy/ClaudeAPI.swift b/leanring-buddy/ClaudeAPI.swift index 0c7070b5..e93f7a51 100644 --- a/leanring-buddy/ClaudeAPI.swift +++ b/leanring-buddy/ClaudeAPI.swift @@ -13,11 +13,34 @@ class ClaudeAPI { private let apiURL: URL var model: String private let session: URLSession - + /// Optional Anthropic API key for direct calls. When set, requests go + /// to api.anthropic.com with `x-api-key` + `anthropic-version` headers + /// instead of through the Cloudflare Worker proxy. + private let directAPIKey: String? + + /// Initializer for proxy-based usage (Cloudflare Worker handles auth). + /// `proxyURL` is the full URL of the chat endpoint, e.g. + /// `https://your-worker.workers.dev/chat`. init(proxyURL: String, model: String = "claude-sonnet-4-6") { self.apiURL = URL(string: proxyURL)! self.model = model + self.directAPIKey = nil + self.session = Self.makeURLSession() + warmUpTLSConnectionIfNeeded() + } + + /// Initializer for direct-to-Anthropic usage with a user-supplied key. + /// Bypasses the Cloudflare Worker — the key is sent in the `x-api-key` + /// header on every request. + init(directAnthropicAPIKey apiKey: String, model: String = "claude-sonnet-4-6") { + self.apiURL = URL(string: "https://api.anthropic.com/v1/messages")! + self.model = model + self.directAPIKey = apiKey + self.session = Self.makeURLSession() + warmUpTLSConnectionIfNeeded() + } + private static func makeURLSession() -> URLSession { // Use .default instead of .ephemeral so TLS session tickets are cached. // Ephemeral sessions do a full TLS handshake on every request, which causes // transient -1200 (errSSLPeerHandshakeFail) errors with large image payloads. @@ -28,12 +51,7 @@ class ClaudeAPI { config.waitsForConnectivity = true config.urlCache = nil config.httpCookieStorage = nil - self.session = URLSession(configuration: config) - - // Fire a lightweight HEAD request in the background to pre-establish the TLS - // connection. This caches the TLS session ticket so the first real API call - // (which carries a large image payload) doesn't need a cold TLS handshake. - warmUpTLSConnectionIfNeeded() + return URLSession(configuration: config) } private func makeAPIRequest() -> URLRequest { @@ -41,6 +59,13 @@ class ClaudeAPI { request.httpMethod = "POST" request.timeoutInterval = 120 request.setValue("application/json", forHTTPHeaderField: "Content-Type") + // When using a direct API key, add Anthropic's required auth and + // version headers. The Cloudflare Worker injects these server-side + // when proxying, so we only set them in direct mode. + if let directAPIKey { + request.setValue(directAPIKey, forHTTPHeaderField: "x-api-key") + request.setValue("2023-06-01", forHTTPHeaderField: "anthropic-version") + } return request } diff --git a/leanring-buddy/CompanionManager.swift b/leanring-buddy/CompanionManager.swift index 0234cf19..f08d6e8d 100644 --- a/leanring-buddy/CompanionManager.swift +++ b/leanring-buddy/CompanionManager.swift @@ -21,6 +21,45 @@ enum CompanionVoiceState { case responding } +/// How the user submits prompts to Clicky. Either holds the push-to-talk +/// hotkey to dictate (.voice) or taps the same hotkey to summon a small +/// floating chat bubble and type (.text). Persisted to UserDefaults. +enum CompanionInputMode: String { + case voice + case text +} + +/// Which AI backend handles chat requests. +/// - `.clickyProxy` — default, route through the Cloudflare Worker which +/// holds the app's shared API keys (no setup needed for the user). +/// - `.userAnthropic` — bypass the proxy and call Anthropic directly with +/// the user's own `x-api-key`. Bills against their own Anthropic account. +/// - `.userOpenAI` — call OpenAI's chat completions endpoint directly with +/// the user's own bearer token. Uses GPT-class models instead of Claude. +/// In all three modes voice transcription and TTS still go through the +/// Worker (those use AssemblyAI / ElevenLabs, not the chat provider). +enum AIProvider: String { + case clickyProxy + case userAnthropic + case userOpenAI +} + +/// One turn in the chat history shown in the streaming response panel. +/// Each turn is a complete user prompt + the assistant's full response. +/// `id` is `Identifiable` for SwiftUI's `ForEach` so list updates animate +/// smoothly as new turns are appended during a follow-up exchange. +struct CompanionConversationTurn: Identifiable, Equatable { + let id: UUID + let userMessage: String + let assistantResponse: String + + init(userMessage: String, assistantResponse: String) { + self.id = UUID() + self.userMessage = userMessage + self.assistantResponse = assistantResponse + } +} + @MainActor final class CompanionManager: ObservableObject { @Published private(set) var voiceState: CompanionVoiceState = .idle @@ -65,24 +104,62 @@ final class CompanionManager: ObservableObject { let buddyDictationManager = BuddyDictationManager() let globalPushToTalkShortcutMonitor = GlobalPushToTalkShortcutMonitor() let overlayWindowManager = OverlayWindowManager() + let chatInputBubbleManager = ChatInputBubbleManager() + let streamingResponsePanelManager = StreamingResponsePanelManager() + let processingShimmerManager = ProcessingShimmerManager() // Response text is now displayed inline on the cursor overlay via // streamingResponseText, so no separate response overlay manager is needed. - /// Base URL for the Cloudflare Worker proxy. All API requests route - /// through this so keys never ship in the app binary. + /// Base URL for the Cloudflare Worker proxy. Used for chat (when + /// `aiProvider == .clickyProxy`), TTS (always), and voice + /// transcription tokens (always). User-supplied API keys only affect + /// the chat path — TTS and voice still go through the proxy. private static let workerBaseURL = "https://your-worker-name.your-subdomain.workers.dev" - private lazy var claudeAPI: ClaudeAPI = { - return ClaudeAPI(proxyURL: "\(Self.workerBaseURL)/chat", model: selectedModel) - }() + + /// Keychain account names for user-supplied API keys. + private static let anthropicKeychainAccountName = "userAnthropicAPIKey" + private static let openaiKeychainAccountName = "userOpenAIAPIKey" + + /// Active Claude client. Re-created when `aiProvider` switches between + /// `.clickyProxy` and `.userAnthropic`, or when the user updates their + /// Anthropic API key, or when the model changes. + private var claudeAPI: ClaudeAPI + + /// Active OpenAI client. Non-nil only when `aiProvider == .userOpenAI` + /// AND the user has entered an OpenAI key. Re-created when those + /// preconditions change. + private var openaiAPI: OpenAIAPI? private lazy var elevenLabsTTSClient: ElevenLabsTTSClient = { return ElevenLabsTTSClient(proxyURL: "\(Self.workerBaseURL)/tts") }() + init() { + // Seed `claudeAPI` with the proxy version so the property is + // initialized before any other code runs. `rebuildAPIs()` then + // immediately swaps to the right configuration based on the + // user's persisted provider choice and any stored API keys. + self.claudeAPI = ClaudeAPI(proxyURL: "\(Self.workerBaseURL)/chat", model: "claude-sonnet-4-6") + self.openaiAPI = nil + rebuildAPIs() + } + /// Conversation history so Claude remembers prior exchanges within a session. /// Each entry is the user's transcript and Claude's response. - private var conversationHistory: [(userTranscript: String, assistantResponse: String)] = [] + /// Completed turns of the current chat. `@Published` so the streaming + /// response panel re-renders the chat thread when new turns are + /// appended after a follow-up exchange completes. Cleared when the + /// user dismisses the chat with the X button. + @Published private(set) var conversationHistory: [CompanionConversationTurn] = [] + + /// The user's message that's currently being processed by the AI but + /// hasn't been written to `conversationHistory` yet (the history is + /// only updated after the response completes). Set by `submitTextInput`, + /// cleared at the end of the response pipeline. The chat panel uses + /// this to render the user's prompt immediately on send so the chat + /// doesn't look frozen during the processing window. + @Published private(set) var pendingUserPrompt: String? /// The currently running AI response task, if any. Cancelled when the user /// speaks again so a new response can begin immediately. @@ -90,6 +167,9 @@ final class CompanionManager: ObservableObject { private var shortcutTransitionCancellable: AnyCancellable? private var voiceStateCancellable: AnyCancellable? + /// Subscription that drives the screen-edge shimmer overlay on/off + /// based on whether the AI is currently processing a request. + private var shimmerStateCancellable: AnyCancellable? private var audioPowerCancellable: AnyCancellable? private var accessibilityCheckTimer: Timer? private var pendingKeyboardShortcutStartTask: Task? @@ -113,7 +193,12 @@ final class CompanionManager: ObservableObject { func setSelectedModel(_ model: String) { selectedModel = model UserDefaults.standard.set(model, forKey: "selectedClaudeModel") + // Update the in-flight client. Then rebuild so that if the user + // is in `.userAnthropic` mode the new model is picked up by a + // fresh ClaudeAPI instance (defensive — the current code path + // doesn't strictly need this, but it keeps state coherent). claudeAPI.model = model + rebuildAPIs() } /// User preference for whether the Clicky cursor should be shown. @@ -123,6 +208,185 @@ final class CompanionManager: ObservableObject { ? true : UserDefaults.standard.bool(forKey: "isClickyCursorEnabled") + /// How the user submits prompts. Defaults to `.voice` to preserve the + /// existing push-to-talk experience for users upgrading from older builds. + /// Persisted to UserDefaults so the choice survives app restarts. + @Published private(set) var inputMode: CompanionInputMode = { + let storedRawValue = UserDefaults.standard.string(forKey: "inputMode") ?? "" + return CompanionInputMode(rawValue: storedRawValue) ?? .voice + }() + + /// Which AI backend handles chat. Defaults to `.clickyProxy` so the + /// user has zero setup on first launch. + @Published private(set) var aiProvider: AIProvider = { + let storedRawValue = UserDefaults.standard.string(forKey: "aiProvider") ?? "" + return AIProvider(rawValue: storedRawValue) ?? .clickyProxy + }() + + /// Switches to a new chat backend, persists the choice, and rebuilds + /// the active API client(s). Safe to call repeatedly. + func setAIProvider(_ newProvider: AIProvider) { + aiProvider = newProvider + UserDefaults.standard.set(newProvider.rawValue, forKey: "aiProvider") + rebuildAPIs() + } + + /// User-supplied Anthropic API key, read from the macOS Keychain. + /// Returns the empty string when no key is stored. Read each time + /// rather than cached so the UI text field always reflects the + /// authoritative value. + var userAnthropicAPIKey: String { + KeychainHelper.readString(forAccount: Self.anthropicKeychainAccountName) ?? "" + } + + /// User-supplied OpenAI API key, read from the macOS Keychain. + /// Returns the empty string when no key is stored. + var userOpenAIAPIKey: String { + KeychainHelper.readString(forAccount: Self.openaiKeychainAccountName) ?? "" + } + + /// Saves a user-supplied Anthropic key to the Keychain (or deletes it + /// if the trimmed value is empty) and rebuilds the chat client so the + /// next request uses the new key. Triggers a manual `objectWillChange` + /// so any UI bound to this manager reflects the new "has key" state. + func setUserAnthropicAPIKey(_ key: String) { + let trimmed = key.trimmingCharacters(in: .whitespacesAndNewlines) + if trimmed.isEmpty { + KeychainHelper.deleteString(forAccount: Self.anthropicKeychainAccountName) + } else { + KeychainHelper.saveString(trimmed, forAccount: Self.anthropicKeychainAccountName) + } + objectWillChange.send() + rebuildAPIs() + } + + /// Saves a user-supplied OpenAI key to the Keychain (or deletes it + /// if the trimmed value is empty) and rebuilds the chat client. + func setUserOpenAIAPIKey(_ key: String) { + let trimmed = key.trimmingCharacters(in: .whitespacesAndNewlines) + if trimmed.isEmpty { + KeychainHelper.deleteString(forAccount: Self.openaiKeychainAccountName) + } else { + KeychainHelper.saveString(trimmed, forAccount: Self.openaiKeychainAccountName) + } + objectWillChange.send() + rebuildAPIs() + } + + /// Rebuilds the chat clients to match the current `aiProvider` and any + /// user-supplied API keys. Called from `init`, on provider changes, + /// when keys are saved/deleted, and when the model picker changes. + private func rebuildAPIs() { + switch aiProvider { + case .clickyProxy: + // Default: route through the Cloudflare Worker. No user key needed. + claudeAPI = ClaudeAPI(proxyURL: "\(Self.workerBaseURL)/chat", model: selectedModel) + openaiAPI = nil + + case .userAnthropic: + // If the user picked Anthropic but hasn't entered a key yet, + // fall back to the proxy so the app keeps working until they + // do. They'll see a "key required" hint in the panel UI. + let key = userAnthropicAPIKey + if key.isEmpty { + claudeAPI = ClaudeAPI(proxyURL: "\(Self.workerBaseURL)/chat", model: selectedModel) + } else { + claudeAPI = ClaudeAPI(directAnthropicAPIKey: key, model: selectedModel) + } + openaiAPI = nil + + case .userOpenAI: + // OpenAI mode keeps a proxy claudeAPI around as a safety + // fallback (e.g. if the user clears their key while in OpenAI + // mode), but the chat dispatch only uses openaiAPI when the + // key is present. + claudeAPI = ClaudeAPI(proxyURL: "\(Self.workerBaseURL)/chat", model: selectedModel) + let key = userOpenAIAPIKey + openaiAPI = key.isEmpty ? nil : OpenAIAPI(apiKey: key) + } + } + + func setInputMode(_ newInputMode: CompanionInputMode) { + // If the user switches away from voice while a push-to-talk recording + // is in progress, stop it cleanly so the audio engine doesn't keep + // running with no UI to surface its state. + if newInputMode != .voice && buddyDictationManager.isDictationInProgress { + pendingKeyboardShortcutStartTask?.cancel() + pendingKeyboardShortcutStartTask = nil + buddyDictationManager.stopPushToTalkFromKeyboardShortcut() + } + + // If the user switches away from text, dismiss anything still on + // screen from the text-mode flow: the input bubble, the chat + // panel, and the entire chat thread. Otherwise the user is left + // staring at orphaned UI from the old mode. + if newInputMode != .text { + chatInputBubbleManager.hideBubble() + streamingResponseAutoHideTask?.cancel() + streamingResponsePanelManager.hide() + isStreamingResponseBubbleVisible = false + streamingResponseText = "" + pendingUserPrompt = nil + conversationHistory.removeAll() + } + + inputMode = newInputMode + UserDefaults.standard.set(newInputMode.rawValue, forKey: "inputMode") + } + + /// Whether the blue triangle cursor should be visible. Independent from + /// `isOverlayVisible` (which controls the overlay panels' existence) — + /// this is the per-frame opacity of the cursor itself. Default is hidden: + /// the cursor only appears during onboarding or when Claude returns a + /// `[POINT:...]` tag and physically points at something on screen. + @Published private(set) var isCursorTriangleVisible: Bool = false + + /// Reveals the cursor triangle (used when Claude points at something or + /// during onboarding). Idempotent — safe to call repeatedly. + func revealCursorTriangle() { + isCursorTriangleVisible = true + } + + /// Hides the cursor triangle. Called after a pointing animation completes + /// or when onboarding ends. + func hideCursorTriangle() { + isCursorTriangleVisible = false + } + + /// User-initiated dismissal of the chat panel — called by the X + /// close button. Cancels any in-flight response, hides the panel, and + /// clears the entire conversation so the next hotkey press starts a + /// fresh chat. (We deliberately drop `conversationHistory` on dismiss: + /// the user closed the chat to start over, not to suspend it.) + func dismissStreamingResponse() { + currentResponseTask?.cancel() + streamingResponseAutoHideTask?.cancel() + streamingResponsePanelManager.hide() + isStreamingResponseBubbleVisible = false + streamingResponseText = "" + pendingUserPrompt = nil + conversationHistory.removeAll() + } + + /// Claude's response text streamed character-by-character so the + /// overlay can render it as a chat bubble next to the (hidden) cursor + /// when input mode is `.text`. Empty string means no response is + /// being shown. Voice mode leaves this empty since responses are spoken + /// aloud via TTS instead of rendered visually. + @Published private(set) var streamingResponseText: String = "" + + /// Whether the streaming-response chat bubble should be rendered. Stays + /// true from the moment the user submits text input until a few seconds + /// after the response finishes streaming, even while `streamingResponseText` + /// is empty (e.g. while the AI is still processing) so the user sees a + /// "..." thinking indicator immediately after submitting. + @Published private(set) var isStreamingResponseBubbleVisible: Bool = false + + /// Auto-hide task for the streaming response bubble. Cancelled and + /// rescheduled if a new text-mode submission arrives before the + /// previous one's bubble has faded. + private var streamingResponseAutoHideTask: Task? + func setClickyCursorEnabled(_ enabled: Bool) { isClickyCursorEnabled = enabled UserDefaults.standard.set(enabled, forKey: "isClickyCursorEnabled") @@ -177,6 +441,7 @@ final class CompanionManager: ObservableObject { print("🔑 Clicky start — accessibility: \(hasAccessibilityPermission), screen: \(hasScreenRecordingPermission), mic: \(hasMicrophonePermission), screenContent: \(hasScreenContentPermission), onboarded: \(hasCompletedOnboarding)") startPermissionPolling() bindVoiceStateObservation() + bindProcessingShimmerToVoiceState() bindAudioPowerLevel() bindShortcutTransitions() // Eagerly touch the Claude API so its TLS warmup handshake completes @@ -215,6 +480,10 @@ final class CompanionManager: ObservableObject { // the welcome animation and onboarding video overlayWindowManager.showOverlay(onScreens: NSScreen.screens, companionManager: self) isOverlayVisible = true + // Onboarding is one of the few moments the cursor is visible in + // the new "summon-on-demand" design — the user is being introduced + // to Clicky and needs to see the cursor companion. + revealCursorTriangle() } /// Replays the onboarding experience from the "Watch Onboarding Again" @@ -228,6 +497,7 @@ final class CompanionManager: ObservableObject { overlayWindowManager.hasShownOverlayBefore = false overlayWindowManager.showOverlay(onScreens: NSScreen.screens, companionManager: self) isOverlayVisible = true + revealCursorTriangle() } private func stopOnboardingMusic() { @@ -297,6 +567,8 @@ final class CompanionManager: ObservableObject { currentResponseTask = nil shortcutTransitionCancellable?.cancel() voiceStateCancellable?.cancel() + shimmerStateCancellable?.cancel() + processingShimmerManager.hide() audioPowerCancellable?.cancel() accessibilityCheckTimer?.invalidate() accessibilityCheckTimer = nil @@ -427,6 +699,25 @@ final class CompanionManager: ObservableObject { } } + /// Reacts to `voiceState` transitions and toggles the screen-edge + /// shimmer overlay accordingly. The shimmer is the Apple-Intelligence- + /// style "the system is thinking" feedback — visible only while the + /// AI is actually processing a request, hidden otherwise so the + /// screen edges aren't constantly glowing. + private func bindProcessingShimmerToVoiceState() { + shimmerStateCancellable = $voiceState + .removeDuplicates() + .receive(on: DispatchQueue.main) + .sink { [weak self] newState in + guard let self else { return } + if newState == .processing { + self.processingShimmerManager.show() + } else { + self.processingShimmerManager.hide() + } + } + } + private func bindVoiceStateObservation() { voiceStateCancellable = buddyDictationManager.$isRecordingFromKeyboardShortcut .combineLatest( @@ -473,6 +764,14 @@ final class CompanionManager: ObservableObject { private func handleShortcutTransition(_ transition: BuddyPushToTalkShortcut.ShortcutTransition) { switch transition { case .pressed: + // In text mode the hotkey is a momentary tap — we summon the chat + // bubble on .pressed and ignore .released entirely. Voice mode + // uses the same hotkey as a push-and-hold, handled below. + if inputMode == .text { + handleTextModeShortcutPressed() + return + } + guard !buddyDictationManager.isDictationInProgress else { return } // Don't register push-to-talk while the onboarding video is playing guard !showOnboardingVideo else { return } @@ -488,6 +787,12 @@ final class CompanionManager: ObservableObject { isOverlayVisible = true } + // Voice mode shows the cursor visualizations (waveform during + // recording, spinner during processing, triangle during TTS) so + // the user has feedback that they're being heard. Reveal here; + // scheduleTransientHideIfNeeded() hides it after the interaction. + revealCursorTriangle() + // Dismiss the menu bar panel so it doesn't cover the screen NotificationCenter.default.post(name: .clickyDismissPanel, object: nil) @@ -506,7 +811,7 @@ final class CompanionManager: ObservableObject { self.onboardingPromptText = "" } } - + ClickyAnalytics.trackPushToTalkStarted() @@ -526,6 +831,9 @@ final class CompanionManager: ObservableObject { ) } case .released: + // Text mode treats the hotkey as a tap — releases are no-ops. + if inputMode == .text { return } + // Cancel the pending start task in case the user released the shortcut // before the async startPushToTalk had a chance to begin recording. // Without this, a quick press-and-release drops the release event and @@ -539,6 +847,67 @@ final class CompanionManager: ObservableObject { } } + /// Text-mode handler for a push-to-talk hotkey press: summons the floating + /// chat bubble near the cursor. Mirrors the side effects of the voice + /// path (dismiss the menu bar panel, cancel any in-flight response) so + /// the user gets a fresh interaction every time they tap the shortcut. + private func handleTextModeShortcutPressed() { + // Don't open the bubble during the onboarding video — same guard + // used for the voice path. + guard !showOnboardingVideo else { return } + + // Dismiss the menu bar panel so it doesn't cover the chat bubble. + NotificationCenter.default.post(name: .clickyDismissPanel, object: nil) + + // Cancel any in-progress response and TTS from a previous prompt so + // the user can interrupt by re-summoning the bubble. + currentResponseTask?.cancel() + elevenLabsTTSClient.stopPlayback() + clearDetectedElementLocation() + + // Dismiss the onboarding prompt if it's showing + if showOnboardingPrompt { + withAnimation(.easeOut(duration: 0.3)) { + onboardingPromptOpacity = 0.0 + } + DispatchQueue.main.asyncAfter(deadline: .now() + 0.35) { + self.showOnboardingPrompt = false + self.onboardingPromptText = "" + } + } + + // Text mode chat keeps the cursor hidden until Claude returns a + // [POINT:] tag. If onboarding had revealed it, hide it now so the + // chat experience is bubble-only. + hideCursorTriangle() + + chatInputBubbleManager.showBubble( + onSubmit: { [weak self] submittedText in + self?.submitTextInput(submittedText) + }, + onCancel: { + // Nothing to do — the bubble is already dismissed. + } + ) + } + + /// Submits a text prompt typed into the chat bubble or as a follow-up + /// inside the chat panel. Trims whitespace and forwards to the same + /// Claude + screenshot pipeline used by voice dictation. Empty + /// submissions are ignored. + func submitTextInput(_ text: String) { + let trimmedText = text.trimmingCharacters(in: .whitespacesAndNewlines) + guard !trimmedText.isEmpty else { return } + + lastTranscript = trimmedText + // Render the user's message in the chat thread immediately. The + // pending prompt is cleared once the response completes and the + // turn is committed to `conversationHistory`. + pendingUserPrompt = trimmedText + ClickyAnalytics.trackUserMessageSent(transcript: trimmedText) + sendTranscriptToClaudeWithScreenshot(transcript: trimmedText) + } + // MARK: - Companion Prompt private static let companionVoiceResponseSystemPrompt = """ @@ -559,21 +928,38 @@ final class CompanionManager: ObservableObject { - if you receive multiple screen images, the one labeled "primary focus" is where the cursor is — prioritize that one but reference others if relevant. element pointing: - you have a small blue triangle cursor that can fly to and point at things on screen. use it whenever pointing would genuinely help the user — if they're asking how to do something, looking for a menu, trying to find a button, or need help navigating an app, point at the relevant element. err on the side of pointing rather than not pointing, because it makes your help way more useful and concrete. + you have a small blue triangle cursor that can fly to and point at a specific UI element on screen. it's a strong gesture — only use it when pointing genuinely adds value. DEFAULT TO NOT POINTING. - don't point at things when it would be pointless — like if the user asks a general knowledge question, or the conversation has nothing to do with what's on screen, or you'd just be pointing at something obvious they're already looking at. but if there's a specific UI element, menu, button, or area on screen that's relevant to what you're helping with, point at it. + only point when ALL of these are true: + 1. the user is asking about a specific UI element, where to find something, what a button does, or how to navigate an app + 2. the relevant element is clearly visible in the screenshot + 3. pointing at it would teach the user something they couldn't easily figure out on their own - when you point, append a coordinate tag at the very end of your response, AFTER your spoken text. the screenshot images are labeled with their pixel dimensions. use those dimensions as the coordinate space. the origin (0,0) is the top-left corner of the image. x increases rightward, y increases downward. + do NOT point for: + - greetings or casual messages ("hey", "hi", "thanks", "ok", "cool") + - general knowledge questions unrelated to the screen ("what is git?", "explain async/await") + - acknowledgments, agreements, or one-word responses + - follow-up clarifications on a previous response (the user is reading what you said, not looking for a new element) + - questions you can answer with words alone + - anything where the user isn't actively trying to find or click something - format: [POINT:x,y:label] where x,y are integer pixel coordinates in the screenshot's coordinate space, and label is a short 1-3 word description of the element (like "search bar" or "save button"). if the element is on the cursor's screen you can omit the screen number. if the element is on a DIFFERENT screen, append :screenN where N is the screen number from the image label (e.g. :screen2). this is important — without the screen number, the cursor will point at the wrong place. + in every other case, append [POINT:none]. when in doubt, [POINT:none] is the safe default. - if pointing wouldn't help, append [POINT:none]. + when you DO point, append a coordinate tag at the very end of your response, AFTER your spoken text. the screenshot images are labeled with their pixel dimensions. use those dimensions as the coordinate space. the origin (0,0) is the top-left corner of the image. x increases rightward, y increases downward. - examples: + format: [POINT:x,y:label] where x,y are integer pixel coordinates in the screenshot's coordinate space, and label is a short 1-3 word description of the element (like "search bar" or "save button"). if the element is on the cursor's screen you can omit the screen number. if the element is on a DIFFERENT screen, append :screenN where N is the screen number from the image label (e.g. :screen2). this is important — without the screen number, the cursor will point at the wrong place. + + examples of when TO point: - user asks how to color grade in final cut: "you'll want to open the color inspector — it's right up in the top right area of the toolbar. click that and you'll get all the color wheels and curves. [POINT:1100,42:color inspector]" - - user asks what html is: "html stands for hypertext markup language, it's basically the skeleton of every web page. curious how it connects to the css you're looking at? [POINT:none]" - user asks how to commit in xcode: "see that source control menu up top? click that and hit commit, or you can use command option c as a shortcut. [POINT:285,11:source control]" - element is on screen 2 (not where cursor is): "that's over on your other monitor — see the terminal window? [POINT:400,300:terminal:screen2]" + + examples of when NOT to point — these all get [POINT:none]: + - user says "hey": "hey! what's up? [POINT:none]" + - user says "thanks": "anytime! [POINT:none]" + - user asks what html is: "html stands for hypertext markup language, it's basically the skeleton of every web page. [POINT:none]" + - user says "explain that more": "sure — the way it works is... [POINT:none]" + - user asks "what time is it?": "i can't actually see your clock, but the menu bar usually shows it in the top right. [POINT:none]" """ // MARK: - AI Response Pipeline @@ -583,9 +969,35 @@ final class CompanionManager: ObservableObject { /// the spinner/processing state until TTS audio begins playing. /// Claude's response may include a [POINT:x,y:label] tag which triggers /// the buddy to fly to that element on screen. - private func sendTranscriptToClaudeWithScreenshot(transcript: String) { + /// Internal so both voice dictation and the text-input chat bubble can + /// hand off a finalized prompt string to the same pipeline. + func sendTranscriptToClaudeWithScreenshot(transcript: String) { currentResponseTask?.cancel() elevenLabsTTSClient.stopPlayback() + streamingResponseAutoHideTask?.cancel() + + // Capture the input mode at submission time — if the user toggles + // mid-flight, this turn should still behave as the mode it started in. + let isTextMode = inputMode == .text + + // In text mode, the chat bubble is the only feedback the user gets + // that the AI is working. Show it immediately as a "..." placeholder + // so there's no visual gap between dismissing the input bubble and + // the response streaming in. + if isTextMode { + streamingResponseText = "" + isStreamingResponseBubbleVisible = true + // The panel observes `CompanionManager` directly via + // SwiftUI, so we only need to ensure it's on screen — the + // chat thread, streaming text, and pending prompt all + // re-render automatically as their `@Published` sources + // update during the response pipeline. + streamingResponsePanelManager.show(companionManager: self) + } else { + streamingResponseText = "" + isStreamingResponseBubbleVisible = false + streamingResponsePanelManager.hide() + } currentResponseTask = Task { // Stay in processing (spinner) state — no streaming text displayed @@ -606,19 +1018,68 @@ final class CompanionManager: ObservableObject { } // Pass conversation history so Claude remembers prior exchanges - let historyForAPI = conversationHistory.map { entry in - (userPlaceholder: entry.userTranscript, assistantResponse: entry.assistantResponse) + let historyForAPI = conversationHistory.map { turn in + (userPlaceholder: turn.userMessage, assistantResponse: turn.assistantResponse) } - let (fullResponseText, _) = try await claudeAPI.analyzeImageStreaming( - images: labeledImages, - systemPrompt: Self.companionVoiceResponseSystemPrompt, - conversationHistory: historyForAPI, - userPrompt: transcript, - onTextChunk: { _ in - // No streaming text display — spinner stays until TTS plays + // Route the request to whichever provider the user picked. + // If they selected OpenAI but haven't set a key, surface + // an error message in the response bubble rather than + // silently falling back to a different AI. + let fullResponseText: String + if aiProvider == .userOpenAI { + guard let openaiAPI else { + let errorMessage = "Add your OpenAI API key in Clicky's settings to use OpenAI mode." + if isTextMode { + streamingResponseText = errorMessage + // Surface the error as a synthetic completed + // turn so it joins the chat thread cleanly + // rather than sitting as a perma-streaming + // message. Then clear the pending prompt. + if let pendingPrompt = pendingUserPrompt { + conversationHistory.append(CompanionConversationTurn( + userMessage: pendingPrompt, + assistantResponse: errorMessage + )) + pendingUserPrompt = nil + } + } + voiceState = .idle + return } - ) + let (responseText, _) = try await openaiAPI.analyzeImage( + images: labeledImages, + systemPrompt: Self.companionVoiceResponseSystemPrompt, + conversationHistory: historyForAPI, + userPrompt: transcript + ) + fullResponseText = responseText + // OpenAI is non-streaming, so the text arrives in one + // shot. Render it immediately so the bubble updates + // before the [POINT:] parsing + cleanup step below. + if isTextMode { + streamingResponseText = responseText + } + } else { + let (responseText, _) = try await claudeAPI.analyzeImageStreaming( + images: labeledImages, + systemPrompt: Self.companionVoiceResponseSystemPrompt, + conversationHistory: historyForAPI, + userPrompt: transcript, + onTextChunk: { [weak self] accumulatedTextSoFar in + // ClaudeAPI delivers the FULL accumulated text on + // every chunk (not deltas), so we assign rather + // than append. The chat panel observes + // `streamingResponseText` directly and re-renders + // automatically on each update. + guard let self else { return } + guard isTextMode else { return } + self.streamingResponseText = accumulatedTextSoFar + print("📡 Stream chunk: \(accumulatedTextSoFar.count) chars total") + } + ) + fullResponseText = responseText + } guard !Task.isCancelled else { return } @@ -633,6 +1094,11 @@ final class CompanionManager: ObservableObject { let hasPointCoordinate = parseResult.coordinate != nil if hasPointCoordinate { voiceState = .idle + // Reveal the cursor BEFORE the location is set so the + // triangle is on screen when the bezier flight starts. + // Critical for text mode where the cursor was previously + // hidden — without this, the flight is invisible. + revealCursorTriangle() } // Pick the screen capture matching Claude's screen number, @@ -667,7 +1133,11 @@ final class CompanionManager: ObservableObject { // Convert from top-left origin (screenshot) to bottom-left origin (AppKit) let appKitY = displayHeight - displayLocalY - // Convert display-local coords to global screen coords + // Convert display-local coords to global screen coords. + // The cursor's tip-to-center offset is applied later + // in OverlayWindow.startNavigatingToElement so this + // location reflects the AI's *intended target*, not + // a pre-shifted center position. let globalLocation = CGPoint( x: displayLocalX + displayFrame.origin.x, y: appKitY + displayFrame.origin.y @@ -681,25 +1151,46 @@ final class CompanionManager: ObservableObject { print("🎯 Element pointing: \(parseResult.elementLabel ?? "no element")") } - // Save this exchange to conversation history (with the point tag - // stripped so it doesn't confuse future context) - conversationHistory.append(( - userTranscript: transcript, + print("💾 About to commit turn: history.count=\(conversationHistory.count), pending=\(pendingUserPrompt ?? "nil"), streaming.count=\(streamingResponseText.count)") + + // Order matters: SwiftUI batches @Published writes that + // happen in the same synchronous run, so all three of + // these mutations result in ONE re-render. By writing + // streamingResponseText first, then appending to history, + // and only then clearing pendingUserPrompt, the batched + // re-render sees the completed turn already in + // conversationHistory before the in-flight section + // collapses — there's no frame where the user could see + // the chat with neither the in-flight bubble nor the + // history turn rendered. + streamingResponseText = spokenText + conversationHistory.append(CompanionConversationTurn( + userMessage: transcript, assistantResponse: spokenText )) + pendingUserPrompt = nil // Keep only the last 10 exchanges to avoid unbounded context growth if conversationHistory.count > 10 { conversationHistory.removeFirst(conversationHistory.count - 10) } + print("🔄 Cleared pending. history.count=\(conversationHistory.count), streaming.count=\(streamingResponseText.count)") print("🧠 Conversation history: \(conversationHistory.count) exchanges") ClickyAnalytics.trackAIResponseReceived(response: spokenText) - // Play the response via TTS. Keep the spinner (processing state) - // until the audio actually starts playing, then switch to responding. - if !spokenText.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty { + // In text mode the response was already committed to + // history above. We only need to flip voiceState here so + // the cursor visualizations know the AI is no longer + // processing. The chat panel is the user's read-back + // surface and persists until they dismiss it with X. + if isTextMode { + voiceState = .responding + } else if !spokenText.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty { + // Voice mode: play the response via TTS. Keep the spinner + // (processing state) until the audio actually starts + // playing, then switch to responding. do { try await elevenLabsTTSClient.speakText(spokenText) // speakText returns after player.play() — audio is now playing @@ -725,12 +1216,49 @@ final class CompanionManager: ObservableObject { } } - /// If the cursor is in transient mode (user toggled "Show Clicky" off), - /// waits for TTS playback and any pointing animation to finish, then - /// fades out the overlay after a 1-second pause. Cancelled automatically - /// if the user starts another push-to-talk interaction. + /// Schedules a fade-out of the streaming response chat bubble after a + /// dwell time proportional to the response length, so longer responses + /// stay on screen longer than short ones. Used only in text mode where + /// the bubble is the user's primary read-back surface. + private func scheduleStreamingResponseBubbleAutoHide(forResponseLength characterCount: Int) { + streamingResponseAutoHideTask?.cancel() + + // Roughly 50 chars per second of reading + a 3.5s buffer floor so + // even one-word answers linger long enough to read comfortably. + let secondsPerCharacter = 0.06 + let bufferSeconds = 3.5 + let dwellSeconds = bufferSeconds + Double(max(0, characterCount)) * secondsPerCharacter + let nanosecondsToWait = UInt64(dwellSeconds * 1_000_000_000) + + streamingResponseAutoHideTask = Task { [weak self] in + try? await Task.sleep(nanoseconds: nanosecondsToWait) + guard !Task.isCancelled else { return } + await MainActor.run { + guard let self else { return } + self.isStreamingResponseBubbleVisible = false + self.streamingResponsePanelManager.hide() + // Wait for the SwiftUI fade animation to finish before + // clearing the text so the bubble doesn't go blank during + // the fade-out. + Task { + try? await Task.sleep(nanoseconds: 500_000_000) + await MainActor.run { + guard !(self.isStreamingResponseBubbleVisible) else { return } + self.streamingResponseText = "" + } + } + } + } + } + + /// Waits for TTS playback and any pointing animation to finish, then + /// hides the cursor visualizations after a 1-second pause. Cancelled + /// automatically if the user starts another push-to-talk interaction. + /// In the new "summon-on-demand" design this fires after every chat + /// turn so the cursor returns to its hidden default state. + /// Also tears down the overlay panels entirely if "Show Clicky" is off. private func scheduleTransientHideIfNeeded() { - guard !isClickyCursorEnabled && isOverlayVisible else { return } + guard isOverlayVisible else { return } transientHideTask?.cancel() transientHideTask = Task { @@ -747,11 +1275,26 @@ final class CompanionManager: ObservableObject { guard !Task.isCancelled else { return } } - // Pause 1s after everything finishes, then fade out + // Pause 1s after everything finishes, then hide the cursor. try? await Task.sleep(nanoseconds: 1_000_000_000) guard !Task.isCancelled else { return } - overlayWindowManager.fadeOutAndHideOverlay() - isOverlayVisible = false + + // Don't hide while the user is being onboarded — onboarding + // shows the cursor as part of its scripted experience. + if showOnboardingVideo || showOnboardingPrompt { + return + } + + hideCursorTriangle() + + // Preserve the original "Show Clicky off" behavior: tear down + // the overlay panels entirely. With the toggle on (default), + // we only hide the cursor — the panels stay alive so a future + // [POINT:] event can reveal the cursor on any screen instantly. + if !isClickyCursorEnabled { + overlayWindowManager.fadeOutAndHideOverlay() + isOverlayVisible = false + } } } @@ -916,6 +1459,11 @@ final class CompanionManager: ObservableObject { DispatchQueue.main.asyncAfter(deadline: .now() + 0.35) { self.showOnboardingPrompt = false self.onboardingPromptText = "" + // Onboarding has finished — hide the cursor. From + // here on the cursor only appears during a [POINT:] + // flight or, in voice mode, while the user is + // actively speaking. + self.hideCursorTriangle() } } return @@ -1007,6 +1555,8 @@ final class CompanionManager: ObservableObject { let displayLocalX = clampedX * (displayWidth / screenshotWidth) let displayLocalY = clampedY * (displayHeight / screenshotHeight) let appKitY = displayHeight - displayLocalY + // The cursor's tip-to-center offset is applied later + // in OverlayWindow.startNavigatingToElement. let globalLocation = CGPoint( x: displayLocalX + displayFrame.origin.x, y: appKitY + displayFrame.origin.y diff --git a/leanring-buddy/CompanionPanelView.swift b/leanring-buddy/CompanionPanelView.swift index 76789b4c..819b00ca 100644 --- a/leanring-buddy/CompanionPanelView.swift +++ b/leanring-buddy/CompanionPanelView.swift @@ -25,7 +25,13 @@ struct CompanionPanelView: View { .padding(.top, 16) .padding(.horizontal, 16) - if companionManager.hasCompletedOnboarding && companionManager.allPermissionsGranted { + // Hide the Sonnet/Opus picker when the user is on OpenAI mode — + // those model IDs aren't valid for OpenAI, so the picker would + // be misleading. Clicky-proxy and user-Anthropic both use + // Anthropic models so the picker applies in both cases. + if companionManager.hasCompletedOnboarding + && companionManager.allPermissionsGranted + && companionManager.aiProvider != .userOpenAI { Spacer() .frame(height: 12) @@ -33,6 +39,22 @@ struct CompanionPanelView: View { .padding(.horizontal, 16) } + if companionManager.hasCompletedOnboarding && companionManager.allPermissionsGranted { + Spacer() + .frame(height: 8) + + inputModePickerRow + .padding(.horizontal, 16) + } + + if companionManager.hasCompletedOnboarding && companionManager.allPermissionsGranted { + Spacer() + .frame(height: 8) + + aiProviderSection + .padding(.horizontal, 16) + } + if !companionManager.allPermissionsGranted { Spacer() .frame(height: 16) @@ -641,6 +663,121 @@ struct CompanionPanelView: View { .pointerCursor() } + // MARK: - Input Mode Picker + + private var inputModePickerRow: some View { + HStack { + Text("Input") + .font(.system(size: 13, weight: .medium)) + .foregroundColor(DS.Colors.textSecondary) + + Spacer() + + HStack(spacing: 0) { + inputModeOptionButton(label: "Voice", mode: .voice) + inputModeOptionButton(label: "Text", mode: .text) + } + .background( + RoundedRectangle(cornerRadius: 6, style: .continuous) + .fill(Color.white.opacity(0.06)) + ) + .overlay( + RoundedRectangle(cornerRadius: 6, style: .continuous) + .stroke(DS.Colors.borderSubtle, lineWidth: 0.5) + ) + } + .padding(.vertical, 4) + } + + private func inputModeOptionButton(label: String, mode: CompanionInputMode) -> some View { + let isSelected = companionManager.inputMode == mode + return Button(action: { + companionManager.setInputMode(mode) + }) { + Text(label) + .font(.system(size: 11, weight: .medium)) + .foregroundColor(isSelected ? DS.Colors.textPrimary : DS.Colors.textTertiary) + .padding(.horizontal, 10) + .padding(.vertical, 5) + .background( + RoundedRectangle(cornerRadius: 5, style: .continuous) + .fill(isSelected ? Color.white.opacity(0.1) : Color.clear) + ) + } + .buttonStyle(.plain) + .pointerCursor() + } + + // MARK: - AI Provider + + /// Picker that lets the user route chat through the Cloudflare Worker + /// proxy (default), through their own Anthropic key, or through their + /// own OpenAI key. When a non-default provider is selected, an inline + /// secure field appears so the user can paste their API key. + private var aiProviderSection: some View { + VStack(alignment: .leading, spacing: 6) { + HStack { + Text("AI") + .font(.system(size: 13, weight: .medium)) + .foregroundColor(DS.Colors.textSecondary) + + Spacer() + + HStack(spacing: 0) { + aiProviderOptionButton(label: "Clicky", provider: .clickyProxy) + aiProviderOptionButton(label: "Anthropic", provider: .userAnthropic) + aiProviderOptionButton(label: "OpenAI", provider: .userOpenAI) + } + .background( + RoundedRectangle(cornerRadius: 6, style: .continuous) + .fill(Color.white.opacity(0.06)) + ) + .overlay( + RoundedRectangle(cornerRadius: 6, style: .continuous) + .stroke(DS.Colors.borderSubtle, lineWidth: 0.5) + ) + } + + if companionManager.aiProvider == .userAnthropic { + APIKeyEntryRow( + placeholder: "Anthropic API key (sk-ant-...)", + hasSavedKey: !companionManager.userAnthropicAPIKey.isEmpty, + onSave: { newKey in + companionManager.setUserAnthropicAPIKey(newKey) + } + ) + } else if companionManager.aiProvider == .userOpenAI { + APIKeyEntryRow( + placeholder: "OpenAI API key (sk-...)", + hasSavedKey: !companionManager.userOpenAIAPIKey.isEmpty, + onSave: { newKey in + companionManager.setUserOpenAIAPIKey(newKey) + } + ) + } + } + .padding(.vertical, 4) + } + + private func aiProviderOptionButton(label: String, provider: AIProvider) -> some View { + let isSelected = companionManager.aiProvider == provider + return Button(action: { + companionManager.setAIProvider(provider) + }) { + Text(label) + .font(.system(size: 11, weight: .medium)) + .foregroundColor(isSelected ? DS.Colors.textPrimary : DS.Colors.textTertiary) + .padding(.horizontal, 10) + .padding(.vertical, 5) + .background( + RoundedRectangle(cornerRadius: 5, style: .continuous) + .fill(isSelected ? Color.white.opacity(0.1) : Color.clear) + ) + } + .buttonStyle(.plain) + .pointerCursor() + } + // MARK: - DM Farza Button private var dmFarzaButton: some View { @@ -759,3 +896,78 @@ struct CompanionPanelView: View { } } + +/// Inline secure-text-field row for entering an API key. We deliberately +/// do NOT pre-fill the field with the saved key, even masked: the goal is +/// to never echo the secret back to the screen, since SecureField content +/// can be inspected via accessibility or screen recordings. Instead, the +/// row displays a "Saved in Keychain" indicator when a key already exists. +/// To replace, the user types a new value and presses return (or clicks +/// the inline checkmark button). +private struct APIKeyEntryRow: View { + /// SecureField placeholder hint, e.g. "Anthropic API key (sk-ant-...)". + let placeholder: String + /// Whether a key is already stored in the Keychain. Drives the status + /// indicator below the field. + let hasSavedKey: Bool + /// Persists the entered key. Called on return-key submit and on the + /// inline save-button click. Empty values delete the stored key. + let onSave: (String) -> Void + + @State private var inputText: String = "" + + var body: some View { + VStack(alignment: .leading, spacing: 4) { + HStack(spacing: 6) { + SecureField(placeholder, text: $inputText) + .textFieldStyle(.plain) + .font(.system(size: 12)) + .foregroundColor(DS.Colors.textPrimary) + .padding(.horizontal, 10) + .padding(.vertical, 7) + .onSubmit { + commitKey() + } + + if !inputText.isEmpty { + Button(action: commitKey) { + Image(systemName: "checkmark.circle.fill") + .font(.system(size: 13)) + .foregroundColor(DS.Colors.success) + .padding(.trailing, 8) + } + .buttonStyle(.plain) + .pointerCursor() + } + } + .background( + RoundedRectangle(cornerRadius: 6, style: .continuous) + .fill(Color.white.opacity(0.06)) + ) + .overlay( + RoundedRectangle(cornerRadius: 6, style: .continuous) + .stroke(DS.Colors.borderSubtle, lineWidth: 0.5) + ) + + HStack(spacing: 5) { + Image(systemName: hasSavedKey ? "lock.fill" : "exclamationmark.circle") + .font(.system(size: 9)) + .foregroundColor(hasSavedKey ? DS.Colors.textTertiary : DS.Colors.warningText) + + Text(hasSavedKey + ? "Saved in macOS Keychain. Type a new key to replace." + : "Required — paste your key and press return.") + .font(.system(size: 10)) + .foregroundColor(hasSavedKey ? DS.Colors.textTertiary : DS.Colors.warningText) + } + .padding(.leading, 2) + } + } + + private func commitKey() { + onSave(inputText) + // Clear the field after saving so the secret isn't left on screen + // and the "Saved" indicator becomes the source of truth. + inputText = "" + } +} diff --git a/leanring-buddy/CompanionScreenCaptureUtility.swift b/leanring-buddy/CompanionScreenCaptureUtility.swift index 79784178..d2b12e56 100644 --- a/leanring-buddy/CompanionScreenCaptureUtility.swift +++ b/leanring-buddy/CompanionScreenCaptureUtility.swift @@ -96,6 +96,13 @@ enum CompanionScreenCaptureUtility { configuration: configuration ) + // Diagnostic: ScreenCaptureKit can ignore configuration.width/.height + // and return the image at the display's native pixel resolution + // (especially on Retina displays). If `actual` ≠ `configured`, + // the dimensions we report to the AI must reflect `actual` — not + // what we requested — or all coordinate scaling will be off. + print("📷 Capture: configured=\(configuration.width)×\(configuration.height), actual cgImage=\(cgImage.width)×\(cgImage.height), display points=\(Int(displayFrame.width))×\(Int(displayFrame.height))") + guard let jpegData = NSBitmapImageRep(cgImage: cgImage) .representation(using: .jpeg, properties: [.compressionFactor: 0.8]) else { continue @@ -117,8 +124,16 @@ enum CompanionScreenCaptureUtility { displayWidthInPoints: Int(displayFrame.width), displayHeightInPoints: Int(displayFrame.height), displayFrame: displayFrame, - screenshotWidthInPixels: configuration.width, - screenshotHeightInPixels: configuration.height + // Use the actual cgImage dimensions, NOT configuration.width/.height. + // ScreenCaptureKit can return the image at the display's native + // pixel resolution regardless of what we configured — particularly + // on Retina displays where the backing scale factor is auto-applied. + // Reporting `configuration.width` when the JPEG is actually 2x as + // large means the AI's coordinates are scaled to the wrong space, + // causing the cursor to land at the wrong proportional position + // (typically far below the intended target). + screenshotWidthInPixels: cgImage.width, + screenshotHeightInPixels: cgImage.height )) } diff --git a/leanring-buddy/KeychainHelper.swift b/leanring-buddy/KeychainHelper.swift new file mode 100644 index 00000000..ecfbab51 --- /dev/null +++ b/leanring-buddy/KeychainHelper.swift @@ -0,0 +1,85 @@ +// +// KeychainHelper.swift +// leanring-buddy +// +// Thin wrapper around the Security framework for storing API keys in +// the macOS Keychain. Keys are stored as generic passwords keyed by +// the app's bundle identifier (service) and a per-key account name. +// +// We deliberately do NOT use UserDefaults for API keys: UserDefaults is +// a plist on disk readable by anyone with file access, while the +// Keychain is encrypted and access-controlled by macOS. +// + +import Foundation +import Security + +enum KeychainHelper { + /// Service identifier shared across all Keychain entries written by + /// this app. Using `Bundle.main.bundleIdentifier` keeps the entries + /// scoped to this app — different builds (debug, release) with + /// different bundle IDs won't collide. + private static var serviceIdentifier: String { + Bundle.main.bundleIdentifier ?? "com.clicky.leanring-buddy" + } + + /// Saves a string value to the Keychain under the given account name. + /// Overwrites any existing value. Returns true on success. + @discardableResult + static func saveString(_ value: String, forAccount accountName: String) -> Bool { + guard let data = value.data(using: .utf8) else { return false } + + // Delete any existing entry first so we can perform a clean add. + // SecItemUpdate is finicky about which attributes are required; + // delete-then-add is simpler and reliable for our use case. + deleteString(forAccount: accountName) + + let query: [String: Any] = [ + kSecClass as String: kSecClassGenericPassword, + kSecAttrService as String: serviceIdentifier, + kSecAttrAccount as String: accountName, + kSecValueData as String: data, + // Only allow access when this device is unlocked, and never + // sync to iCloud Keychain — these are device-local secrets. + kSecAttrAccessible as String: kSecAttrAccessibleWhenUnlockedThisDeviceOnly + ] + + let status = SecItemAdd(query as CFDictionary, nil) + return status == errSecSuccess + } + + /// Reads a string value from the Keychain under the given account name. + /// Returns nil if the entry doesn't exist or can't be decoded. + static func readString(forAccount accountName: String) -> String? { + let query: [String: Any] = [ + kSecClass as String: kSecClassGenericPassword, + kSecAttrService as String: serviceIdentifier, + kSecAttrAccount as String: accountName, + kSecReturnData as String: true, + kSecMatchLimit as String: kSecMatchLimitOne + ] + + var item: CFTypeRef? + let status = SecItemCopyMatching(query as CFDictionary, &item) + guard status == errSecSuccess, + let data = item as? Data, + let stringValue = String(data: data, encoding: .utf8) else { + return nil + } + return stringValue + } + + /// Deletes the Keychain entry for the given account name. Safe to + /// call when no entry exists — a missing entry is treated as success. + @discardableResult + static func deleteString(forAccount accountName: String) -> Bool { + let query: [String: Any] = [ + kSecClass as String: kSecClassGenericPassword, + kSecAttrService as String: serviceIdentifier, + kSecAttrAccount as String: accountName + ] + + let status = SecItemDelete(query as CFDictionary) + return status == errSecSuccess || status == errSecItemNotFound + } +} diff --git a/leanring-buddy/OverlayWindow.swift b/leanring-buddy/OverlayWindow.swift index 884ebcbf..5a2bafaa 100644 --- a/leanring-buddy/OverlayWindow.swift +++ b/leanring-buddy/OverlayWindow.swift @@ -85,6 +85,7 @@ struct NavigationBubbleSizePreferenceKey: PreferenceKey { } } + /// The buddy's behavioral mode. Controls whether it follows the cursor, /// is flying toward a detected UI element, or is pointing at an element. enum BuddyNavigationMode { @@ -144,6 +145,7 @@ struct BlueCursorView: View { @State private var navigationBubbleOpacity: Double = 0.0 @State private var navigationBubbleSize: CGSize = .zero + /// The cursor position at the moment navigation started, used to detect /// if the user moves the cursor enough to cancel the navigation. @State private var cursorPositionWhenNavigationStarted: CGPoint = .zero @@ -308,7 +310,7 @@ struct BlueCursorView: View { .rotationEffect(.degrees(triangleRotationDegrees)) .shadow(color: DS.Colors.overlayCursorBlue, radius: 8 + (buddyFlightScale - 1.0) * 20, x: 0, y: 0) .scaleEffect(buddyFlightScale) - .opacity(buddyIsVisibleOnThisScreen && (companionManager.voiceState == .idle || companionManager.voiceState == .responding) ? cursorOpacity : 0) + .opacity(buddyIsVisibleOnThisScreen && (companionManager.voiceState == .idle || companionManager.voiceState == .responding) && companionManager.isCursorTriangleVisible ? cursorOpacity : 0) .position(cursorPosition) .animation( buddyNavigationMode == .followingCursor @@ -317,6 +319,7 @@ struct BlueCursorView: View { value: cursorPosition ) .animation(.easeIn(duration: 0.25), value: companionManager.voiceState) + .animation(.easeInOut(duration: 0.25), value: companionManager.isCursorTriangleVisible) .animation( buddyNavigationMode == .navigatingToTarget ? nil : .easeInOut(duration: 0.3), value: triangleRotationDegrees @@ -324,12 +327,17 @@ struct BlueCursorView: View { // Blue waveform — replaces the triangle while listening BlueCursorWaveformView(audioPowerLevel: companionManager.currentAudioPowerLevel) - .opacity(buddyIsVisibleOnThisScreen && companionManager.voiceState == .listening ? cursorOpacity : 0) + .opacity(buddyIsVisibleOnThisScreen && companionManager.voiceState == .listening && companionManager.isCursorTriangleVisible ? cursorOpacity : 0) .position(cursorPosition) .animation(.spring(response: 0.2, dampingFraction: 0.6, blendDuration: 0), value: cursorPosition) .animation(.easeIn(duration: 0.15), value: companionManager.voiceState) - // Blue spinner — shown while the AI is processing (transcription + Claude + waiting for TTS) + // Blue spinner — shown while the AI is processing (transcription + Claude + waiting for TTS). + // Intentionally NOT gated on `isCursorTriangleVisible` so the + // spinner appears next to the mouse during text-mode chat too, + // mirroring the visual feedback voice mode gets while waiting + // on Claude. The triangle/waveform stay gated on visibility, + // but the spinner is always meaningful when processing. BlueCursorSpinnerView() .opacity(buddyIsVisibleOnThisScreen && companionManager.voiceState == .processing ? cursorOpacity : 0) .position(cursorPosition) @@ -457,20 +465,29 @@ struct BlueCursorView: View { // Don't interrupt welcome animation guard !showWelcome || welcomeText.isEmpty else { return } - // Convert the AppKit screen location to SwiftUI coordinates for this screen + // Convert the AppKit screen location to SwiftUI coordinates for this screen. let targetInSwiftUI = convertScreenPointToSwiftUICoordinates(screenLocation) - // Offset the target so the buddy sits beside the element rather than - // directly on top of it — 8px to the right, 12px below. - let offsetTarget = CGPoint( - x: targetInSwiftUI.x + 8, - y: targetInSwiftUI.y + 12 + // Compensate for the cursor's tip-to-center offset so the + // visible TIP of the triangle lands directly on the target + // (not the triangle's geometric center). The Triangle shape's + // top vertex is ~9.24px above the 16x16 frame's center, and a + // -35° rotation shifts that vertex to (-5.30, -7.57) in SwiftUI + // coords. To put the tip ON the target, the cursor's center + // must be at `target + (5.30, 7.57)` — i.e., shift the + // animation destination down-and-right by exactly that amount. + // The navigation speech bubble is positioned offset from the + // cursor anyway (see line ~288), so it ends up beside the + // target as a natural consequence — no extra padding needed. + let tipCompensatedTarget = CGPoint( + x: targetInSwiftUI.x + 5.30, + y: targetInSwiftUI.y + 7.57 ) // Clamp target to screen bounds with padding let clampedTarget = CGPoint( - x: max(20, min(offsetTarget.x, screenFrame.width - 20)), - y: max(20, min(offsetTarget.y, screenFrame.height - 20)) + x: max(20, min(tipCompensatedTarget.x, screenFrame.width - 20)), + y: max(20, min(tipCompensatedTarget.y, screenFrame.height - 20)) ) // Record the current cursor position so we can detect if the user diff --git a/leanring-buddy/ProcessingShimmerManager.swift b/leanring-buddy/ProcessingShimmerManager.swift new file mode 100644 index 00000000..bfb47def --- /dev/null +++ b/leanring-buddy/ProcessingShimmerManager.swift @@ -0,0 +1,118 @@ +// +// ProcessingShimmerManager.swift +// leanring-buddy +// +// Owns the full-screen shimmer overlay panels — one per connected +// display — that render the Apple-Intelligence-style processing +// animation. Each panel is borderless, transparent, click-through, and +// sits at the screen-saver window level so the shimmer appears above +// other windows without ever stealing focus or blocking interaction. +// +// Show/hide is triggered from `CompanionManager` based on voice state +// transitions — visible only while the AI is processing a request, +// hidden the rest of the time so the screen edges aren't constantly +// glowing. +// + +import AppKit +import SwiftUI + +/// Full-screen shimmer overlay window. Critically `ignoresMouseEvents` +/// is true so users can click through the colored edge glow to whatever +/// they were doing — the shimmer is purely visual feedback. +private final class ProcessingShimmerWindow: NSWindow { + init(screen: NSScreen) { + super.init( + contentRect: screen.frame, + styleMask: .borderless, + backing: .buffered, + defer: false + ) + + // Transparent, non-interactive, always-on-top. + self.isOpaque = false + self.backgroundColor = .clear + self.level = .screenSaver // matches OverlayWindow so they coexist + self.ignoresMouseEvents = true + self.collectionBehavior = [.canJoinAllSpaces, .stationary, .fullScreenAuxiliary] + self.isReleasedWhenClosed = false + self.hasShadow = false + self.hidesOnDeactivate = false + + // Cover the entire screen including menu bar / dock zones. + self.setFrame(screen.frame, display: true) + } + + override var canBecomeKey: Bool { false } + override var canBecomeMain: Bool { false } +} + +@MainActor +final class ProcessingShimmerManager { + /// One window per screen, keyed by `NSScreen.frame`. + private var windowsByScreenFrame: [NSRect: ProcessingShimmerWindow] = [:] + + /// True while the shimmer is currently shown. Used to make `show()` + /// idempotent — calling it while already visible is a no-op. + private var isShimmerCurrentlyVisible = false + + /// Renders the shimmer on the screen the cursor is currently on. + /// Idempotent: safe to call repeatedly without flicker. We pin to + /// one screen rather than every connected display because the + /// processing-feedback shimmer is meant to be a focused signal at + /// the user's current attention, not a global "the system is busy" + /// overlay across all monitors. + func show() { + guard !isShimmerCurrentlyVisible else { return } + isShimmerCurrentlyVisible = true + + // Pick the cursor's current screen. + let mouseLocation = NSEvent.mouseLocation + let targetScreen = NSScreen.screens.first(where: { $0.frame.contains(mouseLocation) }) + ?? NSScreen.main + ?? NSScreen.screens.first + guard let screen = targetScreen else { return } + + // Hide any windows on OTHER screens — happens if the user moves + // between monitors between chat sessions. + for (frame, oldWindow) in windowsByScreenFrame where frame != screen.frame { + oldWindow.orderOut(nil) + oldWindow.contentView = nil + } + + let window = windowsByScreenFrame[screen.frame] ?? createWindow(for: screen) + windowsByScreenFrame[screen.frame] = window + + // (Re-)install the SwiftUI hosting view. Doing this on every + // show means the rotation animation restarts from 0° each + // request, which feels intentional — the shimmer "powers on" + // when processing begins rather than picking up mid-rotation + // from a previous run. + let hostingView = NSHostingView(rootView: ProcessingShimmerView()) + hostingView.frame = NSRect(origin: .zero, size: screen.frame.size) + hostingView.wantsLayer = true + hostingView.layer?.backgroundColor = .clear + window.contentView = hostingView + + window.orderFrontRegardless() + } + + /// Tears down all shimmer windows. Safe to call when no shimmer is + /// shown. + func hide() { + guard isShimmerCurrentlyVisible else { return } + isShimmerCurrentlyVisible = false + + for window in windowsByScreenFrame.values { + window.orderOut(nil) + // Drop the contentView so the SwiftUI view's animation + // timers stop running while the window is hidden — otherwise + // they'd churn CPU even when nothing is on screen. + window.contentView = nil + } + } + + private func createWindow(for screen: NSScreen) -> ProcessingShimmerWindow { + ProcessingShimmerWindow(screen: screen) + } +} diff --git a/leanring-buddy/ProcessingShimmerView.swift b/leanring-buddy/ProcessingShimmerView.swift new file mode 100644 index 00000000..23d92195 --- /dev/null +++ b/leanring-buddy/ProcessingShimmerView.swift @@ -0,0 +1,80 @@ +// +// ProcessingShimmerView.swift +// leanring-buddy +// +// Apple-Intelligence-style screen-edge shimmer rendered while the AI +// is processing a request. A multi-color angular gradient (purple, +// magenta, cyan, green) fills a full-screen rectangle, masked so only +// a soft glow at the screen edges is visible — the interior of the +// screen stays untouched. The gradient continuously rotates so the +// colors flow around the perimeter, giving the same "the system is +// thinking" feedback Apple uses on iOS/macOS. +// +// Visual recipe (see comments inline for the math): +// 1. Render an angular gradient on a rectangle that fills the screen. +// 2. Apply a heavy blur so the colors bleed and feel atmospheric. +// 3. Mask with a stroked rectangle (thick stroke + blur) so only the +// edge band shows, fading toward the center. +// 4. Animate the gradient's start angle around 360° with a linear +// repeat-forever timer for the continuous "shimmer" loop. +// + +import SwiftUI + +struct ProcessingShimmerView: View { + /// Drives the angular gradient's rotation. Animated from 0 → 360 in + /// a linear repeating loop on appear. + @State private var rotationDegrees: Double = 0 + + /// Multi-stop gradient palette inspired by Apple Intelligence: + /// purple → magenta → cyan → green → back to purple. The duplicated + /// purple at the end ensures a seamless loop when the gradient + /// rotates through 360°. + private static let shimmerColors: [Color] = [ + Color(red: 0.62, green: 0.10, blue: 1.0), // bright purple + Color(red: 1.0, green: 0.20, blue: 0.85), // magenta + Color(red: 0.10, green: 0.85, blue: 1.0), // cyan + Color(red: 0.20, green: 1.0, blue: 0.55), // green + Color(red: 0.62, green: 0.10, blue: 1.0) // purple (loops) + ] + + /// Width of the gradient stroke at the screen edge. Combined with + /// `edgeGlowBlurRadius`, this controls how far inside the screen + /// the glow reaches: visible glow extends roughly + /// `strokeWidth + blurRadius` from the screen's outer edge. + private static let strokeWidth: CGFloat = 6 + + /// Blur applied to the stroked rectangle. Soft, but not so soft + /// that the glow bleeds far into the workspace. With these values + /// the visible glow caps at ~22px from the screen edge. + private static let edgeGlowBlurRadius: CGFloat = 16 + + var body: some View { + GeometryReader { geometry in + // The whole effect is just: stroke the screen rectangle with + // an angular gradient, blur lightly, animate the rotation. + // No mask, no compositing group, no inner padding tricks — + // a thin gradient line + soft blur is the entire shimmer. + Rectangle() + .strokeBorder( + AngularGradient( + gradient: Gradient(colors: Self.shimmerColors), + center: .center, + angle: .degrees(rotationDegrees) + ), + lineWidth: Self.strokeWidth + ) + .blur(radius: Self.edgeGlowBlurRadius) + .frame(width: geometry.size.width, height: geometry.size.height) + .transition(.opacity) + .onAppear { + // Continuous linear rotation. 6s per full loop keeps + // the motion noticeable without being distracting. + withAnimation(.linear(duration: 6).repeatForever(autoreverses: false)) { + rotationDegrees = 360 + } + } + } + .ignoresSafeArea() + } +} diff --git a/leanring-buddy/StreamingResponsePanelManager.swift b/leanring-buddy/StreamingResponsePanelManager.swift new file mode 100644 index 00000000..9edc6b86 --- /dev/null +++ b/leanring-buddy/StreamingResponsePanelManager.swift @@ -0,0 +1,396 @@ +// +// StreamingResponsePanelManager.swift +// leanring-buddy +// +// Manages the floating chat panel for text-mode interactions — the +// rounded dark panel anchored at top-right of every screen that hosts +// the user ↔ Clicky conversation. Each panel is its own NSPanel because +// the cursor overlay is `ignoresMouseEvents = true` and would have +// swallowed clicks on the close button or the follow-up input field. +// +// Panels observe `CompanionManager` directly via `@ObservedObject`, so +// the manager doesn't need to push every text-chunk update — SwiftUI's +// diffing re-renders the chat as `streamingResponseText`, +// `conversationHistory`, and `pendingUserPrompt` change. +// +// Behavior: +// - One NSPanel per connected screen, all showing the same chat. +// - Positioned top-right of `screen.visibleFrame` so it doesn't sit +// on top of the menu bar. +// - Slides further down when the mouse hovers the menu bar zone, so +// the user can interact with menu items without the panel covering +// them. +// - Non-activating: the panel can become key (so the input field +// gets focus) without stealing focus from the user's current app. +// + +import AppKit +import Combine +import SwiftUI + +/// Subclass overriding `canBecomeKey` so the SwiftUI `TextField` inside +/// the chat input row can become first responder. Without this, key +/// presses inside a non-activating panel are silently dropped. +private final class StreamingResponsePanel: NSPanel { + override var canBecomeKey: Bool { true } +} + +/// Thin SwiftUI wrapper that observes `CompanionManager` and feeds the +/// observable state into `StreamingResponsePanelView`. Lives here so the +/// view itself stays a pure data → UI mapping (easier to preview and +/// reason about). +private struct StreamingResponsePanelHost: View { + @ObservedObject var companionManager: CompanionManager + + var body: some View { + StreamingResponsePanelView( + conversationHistory: companionManager.conversationHistory, + pendingUserPrompt: companionManager.pendingUserPrompt, + streamingResponseText: companionManager.streamingResponseText, + // We're "processing" from the moment the user submits until + // the assistant response is committed to history. During the + // streaming window `streamingResponseText` is non-empty AND + // `pendingUserPrompt` is still set, so this flag toggles to + // false only once `pendingUserPrompt` clears. + isProcessingCurrentTurn: companionManager.pendingUserPrompt != nil, + onSubmitFollowUp: { followUpText in + companionManager.submitTextInput(followUpText) + }, + onDismiss: { + companionManager.dismissStreamingResponse() + } + ) + } +} + +@MainActor +final class StreamingResponsePanelManager { + /// One panel per connected screen, keyed by `NSScreen.frame`. + private var panelsByScreenFrame: [NSRect: StreamingResponsePanel] = [:] + + /// Polling timer that keeps each panel's vertical position in sync + /// with whether the mouse is hovering the menu bar zone. + private var mouseTrackingTimer: Timer? + + /// Combine subscriptions on the @Published state we want the panel + /// to react to. Belt-and-suspenders: even if `@ObservedObject` + /// inside the SwiftUI host view weren't propagating updates, these + /// force a manual rootView reassignment on every change. + private var stateSubscriptions: Set = [] + + /// Vertical inset from the screen's visible-frame top in normal state. + private let normalTopInset: CGFloat = 12 + + /// Vertical inset when the mouse is in the menu bar zone — slides + /// the panel down further so it doesn't cover menu items. + private let menuBarHoverTopInset: CGFloat = 48 + + /// Horizontal inset from the screen's visible-frame right edge. + private let trailingInset: CGFloat = 16 + + /// How tall the menu-bar trigger zone is. Slightly taller than the + /// actual menu bar so the slide-down kicks in as the mouse approaches. + private let menuBarTriggerZoneHeight: CGFloat = 36 + + deinit { + mouseTrackingTimer?.invalidate() + // `stateSubscriptions` clears automatically when the manager + // deinits, but being explicit avoids a stray reference if a + // subscription's sink closure captured something we care about. + stateSubscriptions.removeAll() + } + + // MARK: - Public API + + /// Shows the chat panel on every connected screen, bound to the + /// given `CompanionManager`. Idempotent — calling while already + /// shown is a no-op for panel creation, but ensures the SwiftUI + /// hosting view is up-to-date. + func show(companionManager: CompanionManager) { + // Pin the chat to the screen the cursor is on at the moment of + // show. We deliberately don't follow the cursor across screens + // mid-chat — moving panels under the user as they drag their + // mouse to another monitor would feel jarring. Lock to one. + let mouseLocation = NSEvent.mouseLocation + let targetScreen = NSScreen.screens.first(where: { $0.frame.contains(mouseLocation) }) + ?? NSScreen.main + ?? NSScreen.screens.first + guard let screen = targetScreen else { return } + + // Tear down any panels we have on OTHER screens — happens if + // the user dismissed and re-summoned on a different monitor. + for (frame, oldPanel) in panelsByScreenFrame where frame != screen.frame { + oldPanel.orderOut(nil) + oldPanel.contentView = nil + } + panelsByScreenFrame = panelsByScreenFrame.filter { $0.key == screen.frame } + + let panel = panelsByScreenFrame[screen.frame] ?? createPanel(for: screen) + panelsByScreenFrame[screen.frame] = panel + + // Always (re)install the hosting view on every show. This + // guarantees a fresh `@ObservedObject` binding to the + // current `companionManager` rather than relying on a + // potentially stale view tree from a previous chat session. + installContentView(panel: panel, companionManager: companionManager) + + // Initial sizing + positioning. After this, the timer only + // touches origin and the Combine sinks handle resizing on + // content changes. + resizePanelToFitContent(panel, on: screen) + repositionPanel(panel, on: screen, mouseLocation: mouseLocation) + // makeKeyAndOrderFront orders the window front AND makes it + // key in one shot — this is what's needed for the input + // field to actually receive focus. Calling makeKey() alone + // (without ordering) leaves the panel behind other windows. + panel.makeKeyAndOrderFront(nil) + panel.orderFrontRegardless() + + startMouseTracking() + startObservingCompanionState(companionManager: companionManager) + } + + /// Tears down all panels and stops the mouse-tracking timer. Safe to + /// call when no panel is shown. + func hide() { + stopMouseTracking() + stateSubscriptions.removeAll() + for panel in panelsByScreenFrame.values { + panel.orderOut(nil) + // Drop the SwiftUI hosting view so its observers are released + // and any in-flight animations stop. Recreated on next show(). + panel.contentView = nil + } + panelsByScreenFrame.removeAll() + } + + /// Compatibility shim — the chat panel observes `CompanionManager` + /// directly now, so callers don't need to push text chunks. Kept as + /// a no-op so existing call sites don't break. + func updateResponseText(_ text: String) { + // Intentionally empty — see class doc comment. + } + + // MARK: - Combine-driven Refresh (defensive) + + /// Subscribes to the @Published properties the chat panel cares + /// about and forces a manual rootView reassignment on every change. + /// This duplicates work that `@ObservedObject` should already be + /// doing, but in practice `@ObservedObject` propagation through + /// `NSHostingView` can be unreliable when the hosting view is set + /// up imperatively from AppKit code (vs. declaratively in a SwiftUI + /// hierarchy). Re-assigning `rootView` is cheap and guarantees the + /// panel never goes stale. + private func startObservingCompanionState(companionManager: CompanionManager) { + stateSubscriptions.removeAll() + + let refresh: (String) -> Void = { [weak self] reason in + guard let self else { return } + // Single source of truth for "state changed" logging — fires + // exactly once per @Published change, in contrast to a print + // inside `body` which fires on every body re-evaluation. + print("🔍 State change (\(reason)): pending=\(companionManager.pendingUserPrompt ?? "nil"), streaming=\(companionManager.streamingResponseText.count)c, history=\(companionManager.conversationHistory.count)") + self.refreshAllRootViews(companionManager: companionManager) + } + + companionManager.$conversationHistory + .dropFirst() // skip the initial value emitted on subscribe + .receive(on: DispatchQueue.main) + .sink { _ in refresh("history") } + .store(in: &stateSubscriptions) + + companionManager.$pendingUserPrompt + .dropFirst() + .receive(on: DispatchQueue.main) + .sink { _ in refresh("pending") } + .store(in: &stateSubscriptions) + + companionManager.$streamingResponseText + .dropFirst() + .receive(on: DispatchQueue.main) + .sink { _ in refresh("streaming") } + .store(in: &stateSubscriptions) + } + + /// Reassigns the SwiftUI rootView on every panel and re-fits the + /// panel size to the new content. Called only on actual @Published + /// state changes, not on every mouse-tracking tick. + private func refreshAllRootViews(companionManager: CompanionManager) { + for (screenFrame, panel) in panelsByScreenFrame { + guard let hostingView = panel.contentView as? NSHostingView else { + continue + } + hostingView.rootView = StreamingResponsePanelHost(companionManager: companionManager) + hostingView.layoutSubtreeIfNeeded() + + // Resize to match the new content's natural size. + if let screen = NSScreen.screens.first(where: { $0.frame == screenFrame }) { + resizePanelToFitContent(panel, on: screen) + } + } + } + + // MARK: - Panel Creation + + private func createPanel(for screen: NSScreen) -> StreamingResponsePanel { + let panel = StreamingResponsePanel( + contentRect: NSRect( + x: 0, + y: 0, + width: StreamingResponsePanelView.panelWidth, + height: 200 + ), + styleMask: [.borderless, .nonactivatingPanel], + backing: .buffered, + defer: false + ) + + panel.isFloatingPanel = true + panel.level = .floating + panel.isOpaque = false + panel.backgroundColor = .clear + panel.hasShadow = false + panel.hidesOnDeactivate = false + panel.isExcludedFromWindowsMenu = true + panel.collectionBehavior = [.canJoinAllSpaces, .fullScreenAuxiliary] + panel.isMovableByWindowBackground = false + panel.titleVisibility = .hidden + panel.titlebarAppearsTransparent = true + + return panel + } + + private func installContentView(panel: StreamingResponsePanel, companionManager: CompanionManager) { + let hostingView = NSHostingView(rootView: StreamingResponsePanelHost(companionManager: companionManager)) + hostingView.wantsLayer = true + hostingView.layer?.backgroundColor = .clear + + // Set a generous initial size. `hostingView.fittingSize` returns + // (0, 0) before SwiftUI has run its layout pass, which would + // make the panel 0×0 and silently invisible. We start at a + // reasonable size and let the mouse-tracking timer's + // `positionPanel` calls re-measure and resize on each tick once + // SwiftUI has actually laid out the content. + let initialSize = NSSize( + width: StreamingResponsePanelView.panelWidth, + height: 200 + ) + hostingView.frame = NSRect(origin: .zero, size: initialSize) + panel.contentView = hostingView + panel.setContentSize(initialSize) + + // Force an immediate layout pass so the very first + // `positionPanel` call after this returns the right fitting + // size, rather than the placeholder we just set. + hostingView.layoutSubtreeIfNeeded() + } + + // MARK: - Position & Size + // + // Reposition and resize are split into two paths to prevent a + // SwiftUI layout-feedback loop. The mouse-tracking timer fires + // every 50ms; if it kept re-measuring `panel.contentView?.fittingSize` + // each tick, that read could subtly shift SwiftUI's layout, which + // would change the next fittingSize, ad infinitum — body re-evaluating + // dozens of times per second with no actual state change. + // + // Now: `repositionPanel(_:on:mouseLocation:)` only updates origin, + // using whatever the panel's *current* size is. Resizing happens + // exclusively in `resizePanelToFitContent(_:on:)`, called from the + // Combine subscriptions when the chat state actually changes. + + /// Updates the panel's origin to keep it pinned at top-right of the + /// screen's visible frame, sliding down when the mouse hovers the + /// menu bar zone. Does NOT change the panel's size. + private func repositionPanel(_ panel: StreamingResponsePanel, on screen: NSScreen, mouseLocation: NSPoint) { + let visibleFrame = screen.visibleFrame + let isMouseInMenuBarZone = isMouseHoveringMenuBar(on: screen, mouseLocation: mouseLocation) + let topInset = isMouseInMenuBarZone ? menuBarHoverTopInset : normalTopInset + + // Use the panel's CURRENT size — don't query fittingSize here. + let currentSize = panel.frame.size + let panelOriginX = visibleFrame.maxX - currentSize.width - trailingInset + let panelOriginY = visibleFrame.maxY - currentSize.height - topInset + + let targetOrigin = NSPoint(x: panelOriginX, y: panelOriginY) + + if panel.frame.origin != targetOrigin { + NSAnimationContext.runAnimationGroup { context in + context.duration = 0.18 + context.timingFunction = CAMediaTimingFunction(name: .easeOut) + panel.animator().setFrameOrigin(targetOrigin) + } + } + } + + /// Re-measures `fittingSize` and resizes the panel to match. Called + /// from the Combine subscriptions when the chat content changes — + /// not from the timer. After resizing, also nudges the origin so the + /// panel stays anchored at top-right (since growing height would + /// otherwise push the panel below the visible frame). + private func resizePanelToFitContent(_ panel: StreamingResponsePanel, on screen: NSScreen) { + let measured = panel.contentView?.fittingSize ?? .zero + let contentSize = NSSize( + width: max(measured.width, StreamingResponsePanelView.panelWidth), + height: max(measured.height, 120) + ) + + // Skip if nothing changed — avoids unnecessary layout passes. + if abs(panel.frame.size.width - contentSize.width) < 0.5 + && abs(panel.frame.size.height - contentSize.height) < 0.5 { + return + } + + let visibleFrame = screen.visibleFrame + let mouseLocation = NSEvent.mouseLocation + let topInset = isMouseHoveringMenuBar(on: screen, mouseLocation: mouseLocation) + ? menuBarHoverTopInset + : normalTopInset + + let targetFrame = NSRect( + x: visibleFrame.maxX - contentSize.width - trailingInset, + y: visibleFrame.maxY - contentSize.height - topInset, + width: contentSize.width, + height: contentSize.height + ) + + NSAnimationContext.runAnimationGroup { context in + context.duration = 0.18 + context.timingFunction = CAMediaTimingFunction(name: .easeOut) + panel.animator().setFrame(targetFrame, display: true) + } + } + + private func isMouseHoveringMenuBar(on screen: NSScreen, mouseLocation: NSPoint) -> Bool { + let screenFrame = screen.frame + guard screenFrame.contains(mouseLocation) else { return false } + let menuBarBandLowerBound = screenFrame.maxY - menuBarTriggerZoneHeight + return mouseLocation.y >= menuBarBandLowerBound + } + + // MARK: - Mouse Tracking + + private func startMouseTracking() { + stopMouseTracking() + + // Timer only updates ORIGIN — never size. This was the source + // of the runaway re-render loop where querying fittingSize on + // every tick caused SwiftUI to re-layout, which changed + // fittingSize again, infinitely. + mouseTrackingTimer = Timer.scheduledTimer(withTimeInterval: 0.05, repeats: true) { [weak self] _ in + guard let self else { return } + let mouseLocation = NSEvent.mouseLocation + for (screenFrame, panel) in self.panelsByScreenFrame { + guard let screen = NSScreen.screens.first(where: { $0.frame == screenFrame }) else { + continue + } + self.repositionPanel(panel, on: screen, mouseLocation: mouseLocation) + } + } + } + + private func stopMouseTracking() { + mouseTrackingTimer?.invalidate() + mouseTrackingTimer = nil + } +} diff --git a/leanring-buddy/StreamingResponsePanelView.swift b/leanring-buddy/StreamingResponsePanelView.swift new file mode 100644 index 00000000..74b3656a --- /dev/null +++ b/leanring-buddy/StreamingResponsePanelView.swift @@ -0,0 +1,396 @@ +// +// StreamingResponsePanelView.swift +// leanring-buddy +// +// The floating chat panel that hosts the conversation between the user +// and Clicky during text-mode chat. Renders as a compact, polished +// message thread with: +// - Right-aligned blue bubbles for the user's prompts +// - Left-aligned flat text for Claude's responses (cleaner reading) +// - An animated "thinking dots" indicator while the AI is processing +// - A bottom input row for sending follow-up messages +// - A small X button in the top-right to dismiss the chat +// +// Designed to feel like a native macOS chat app — soft shadows, hairline +// borders, generous spacing, subtle animations on new content. +// + +import SwiftUI + +struct StreamingResponsePanelView: View { + /// Completed turns of the chat. Rendered top-to-bottom. + let conversationHistory: [CompanionConversationTurn] + /// The user's most recent prompt that's currently being processed. + /// Rendered as the latest user bubble below the completed history, + /// followed by either the streaming response or a thinking indicator. + let pendingUserPrompt: String? + /// Streamed assistant text for the in-flight turn. Empty during the + /// processing window before the first chunk arrives. + let streamingResponseText: String + /// True while the in-flight turn hasn't finished — drives whether + /// the bottom indicator shows the thinking dots or the streamed text. + let isProcessingCurrentTurn: Bool + /// Called when the user sends a follow-up via the input row. + let onSubmitFollowUp: (String) -> Void + /// Called when the user clicks the X close button. + let onDismiss: () -> Void + + /// Local input state for the bottom text field. Cleared on submit. + @State private var followUpText: String = "" + /// Drives the auto-focus behavior so the input field is ready + /// immediately when the panel appears. + @FocusState private var isInputFieldFocused: Bool + + /// Fixed panel width — wide enough to read multi-line responses + /// comfortably without dominating the screen. + static let panelWidth: CGFloat = 440 + + /// Vertical scroll target ID — placed at the very bottom of the + /// thread and used by `ScrollViewReader` to keep the latest content + /// in view as new chunks stream in. + private static let scrollAnchorBottomID = "chat.bottom" + + var body: some View { + VStack(spacing: 0) { + chatThread + inputBar + } + .frame(width: Self.panelWidth) + .frame(minHeight: 88) + .background(panelBackground) + .overlay(panelBorder) + .overlay(closeButton, alignment: .topTrailing) + .clipShape(RoundedRectangle(cornerRadius: 18, style: .continuous)) + .shadow(color: Color.black.opacity(0.45), radius: 24, x: 0, y: 12) + } + + // MARK: - Chat Thread + + private var chatThread: some View { + ScrollViewReader { scrollProxy in + ScrollView(.vertical, showsIndicators: false) { + LazyVStack(alignment: .leading, spacing: 16) { + // Completed turns + ForEach(conversationHistory) { turn in + ChatTurnView( + userMessage: turn.userMessage, + assistantResponse: turn.assistantResponse, + isAssistantStreaming: false, + isAssistantThinking: false + ) + .id(turn.id) + .onAppear { + print("📌 ChatTurnView appeared (history): user=\(turn.userMessage.prefix(20)), assistant.count=\(turn.assistantResponse.count)") + } + } + + // In-flight turn — the user's pending prompt + either + // the streaming response so far, or the thinking dots + // if no chunks have arrived yet. + if let pendingPrompt = pendingUserPrompt { + ChatTurnView( + userMessage: pendingPrompt, + assistantResponse: streamingResponseText, + isAssistantStreaming: isProcessingCurrentTurn && !streamingResponseText.isEmpty, + isAssistantThinking: isProcessingCurrentTurn && streamingResponseText.isEmpty + ) + .id("inflight") + .onAppear { + print("📌 ChatTurnView appeared (in-flight): pending=\(pendingPrompt.prefix(20)), streaming.count=\(streamingResponseText.count)") + } + } + + // Anchor for auto-scroll + Color.clear + .frame(height: 1) + .id(Self.scrollAnchorBottomID) + } + .padding(.horizontal, 22) + .padding(.top, 38) + .padding(.bottom, 14) + } + .frame(maxHeight: 460) + // Auto-scroll to the latest content as the chat grows. + // Watching `streamingResponseText` keeps the view pinned to + // the bottom while text streams in chunk-by-chunk. + .onChange(of: streamingResponseText) { _ in + scrollToBottom(scrollProxy) + } + .onChange(of: pendingUserPrompt) { _ in + scrollToBottom(scrollProxy) + } + .onChange(of: conversationHistory.count) { _ in + scrollToBottom(scrollProxy) + } + .onAppear { + scrollToBottom(scrollProxy, animated: false) + } + } + } + + private func scrollToBottom(_ proxy: ScrollViewProxy, animated: Bool = true) { + if animated { + withAnimation(.easeOut(duration: 0.22)) { + proxy.scrollTo(Self.scrollAnchorBottomID, anchor: .bottom) + } + } else { + proxy.scrollTo(Self.scrollAnchorBottomID, anchor: .bottom) + } + } + + // MARK: - Input Bar + + private var inputBar: some View { + VStack(spacing: 0) { + // Hairline divider between the chat thread and the input. + Rectangle() + .fill(Color.white.opacity(0.06)) + .frame(height: 0.5) + + HStack(spacing: 10) { + TextField("Ask a follow-up", text: $followUpText) + .textFieldStyle(.plain) + .focused($isInputFieldFocused) + .font(.system(size: 13)) + .foregroundColor(.white) + .tint(DS.Colors.overlayCursorBlue) + .onSubmit { + submitFollowUpIfNonEmpty() + } + + sendButton + } + .padding(.horizontal, 16) + .padding(.vertical, 12) + .background( + RoundedRectangle(cornerRadius: 12, style: .continuous) + .fill(Color.white.opacity(0.05)) + ) + .overlay( + RoundedRectangle(cornerRadius: 12, style: .continuous) + .stroke(Color.white.opacity(0.08), lineWidth: 0.5) + ) + .padding(.horizontal, 14) + .padding(.top, 10) + .padding(.bottom, 14) + .onAppear { + // Auto-focus a beat after appear so the panel has fully + // become key — without the small delay the focus is + // sometimes stolen back as the panel finishes ordering in. + DispatchQueue.main.asyncAfter(deadline: .now() + 0.05) { + isInputFieldFocused = true + } + } + } + } + + private var sendButton: some View { + let trimmed = followUpText.trimmingCharacters(in: .whitespacesAndNewlines) + let canSubmit = !trimmed.isEmpty + return Button(action: submitFollowUpIfNonEmpty) { + Image(systemName: "arrow.up") + .font(.system(size: 11, weight: .bold)) + .foregroundColor(.white) + .frame(width: 24, height: 24) + .background( + Circle() + .fill(canSubmit ? DS.Colors.overlayCursorBlue : Color.white.opacity(0.10)) + ) + .scaleEffect(canSubmit ? 1.0 : 0.94) + .animation(.easeOut(duration: 0.15), value: canSubmit) + } + .buttonStyle(.plain) + .pointerCursor(isEnabled: canSubmit) + .disabled(!canSubmit) + } + + private func submitFollowUpIfNonEmpty() { + let trimmed = followUpText.trimmingCharacters(in: .whitespacesAndNewlines) + guard !trimmed.isEmpty else { return } + onSubmitFollowUp(trimmed) + followUpText = "" + } + + // MARK: - Decoration + + private var panelBackground: some View { + // Slightly elevated dark with a faint top-down gradient gives the + // panel depth without being noisy. Pure flat #101211 looks fine + // but the gradient adds a touch of polish at no cost. + LinearGradient( + colors: [ + Color(red: 0.08, green: 0.09, blue: 0.10), + Color(red: 0.06, green: 0.07, blue: 0.07) + ], + startPoint: .top, + endPoint: .bottom + ) + } + + private var panelBorder: some View { + RoundedRectangle(cornerRadius: 18, style: .continuous) + .stroke(Color.white.opacity(0.10), lineWidth: 0.5) + } + + private var closeButton: some View { + Button(action: onDismiss) { + Image(systemName: "xmark") + .font(.system(size: 9, weight: .bold)) + .foregroundColor(.white.opacity(0.65)) + .frame(width: 20, height: 20) + .background( + Circle() + .fill(Color.white.opacity(0.08)) + ) + .overlay( + Circle() + .stroke(Color.white.opacity(0.06), lineWidth: 0.5) + ) + } + .buttonStyle(.plain) + .pointerCursor() + .padding(.top, 10) + .padding(.trailing, 10) + } +} + +// MARK: - Per-Turn View + +/// Renders one user → assistant exchange. The user message is a +/// right-aligned blue bubble (matching the cursor color and the floating +/// input bubble's color so the conversation feels visually unified). The +/// assistant response is left-aligned flat text — cleaner to read for +/// longer responses, and avoids a "two-bubbles-stacked" look that feels +/// busy in a compact panel. +private struct ChatTurnView: View { + let userMessage: String + let assistantResponse: String + /// True while the assistant text is mid-stream — drives a subtle + /// trailing caret to signal "more is coming." + let isAssistantStreaming: Bool + /// True while we're still waiting for the very first response chunk. + /// Replaces the assistant text with an animated thinking indicator. + let isAssistantThinking: Bool + + var body: some View { + VStack(alignment: .leading, spacing: 10) { + userBubble + assistantContent + } + } + + private var userBubble: some View { + HStack(spacing: 0) { + Spacer(minLength: 48) + Text(userMessage) + .font(.system(size: 13, weight: .medium)) + .foregroundColor(.white) + .multilineTextAlignment(.leading) + .fixedSize(horizontal: false, vertical: true) + .padding(.horizontal, 14) + .padding(.vertical, 9) + .background( + RoundedRectangle(cornerRadius: 16, style: .continuous) + .fill(DS.Colors.overlayCursorBlue) + ) + } + } + + @ViewBuilder + private var assistantContent: some View { + if isAssistantThinking { + HStack(spacing: 0) { + ThinkingDots() + .padding(.horizontal, 4) + .padding(.vertical, 6) + Spacer(minLength: 0) + } + .transition(.opacity) + } else { + HStack(spacing: 0) { + AssistantTextWithCursor( + text: assistantResponse, + showStreamingCursor: isAssistantStreaming + ) + Spacer(minLength: 32) + } + } + } +} + +/// The assistant's response text, optionally with a trailing blinking +/// caret while text is still streaming. The caret is a subtle visual +/// signal that the response isn't done yet — like the typing indicator +/// in chat apps but inline at the end of the text. +private struct AssistantTextWithCursor: View { + let text: String + let showStreamingCursor: Bool + + @State private var isCursorVisible = true + + var body: some View { + Group { + if showStreamingCursor { + // Use Text concatenation so the caret hugs the last + // character on the line — no awkward HStack wrapping when + // text spans multiple lines. + (Text(text) + + Text(isCursorVisible ? "▌" : " ") + .foregroundColor(DS.Colors.overlayCursorBlue)) + .font(.system(size: 13, weight: .regular)) + .foregroundColor(.white.opacity(0.95)) + .multilineTextAlignment(.leading) + .fixedSize(horizontal: false, vertical: true) + .onAppear { + withAnimation(.easeInOut(duration: 0.5).repeatForever()) { + isCursorVisible.toggle() + } + } + } else { + Text(text) + .font(.system(size: 13, weight: .regular)) + .foregroundColor(.white.opacity(0.95)) + .multilineTextAlignment(.leading) + .fixedSize(horizontal: false, vertical: true) + } + } + } +} + +/// Three small dots that pulse in sequence — the "AI is thinking" +/// indicator shown while waiting for the first response chunk. Sized +/// and timed to match the visual language of macOS native typing +/// indicators rather than the heavier voice-mode spinner. +private struct ThinkingDots: View { + @State private var animationPhase: Int = 0 + + var body: some View { + HStack(spacing: 5) { + ForEach(0..<3, id: \.self) { dotIndex in + Circle() + .fill(Color.white.opacity(0.55)) + .frame(width: 6, height: 6) + .scaleEffect(scaleForDot(at: dotIndex)) + .opacity(opacityForDot(at: dotIndex)) + } + } + .onAppear { + withAnimation(.easeInOut(duration: 0.6).repeatForever(autoreverses: true)) { + animationPhase = 1 + } + } + } + + private func scaleForDot(at index: Int) -> CGFloat { + // Each dot peaks at a different phase so they pulse in sequence. + let phaseOffset = Double(index) * 0.2 + let pulse = sin((Double(animationPhase) + phaseOffset) * .pi) + return 0.7 + CGFloat(pulse) * 0.3 + } + + private func opacityForDot(at index: Int) -> Double { + let phaseOffset = Double(index) * 0.2 + let pulse = sin((Double(animationPhase) + phaseOffset) * .pi) + return 0.4 + pulse * 0.6 + } +}