Skip to content

Add text input mode and bring-your-own-key API support#80

Open
uziiuzair wants to merge 2 commits intofarzaa:mainfrom
uziiuzair:feature/text-input-and-byok-keys
Open

Add text input mode and bring-your-own-key API support#80
uziiuzair wants to merge 2 commits intofarzaa:mainfrom
uziiuzair:feature/text-input-and-byok-keys

Conversation

@uziiuzair
Copy link
Copy Markdown

@uziiuzair uziiuzair commented May 3, 2026

Summary

  • Text input mode — a floating green chat bubble (ChatInputBubbleManager / ChatInputBubbleView) summoned near the cursor when "text" is selected in the menu bar panel. The same ctrl + option hotkey toggles between voice dictation and the text bubble. Responses render in a dedicated top-right StreamingResponsePanel (its own NSPanel, one per screen) rather than on the click-through cursor overlay, so the close button and follow-up input field actually receive clicks. Both voice and text paths terminate at the existing sendTranscriptToClaudeWithScreenshot pipeline — screenshot capture, Claude streaming, TTS, and [POINT:] pointing all behave identically.
  • Processing shimmer — full-screen Apple-Intelligence-style edge glow rendered during AI processing (ProcessingShimmerManager / ProcessingShimmerView). Click-through (ignoresMouseEvents = true), sits at .screenSaver level alongside the cursor overlay, one panel per connected display. Visible only while voice state is .processing, hidden the rest of the time.
  • Bring-your-own-key (BYOK) API support — provider picker in the menu bar panel: Clicky proxy (default, no setup), user-supplied Anthropic key, or user-supplied OpenAI key. TTS and voice transcription still route through the Worker in all modes. User keys are persisted to the macOS Keychain (new KeychainHelper.swift) — never UserDefaults.
  • Cursor summon-on-demand — the blue cursor overlay is now hidden by default and only revealed during voice interactions, onboarding, and when Claude returns a [POINT:x,y:label:screenN] tag. Overlay panels stay alive across visibility transitions to avoid races with multi-monitor coordinate mapping.

AGENTS.md is updated to document the input-mode picker, cursor visibility model, and the new files.

Test plan

  • Voice mode: hold ctrl + option, speak, release — waveform appears, transcript is sent, Claude responds with TTS playback
  • Text mode: tap ctrl + option — green bubble appears near the cursor, type a prompt, press return — response streams into the top-right StreamingResponsePanel; close button and the follow-up input row both work
  • Processing shimmer: edge glow appears on every connected display while the AI is processing, disappears the moment streaming text starts, and clicks pass through it to underlying windows
  • Cursor visibility: cursor stays hidden during a normal chat response and only flies out for [POINT:...] responses
  • BYOK Anthropic: paste a real Anthropic key in the panel, send a prompt — request goes direct to api.anthropic.com
  • BYOK OpenAI: paste a real OpenAI key, send a prompt — response uses a GPT model
  • Persistence: quit and relaunch — input mode, provider choice, and stored keys all survive (mode/provider via UserDefaults; keys via Keychain)
  • Multi-monitor: trigger a [POINT:] response on a non-primary screen — cursor flies to the correct monitor

🤖 Generated with Claude Code

uziiuzair added 2 commits May 3, 2026 18:14
- Floating chat bubble (ChatInputBubbleManager / ChatInputBubbleView)
  shown near the cursor when input mode is .text. The same ctrl+option
  hotkey toggles dictation in voice mode or summons the bubble in text
  mode, and both paths terminate at the existing screenshot → Claude →
  TTS → pointing pipeline.
- Hide the blue cursor overlay by default; reveal it only during voice
  interactions, onboarding, and when Claude returns a [POINT:...] tag.
- Provider picker: route chat through the Clicky Worker proxy (default),
  the user's own Anthropic key, or the user's own OpenAI key. TTS and
  voice transcription still go through the proxy in all modes.
- Store user-supplied API keys in the macOS Keychain via the new
  KeychainHelper instead of UserDefaults.
- Document the new input modes, cursor visibility model, and added
  files in AGENTS.md.
- Move the top-right text-mode chat box out of the click-through cursor
  overlay (OverlayWindow) and into its own NSPanel
  (StreamingResponsePanelManager / StreamingResponsePanelView). The
  cursor overlay is ignoresMouseEvents=true, which was swallowing
  clicks on the close button and follow-up input field.
- Add ProcessingShimmerManager / ProcessingShimmerView — a full-screen
  click-through edge-glow shimmer rendered during AI processing for
  Apple-Intelligence-style visual feedback. One panel per connected
  display, sits at .screenSaver level, visible only while the AI is
  processing.
- Adjust ChatInputBubble, CompanionManager, and screen capture wiring
  so voice/text-state transitions drive the new panels and shimmer.
@uziiuzair uziiuzair marked this pull request as ready for review May 3, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant