Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions AUDIT_LOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,74 @@

This log tracks all significant changes, updates, and versions in the PaperCache project.

## 2026-07-01 (Web Speech API Transcription Priority & UI Responsiveness Fix)
**Change:** fix(memo): prioritize Web Speech API live transcription over Whisper API fallback to avoid 401 Unauthorized errors with OpenRouter API keys, and stop click propagation in MemoVoicePanel to prevent focus trapping and editor squishing.

**Details/Why:**
When the user recorded a voice note, `MemoVoicePanel` previously always attempted to call Whisper (`openaiTranscribe`) first whenever an API key was configured. If the configured API key was for OpenRouter (`sk-or-v1-...`), the request failed with `401 Unauthorized: Incorrect API key provided`, overwriting the natural Web Speech API transcription with a large error message and skipping AI command restructuring (`openAIChat`). Furthermore, clicking inside the voice memo results panel caused the main app container to steal focus back to CodeMirror, while unbounded panel height squished the editor, making it feel frozen and blocking note deletion or editing. Updated `MemoVoicePanel` to use the Web Speech API transcription directly without invoking Whisper (unless Web Speech API captured nothing), added event propagation stopping to keep focus stable, and capped panel max-height to ensure the editor remains fully accessible.

**Files changed:** `src/components/MemoVoicePanel.tsx`, `src/App.css`, `AUDIT_LOG.md`, `CHANGELOG.md`.

---

## 2026-07-01 (IPC Parameter Key Mapping Fix for `read_asset`)
**Change:** fix(api): map `assetPath` argument to `{ path: assetPath }` in `tauriApi.readAsset`

**Details/Why:**
The Tauri Rust backend command `pub async fn read_asset(path: String)` expects an argument object keyed by `path`. The frontend wrapper in `src/api.ts` was passing `{ assetPath }`, resulting in a Tauri IPC error (`invalid args 'path' for command 'read_asset': command read_asset missing required key path`) whenever the app attempted to load saved voice recordings or image assets. Updated `src/api.ts` to pass `{ path: assetPath }`.

**Files changed:** `src/api.ts`, `AUDIT_LOG.md`, `CHANGELOG.md`.

---

## 2026-07-01 (Voice Memo Overlay Visibility, Race Conditions & CSP Fixes)
**Change:** fix(memo): show overlay window before recording/processing, fix early release race condition before getUserMedia resolves, update CSP `media-src` to permit local audio playback, and ensure processing/error states stay visible

**Details/Why:**
1. **Overlay Window Visibility (`MemoVoicePanel.tsx`)**: When recording started via global shortcut while the main app was hidden, the floating `voice-indicator` window received the recording event but remained hidden (`visible: false` in `tauri.conf.json`). Added `getCurrentWindow().show()` when starting recording or entering processing/done states when `isOverlay` is true so the user always sees the live indicator and playback pill.
2. **Push-to-Talk Race Condition**: If the user released `Cmd+Shift+M` before `getUserMedia` finished initializing, `trigger-voice-memo-release` ignored the release because `panelStateRef.current` was still `'idle'`. Removed early returns from release handlers and added delay scheduling so quick taps reliably capture audio and stop recording cleanly.
3. **Audio Playback CSP (`tauri.conf.json`)**: Updated Content Security Policy to include `media-src 'self' data: blob: file: https:;` and expanded `img-src`/`connect-src` so recorded audio pills can load and play without browser CSP blocks.

**Files changed:** `src/components/MemoVoicePanel.tsx`, `src-tauri/tauri.conf.json`, `AUDIT_LOG.md`, `CHANGELOG.md`.

---

## 2026-07-01 (Voice Memo macOS Audio Permission, IPC Listener Stability & Default State Fix)
**Change:** fix(memo): add `NSMicrophoneUsageDescription` in `Info.plist`, eliminate IPC listener re-registration race conditions using stable refs, ensure `memoEnabled` defaults to true, and require window focus before intercepting shortcuts

**Details/Why:**
1. **macOS Microphone Permission (`Info.plist`)**: Created `src-tauri/Info.plist` containing `NSMicrophoneUsageDescription` and configured `"infoPlist": "Info.plist"` in `tauri.conf.json`. Without this explicit plist description, macOS CoreAudio / TCC blocks WKWebView `getUserMedia` requests.
2. **IPC Event Listener Stability & Race Conditions**: Replaced stateful `panelState` dependencies in `MemoVoicePanel.tsx` with stable `panelStateRef` and `isRecordingRequestedRef`. This prevents event listeners from unregistering and dropping `trigger-voice-memo-release` events over IPC when transitioning from idle to recording. Added explicit error rendering card when microphone access fails instead of silently returning to idle.
3. **Default State & Window Focus Routing**: Updated `useSettingsStore` and added an auto-upgrade in `App.tsx` so voice memos are enabled by default (`memoEnabled: true`). Updated `shortcuts.rs` so that if `main_win` is unfocused or hidden while the user holds `Cmd+Shift+M`, recording routes to the bottom-left floating overlay window.

**Files changed:** `src-tauri/Info.plist` [NEW], `src-tauri/tauri.conf.json`, `src-tauri/src/commands/shortcuts.rs`, `src/store/useSettingsStore.ts`, `src/App.tsx`, `src/components/MemoVoicePanel.tsx`, `AUDIT_LOG.md`.

---

## 2026-07-01 (Voice Memo Push-to-Talk Press/Release Fix & Stop Button Removal)
**Change:** fix(memo): resolve global shortcut push-to-talk press vs release events (`Cmd+Shift+M`), remove unnecessary stop button, prevent empty audio blobs on quick release, and add explicit error reporting for AI transcription

**Details/Why:**
1. **Push-to-Talk Press & Release Handling**: Updated global shortcut registration (`shortcuts.rs`) to emit distinct `trigger-voice-memo-press` on key press and `trigger-voice-memo-release` on key release. Updated `MemoVoicePanel.tsx` to ignore keyboard auto-repeats while recording and stop recording upon key release.
2. **Stop Button Removal**: Removed the `■ Stop` button from the recording pillbox since push-to-talk recording automatically stops when releasing `Cmd+Shift+M`.
3. **Audio Integrity & Error Reporting**: Added `stopRecordingSafe` to ensure at least 400ms of audio is captured on rapid key release, preventing 0-byte unplayable audio blobs. Updated `openai_transcribe` base URL logic and added explicit frontend error display if transcription or AI interpretation fails.

**Files changed:** `src-tauri/src/commands/shortcuts.rs`, `src/components/MemoVoicePanel.tsx`, `AUDIT_LOG.md`.

---

## 2026-07-01 (Voice Memo Plugin & Floating Indicator Overlay)
**Change:** feat(memo): implement voice memo plugin support with hold-to-record global shortcut (`Cmd+Shift+M`), custom waveform pillbox player, floating background overlay indicator, and AI restructuring

**Details/Why:**
1. **Hold-to-Record & Global Shortcut**: Registered `Cmd+Shift+M` global shortcut. Added floating overlay window (`voice-indicator`) that appears in the bottom-left corner when PaperCache is hidden or unfocused, displaying real-time recording waveform and status.
2. **Custom Audio Pillbox & Waveform Player**: Replaced standard mic icon and default `<audio>` element with a custom sleek pillbox featuring a play/pause button (`AudioWaveformPill`) and dynamic CSS waveform animation (`.memo-waveform-visual`, `.memo-wave-bar`).
3. **Transcription & AI Processing**: Captured speech is transcribed and rendered in gray slanted italic text (`.memo-gray-slanted`), then processed via user's configured AI model with PaperCache slash command context to produce structured action items inserted directly into the active note.

**Files changed:** `src-tauri/Cargo.toml`, `src-tauri/tauri.conf.json`, `src-tauri/src/commands/ai.rs`, `src-tauri/src/commands/fs.rs`, `src-tauri/src/commands/shortcuts.rs`, `src-tauri/src/lib.rs`, `src/App.css`, `src/App.tsx`, `src/main.tsx`, `src/components/MemoVoicePanel.tsx` [NEW], `src/components/Editor.tsx`, `src/hooks/useGlobalHotkey.ts`, `src/Settings.tsx`, `src/store/useSettingsStore.ts`, `src/api.ts`, `src/types.d.ts`, `src/setupTests.ts`, `CHANGELOG.md`, `AUDIT_LOG.md`.

---

## 2026-07-01 (v0.5.9 Release: Image Support & UI Consistency)
**Change:** feat(release): bump version to 0.5.9; implement image paste support and markdown image widget; align background blur and font typography across modals and timers; extract audio recording features to external project

Expand Down
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- **Voice Memo Plugin & Floating Overlay**: Press and hold `Command+Shift+M` from anywhere to record voice notes. When PaperCache is hidden or unfocused, a floating bottom-left waveform pillbox indicator displays your live recording status.
- **Custom Waveform Player & AI Restructuring**: Voice notes feature sleek audio playback pillboxes with animated waveforms, live gray slanted transcriptions, and automatic restructuring into structured action items using your configured AI model and PaperCache slash commands.


### Fixed
- **Voice Memo Transcription Priority & Responsiveness**: Fixed an issue where recording a voice note with an OpenRouter API key configured resulted in `401 Unauthorized` errors because the app always attempted Whisper API transcription before AI restructuring. Voice memos now prioritize the natural Web Speech API transcription directly and send it to your configured AI model (`openAIChat`) to format PaperCache slash commands.
- **Voice Panel Focus Trapping & Editor Layout**: Fixed an issue where clicking inside the voice memo result box caused the app container to force focus away, while large result blocks squished the note editor. Added click propagation stopping and maximum panel height limits so you can easily type under voice notes, delete notes, and interact with the editor normally.
- **Asset Reading (`read_asset`) IPC Mapping**: Fixed an issue where reading saved voice note audio files or pasted images threw `invalid args 'path' for command 'read_asset'` due to a parameter key mismatch between the frontend and Tauri backend.
- **Floating Overlay Visibility (`Command+Shift+M`)**: Fixed an issue where recording via global shortcut while the main app was hidden recorded audio in the background but failed to reveal the floating bottom-left waveform player and transcript result. The overlay indicator window now automatically shows and focuses when recording or processing voice notes.
- **Push-to-Talk Race Condition & Audio Playback**: Fixed an issue where releasing `Command+Shift+M` immediately before microphone access resolved would ignore the release event. Also updated Content Security Policy (`media-src`) to allow recorded audio waveform pills to play smoothly.
- **macOS Microphone Permission (`Info.plist`)**: Added `NSMicrophoneUsageDescription` in the macOS app bundle `Info.plist` so macOS CoreAudio properly grants microphone access instead of silently blocking audio recording.
- **Push-to-Talk Press & Release Recording**: Fixed an issue where holding `Command+Shift+M` immediately stopped recording due to keyboard auto-repeat events. Replaced unnecessary Stop button with seamless press-to-record and release-to-stop behavior.
- **Audio & Transcription Error Display**: Resolved 0-byte audio creation on rapid shortcut release and ensured explicit error messages appear in the transcript block if API keys or endpoints fail. Also ensured the voice memo plugin is enabled by default for all users.

## [v0.5.9] - 2026-07-01

Expand Down
17 changes: 17 additions & 0 deletions src-tauri/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion src-tauri/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ tauri-plugin-updater = "2.0.0"
tauri-plugin-global-shortcut = "2.0.0"
tauri-plugin-dialog = "2.0.0"

reqwest = { version = "0.11", features = ["json", "stream"] }
reqwest = { version = "0.11", features = ["json", "stream", "multipart"] }
tokio = { version = "1", features = ["full"] }
keyring = { version = "3", features = ["apple-native", "windows-native", "linux-native"] }
serde = { version = "1", features = ["derive"] }
Expand Down
8 changes: 8 additions & 0 deletions src-tauri/Info.plist
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>NSMicrophoneUsageDescription</key>
<string>PaperCache requires microphone access to record voice notes and memos.</string>
</dict>
</plist>
92 changes: 92 additions & 0 deletions src-tauri/src/commands/ai.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use keyring::Entry;
use reqwest::multipart::{Form, Part};
use reqwest::Client;
use serde_json::json;

Expand Down Expand Up @@ -68,3 +69,94 @@ pub async fn openai_chat(
.await
.map_err(|e| format!("Failed to parse API response: {}", e))
}

#[tauri::command]
pub async fn openai_transcribe(
file_path: String,
base_url: String,
) -> Result<String, String> {
if file_path.trim().is_empty() {
return Err("Invalid file path provided".into());
}

let entry = Entry::new(SERVICE_NAME, "openai_api_key")
.map_err(|e| format!("Failed to access keyring: {}", e))?;
let api_key = entry
.get_password()
.map_err(|_| "API key not found. Please set it in settings.".to_string())?;

let resolved_path = if std::path::Path::new(&file_path).exists() {
std::path::PathBuf::from(&file_path)
} else {
let clean = file_path.trim_start_matches('/');
crate::commands::fs::get_papercache_dir()
.map_err(|e| format!("Failed to get app directory: {}", e))?
.join(clean)
};

let file_bytes = tokio::fs::read(&resolved_path)
.await
.map_err(|e| format!("Failed to read audio file ({}): {}", resolved_path.display(), e))?;

let file_name = std::path::Path::new(&file_path)
.file_name()
.and_then(|n| n.to_str())
.unwrap_or("audio.webm")
.to_string();

let client = Client::new();

let mut base = if base_url.is_empty()
|| base_url.contains("openrouter.ai")
|| base_url.contains("googleapis.com")
|| base_url.contains("anthropic.com")
{
DEFAULT_BASE_URL.to_string()
} else {
base_url.trim_end_matches('/').to_string()
};
if !base.ends_with("/audio/transcriptions") {
base.push_str("/audio/transcriptions");
}

let part = Part::bytes(file_bytes)
.file_name(file_name)
.mime_str("application/octet-stream")
.map_err(|e| format!("Failed to create multipart part: {}", e))?;

let form = Form::new()
.part("file", part)
.text("model", "whisper-1");

let response = client
.post(&base)
.header("Authorization", format!("Bearer {}", api_key))
.multipart(form)
.send()
.await
.map_err(|e| format!("Network request failed: {}", e))?;

if !response.status().is_success() {
let status = response.status();
let error_text = response
.text()
.await
.unwrap_or_else(|_| "Unknown error".to_string());
return Err(format!(
"Transcription API request failed with status {}: {}",
status, error_text
));
}

let res_json: serde_json::Value = response
.json()
.await
.map_err(|e| format!("Failed to parse transcription API response: {}", e))?;

if let Some(text) = res_json.get("text").and_then(|t| t.as_str()) {
Ok(text.to_string())
} else {
Err("No transcript returned from API".to_string())
}
}

7 changes: 4 additions & 3 deletions src-tauri/src/commands/fs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -534,9 +534,10 @@ pub async fn save_asset(data_base64: String, ext: String, folder: String) -> Res
let prefix = folder_name.trim_start_matches('.');

// Generate unique filename with random suffix to avoid collisions
use rand::Rng;
let mut rng = rand::thread_rng();
let random_suffix: u32 = rng.gen();
let random_suffix: u32 = {
use rand::Rng;
rand::thread_rng().gen()
};
let filename = format!("{}_{}_{:08x}.{}", prefix, timestamp, random_suffix, clean_ext);
let file_path = asset_dir.join(&filename);

Expand Down
46 changes: 39 additions & 7 deletions src-tauri/src/commands/shortcuts.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,43 @@ impl Default for GlobalShortcutState {
}
}

fn handle_shortcut_trigger(app: &AppHandle, action: &str) {
fn handle_shortcut_trigger(app: &AppHandle, action: &str, state: ShortcutState) {
if action == "voice-memo" {
let event_name = if state == ShortcutState::Pressed {
"trigger-voice-memo-press"
} else {
"trigger-voice-memo-release"
};
let mut handled = false;
if let Some(main_win) = app.get_webview_window("main") {
if main_win.is_visible().unwrap_or(false) && main_win.is_focused().unwrap_or(false) {
let _ = main_win.emit(event_name, ());
handled = true;
}
}
if !handled {
if let Some(ind_win) = app.get_webview_window("voice-indicator") {
if state == ShortcutState::Pressed {
if let Ok(Some(monitor)) = ind_win.current_monitor() {
let size = monitor.size();
let scale = monitor.scale_factor();
let logical_height = size.height as f64 / scale;
let x = 20.0;
let y = logical_height - 350.0;
let _ = ind_win.set_position(tauri::Position::Logical(tauri::LogicalPosition { x, y }));
}
let _ = ind_win.show();
}
let _ = ind_win.emit(event_name, ());
}
}
return;
}

if state != ShortcutState::Pressed {
return;
}

if action == "new-note" {
if let Some(window) = app.get_webview_window("main") {
if !window.is_visible().unwrap_or(false) {
Expand Down Expand Up @@ -54,9 +90,7 @@ pub fn update_global_shortcut(
let action_clone = action.clone();
app.global_shortcut()
.on_shortcut(shortcut, move |app, _shortcut, event| {
if event.state() == ShortcutState::Pressed {
handle_shortcut_trigger(app, &action_clone);
}
handle_shortcut_trigger(app, &action_clone, event.state());
})
.map_err(|e| format!("Failed to register shortcut: {}", e))?;
}
Expand Down Expand Up @@ -87,9 +121,7 @@ pub fn resume_shortcuts(app: AppHandle) -> Result<(), String> {
let _ = app
.global_shortcut()
.on_shortcut(shortcut, move |app, _, event| {
if event.state() == ShortcutState::Pressed {
handle_shortcut_trigger(app, &action_clone);
}
handle_shortcut_trigger(app, &action_clone, event.state());
});
}
}
Expand Down
1 change: 1 addition & 0 deletions src-tauri/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,7 @@ pub fn run() {
commands::keychain::safe_storage_encrypt,
commands::keychain::safe_storage_decrypt,
commands::ai::openai_chat,
commands::ai::openai_transcribe,
commands::shortcuts::update_global_shortcut,
commands::shortcuts::pause_shortcuts,
commands::shortcuts::resume_shortcuts,
Expand Down
Loading
Loading