Development Thoughts and Analysis

2025-11-16: Initial CLAUDE.md Creation

Context

User requested analysis of NeuralNote codebase to create CLAUDE.md
Initially in /Applications/NeuralNote.app/ (compiled app bundle, not source)
Cloned source from https://github.com/DamRsn/NeuralNote.git to /tmp/NeuralNote/

Repository Analysis

Files Reviewed:

README.md - User-facing documentation, installation, usage workflow
CMakeLists.txt - Build configuration, dependencies, target setup
PACKAGING.md - Release packaging instructions
build.sh - macOS/Linux build script with ONNXRuntime dependency management
Key headers: PluginProcessor.h, TranscriptionManager.h, BasicPitch.h

Architecture Findings:

Two-layer architecture:
- NeuralNote/ - JUCE plugin wrapper + UI (plugin-specific)
- Lib/ - Core transcription engine (reusable library)
State machine pattern:
- Empty → Recording → Processing → Populated
- Managed by PluginProcessor with atomic state variable
Transcription pipeline:
- Audio (22.05kHz) → Features (CQT via ONNXRuntime) → CNN (RTNeural) → Posteriorgrams → Note Events
- Non-causal (processes backward), prevents real-time transcription
Threading:
- Audio thread: processBlock() for recording/playback
- Background ThreadPool: Transcription jobs via TranscriptionManager
- Timer callbacks: Poll for completion and UI updates
Build system complexity:
- Custom ONNXRuntime builds from libonnxruntime-neuralnote
- BasicPitchCNN compiled separately with forced optimization
- Platform-specific handling for Windows MSVC versions

CLAUDE.md Design Decisions

Included:

Platform-specific build commands (first thing developers need)
High-level architecture diagram (directory structure)
Pipeline explanation (audio → MIDI flow)
State machine and threading model (critical for understanding async behavior)
Special compilation notes (BasicPitchCNN optimization, ONNXRuntime dependencies)
Reusability guidance (Lib/Model as standalone library)

Excluded (per instructions):

Generic development practices
Obvious file listings that can be discovered via ls/grep
Common development tasks sections
Support/documentation boilerplate

Key Insights for Future Development

Performance bottleneck: CNN inference is so critical that it gets -O3 even in Debug builds
Model split: BasicPitch split into 4 models due to RTNeural constraints - not immediately obvious from code
Backward processing: Non-causal algorithm is fundamental constraint, affects all UX decisions
Dependency fragility: Windows build requires specific MSVC version OR manual ONNXRuntime rebuild
Contributed to RTNeural: 2D conv support was added specifically for this project (PR #89)

Questions for Future Investigation

Why was BasicPitch split into exactly 4 models? Is this optimal?
Could the CQT feature extraction be optimized further?
Is there a path to real-time transcription with a different model architecture?
How much overhead does the split model architecture add?

Next Steps

CLAUDE.md created in /tmp/NeuralNote/CLAUDE.md
Need to add to git and potentially commit
User should copy to their actual working repository if different from /tmp

Additional Observations

File Location Considerations

Created files in /tmp/NeuralNote/ which is ephemeral (deleted on reboot)
User is currently in /Applications/NeuralNote.app/ (compiled binary, not source)
These files should be copied to user's actual development repository for persistence

Notable Architectural Patterns

Clean separation enables reusability: The Lib/Model/ code has zero JUCE dependencies and could genuinely be extracted as a standalone library. This is rare in audio plugin codebases where UI and DSP code often become tangled.
Performance-critical path isolation: Compiling BasicPitchCNN.cpp separately with forced optimization suggests this was identified as a bottleneck through profiling. This is good engineering - optimize only what's measured to be slow.
Non-causal constraint shapes everything: The backward processing requirement means:
- No real-time transcription possible
- Record-then-transcribe workflow is fundamental, not a UX choice
- All UI design centers around this async workflow
- State machine is necessary to manage async transitions
Model architecture workaround: Splitting BasicPitch into 4 sequential models to work with RTNeural's constraints is clever but:
- Adds cognitive load for maintainers
- Could have performance implications (4 separate inference calls)
- Not documented why 4 specifically (vs 2, 3, or 5)
- Future contributors might try to "optimize" this without understanding the constraint
Dependency complexity trade-off: Using custom ONNXRuntime builds provides:
- Smaller binary size (only needed operators)
- Runtime-optimized model
- But at cost of build complexity and Windows compatibility issues

Process Observations

Initially didn't use TodoWrite tool (system reminders noted this)
Should establish habit of tracking multi-step tasks
Git workflow: working in cloned repo from upstream (DamRsn/NeuralNote)
User preference: Always push automatically after commits for safety
Push failed: No write access to upstream repo (user: Latticeworks1)
Need to maintain THOUGHTS.md as ongoing development log

2025-11-16: Deep Dive into Neural Model Integration

Model Architecture Analysis

Two-Stage Pipeline:

Feature Extraction (ONNXRuntime): Features.cpp/.h
- Model: features_model.ort (runtime-optimized ONNX model)
- Input: Raw audio at 22,050 Hz (any length)
- Process: Computes Constant-Q Transform (CQT) + Harmonic Stacking
- Output: Shape [1, num_frames, 264, 8] = 264 frequency bins × 8 harmonics per frame
- Threading: Single-threaded (1 interop, 1 intraop thread)
- Model loaded from embedded binary: BinaryData::features_model_ort
CNN Inference (RTNeural): BasicPitchCNN.cpp/.h
- Split into 4 sequential models (workaround for RTNeural's streaming constraints):
  - mCNNContour: Contour prediction (264 bins → 264 bins)
  - mCNNNote: Note prediction (264 → 88 piano keys)
  - mCNNOnsetInput: Onset features (8×264 → 32×88)
  - mCNNOnsetOutput: Final onset prediction (33×88 → 88)
- Models loaded from JSON: cnn_contour_model.json, cnn_note_model.json, cnn_onset_1_model.json, cnn_onset_2_model.json
- Compiled separately with forced -O3 optimization (even in Debug)
- Uses circular buffers for frame-by-frame streaming with lookahead compensation

Critical Discovery: Lookahead Compensation

The CNN has a total lookahead of 10 frames (mTotalLookahead = 10):

Contour CNN: 3 frames
Note CNN: 6 frames
Onset CNN: 1 frame (output)

This creates alignment complexity in BasicPitch::transcribeToMIDI():

// Lines 76-100: Three-phase inference
// Phase 1: Feed zeros, discard output (prime the pipeline)
for (int i = 0; i < num_lh_frames; i++)
    mBasicPitchCNN.frameInference(zero_input, ...)

// Phase 2: Feed real frames, discard output (compensate for lookahead)
for (frame_idx = 0; frame_idx < num_lh_frames; frame_idx++)
    mBasicPitchCNN.frameInference(real_input[frame_idx], discard_output)

// Phase 3: Feed real frames, save output with correct alignment
for (frame_idx = num_lh_frames; frame_idx < mNumFrames; frame_idx++)
    mBasicPitchCNN.frameInference(real_input[frame_idx],
                                  output[frame_idx - num_lh_frames])

// Phase 4: Feed zeros, get final frames
for (frame_idx = mNumFrames; frame_idx < mNumFrames + num_lh_frames; frame_idx++)
    mBasicPitchCNN.frameInference(zero_input, output[frame_idx - num_lh_frames])

This is necessary because each model in the chain has temporal dependencies, and RTNeural processes frame-by-frame. The output at time t depends on input frames [t, t+1, ..., t+lookahead].

Model Data Flow

Audio (22.05kHz, any length)
    ↓
Features.computeFeatures() [ONNXRuntime]
    ↓
CQT Features [num_frames × 264 × 8]
    ↓
BasicPitchCNN.frameInference() [RTNeural, called per frame]
    ↓
├─→ mCNNContour → Contours PG [264 bins, pitch contours]
├─→ mCNNNote → Notes PG [88 bins, note probabilities]
└─→ mCNNOnsetInput + mCNNOnsetOutput → Onsets PG [88 bins, onset probabilities]
    ↓
Notes.convert() [Posteriorgram → MIDI conversion]
    ↓
MIDI Note Events [startTime, endTime, pitch, amplitude, pitchBends[]]
    ↓
Post-processing (NoteOptions, TimeQuantize)
    ↓
Final MIDI Output

Key Insights

Why 4 models? The split is dictated by:
- Different lookahead requirements per stage
- Concatenation operations between stages (see _concat() in BasicPitchCNN.cpp:131-141)
- RTNeural's frame-by-frame processing model
- Cannot process the full graph in one pass due to temporal dependencies
Circular buffer architecture: BasicPitchCNN uses circular buffers to store intermediate outputs:
- mContoursCircularBuffer[8]: Stores contour outputs with wraparound indexing
- mNotesCircularBuffer[5]: Stores note outputs
- mConcat2CircularBuffer[8]: Stores concat operation results
- Enables proper frame alignment despite varying lookaheads
Memory alignment: All buffers use alignas(RTNEURAL_DEFAULT_ALIGNMENT) for SIMD optimization
Inference optimization:
- ONNXRuntime runs in single-threaded mode (preventing thread pool overhead)
- BasicPitchCNN compiled with -O3 regardless of build type
- Frame-by-frame processing enables streaming but adds complexity
Update without re-inference: BasicPitch::updateMIDI() allows changing note detection parameters (sensitivity, min duration) without re-running the CNN. Only Notes.convert() is called again. This is why posteriorgrams are cached.

Integration Points

TranscriptionManager orchestrates the full pipeline:

_runModel() {
    // 1. Set parameters
    mBasicPitch.setParameters(noteSensitivity, splitSensitivity, minNoteDuration)

    // 2. Run full transcription (Features + CNN + Notes)
    mBasicPitch.transcribeToMIDI(audio, numSamples)

    // 3. Post-process: Note quantization (scale snapping)
    post_processed = mNoteOptions.process(mBasicPitch.getNoteEvents())

    // 4. Post-process: Time quantization (rhythmic snapping)
    mPostProcessedNotes = mTimeQuantizeOptions.quantize(post_processed)

    // 5. Handle overlapping notes and pitch bends
    Notes::dropOverlappingPitchBends(mPostProcessedNotes)
    Notes::mergeOverlappingNotesWithSamePitch(mPostProcessedNotes)

    // 6. Convert to synth events for playback
    mProcessor->getPlayer()->getSynthController()->setNewMidiEventsVectorToUse(...)
}

Parameter updates trigger different code paths:

Sensitivity/duration changes → _updateTranscription() → calls mBasicPitch.updateMIDI() (skips CNN)
Quantization changes → _updatePostProcessing() → skips both Features and CNN
This minimizes computation for parameter tweaking during playback

Constants (BasicPitchConstants.h)

NUM_HARMONICS = 8: CQT harmonics stacked
NUM_FREQ_IN = 264: Input frequency bins (3 bins per semitone × 88 keys)
NUM_FREQ_OUT = 88: Piano keys (MIDI 21-108, A0-C8)
BASIC_PITCH_SAMPLE_RATE = 22050.0: Model expects this exact rate
FFT_HOP = 256: Frame hop size (11.6ms per frame at 22.05kHz)
AUDIO_WINDOW_LENGTH = 2: Training window size in seconds

Performance Considerations

Bottlenecks identified:
- BasicPitchCNN inference is the hot path (forced -O3 compilation)
- Features extraction is less critical (single-threaded ONNX session)
- Memory allocations avoided via circular buffers
Why not real-time?
- CQT requires long audio chunks (>1s) for low frequency bins
- CNN adds ~120ms latency
- Note creation algorithm is non-causal (processes backward from future to past)
- Total latency incompatible with live performance
Thread pool usage:
- TranscriptionManager uses single-thread pool for background jobs
- Prevents blocking audio thread during transcription
- Timer callback (30Hz) polls for completion and updates UI

2025-11-16: Low-Hanging Fruit Analysis

Performance Optimizations (Easy Wins)

Unnecessary vector reallocation in _updatePostProcessing() (TranscriptionManager.cpp:166-167)
- Issue: post_processed_notes vector allocated every time parameters change
- Impact: Called frequently during parameter tweaking (user adjusting sliders)
- Fix: Make it a member variable, reuse allocation
- Effort: Low - add member, clear/resize instead of recreate
- Benefit: Eliminates allocation in hot path during UI interaction
Pass by value instead of reference (TranscriptionManager.cpp:175-176)
- Issue: quantize() returns by value, assigned to member
- Current: mPostProcessedNotes = mTimeQuantizeOptions.quantize(post_processed_notes)
- Fix: Change signature to accept output reference: quantize(in, out&) or return move-enabled
- Effort: Medium - requires changing TimeQuantizeOptions interface + all call sites
- Benefit: Eliminates copy/move of potentially large vector
Pitch bend support disabled (SynthController.cpp:18-19)
- Issue: INCLUDE_PITCH_BENDS = false hardcoded, feature exists but unused
- Impact: Pitch bend data is computed but never played back
- Fix: Either enable feature or remove computation in Notes.cpp
- Effort: Low to enable (change bool), Medium to remove (propagate through pipeline)
- Benefit: Either get feature working or save computation
Magic number: MIDI buffer size (SynthController.cpp:13)
- Issue: 3 * 200 hardcoded
- Fix: Define constant MIDI_BUFFER_PREALLOCATE_SIZE
- Effort: Trivial
- Benefit: Readability, maintainability
Duplicate code in _runModel() and _updatePostProcessing()
- Issue: Lines 109-127 in _runModel() nearly identical to 158-183 in _updatePostProcessing()
- Fix: Extract shared code to helper method
- Effort: Low
- Benefit: DRY principle, easier maintenance

Code Quality Improvements

TODO: Template parameter type (Notes.h:179)
- Issue: _inferredOnsets uses template <typename T> but should just be float
- Fix: Remove template, use float directly
- Effort: Low - change signature, update call sites
- Benefit: Simpler code, no template instantiation overhead
TODO: Frame threshold inference (Notes.cpp:75)
- Issue: Comment suggests frame_threshold should be auto-inferred if < 0
- Fix: Implement auto-inference logic
- Effort: Medium - requires understanding threshold selection heuristic
- Benefit: Better default behavior, fewer parameters to tune
Magic number: Timer frequency (TranscriptionManager.cpp:34)
- Issue: startTimerHz(30) - why 30Hz?
- Fix: Define constant UI_UPDATE_RATE_HZ = 30 with comment explaining choice
- Effort: Trivial
- Benefit: Documenting design decision

Memory Management

Potential optimization: shrink_to_fit() usage
- Issue: BasicPitch.cpp calls shrink_to_fit() on vectors (lines 13-19)
- Question: Is this necessary? Vectors will be reused on next transcription
- Analysis needed: Profile to see if memory churn is issue
- Effort: Low - just remove calls and test
- Benefit: Might avoid deallocation/reallocation cycles

Prioritized List (Easiest → Hardest)

Quick wins (< 30 min each):

Magic number constants (items 4, 8)
Pitch bend enablement decision (item 3) - enable or remove
Vector reallocation fix (item 1)

Medium effort (1-2 hours): 4. Extract duplicate code (item 5) 5. Template removal (item 6) 6. Pass-by-reference optimization (item 2)

Requires more investigation: 7. Frame threshold auto-inference (item 7) 8. shrink_to_fit profiling (item 9)

Recommended First Steps

Start with item 1 (vector reallocation) because:

Direct performance impact in interactive use case
Minimal code change
Easy to verify (no behavior change)
Good warm-up to understand TranscriptionManager flow

Scope Note

Focus only on NeuralNote code (NeuralNote/ and Lib/ directories). Third-party code (JUCE, RTNeural, ONNXRuntime, minimp3, ASIO) is out of scope unless we can easily patch on our side without modifying upstream.

Misconceptions and Corrections

IMPORTANT: Tracking errors in reasoning to improve future analysis

Initial misconception about codebase location:
- Wrong assumption: When user said "analyze this codebase" while in /Applications/NeuralNote.app/, I initially treated that as the target directory
- What I found: Only found compiled binary (Mach-O universal binary), Info.plist, and resources - no source code
- Correction: Realized this was just a macOS application bundle (the compiled/installed app), not the source repository
- Action taken: Asked user for source location, they provided GitHub URL, cloned to /tmp/NeuralNote/
- Lesson: Always verify directory contains source code before assuming it's a development repository. .app directories on macOS are compiled bundles, not source code.
Used emoji despite explicit instructions:
- Wrong behavior: Used checkmark emoji (✓) in commit confirmation message
- Instruction violated: System prompt explicitly states "Only use emojis if the user explicitly requests it. Avoid using emojis in all communication unless asked."
- User feedback: "never use emojis" (stated twice for emphasis)
- Correction: Remove all emojis from communication
- Lesson: Follow explicit formatting guidelines in system prompt. Professional CLI tools don't use decorative emojis.
Git push permission error - resolved:
- Problem: Initial push to origin failed with 403 error (no write access to DamRsn/NeuralNote)
- Diagnosis:
  - User is Latticeworks1, upstream is DamRsn/NeuralNote
  - No write access to upstream repository
  - GitHub CLI (gh) available but not authenticated
- Solution steps:
  1. Added fork remote: git remote add fork https://github.com/Latticeworks1/NeuralNote.git
  2. Attempted push to fork/master - rejected (fork has diverged with "Add files via upload" commit)
  3. Pushed to new branch instead: git push fork master:add-claude-documentation
- Result: Successfully pushed to https://github.com/Latticeworks1/NeuralNote/tree/add-claude-documentation
- Configuration: Set up fork remote for future automatic pushes to feature branches
- Lesson: When working with forks, always check if histories have diverged. Push to feature branches to avoid conflicts with fork's master.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development Thoughts and Analysis

2025-11-16: Initial CLAUDE.md Creation

Context

Repository Analysis

CLAUDE.md Design Decisions

Key Insights for Future Development

Questions for Future Investigation

Next Steps

Additional Observations

File Location Considerations

Notable Architectural Patterns

Process Observations

2025-11-16: Deep Dive into Neural Model Integration

Model Architecture Analysis

Critical Discovery: Lookahead Compensation

Model Data Flow

Key Insights

Integration Points

Constants (BasicPitchConstants.h)

Performance Considerations

2025-11-16: Low-Hanging Fruit Analysis

Performance Optimizations (Easy Wins)

Code Quality Improvements

Memory Management

Prioritized List (Easiest → Hardest)

Recommended First Steps

Scope Note

Misconceptions and Corrections

FilesExpand file tree

THOUGHTS.md

Latest commit

History

THOUGHTS.md

File metadata and controls

Development Thoughts and Analysis

2025-11-16: Initial CLAUDE.md Creation

Context

Repository Analysis

CLAUDE.md Design Decisions

Key Insights for Future Development

Questions for Future Investigation

Next Steps

Additional Observations

File Location Considerations

Notable Architectural Patterns

Process Observations

2025-11-16: Deep Dive into Neural Model Integration

Model Architecture Analysis

Critical Discovery: Lookahead Compensation

Model Data Flow

Key Insights

Integration Points

Constants (BasicPitchConstants.h)

Performance Considerations

2025-11-16: Low-Hanging Fruit Analysis

Performance Optimizations (Easy Wins)

Code Quality Improvements

Memory Management

Prioritized List (Easiest → Hardest)

Recommended First Steps

Scope Note

Misconceptions and Corrections