feat: integrate speech recognition functionality in InterviewChat component#4
feat: integrate speech recognition functionality in InterviewChat component#4CUinspace233 merged 2 commits intomainfrom
Conversation
…ponent - Added a custom hook `useSpeechRecognition` to manage speech recognition features, including start/stop recording and handling transcripts. - Implemented voice input support in the InterviewChat component with a microphone button, enhancing user interaction. - Updated the answer input field to reflect recording status and adjusted styles for better visual feedback. - Ensured recording stops automatically when the answer area is hidden, improving usability.
4804719 to
a852b18
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds speech-to-text voice input support to the InterviewChat component. It introduces a custom useSpeechRecognition hook that wraps the Web Speech API, and integrates it with a microphone toggle button in the answer input area. The feature allows users to dictate their interview answers instead of typing them.
Changes:
- New
useSpeechRecognitionhook managing the Web Speech API lifecycle (start/stop, interim results, auto-restart on Chrome silence, error handling). - Microphone button added to the InterviewChat answer area with visual feedback (pulse animation, red border on textarea while recording, dynamic placeholder).
- Recording is auto-stopped when the answer area hides or when an answer is sent, ensuring clean state transitions.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
frontend/src/hooks/useSpeechRecognition.ts |
New custom hook encapsulating Web Speech API for start/stop recording, transcript handling, and error management |
frontend/src/components/InterviewChat.tsx |
Integrates the speech recognition hook with a mic button, visual recording indicators, and auto-stop behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| new (): SpeechRecogInstance; | ||
| } | ||
|
|
||
| interface WindowWithSpeech { | ||
| SpeechRecognition?: SpeechRecogConstructor; | ||
| webkitSpeechRecognition?: SpeechRecogConstructor; | ||
| } | ||
|
|
||
| const getWindow = () => typeof window !== "undefined" ? window as unknown as WindowWithSpeech : undefined; | ||
|
|
||
| const isSupported = !!getWindow()?.SpeechRecognition || !!getWindow()?.webkitSpeechRecognition; | ||
|
|
||
| export function useSpeechRecognition({ | ||
| language, | ||
| onTranscript, | ||
| onError, | ||
| getCurrentText, | ||
| }: SpeechRecognitionOptions): SpeechRecognitionHook { | ||
| const [isRecording, setIsRecording] = useState(false); | ||
| const recognitionRef = useRef<SpeechRecogInstance | null>(null); | ||
| const isRecordingRef = useRef(false); | ||
| const baseTextRef = useRef(""); | ||
|
|
||
| const stopRecording = useCallback(() => { | ||
| isRecordingRef.current = false; | ||
| setIsRecording(false); | ||
| if (recognitionRef.current) { | ||
| recognitionRef.current.onend = null; | ||
| recognitionRef.current.onerror = null; | ||
| recognitionRef.current.onresult = null; | ||
| try { | ||
| recognitionRef.current.stop(); | ||
| } catch { | ||
| // already stopped | ||
| } | ||
| recognitionRef.current = null; | ||
| } | ||
| }, []); | ||
|
|
||
| const startRecording = useCallback(() => { | ||
| if (!isSupported) return; | ||
|
|
||
| const w = getWindow(); | ||
| const Ctor = w?.SpeechRecognition ?? w?.webkitSpeechRecognition; | ||
| if (!Ctor) return; | ||
|
|
||
| const recognition = new Ctor(); | ||
| recognition.continuous = true; | ||
| recognition.interimResults = true; | ||
| recognition.lang = language; | ||
|
|
||
| baseTextRef.current = getCurrentText(); | ||
|
|
||
| recognition.onresult = (event) => { | ||
| let transcript = ""; | ||
| for (let i = 0; i < event.results.length; i++) { | ||
| const result = event.results[i]; | ||
| const alt = result?.[0]; | ||
| if (alt) { | ||
| transcript += alt.transcript; | ||
| } | ||
| } | ||
| const separator = baseTextRef.current ? " " : ""; | ||
| onTranscript(baseTextRef.current + separator + transcript); | ||
| }; | ||
|
|
There was a problem hiding this comment.
Stale closure bug: The callbacks onTranscript, onError, and getCurrentText are captured in startRecording's useCallback closure, but the recognition.onresult and recognition.onend event handlers are only set once when startRecording runs. If these callbacks change identity on re-render (which they do, since they're inline arrow functions at the call site), the event handlers will use stale versions.
Most critically, getCurrentText on line 81 (inside onend) will return a stale currentAnswer value — the value at the time recording started, not the current value after transcription updates. This means when Chrome auto-restarts recognition after silence, baseTextRef.current will be set to the wrong text, causing transcript duplication or data loss.
The existing useDebounce hook in this codebase (see frontend/src/hooks/useDebounce.ts:5-6) solves this exact problem by using the ref pattern: store callbacks in refs and update them on every render, so that event handlers always invoke the latest callback. The same pattern should be applied here — store onTranscript, onError, and getCurrentText in refs (e.g., const onTranscriptRef = useRef(onTranscript); onTranscriptRef.current = onTranscript;) and call through the ref inside event handlers.
- Updated the `useSpeechRecognition` hook to store callbacks in refs, ensuring the latest versions are used in event handlers. - Improved performance by reducing dependencies in the `useEffect` hook, enhancing the overall efficiency of the speech recognition functionality.
useSpeechRecognitionto manage speech recognition features, including start/stop recording and handling transcripts.