fix(model): detect audio format from magic bytes instead of hardcoding .mp3#348
fix(model): detect audio format from magic bytes instead of hardcoding .mp3#348Kota-Maeda wants to merge 1 commit into
Conversation
…g .mp3 invoke_speech_to_text wrote incoming audio to a NamedTemporaryFile with a hardcoded .mp3 suffix, so non-mp3 content (m4a/AAC, wav, ogg, flac, webm) was labeled .mp3 and rejected by OpenAI/Azure Whisper with "Invalid file format". The model-invoke payload carries only raw bytes (no filename), so detect the container from the leading magic bytes and pick the matching suffix, falling back to .mp3 for unknown content (backward compatible). Adds unit tests for the detection helper and an end-to-end test asserting the dispatch labels the temp file by format and writes the full payload.
There was a problem hiding this comment.
Code Review
This pull request introduces dynamic audio format detection for speech-to-text dispatching by sniffing the leading magic bytes of raw audio payloads (supporting WAV, FLAC, OGG, M4A, and WebM, with a fallback to MP3). It also adds comprehensive unit and end-to-end tests for this detection. Feedback was provided regarding a potential PermissionError on Windows when reopening a tempfile.NamedTemporaryFile while it is still open, suggesting a cross-platform workaround.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| audio_bytes = binascii.unhexlify(data.file) | ||
| suffix = _detect_audio_suffix(audio_bytes[:16]) | ||
| with tempfile.NamedTemporaryFile(suffix=suffix, mode="wb", delete=True) as temp: | ||
| temp.write(audio_bytes) | ||
| temp.flush() |
There was a problem hiding this comment.
On Windows platforms, attempting to open a file created by tempfile.NamedTemporaryFile a second time (via pathlib.Path(temp.name).open("rb")) while the temporary file object is still open will raise a PermissionError.
To ensure cross-platform compatibility (especially for Windows users/developers), we can use a custom context manager wrapper that leverages tempfile.TemporaryDirectory to manage the lifecycle of the temporary file, writing and closing the file immediately so it can be safely opened again.
class WinSafeTempFile:
def __init__(self, suffix: str) -> None:
self._dir = tempfile.TemporaryDirectory()
self.name = str(pathlib.Path(self._dir.name) / f"temp{suffix}")
def __enter__(self) -> "WinSafeTempFile":
return self
def __exit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
self._dir.cleanup()
def write(self, b: bytes) -> None:
pathlib.Path(self.name).write_bytes(b)
def flush(self) -> None:
pass
audio_bytes = binascii.unhexlify(data.file)
suffix = _detect_audio_suffix(audio_bytes[:16])
with WinSafeTempFile(suffix=suffix) as temp:
temp.write(audio_bytes)
temp.flush()
Closes #347
Summary
invoke_speech_to_textwrote the incoming audio to aNamedTemporaryFile(suffix=".mp3")regardless of the real audio container.Since OpenAI-compatible / Azure OpenAI Whisper endpoints infer the format from
the multipart filename extension, non-mp3 content (m4a/AAC, wav, ogg, flac,
webm) was mislabeled as
.mp3and rejected withInvalid file format, whichwrongly blames the user's input file.
The model-invoke payload carries only raw bytes (no filename), so this detects
the container from the leading magic bytes and picks the correct suffix
(wav/flac/ogg/m4a/webm), falling back to
.mp3for unknown/undetectablecontent. This fixes transcription for every speech2text provider that goes
through this shared dispatch.
Changes
_detect_audio_suffix(header)that sniffs the container from magic bytes.invoke_speech_to_textinstead of the hardcoded.mp3suffix(unhexlify the payload once and reuse it).
dispatch labels the temp file by format and writes the full payload.
Pull Request Checklist
Compatibility Check
README.mdREADME.mdREADME.md— N/A, not a breaking changeREADME.md— N/A, this is an SDK fix with no plugin version inREADME.mdAvailable Checks
just buildhas passedNote: As an external contributor I do not have permission to self-assign this PR or the linked issue (#347); please assign as appropriate.