fix(model): detect audio format from magic bytes instead of hardcoding .mp3 by Kota-Maeda · Pull Request #348 · langgenius/dify-plugin-sdks

Kota-Maeda · 2026-06-09T04:55:34Z

Closes #347

Summary

invoke_speech_to_text wrote the incoming audio to a
NamedTemporaryFile(suffix=".mp3") regardless of the real audio container.
Since OpenAI-compatible / Azure OpenAI Whisper endpoints infer the format from
the multipart filename extension, non-mp3 content (m4a/AAC, wav, ogg, flac,
webm) was mislabeled as .mp3 and rejected with Invalid file format, which
wrongly blames the user's input file.

The model-invoke payload carries only raw bytes (no filename), so this detects
the container from the leading magic bytes and picks the correct suffix
(wav/flac/ogg/m4a/webm), falling back to .mp3 for unknown/undetectable
content. This fixes transcription for every speech2text provider that goes
through this shared dispatch.

Changes

Add _detect_audio_suffix(header) that sniffs the container from magic bytes.
Use it in invoke_speech_to_text instead of the hardcoded .mp3 suffix
(unhexlify the payload once and reuse it).
Add unit tests for the detection helper and an end-to-end test asserting the
dispatch labels the temp file by format and writes the full payload.

Pull Request Checklist

Compatibility Check

I have checked whether this change affects the backward compatibility of the plugin declared in README.md
I have checked whether this change affects the forward compatibility of the plugin declared in README.md
If this change introduces a breaking change, I have discussed it with the project maintainer and specified the release version in the README.md — N/A, not a breaking change
I have described the compatibility impact and the corresponding version number in the PR description
I have checked whether the plugin version is updated in the README.md — N/A, this is an SDK fix with no plugin version in README.md

Compatibility impact: fully backward compatible. Unknown/undetectable audio
still uses .mp3, identical to the previous behavior; only previously-broken
non-mp3 inputs change. No SDK API, manifest, or schema change.

Available Checks

just build has passed
Relevant documentation has been updated (if necessary) — N/A, no docs/schema change

Note: As an external contributor I do not have permission to self-assign this PR or the linked issue (#347); please assign as appropriate.

…g .mp3 invoke_speech_to_text wrote incoming audio to a NamedTemporaryFile with a hardcoded .mp3 suffix, so non-mp3 content (m4a/AAC, wav, ogg, flac, webm) was labeled .mp3 and rejected by OpenAI/Azure Whisper with "Invalid file format". The model-invoke payload carries only raw bytes (no filename), so detect the container from the leading magic bytes and pick the matching suffix, falling back to .mp3 for unknown content (backward compatible). Adds unit tests for the detection helper and an end-to-end test asserting the dispatch labels the temp file by format and writes the full payload.

gemini-code-assist

Code Review

This pull request introduces dynamic audio format detection for speech-to-text dispatching by sniffing the leading magic bytes of raw audio payloads (supporting WAV, FLAC, OGG, M4A, and WebM, with a fallback to MP3). It also adds comprehensive unit and end-to-end tests for this detection. Feedback was provided regarding a potential PermissionError on Windows when reopening a tempfile.NamedTemporaryFile while it is still open, suggesting a cross-platform workaround.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-09T04:56:31Z

+        audio_bytes = binascii.unhexlify(data.file)
+        suffix = _detect_audio_suffix(audio_bytes[:16])
+        with tempfile.NamedTemporaryFile(suffix=suffix, mode="wb", delete=True) as temp:
+            temp.write(audio_bytes)
            temp.flush()


On Windows platforms, attempting to open a file created by tempfile.NamedTemporaryFile a second time (via pathlib.Path(temp.name).open("rb")) while the temporary file object is still open will raise a PermissionError.

To ensure cross-platform compatibility (especially for Windows users/developers), we can use a custom context manager wrapper that leverages tempfile.TemporaryDirectory to manage the lifecycle of the temporary file, writing and closing the file immediately so it can be safely opened again.

class WinSafeTempFile: def __init__(self, suffix: str) -> None: self._dir = tempfile.TemporaryDirectory() self.name = str(pathlib.Path(self._dir.name) / f"temp{suffix}") def __enter__(self) -> "WinSafeTempFile": return self def __exit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None: self._dir.cleanup() def write(self, b: bytes) -> None: pathlib.Path(self.name).write_bytes(b) def flush(self) -> None: pass audio_bytes = binascii.unhexlify(data.file) suffix = _detect_audio_suffix(audio_bytes[:16]) with WinSafeTempFile(suffix=suffix) as temp: temp.write(audio_bytes) temp.flush()

gemini-code-assist Bot reviewed Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(model): detect audio format from magic bytes instead of hardcoding .mp3#348

fix(model): detect audio format from magic bytes instead of hardcoding .mp3#348
Kota-Maeda wants to merge 1 commit into
langgenius:mainfrom
Kota-Maeda:fix/speech2text-audio-format-detection

Kota-Maeda commented Jun 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kota-Maeda commented Jun 9, 2026

Summary

Changes

Pull Request Checklist

Compatibility Check

Available Checks

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant