Add FunASR speech2text plugin#3281
Conversation
- Open-source ASR from Alibaba DAMO Academy (17K+ stars) - 170x faster than Whisper (RTF 0.006-0.007) - 4 models: sensevoice (50+ langs), paraformer (zh), paraformer-en, fun-asr-nano - OpenAI-compatible API endpoint - Speaker diarization support (CAM++)
There was a problem hiding this comment.
Code Review
This pull request introduces the FunASR Speech Recognition Plugin, enabling open-source speech-to-text capabilities via an OpenAI-compatible API. The feedback focuses on improving robustness and compatibility: implementing direct reachability checks in validate_credentials and validate_provider_credentials to avoid failures on servers that do not support /v1/models; handling missing endpoint_url and empty api_key values properly in _compat_credentials; forwarding the user parameter in _invoke; and populating model_properties with default file upload limits and supported extensions in get_customizable_model_schema.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def validate_credentials(self, model: str, credentials: dict) -> None: | ||
| compat = self._compat_credentials(credentials) | ||
| super().validate_credentials(model, compat) |
There was a problem hiding this comment.
The default OAICompatSpeech2TextModel.validate_credentials typically calls the /v1/models endpoint to verify credentials. However, FunASR servers (and many lightweight OpenAI-compatible ASR servers) often do not implement /v1/models, which would cause validation to fail with a 404 or 405 error and prevent users from adding the model. Since httpx is already included in the dependencies, we can perform a direct reachability check on the endpoint instead.
def validate_credentials(self, model: str, credentials: dict) -> None:
compat = self._compat_credentials(credentials)
import httpx
try:
# FunASR server may not support /v1/models, so we verify reachability of the endpoint directly.
httpx.get(compat["endpoint_url"], timeout=5.0)
except httpx.RequestError as e:
raise ValueError(f"Failed to connect to FunASR server at {compat['endpoint_url']}: {e}")| @classmethod | ||
| def _compat_credentials(cls, credentials: dict) -> dict: | ||
| credentials = credentials.copy() | ||
| base = credentials["endpoint_url"].rstrip("/").removesuffix("/v1") | ||
| credentials["endpoint_url"] = f"{base}/v1" | ||
| credentials.setdefault("api_key", "no-key") | ||
| return credentials |
There was a problem hiding this comment.
There are two potential issues in _compat_credentials:
- If
endpoint_urlis missing or empty, accessingcredentials["endpoint_url"]will raise aKeyErrororAttributeError. - If the user leaves the optional API Key field blank in the UI,
credentialswill contain"api_key": ""(an empty string). Usingsetdefaultwill not replace this empty string with"no-key"because the key already exists in the dictionary. This can cause downstream authentication failures. We should explicitly check for falsy/empty values.
| @classmethod | |
| def _compat_credentials(cls, credentials: dict) -> dict: | |
| credentials = credentials.copy() | |
| base = credentials["endpoint_url"].rstrip("/").removesuffix("/v1") | |
| credentials["endpoint_url"] = f"{base}/v1" | |
| credentials.setdefault("api_key", "no-key") | |
| return credentials | |
| @classmethod | |
| def _compat_credentials(cls, credentials: dict) -> dict: | |
| credentials = credentials.copy() | |
| endpoint_url = credentials.get("endpoint_url") | |
| if not endpoint_url: | |
| raise ValueError("endpoint_url is required") | |
| base = endpoint_url.rstrip("/").removesuffix("/v1") | |
| credentials["endpoint_url"] = f"{base}/v1" | |
| if not credentials.get("api_key"): | |
| credentials["api_key"] = "no-key" | |
| return credentials |
| def _invoke(self, model: str, credentials: dict, file: IO[bytes], user: Optional[str] = None) -> str: | ||
| compat = self._compat_credentials(credentials) | ||
| return super()._invoke(model, compat, file) |
There was a problem hiding this comment.
The user parameter is received in _invoke but is not forwarded to the superclass method call. Forwarding user ensures that any user-specific tracking, logging, or rate-limiting in the base class functions correctly.
| def _invoke(self, model: str, credentials: dict, file: IO[bytes], user: Optional[str] = None) -> str: | |
| compat = self._compat_credentials(credentials) | |
| return super()._invoke(model, compat, file) | |
| def _invoke(self, model: str, credentials: dict, file: IO[bytes], user: Optional[str] = None) -> str: | |
| compat = self._compat_credentials(credentials) | |
| return super()._invoke(model, compat, file, user) |
| def get_customizable_model_schema(self, model: str, credentials: dict) -> Optional[AIModelEntity]: | ||
| return AIModelEntity( | ||
| model=model, | ||
| label=I18nObject(en_us=model, zh_hans=model), | ||
| fetch_from=FetchFrom.CUSTOMIZABLE_MODEL, | ||
| model_type=ModelType.SPEECH2TEXT, | ||
| model_properties={}, | ||
| parameter_rules=[], | ||
| ) |
There was a problem hiding this comment.
When adding a customizable model, leaving model_properties empty ({}) means the custom model won't have any defined file upload limits or supported file extensions. This can cause errors or unexpected behavior in Dify when users attempt to upload audio files for transcription. We should populate model_properties with the same default values used by the predefined models.
def get_customizable_model_schema(self, model: str, credentials: dict) -> Optional[AIModelEntity]:
return AIModelEntity(
model=model,
label=I18nObject(en_us=model, zh_hans=model),
fetch_from=FetchFrom.CUSTOMIZABLE_MODEL,
model_type=ModelType.SPEECH2TEXT,
model_properties={
"file_upload_limit": 1,
"supported_file_extensions": [
"flac",
"mp3",
"mp4",
"mpeg",
"mpga",
"m4a",
"ogg",
"wav",
"webm",
],
},
parameter_rules=[],
)| def validate_provider_credentials(self, credentials: dict) -> None: | ||
| pass |
There was a problem hiding this comment.
Leaving validate_provider_credentials as a pass means that any invalid or unreachable URL configured at the provider level will be accepted without immediate feedback. Implementing a quick reachability check using httpx ensures that configuration errors are caught early.
| def validate_provider_credentials(self, credentials: dict) -> None: | |
| pass | |
| def validate_provider_credentials(self, credentials: dict) -> None: | |
| endpoint_url = credentials.get("endpoint_url") | |
| if not endpoint_url: | |
| raise ValueError("Server URL (endpoint_url) is required") | |
| import httpx | |
| try: | |
| httpx.get(endpoint_url, timeout=5.0) | |
| except httpx.RequestError as e: | |
| raise ValueError(f"Failed to connect to FunASR server at {endpoint_url}: {e}") |
Summary
Add FunASR speech-to-text plugin for Dify.
FunASR is an open-source ASR toolkit from Alibaba DAMO Academy (17.7K+ GitHub stars, Apache 2.0).
Why FunASR?
Models included
Setup
Deploy FunASR server:
Configure plugin with server URL (e.g.
http://your-server:8000).Implementation
OAICompatSpeech2TextModel(FunASR exposes OpenAI-compatible API)Links:
Test plan