Skip to content

Add FunASR speech2text plugin#3281

Open
LauraGPT wants to merge 1 commit into
langgenius:mainfrom
LauraGPT:add-funasr-speech2text-plugin
Open

Add FunASR speech2text plugin#3281
LauraGPT wants to merge 1 commit into
langgenius:mainfrom
LauraGPT:add-funasr-speech2text-plugin

Conversation

@LauraGPT

Copy link
Copy Markdown

Summary

Add FunASR speech-to-text plugin for Dify.

FunASR is an open-source ASR toolkit from Alibaba DAMO Academy (17.7K+ GitHub stars, Apache 2.0).

Why FunASR?

  • 170x faster than Whisper — Paraformer RTF 0.006, SenseVoice RTF 0.007
  • 50+ languages with emotion detection (SenseVoice)
  • Chinese production with built-in punctuation (Paraformer)
  • Speaker diarization via CAM++ (no pyannote needed)
  • OpenAI-compatible API

Models included

Model Languages Best for
sensevoice 50+ General multilingual, emotion detection
paraformer zh + mixed Chinese production with punctuation
paraformer-en en English
fun-asr-nano 31 Encoder+LLM architecture

Setup

Deploy FunASR server:

pip install funasr vllm fastapi uvicorn
funasr-server --device cuda --host 0.0.0.0 --port 8000

Configure plugin with server URL (e.g. http://your-server:8000).

Implementation

  • Extends OAICompatSpeech2TextModel (FunASR exposes OpenAI-compatible API)
  • Predefined models + customizable model support
  • Minimal dependencies: dify_plugin, httpx, openai

Links:

Test plan

  • Deploy FunASR server locally
  • Configure plugin in Dify with server URL
  • Test speech2text with all 4 predefined models
  • Test customizable model with custom model name
  • Verify error handling for unreachable server

- Open-source ASR from Alibaba DAMO Academy (17K+ stars)
- 170x faster than Whisper (RTF 0.006-0.007)
- 4 models: sensevoice (50+ langs), paraformer (zh), paraformer-en, fun-asr-nano
- OpenAI-compatible API endpoint
- Speaker diarization support (CAM++)
@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. enhancement New feature or request labels Jun 12, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the FunASR Speech Recognition Plugin, enabling open-source speech-to-text capabilities via an OpenAI-compatible API. The feedback focuses on improving robustness and compatibility: implementing direct reachability checks in validate_credentials and validate_provider_credentials to avoid failures on servers that do not support /v1/models; handling missing endpoint_url and empty api_key values properly in _compat_credentials; forwarding the user parameter in _invoke; and populating model_properties with default file upload limits and supported extensions in get_customizable_model_schema.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +13 to +15
def validate_credentials(self, model: str, credentials: dict) -> None:
compat = self._compat_credentials(credentials)
super().validate_credentials(model, compat)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The default OAICompatSpeech2TextModel.validate_credentials typically calls the /v1/models endpoint to verify credentials. However, FunASR servers (and many lightweight OpenAI-compatible ASR servers) often do not implement /v1/models, which would cause validation to fail with a 404 or 405 error and prevent users from adding the model. Since httpx is already included in the dependencies, we can perform a direct reachability check on the endpoint instead.

    def validate_credentials(self, model: str, credentials: dict) -> None:
        compat = self._compat_credentials(credentials)
        import httpx
        try:
            # FunASR server may not support /v1/models, so we verify reachability of the endpoint directly.
            httpx.get(compat["endpoint_url"], timeout=5.0)
        except httpx.RequestError as e:
            raise ValueError(f"Failed to connect to FunASR server at {compat['endpoint_url']}: {e}")

Comment on lines +27 to +33
@classmethod
def _compat_credentials(cls, credentials: dict) -> dict:
credentials = credentials.copy()
base = credentials["endpoint_url"].rstrip("/").removesuffix("/v1")
credentials["endpoint_url"] = f"{base}/v1"
credentials.setdefault("api_key", "no-key")
return credentials

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are two potential issues in _compat_credentials:

  1. If endpoint_url is missing or empty, accessing credentials["endpoint_url"] will raise a KeyError or AttributeError.
  2. If the user leaves the optional API Key field blank in the UI, credentials will contain "api_key": "" (an empty string). Using setdefault will not replace this empty string with "no-key" because the key already exists in the dictionary. This can cause downstream authentication failures. We should explicitly check for falsy/empty values.
Suggested change
@classmethod
def _compat_credentials(cls, credentials: dict) -> dict:
credentials = credentials.copy()
base = credentials["endpoint_url"].rstrip("/").removesuffix("/v1")
credentials["endpoint_url"] = f"{base}/v1"
credentials.setdefault("api_key", "no-key")
return credentials
@classmethod
def _compat_credentials(cls, credentials: dict) -> dict:
credentials = credentials.copy()
endpoint_url = credentials.get("endpoint_url")
if not endpoint_url:
raise ValueError("endpoint_url is required")
base = endpoint_url.rstrip("/").removesuffix("/v1")
credentials["endpoint_url"] = f"{base}/v1"
if not credentials.get("api_key"):
credentials["api_key"] = "no-key"
return credentials

Comment on lines +9 to +11
def _invoke(self, model: str, credentials: dict, file: IO[bytes], user: Optional[str] = None) -> str:
compat = self._compat_credentials(credentials)
return super()._invoke(model, compat, file)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The user parameter is received in _invoke but is not forwarded to the superclass method call. Forwarding user ensures that any user-specific tracking, logging, or rate-limiting in the base class functions correctly.

Suggested change
def _invoke(self, model: str, credentials: dict, file: IO[bytes], user: Optional[str] = None) -> str:
compat = self._compat_credentials(credentials)
return super()._invoke(model, compat, file)
def _invoke(self, model: str, credentials: dict, file: IO[bytes], user: Optional[str] = None) -> str:
compat = self._compat_credentials(credentials)
return super()._invoke(model, compat, file, user)

Comment on lines +17 to +25
def get_customizable_model_schema(self, model: str, credentials: dict) -> Optional[AIModelEntity]:
return AIModelEntity(
model=model,
label=I18nObject(en_us=model, zh_hans=model),
fetch_from=FetchFrom.CUSTOMIZABLE_MODEL,
model_type=ModelType.SPEECH2TEXT,
model_properties={},
parameter_rules=[],
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When adding a customizable model, leaving model_properties empty ({}) means the custom model won't have any defined file upload limits or supported file extensions. This can cause errors or unexpected behavior in Dify when users attempt to upload audio files for transcription. We should populate model_properties with the same default values used by the predefined models.

    def get_customizable_model_schema(self, model: str, credentials: dict) -> Optional[AIModelEntity]:
        return AIModelEntity(
            model=model,
            label=I18nObject(en_us=model, zh_hans=model),
            fetch_from=FetchFrom.CUSTOMIZABLE_MODEL,
            model_type=ModelType.SPEECH2TEXT,
            model_properties={
                "file_upload_limit": 1,
                "supported_file_extensions": [
                    "flac",
                    "mp3",
                    "mp4",
                    "mpeg",
                    "mpga",
                    "m4a",
                    "ogg",
                    "wav",
                    "webm",
                ],
            },
            parameter_rules=[],
        )

Comment on lines +5 to +6
def validate_provider_credentials(self, credentials: dict) -> None:
pass

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Leaving validate_provider_credentials as a pass means that any invalid or unreachable URL configured at the provider level will be accepted without immediate feedback. Implementing a quick reachability check using httpx ensures that configuration errors are caught early.

Suggested change
def validate_provider_credentials(self, credentials: dict) -> None:
pass
def validate_provider_credentials(self, credentials: dict) -> None:
endpoint_url = credentials.get("endpoint_url")
if not endpoint_url:
raise ValueError("Server URL (endpoint_url) is required")
import httpx
try:
httpx.get(endpoint_url, timeout=5.0)
except httpx.RequestError as e:
raise ValueError(f"Failed to connect to FunASR server at {endpoint_url}: {e}")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant