Skip to content

Improve /talk handling and lazy Whisper load#25

Open
OgeonX-Ai wants to merge 8 commits into
codex/analyze-errors-and-create-resolution-planfrom
codex/fix-json-parsing-issue-in-talk-endpoint-4dfpqv
Open

Improve /talk handling and lazy Whisper load#25
OgeonX-Ai wants to merge 8 commits into
codex/analyze-errors-and-create-resolution-planfrom
codex/fix-json-parsing-issue-in-talk-endpoint-4dfpqv

Conversation

@OgeonX-Ai

Copy link
Copy Markdown
Owner

Summary

  • lazily load Whisper to avoid import-time failures and reuse the cached model for audio transcription
  • harden /talk JSON handling with validation and return clearer errors for invalid payloads or unsupported content types
  • document httpx dependency for exercising FastAPI response handling in tests

Testing

  • python -m compileall backend
  • `python - <<'PY'
    import asyncio
    import types
    import sys

fake_hf = types.ModuleType("huggingface_hub")
class DummyInferenceClient:
def init(self, *args, **kwargs):
pass
def chat_completion(self, *args, **kwargs):
class R:
choices = [ {"message": {"content": "stub reply"}} ]
return R()
fake_hf.InferenceClient = DummyInferenceClient
sys.modules.setdefault("huggingface_hub", fake_hf)

fake_whisper = types.ModuleType("whisper")
fake_whisper.load_model = lambda *args, **kwargs: types.SimpleNamespace(transcribe=lambda path, language=None: {"text": "hello"})
sys.modules.setdefault("whisper", fake_whisper)

from backend import main

main.ask_llm = lambda prompt: f"echo {prompt}"
main.tts_elevenlabs = lambda text, voice_id=None: b"mp3bytes"

class FakeRequest:
def init(self, payload: dict):
self._payload = payload
self.headers = {"content-type": "application/json"}
async def json(self):
return self._payload
async def form(self):
return None

req = FakeRequest({"prompt": "hi there", "voice": "abc"})
response = asyncio.run(main.talk(req))
print(type(response).name, response.status_code, len(response.body))
PY`


Codex Task

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread backend/main.py
Comment on lines +201 to +205
elif "multipart/form-data" in content_type or "application/x-www-form-urlencoded" in content_type:
form = await http_request.form()
audio_file = form.get("audio") if isinstance(form, dict) else None
if audio_file is not None and not isinstance(audio_file, UploadFile):
audio_file = None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Audio form uploads ignored in /talk

The new /talk handler now calls http_request.form() but only extracts audio when the returned object is a plain dict. Starlette returns FormData, so isinstance(form, dict) is false and audio_file remains None even when clients send multipart audio. As a result the audio branch never runs, STT is skipped, and audio requests are rejected with a 400 (“Provide either text prompt or valid audio”) despite valid uploads. Use the FormData mapping interface directly instead of the dict check.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant