Problem
When the bot tries to download a voice file from api.telegram.org and the first TLS handshake fails with ECONNRESET (transient network blip), the user sees Failed to transcribe voice message and loses the audio. There is no retry.
Two things are wrong:
downloadFile has zero retry logic — a single transient network failure kills the flow
- The user-facing error incorrectly blames transcription, when the real failure happened before whisper was ever invoked
Evidence
bot/src/voice.ts:24-31:
export async function downloadFile(url: string, destPath: string): Promise<void> {
const resp = await fetch(url);
if (!resp.ok) {
throw new Error(`Download failed: HTTP ${resp.status}`);
}
const buffer = Buffer.from(await resp.arrayBuffer());
await writeFile(destPath, buffer, { mode: 0o600 });
}
Single fetch(), no retry, no special handling for network-layer failures (which throw before resp.ok is reached).
Stderr log sample (redacted):
ERROR [telegram-bot] Voice transcription error for chat <redacted-chat-id>: [TypeError: fetch failed] {
[cause]: Error: Client network socket disconnected before secure TLS connection was established
at TLSSocket.onConnectEnd (node:internal/tls/wrap:1708:19)
...
code: 'ECONNRESET',
host: 'api.telegram.org',
port: 443
}
}
Frequency on an operator machine: 25 occurrences over 30 days (sporadic, ~0–3 per day). Users experience this as a random "voice recognition broke" — but nothing about voice/whisper actually failed.
Root Cause
Two root causes stacked:
-
No retry on transient fetch failure. The cause chain is fetch failed → ECONNRESET during TLS setup — a network-layer fault, not an HTTP status. It's exactly the class of error that benefits from a small retry with backoff. downloadFile catches neither.
-
Error message lies about which layer failed. The handler that wraps this failure logs and reports "Voice transcription error" + "Failed to transcribe voice message", regardless of whether the failure was in downloadFile, convertToWav (ffmpeg), or transcribeAudio (whisper). The user can't tell the difference — they send the voice again, it might download fine this time, and the "bug" feels random.
Suggested Direction
- Add bounded retry with backoff to
downloadFile (e.g. 3 attempts, 500ms / 1s / 2s) scoped to network-layer errors — not HTTP errors, which indicate real problems (404, 401) that won't be fixed by retry.
- Distinguish the user-facing message by failure stage: download failure vs conversion failure vs transcription failure. A download failure is a network issue the user can retry in a few seconds; a transcription failure is an audio-content or model problem that won't benefit from resending.
- Consider whether other
fetch() calls to api.telegram.org have the same vulnerability (photo/document download in the same module pattern).
Problem
When the bot tries to download a voice file from
api.telegram.organd the first TLS handshake fails with ECONNRESET (transient network blip), the user seesFailed to transcribe voice messageand loses the audio. There is no retry.Two things are wrong:
downloadFilehas zero retry logic — a single transient network failure kills the flowEvidence
bot/src/voice.ts:24-31:Single
fetch(), no retry, no special handling for network-layer failures (which throw beforeresp.okis reached).Stderr log sample (redacted):
Frequency on an operator machine: 25 occurrences over 30 days (sporadic, ~0–3 per day). Users experience this as a random "voice recognition broke" — but nothing about voice/whisper actually failed.
Root Cause
Two root causes stacked:
No retry on transient fetch failure. The cause chain is
fetch failed→ECONNRESET during TLS setup— a network-layer fault, not an HTTP status. It's exactly the class of error that benefits from a small retry with backoff.downloadFilecatches neither.Error message lies about which layer failed. The handler that wraps this failure logs and reports "Voice transcription error" + "Failed to transcribe voice message", regardless of whether the failure was in
downloadFile,convertToWav(ffmpeg), ortranscribeAudio(whisper). The user can't tell the difference — they send the voice again, it might download fine this time, and the "bug" feels random.Suggested Direction
downloadFile(e.g. 3 attempts, 500ms / 1s / 2s) scoped to network-layer errors — not HTTP errors, which indicate real problems (404, 401) that won't be fixed by retry.fetch()calls toapi.telegram.orghave the same vulnerability (photo/document download in the same module pattern).