First 5 customers locked at AED 725/mo forever · 3 spots left

5/4/2026 · engineering

Why Speechmatics ar_en is the only Arabic ASR that actually works on Vapi

When we were building Rannly we had one non-negotiable: the AI had to understand Khaleeji and Egyptian Arabic at production quality. Most of our prospective customers don't speak fluent English — and any AI receptionist that mishears their dialect is worse than voicemail because it fakes confidence. Vapi (the orchestration layer we use) supports six ASR providers. Three of them claim Arabic support. We tested all three. Here's what actually happened. ## Test setup - Test phrase: *"السلام عليكم، التكييف بتاعي خربان من ٣ ساعات والبيت بقى فرن"* (Egyptian dialect, ~5 seconds) - 20 calls per provider, half Khaleeji speakers, half Egyptian - Measured: word error rate (WER), dialect identification, end-to-end latency ## Deepgram nova-3 with `language: "multi"` **Result: doesn't transcribe Arabic at all.** Deepgram's multilingual model on Vapi does not include Arabic in its trained languages — Arabic audio gets routed to **Hindi**, which produces transliterated nonsense like *"नस्लाम अलैकुम तकीफ खुर्बान"*. The downstream gpt-4o received this Hindi-script gibberish, couldn't make sense of it, and defaulted to English replies. Customer hung up confused, twice. WER: 100% (effectively zero comprehension). ## Azure Speech `ar-EG-SalmaNeural` **Result: fast, accurate transcription** — but you're stuck on Microsoft's full Cognitive Services billing model. Azure's Egyptian Arabic is the gold standard for ASR accuracy (~92% WER on our test set). The catch: Vapi's Azure integration requires a tenant migration that took us 3 hours of debugging account permissions before we could even create the Speech resource. Once it works, it's great — but the friction made it a non-starter for a one-person shop. ## ElevenLabs Scribe v2 realtime `ar` **Result: works, but dialect handling is weak.** Scribe v2 is ElevenLabs' answer to streaming ASR. For pure MSA (Modern Standard Arabic, news-anchor register) it's solid. For dialect — Egyptian *"بقى فرن"* (became an oven) gets transcribed as *"بكى فرن"* (cried an oven), which is grammatically nonsense but phonetically close. Acceptable for an MSA-heavy business. Not acceptable for SMBs whose customers speak street dialect. ## Speechmatics `ar_en` enhanced **Result: this is the one.** Speechmatics is the only provider that ships a dedicated **bilingual `ar_en`** model designed for code-switching speakers — exactly what every Gulf SMB caller does (start in Arabic, drop English brand names like "JBR" or "ATM" mid-sentence). - WER: ~6% on Egyptian, ~9% on Khaleeji - Code-switches in real time without losing the thread - `operatingPoint: "enhanced"` adds dialect post-processing - EU region (Frankfurt) keeps latency under 200ms for Gulf callers Cost: comparable to Deepgram. Setup: a free trial key, register as a Vapi credential, done in 10 minutes. ## What this means for Rannly Every Rannly customer's calls run through Speechmatics ar_en. The hand-tuned blueprint that ships on every per-customer assistant is: ```typescript transcriber: { provider: "speechmatics", model: "default", language: "ar_en", operatingPoint: "enhanced", region: "eu", maxDelay: 1500, } ``` If you're building an Arabic voice product on Vapi, save yourself the week. Skip Deepgram and Azure. Go straight to Speechmatics. — Marwan