50

Fix WebSocket Latency & VAD Issues in Real-Time Voice Agent

UpworkILNot specifiedexpert
TypeScriptPostgreSQLNode.jsTwilio
About the Project We have a production voice AI agent built in TypeScript / Node.js that handles inbound phone calls. The voice interaction is in Hebrew, but you do not need to work on language, prompts, or model tuning — your role is strictly to fix the real-time audio pipeline, VAD behavior, and latency issues. Architecture: Twilio Phone Number → Fastify Server (Railway) → Google Gemini Live API → Audio response back to caller The server receives inbound calls via Twilio, plays a Hebrew greeting using TwiML Say, then opens a Twilio Media Stream (WebSocket) that bridges the caller's audio to the Gemini Live API for real-time conversational AI. Audio is transcoded between mu-law 8kHz (Twilio) and PCM 16kHz / 24kHz (Gemini) using FFmpeg child processes. Twilio works correctly. Gemini works correctly. The issues are in our server-side orchestration logic, specifically around connection timing and voice activity detection (VAD). Tech Stack • TypeScript / Node.js • Fastify (HTTP + WebSocket via @fastify/websocket) • Twilio Voice + Twilio Media Streams (mu-law 8kHz bidirectional audio) • Google Gemini Live API (@google/genai with live.connect()) • FFmpeg (ffmpeg-static for mu-law ↔ PCM pipelines) • Redis (call state management) • Prisma + PostgreSQL (call/transcript persistence) • Railway (deployment) • Pino (structured logging)
View Original Listing
Unlock AI intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/mo

Client

Spent: $5,113.25Rating: 5.0Verified