50
Fix WebSocket Latency & VAD Issues in Real-Time Voice Agent
UpworkILNot specifiedexpert
TypeScriptPostgreSQLNode.jsTwilio
About the Project
We have a production voice AI agent built in TypeScript / Node.js that handles inbound phone calls.
The voice interaction is in Hebrew, but you do not need to work on language, prompts, or model tuning — your role is strictly to fix the real-time audio pipeline, VAD behavior, and latency issues.
Architecture:
Twilio Phone Number
→ Fastify Server (Railway)
→ Google Gemini Live API
→ Audio response back to caller
The server receives inbound calls via Twilio, plays a Hebrew greeting using TwiML Say, then opens a Twilio Media Stream (WebSocket) that bridges the caller's audio to the Gemini Live API for real-time conversational AI.
Audio is transcoded between mu-law 8kHz (Twilio) and PCM 16kHz / 24kHz (Gemini) using FFmpeg child processes.
Twilio works correctly. Gemini works correctly.
The issues are in our server-side orchestration logic, specifically around connection timing and voice activity detection (VAD).
Tech Stack
• TypeScript / Node.js
• Fastify (HTTP + WebSocket via @fastify/websocket)
• Twilio Voice + Twilio Media Streams (mu-law 8kHz bidirectional audio)
• Google Gemini Live API (@google/genai with live.connect())
• FFmpeg (ffmpeg-static for mu-law ↔ PCM pipelines)
• Redis (call state management)
• Prisma + PostgreSQL (call/transcript persistence)
• Railway (deployment)
• Pino (structured logging)
Unlock AI intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/moClient
Spent: $5,113.25Rating: 5.0Verified