Replace Google Speech APIs with Gnani.ai (ASR, TTS, Translation, Voice Biometrics)

UpworkINNot specifiedintermediateScore: 59
REST APIAPI IntegrationPythonNext.jsSpeech APIAudio processing
Project Overview We are building Vani, an AI-enabled multilingual audio interview platform as part of the MySkillsPlus ecosystem (skills profiling and employment platform). Currently, we use Google APIs for: Speech-to-Text (STT) Text-to-Speech (TTS) Translation We now want to fully replace Google APIs with Gnani.ai services, including: ASR (Speech to Text) TTS (Text to Speech) Machine Translation Voice Biometrics (for speaker verification / fraud detection) This project requires careful backend integration, audio handling, API abstraction design, and production-grade reliability. Current Flow (Simplified) LLM generates interview questions in English. Question is translated into user-selected language. Text is converted to speech and played. User answers verbally. Audio is: ..Stored ..Transcribed ..Displayed as text At registration, user reads a consent paragraph aloud. ..This audio is used to enroll a voice signature. ..During interviews, speaker verification checks for fraud. All speech services currently use Google APIs and must be replaced with Gnani.ai. Scope of Work 1. Speech-to-Text (Gnani ASR) a. Integrate Gnani ASR for: ....Consent enrollment audio ....Interview answers b. Handle: ....Language selection ....Audio format compliance (16kHz mono etc.) ....Streaming or chunk upload (as supported by Gnani) c. Return: ....Transcript ....Confidence (if available) 2.Text-to-Speech (Gnani TTS) Convert interview questions to speech Support multiple Indian languages Allow voice selection (if available) Cache generated question audio where appropriate 3. Translation (Gnani) English → Selected language (for question display) Selected language → English (optional, for LLM scoring consistency) Must be implemented through a clean provider interface 4. Voice Biometrics (Gnani Armour365) a. Enrollment: ....Capture consent reading (minimum 7 seconds) ....Create voice profile b. Verification: ....During each answer, verify speaker matches enrolled profile ....Store match result, decision code, score 5. Clean Architecture Requirement (Important) You must implement a Speech Services Abstraction Layer, such as: ....TTSProvider ....STTProvider ....TranslationProvider ....VoiceBioProvider Gnani will be the concrete implementation. This ensures we can swap providers in future without rewriting core logic. Deliverables Milestone 1 – ASR Integration Gnani STT fully functional Audio transcription displayed correctly Error handling + retries implemented Milestone 2 – TTS Integration Questions synthesized in selected language Audio playback stable Caching implemented Milestone 3 – Translation Integration Bidirectional translation working Language selection dynamic Clean interface abstraction Milestone 4 – Voice Biometrics Enrollment flow complete Verification per answer working Results stored in database Milestone 5 – Hardening & Deployment Logging (API call latency, errors) Secure key management Documentation Removal of all Google Speech API calls
View Original Listing
Unlock AI Intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/mo

Client

Spent: $35,512.06Rating: 4.9Verified