Replace Google Speech APIs with Gnani.ai (ASR, TTS, Translation, Voice Biometrics)
UpworkINNot specifiedintermediateScore: 59
REST APIAPI IntegrationPythonNext.jsSpeech APIAudio processing
Project Overview
We are building Vani, an AI-enabled multilingual audio interview platform as part of the MySkillsPlus ecosystem (skills profiling and employment platform).
Currently, we use Google APIs for:
Speech-to-Text (STT)
Text-to-Speech (TTS)
Translation
We now want to fully replace Google APIs with Gnani.ai services, including:
ASR (Speech to Text)
TTS (Text to Speech)
Machine Translation
Voice Biometrics (for speaker verification / fraud detection)
This project requires careful backend integration, audio handling, API abstraction design, and production-grade reliability.
Current Flow (Simplified)
LLM generates interview questions in English.
Question is translated into user-selected language.
Text is converted to speech and played.
User answers verbally.
Audio is:
..Stored
..Transcribed
..Displayed as text
At registration, user reads a consent paragraph aloud.
..This audio is used to enroll a voice signature.
..During interviews, speaker verification checks for fraud.
All speech services currently use Google APIs and must be replaced with Gnani.ai.
Scope of Work
1. Speech-to-Text (Gnani ASR)
a. Integrate Gnani ASR for:
....Consent enrollment audio
....Interview answers
b. Handle:
....Language selection
....Audio format compliance (16kHz mono etc.)
....Streaming or chunk upload (as supported by Gnani)
c. Return:
....Transcript
....Confidence (if available)
2.Text-to-Speech (Gnani TTS)
Convert interview questions to speech
Support multiple Indian languages
Allow voice selection (if available)
Cache generated question audio where appropriate
3. Translation (Gnani)
English → Selected language (for question display)
Selected language → English (optional, for LLM scoring consistency)
Must be implemented through a clean provider interface
4. Voice Biometrics (Gnani Armour365)
a. Enrollment:
....Capture consent reading (minimum 7 seconds)
....Create voice profile
b. Verification:
....During each answer, verify speaker matches enrolled profile
....Store match result, decision code, score
5. Clean Architecture Requirement (Important)
You must implement a Speech Services Abstraction Layer, such as:
....TTSProvider
....STTProvider
....TranslationProvider
....VoiceBioProvider
Gnani will be the concrete implementation.
This ensures we can swap providers in future without rewriting core logic.
Deliverables
Milestone 1 – ASR Integration
Gnani STT fully functional
Audio transcription displayed correctly
Error handling + retries implemented
Milestone 2 – TTS Integration
Questions synthesized in selected language
Audio playback stable
Caching implemented
Milestone 3 – Translation Integration
Bidirectional translation working
Language selection dynamic
Clean interface abstraction
Milestone 4 – Voice Biometrics
Enrollment flow complete
Verification per answer working
Results stored in database
Milestone 5 – Hardening & Deployment
Logging (API call latency, errors)
Secure key management
Documentation
Removal of all Google Speech API calls
Unlock AI Intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/moClient
Spent: $35,512.06Rating: 4.9Verified