Replace Google Speech APIs with Gnani.ai (ASR, TTS, Translation, Voice Biometrics)

UpworkINNot specifiedintermediateScore: 59

REST APIAPI IntegrationPythonNext.jsSpeech APIAudio processing

Project Overview

We are building Vani, an AI-enabled multilingual audio interview platform as part of the MySkillsPlus ecosystem (skills profiling and employment platform).

Currently, we use Google APIs for:

Speech-to-Text (STT)
Text-to-Speech (TTS)
Translation

We now want to fully replace Google APIs with Gnani.ai services, including:
ASR (Speech to Text)
TTS (Text to Speech)
Machine Translation
Voice Biometrics (for speaker verification / fraud detection)

This project requires careful backend integration, audio handling, API abstraction design, and production-grade reliability.

Current Flow (Simplified)
LLM generates interview questions in English.
Question is translated into user-selected language.
Text is converted to speech and played.
User answers verbally.
Audio is:
..Stored
..Transcribed
..Displayed as text
At registration, user reads a consent paragraph aloud.
..This audio is used to enroll a voice signature.
..During interviews, speaker verification checks for fraud.

All speech services currently use Google APIs and must be replaced with Gnani.ai.

Scope of Work
1. Speech-to-Text (Gnani ASR)

a. Integrate Gnani ASR for:
....Consent enrollment audio
....Interview answers
b. Handle:
....Language selection
....Audio format compliance (16kHz mono etc.)
....Streaming or chunk upload (as supported by Gnani)
c. Return:
....Transcript
....Confidence (if available)

2.Text-to-Speech (Gnani TTS)

Convert interview questions to speech
Support multiple Indian languages
Allow voice selection (if available)
Cache generated question audio where appropriate

3. Translation (Gnani)

English → Selected language (for question display)
Selected language → English (optional, for LLM scoring consistency)
Must be implemented through a clean provider interface

4. Voice Biometrics (Gnani Armour365)

a. Enrollment:
....Capture consent reading (minimum 7 seconds)
....Create voice profile
b. Verification:
....During each answer, verify speaker matches enrolled profile
....Store match result, decision code, score

5. Clean Architecture Requirement (Important)

You must implement a Speech Services Abstraction Layer, such as:
....TTSProvider
....STTProvider
....TranslationProvider
....VoiceBioProvider

Gnani will be the concrete implementation.

This ensures we can swap providers in future without rewriting core logic.

Deliverables

Milestone 1 – ASR Integration
Gnani STT fully functional
Audio transcription displayed correctly
Error handling + retries implemented

Milestone 2 – TTS Integration
Questions synthesized in selected language
Audio playback stable
Caching implemented

Milestone 3 – Translation Integration
Bidirectional translation working
Language selection dynamic
Clean interface abstraction

Milestone 4 – Voice Biometrics
Enrollment flow complete
Verification per answer working
Results stored in database

Milestone 5 – Hardening & Deployment
Logging (API call latency, errors)
Secure key management
Documentation
Removal of all Google Speech API calls

View Original Listing

Unlock AI Intelligence, score breakdowns, and real-time alerts

Upgrade to Pro — $29.99/mo

Client

Spent: $35,512.06Rating: 4.9Verified