Looking for: AI DATA OPS LEAD
UpworkGBNot specifiedexpertScore: 28
ai data opsQA EngineeringData ScienceData Engineeringcloud & infrastructure data quality & reliabilityAnalyticsdata governanceCross Functional Team Leadership
AI Data Ops Lead / AI QA Lead (LLM Reliability & Validation)
Remote | Contract | Senior Level
ProoflineAI (Early-Stage, High-Ownership Environment)
⸻
About ProoflineAI
ProoflineAI builds AI reliability infrastructure.
We help companies move from experimental AI to enterprise-trustworthy AI through structured validation, measurable reliability scoring, and continuous monitoring.
We operate through three structured sprint formats:
• Express Sprint (2 weeks) – Rapid reliability diagnosis
• 30-Day Sprint – Full audit + hardening implementation
• Enterprise Sprint (60–90 days) – End-to-end validation + monitoring infrastructure
We are hiring a senior AI Data Ops / QA Lead to own the reliability layer across these sprints.
This is not model training.
This is AI reliability engineering.
⸻
Your Role
You will design and implement structured validation systems for LLM and RAG-based AI products.
You will define how AI performance is measured, stress-tested, and monitored.
You will build the layer that makes AI defensible.
⸻
What You’ll Do Across Sprints
Express Sprint
• Rapid hallucination detection testing
• Reliability baseline scoring
• Risk mapping & failure mode analysis
• Executive summary report
30-Day Sprint
• Design structured evaluation datasets
• Build automated validation pipelines
• Define evaluation metrics (precision, recall, F1, grounding score)
• Implement guardrails & prompt testing
• Reduce hallucination rates measurably
Enterprise Sprint
• Build continuous monitoring framework
• Implement drift detection
• Design reporting dashboards
• Create audit-ready documentation
• Establish reliability governance process
Sprint Compensation (Competitive EU Rates)
Express Sprint: €2,200 – €4,200
30-Day Sprint: €4,000 – €7,500
Enterprise Sprint: €8,000 – €15,000
(Final rate depends on region and seniority.)
⸻
Required Experience
• 5+ years in Data Ops, ML Ops, QA Engineering, or AI Validation
• Hands-on experience testing LLM systems (OpenAI, Anthropic, open-source models)
• Strong Python skills
• Experience building evaluation pipelines
• Understanding of hallucination, bias, model drift, and RAG failure modes
• Experience working in production environments
⸻
Highly Valued
• Experience in regulated industries (FinTech, Health, Legal)
• AI governance / risk frameworks familiarity
• Prompt evaluation & adversarial testing experience
• Monitoring dashboard implementation
Please provide LinkedIn profile link
⸻
Who You Are
• Structured systems thinker
• Detail-obsessed about accuracy
• Comfortable defining frameworks from scratch
• Strong communicator in English
• Comfortable in early-stage environments with high ownership
⸻
Engagement
• Sprint-based engagement (Express, 30-Day, Enterprise)
• Potential long-term collaboration
• Remote
• High ownership, low bureaucracy
⸻
To Apply
Please include:
1. Summary of your LLM validation experience
2. Example of evaluation framework you’ve built
3. GitHub or portfolio (if available)
4. Your timezone
We are looking for operators who care about building measurable AI trust at scale.
Unlock AI Intelligence, score breakdowns, and real-time alerts
Upgrade to Pro — $29.99/mo