AI Evaluation as a Service Qorus-NTT DATA Innovation in Insurance Awards 2026
United KingdomCategory
GenAI Innovation of the YearKeyword
AI & Generative AI, Health insurance, Agentic AIBusiness Line
Health InsuranceDistribution Channel
Partners
Innovation presentation
AI Evaluation as a Service (AI EaaS) is UnitedHealth Group's answer to one of healthcare AI's most urgent unsolved problems: how do you know your AI systems are still performing safely, accurately, and fairly — not just at launch, but every single day, across millions of transactions?
Concept and objectives. AI EaaS is a living evaluation layer wrapped around every agentic AI system across UHG's health insurance operations. Rather than testing AI once before deployment and hoping for the best, it continuously monitors outputs across five business units — prior authorization, clinical summarization, claims processing, member navigation, and provider credentialing. Its objectives are threefold: detect errors before they harm patients, reduce the unsustainable burden of manual human review, and give regulators and executives real-time, auditable transparency into AI behavior.
Reasons behind. UHG's AI agents process over 15 million transactions per day. A single systematic error — a wrongful denial pattern, a medication allergy omitted from a clinical summary, a sanctioned provider missed in credentialing — can affect thousands of members and expose the organization to billions in liability. Existing approaches (static test sets, LLM-as-a-Judge scoring, periodic manual audits covering less than 2% of transactions) were simply not built for this scale or consequence.
State of competition. No comparable system exists in the market. The industry default remains LLM-as-a-Judge evaluation, which published research shows produces 15–30% inter-session variance on identical inputs and cannot learn from real-world deployment feedback. AI EaaS's reinforcement learning evaluator achieves Cohen's κ of 0.83 versus 0.54 for LLM judges — a meaningful and measurable advantage. Three patent applications have been filed covering the reward function architecture, dual data stream integration, and federated evaluation methodology.
Sources of inspiration. The core insight came from observing how human clinical quality programs actually work: not through periodic snapshots, but through continuous feedback loops between frontline operators and expert reviewers. AI EaaS formalizes this into a reinforcement learning framework — operator edits provide high-volume near-real-time signal; expert audit batches provide precision ground truth. The combination, pioneered in collaboration with MIT, the University of Oxford, CSIRO, and CHAI, produces an evaluator that genuinely learns what "good" looks like in each clinical context.
Departments involved. The initiative spans UnitedHealthcare (prior authorization), Optum Health (clinical summarization), OptumInsight (claims), UHC Member Services (care navigation), and Optum credentialing — alongside UHG's AI Ethics Board, legal and regulatory teams, and external partners including MIT, Oxford, CSIRO, the NHS AI Lab, and CHAI, which has adopted AI EaaS as its reference implementation for healthcare AI evaluation standards.
Main results so far. Across 47 million transactions evaluated in 2025: prior authorization accuracy rose from 71.3% to 93.7%; safety-critical error detection improved from 61.2% to 91.4%; manual audit burden fell from 100% to 27% of transactions; mean time to identify a systematic AI failure dropped from 47 days to 6 hours; and member appeal rates fell from 14.2% to 5.8%. Total documented financial impact: $2.24 billion annually, with a 4.2x ROI in the first 18 months — while 49 million members receive meaningfully safer AI-assisted care decisions.
Interested in learning more?
Qorus has a library of almost 8,000 innovation case studies across critical areas like customer experience, sustainability, marketing & distribution and more that can be used to inform your decision-making.