VAP (Visual Autonomous Pilot): The Next-Gen Interaction Standard for the AI Era Qorus-NTT DATA Innovation in Insurance Awards 2026

Submitted by

Ping An

Premium
02/03/2026 Insurance Innovation
VAP is more than just a smart assistant—it represents the 'L5 autonomous driving system' in mobile application interaction and serves as the universal standard protocol for next-generation human-machine interaction.
Innovation details
Country
China
Category
Operations & Workforce Excellence
Keyword
Operational excellence & efficiency, AI & Generative AI, Innovation, HR & New ways of working, Automation
Business Line
Life Insurance
Distribution Channel
Online / Direct

Innovation presentation

1. Concepts and Objectives

(1). Concept: Redefining the human-machine boundary and establishing the IUI (Intent User Interface) industry standard.

(2). Objective: We are committed to revolutionizing interaction paradigms. Current application interfaces remain stuck in the "manual transmission" era, forcing users to memorize complex menu paths and business logic, resulting in severe "digital cognitive overload." VAP aims to advance interaction into the "autonomous driving" era by establishing a new "sensation-perception-action" integrated interaction standard. Without relying on underlying code modifications, it restructures user-digital world connections through an "AI overlay layer," achieving a frictionless "what you think is what you get (Intent-is-Action)" experience.

2. background

(1). The efficiency limits of the GUI paradigm:

Functional Overloading and Path Overload: Modern financial apps have evolved into 'super apps,' integrating multiple ecosystems including native, H5, and mini-programs. A simple 'financial risk assessment' operation typically requires navigating 5-8 interface levels and over 20 clicks. This tree-like GUI navigation structure exhibits exponential complexity as the number of functions increases.

Breaking the cognitive load threshold: According to Miller's Law, human short-term memory can only process 7±2 information units. When confronted with a' functional jungle' of over 2,000 functional points, users are forced to perform extensive visual searches and logical reasoning. Traditional pop-up guidance and red dot prompts not only fail to solve the problem but also create severe visual interference, causing user intent to be diluted in complex menu paths.

Interaction entropy: The core of GUI lies in 'human adaptation to machines'. As business logic becomes fragmented, users must comprehend internal structures (e.g., why 'policies' and 'claims' are placed in separate main channels) to complete tasks. This escalating interaction entropy is the root cause of user attrition and exorbitant customer service costs.

(2). The gap in AI implementation at the 'last mile':

The "Deep Diving" Dilemma of LLMs:While large language models (LLMs)now possess robust inference capabilities, they remain "offline" in mobile environments. Most legacy applications lack open deep APIs, and the dynamic nature of UI rendering logic prevents LLMs from accessing core business logic through traditional hard-coded scripts. This paradox of "advanced cognition, perceptual limitations, and restricted execution" creates the final barrier to AI implementation.

The Semantic-Pixel Divide: While large language models (LLMs)process 'intentional semantics' (e.g., 'I was in a traffic accident and need compensation'), the system's backend only receives' pixel coordinates 'or' control IDs '. VAP's emergence fundamentally establishes a real-time mapping protocol: Eye (Perception) – Through edge-side vision models, it converts on-screen pixels into LLM-comprehensible semantic topologies in real time. Hand (Control) – It precisely translates LLM-generated operational logic into low-level driver-level coordinate clicks and gesture simulations.

(3). The Restructuring Dilemma of Existing Stock System

Global financial institutions have accumulated code assets worth hundreds of billions. Implementing intelligent solutions through traditional API integration faces the 'impossible triangle' of cost (billions), time (years), and risk (system stability).

3. How to achieve

VAP has revolutionized the fundamental principles of mobile interaction by establishing an 'independent cognition-execution loop' decoupled from the host environment, enabling a paradigm shift from manual commands to intent-driven operations.

(1).The perception layer-analogous to "bionic lidar"

Technological Breakthrough: From "Attribute Parsing" to "Full-Modal Visual Understanding". VAP has revolutionized the field by abandoning traditional UI tree (DOM/View Tree) parsing models, pioneering a lightweight visual perception engine for mobile devices. This engine directly processes screen pixel streams, using semantic segmentation and feature extraction to dynamically construct "interface topology maps" in real-time, mirroring the human visual system. It not only identifies static components but also accurately captures subtle changes in dynamic rendering, pop-up windows, and cross-framework (Native/H5/Flutter) hybrid interfaces.

Technical features: Achieves complete decoupling between perception capabilities and the underlying code environment, ensuring high environmental adaptability and recognition robustness even in highly complex and dynamically changing mobile environments.

(2).Cognitive Layer-Analogous to "High-Precision Maps and Navigation"

Technological Breakthrough: The dynamic path planning cognitive layer based on RAG serves as the "central brain". By integrating RAG (Retrieval Augmented Generation) technology, the system transforms unstructured business logic and operational specifications into an "action guide library" accessible to AI. When processing user intent, the AI no longer executes preset scripts but dynamically calculates and generates the optimal operational logic chain in real-time, leveraging the current interface's visual state and knowledge base.

Technical features: This plug-and-play knowledge loading architecture endows the system with exceptional versatility. It resolves the logical consistency challenges in AI processing complex, multi-branch business workflows, achieving a paradigm shift from fixed-path execution to dynamic goal navigation.

(3).Control layer-analogous to a neural superimposed actuator

Technological Breakthrough: The pioneering "Neural Superposition" mapping technology VAP establishes a virtual operation plane between the physical display layer and application interaction layer. By simulating the underlying touch event stream, it achieves highly realistic emulation of complex gestures (e.g., multi-touch, precise curve swiping). Most notably, it features "visual closed-loop calibration": each actuator operation triggers immediate state alignment in the sensing layer, enabling millisecond-level correction and self-healing.

Technical features: By establishing a closed-loop system of "execution-feedback-correction", the control layer fundamentally resolves uncertainties in automated processes. This "neural superposition" model ensures operational precision while fully preserving system safety and user visibility into process monitoring.

4. Main Achievements

(1). Development efficiency revolution: The adaptation and launch cycle for new business scenarios has been reduced from the traditional 14 days to just 16 hours.

(2). Extensive device compatibility: Powered by in-process virtualization technology, VAP seamlessly supports over 300 Android/iOS device models, achieving 98.4% compatibility.

Dimensional reduction in interaction: Simplify complex business processes that typically require 5-10 pages into 1-2 rounds of natural language conversations.

Want to keep reading?

Become a Qorus member to get access to all our innovations

Interested in learning more?

Qorus has a library of almost 8,000 innovation case studies across critical areas like customer experience, sustainability, marketing & distribution and more that can be used to inform your decision-making.

Related Content