This paper analyses Conversational AI multi-agent interoperability frameworks and describes the novel architecture proposed by the Open Voice Interoperability initiative (Linux Foundation AI and DATA), also known briefly as OVON (Open Voice Network). The new approach is illustrated, along with the main components, delineating the key benefits and use cases for deploying standard multi-modal AI agency (or agentic AI) communications. Beginning with Universal APIs based on Natural Language, the framework establishes and enables interoperable interactions among diverse Conversational AI agents, including chatbots, voicebots, videobots, and human agents. Furthermore, a new Discovery specification framework is introduced, designed to efficiently look up agents providing specific services and to obtain accurate information about these services through a standard Manifest publication, accessible via an extended set of Natural Language-based APIs. The main purpose of this contribution is to significantly enhance the capabilities and scalability of AI interactions across various platforms. The novel architecture for interoperable Conversational AI assistants is designed to generalize, being replicable and accessible via open repositories.
翻译:本文分析了对话式人工智能多智能体互操作性框架,并阐述了由开放语音互操作性倡议(Linux基金会AI与DATA,简称OVON)提出的新型架构。文中阐释了这一新方法及其核心组件,阐明了部署标准化多模态人工智能代理(或称智能体AI)通信的关键优势与应用场景。该框架以基于自然语言的通用API为起点,建立并实现了多样化对话式AI智能体(包括聊天机器人、语音机器人、视频机器人及人类代理)间的互操作交互。此外,本文引入了一种新型发现规范框架,旨在通过基于自然语言的扩展API集访问标准清单发布,高效查找提供特定服务的智能体并获取相关服务的精确信息。本研究成果的核心目标是显著提升跨平台AI交互的能力与可扩展性。这种面向互操作对话式AI助手的新型架构具备通用性,可通过开放代码库实现复现与访问。