In the evolving landscape of artificial intelligence, multimodal and Neuro-Symbolic paradigms stand at the forefront, with a particular emphasis on the identification and interaction with entities and their relations across diverse modalities. Addressing the need for complex querying and interaction in this context, we introduce SNeL (Structured Neuro-symbolic Language), a versatile query language designed to facilitate nuanced interactions with neural networks processing multimodal data. SNeL's expressive interface enables the construction of intricate queries, supporting logical and arithmetic operators, comparators, nesting, and more. This allows users to target specific entities, specify their properties, and limit results, thereby efficiently extracting information from a scene. By aligning high-level symbolic reasoning with low-level neural processing, SNeL effectively bridges the Neuro-Symbolic divide. The language's versatility extends to a variety of data types, including images, audio, and text, making it a powerful tool for multimodal scene understanding. Our evaluations demonstrate SNeL's potential to reshape the way we interact with complex neural networks, underscoring its efficacy in driving targeted information extraction and facilitating a deeper understanding of the rich semantics encapsulated in multimodal AI models.
翻译:随着人工智能领域的发展,多模态与神经符号范式正站在前沿,尤其强调跨不同模态实体及其关系的识别与交互。为满足这一背景下复杂查询与交互的需求,我们提出SNeL(结构化神经符号语言),这是一种通用查询语言,旨在促进与处理多模态数据的神经网络进行精细交互。SNeL的表达性接口支持构建复杂查询,包括逻辑与算术运算符、比较器、嵌套等操作。这使得用户能够锁定特定实体、指定其属性并限制结果范围,从而高效地从场景中提取信息。通过将高层符号推理与底层神经处理对齐,SNeL有效弥合了神经符号间的鸿沟。该语言的通用性延伸至图像、音频和文本等多种数据类型,使其成为多模态场景理解的有力工具。我们的评估表明,SNeL有望重塑我们与复杂神经网络的交互方式,突显其在驱动目标信息提取、促进对多模态AI模型丰富语义深度理解方面的有效性。