Despite the successes of recent developments in visual AI, different shortcomings still exist; from missing exact logical reasoning, to abstract generalization abilities, to understanding complex and noisy scenes. Unfortunately, existing benchmarks, were not designed to capture more than a few of these aspects. Whereas deep learning datasets focus on visually complex data but simple visual reasoning tasks, inductive logic datasets involve complex logical learning tasks, however, lack the visual component. To address this, we propose the visual logical learning dataset, V-LoL, that seamlessly combines visual and logical challenges. Notably, we introduce the first instantiation of V-LoL, V-LoL-Trains, -- a visual rendition of a classic benchmark in symbolic AI, the Michalski train problem. By incorporating intricate visual scenes and flexible logical reasoning tasks within a versatile framework, V-LoL-Trains provides a platform for investigating a wide range of visual logical learning challenges. We evaluate a variety of AI systems including traditional symbolic AI, neural AI, as well as neuro-symbolic AI. Our evaluations demonstrate that even state-of-the-art AI faces difficulties in dealing with visual logical learning challenges, highlighting unique advantages and limitations specific to each methodology. Overall, V-LoL opens up new avenues for understanding and enhancing current abilities in visual logical learning for AI systems.
翻译:尽管近期视觉人工智能取得了成功,其仍存在不同缺陷:从缺乏精确的逻辑推理,到抽象的泛化能力,再到理解复杂和嘈杂的场景。遗憾的是,现有基准测试大多仅能涵盖其中少数几个方面。深度学习数据集侧重于视觉上复杂的数据但仅涉及简单的视觉推理任务,而归纳逻辑数据集包含复杂的逻辑学习任务,却缺乏视觉组件。为解决这一问题,我们提出了视觉逻辑学习数据集V-LoL,该数据集无缝结合了视觉与逻辑挑战。值得注意的是,我们介绍了V-LoL的第一个实例化版本V-LoL-Trains——这是符号人工智能经典基准测试“Michalski列车问题”的视觉化呈现。通过在一个灵活框架中融入复杂的视觉场景和可扩展的逻辑推理任务,V-LoL-Trains为探索广泛的视觉逻辑学习挑战提供了平台。我们评估了多种人工智能系统,包括传统符号AI、神经AI以及神经符号AI。评估结果表明,即使是当前最先进的人工智能在处理视觉逻辑学习挑战时仍面临困难,并凸显了每种方法特有的优势与局限性。总体而言,V-LoL为理解和提升人工智能系统在视觉逻辑学习方面的当前能力开辟了新途径。