Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images, and then classifying whether or not a new query image depicts the key concept. On Bongard-HOI, a benchmark for natural-image Bongard problems, most existing methods have reached at best 69% accuracy (where chance is 50%). Low accuracy is often attributed to neural nets' lack of ability to find human-like symbolic rules. In this work, we point out that many existing methods are forfeiting accuracy due to a much simpler problem: they do not adapt image features given information contained in the support set as a whole, and rely instead on information extracted from individual supports. This is a critical issue, because the "key concept" in a typical Bongard problem can often only be distinguished using multiple positives and multiple negatives. We explore simple methods to incorporate this context and show substantial gains over prior works, leading to new state-of-the-art accuracy on Bongard-LOGO (75.3%) and Bongard-HOI (76.4%) compared to methods with equivalent vision backbone architectures and strong performance on the original Bongard problem set (60.8%).
翻译:当前机器学习方法在解决Bongard问题时面临困难,这类智商测试要求从一组正负"支持"图像中推导抽象"概念",进而判断新查询图像是否体现核心概念。在自然图像Bongard问题基准Bongard-HOI上,现有方法最高准确率仅达69%(随机基准为50%)。低准确率常被归因于神经网络缺乏发现类人符号规则的能力。本研究指出,许多现有方法因更基础的问题而牺牲了准确率:它们未根据支持集整体信息调整图像特征,仅依赖从单个支持样本提取的信息。这是关键缺陷,因为典型Bongard问题的"核心概念"往往需要同时分析多个正负样本才能辨识。我们探索了融入上下文信息的简易方法,相比先前研究取得显著提升:在视觉骨干架构相同的条件下,Bongard-LOGO(75.3%)和Bongard-HOI(76.4%)准确率创下新高,在原始Bongard问题集上也表现出色(60.8%)。