Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images, and then classifying whether or not a new query image depicts the key concept. On Bongard-HOI, a benchmark for natural-image Bongard problems, existing methods have only reached 66% accuracy (where chance is 50%). Low accuracy is often attributed to neural nets' lack of ability to find human-like symbolic rules. In this work, we point out that many existing methods are forfeiting accuracy due to a much simpler problem: they do not incorporate information contained in the support set as a whole, and rely instead on information extracted from individual supports. This is a critical issue, because unlike in few-shot learning tasks concerning object classification, the "key concept" in a typical Bongard problem can only be distinguished using multiple positives and multiple negatives. We explore a variety of simple methods to take this cross-image context into account, and demonstrate substantial gains over prior methods, leading to new state-of-the-art performance on Bongard-LOGO (75.3%) and Bongard-HOI (72.45%) and strong performance on the original Bongard problem set (60.84%).
翻译:当前机器学习方法在解决邦加德问题方面仍面临挑战。这类智力测试要求从一组正例和负例“支撑图像”中推导出抽象“概念”,然后判断新查询图像是否包含该关键概念。在自然图像邦加德基准测试Bongard-HOI中,现有方法仅达到66%的准确率(随机水平为50%)。低准确率常被归因于神经网络缺乏寻找类似人类符号规则的能力。本工作中我们指出,许多现有方法因一个更简单的问题而损失准确率:它们未能整合整个支撑集中包含的信息,而是依赖从单个支撑图像中提取的信息。这是一个关键问题,因为与目标分类的小样本学习任务不同,典型邦加德问题中的“关键概念”必须通过多个正例和多个负例才能区分。我们探索了多种考虑跨图像上下文的简单方法,相较先前方法取得显著提升——在Bongard-LOGO(75.3%)和Bongard-HOI(72.45%)上实现新最优性能,并在原始邦加德问题集上获得强劲表现(60.84%)。