Artificial intelligence is continuously seeking novel challenges and benchmarks to effectively measure performance and to advance the state-of-the-art. In this paper we introduce KANDY, a benchmarking framework that can be used to generate a variety of learning and reasoning tasks inspired by Kandinsky patterns. By creating curricula of binary classification tasks with increasing complexity and with sparse supervisions, KANDY can be used to implement benchmarks for continual and semi-supervised learning, with a specific focus on symbol compositionality. Classification rules are also provided in the ground truth to enable analysis of interpretable solutions. Together with the benchmark generation pipeline, we release two curricula, an easier and a harder one, that we propose as new challenges for the research community. With a thorough experimental evaluation, we show how both state-of-the-art neural models and purely symbolic approaches struggle with solving most of the tasks, thus calling for the application of advanced neuro-symbolic methods trained over time.
翻译:人工智能不断寻求新颖的挑战和基准,以有效衡量性能并推动技术前沿发展。本文提出KANDY基准框架,该框架可生成受康定斯基模式启发的多样化学习与推理任务。通过构建具有递增复杂度和稀疏监督机制的二元分类任务课程,KANDY可用于实现持续学习和半监督学习的基准测试,尤其关注符号组合性。基准数据集中还提供了真实分类规则,以支持可解释性分析。伴随基准生成流程,我们发布了两套课程(简易与困难版本),作为研究社区的新挑战。通过全面的实验评估,我们证明当前最先进的神经网络模型与纯符号方法均难以解决大部分任务,亟需随时间训练的先进神经符号方法的应用。