Most datasets used for supervised machine learning consist of a single label per data point. However, in cases where more information than just the class label is available, would it be possible to train models more efficiently? We introduce two novel model architectures, which we call hybrid concept-based models, that train using both class labels and additional information in the dataset referred to as concepts. In order to thoroughly assess their performance, we introduce ConceptShapes, an open and flexible class of datasets with concept labels. We show that the hybrid concept-based models outperform standard computer vision models and previously proposed concept-based models with respect to accuracy, especially in sparse data settings. We also introduce an algorithm for performing adversarial concept attacks, where an image is perturbed in a way that does not change a concept-based model's concept predictions, but changes the class prediction. The existence of such adversarial examples raises questions about the interpretable qualities promised by concept-based models.
翻译:大多数用于监督机器学习的数据集每个数据点仅包含单一标签。然而,在可获得比类别标签更多信息的情况下,是否可能更高效地训练模型?我们提出了两种新颖的模型架构——称为混合概念模型——它们同时利用类别标签和数据集中的附加信息(称为概念)进行训练。为了全面评估其性能,我们引入了ConceptShapes:一类开放且灵活、带有概念标签的数据集。实验表明,混合概念模型在准确率方面优于标准计算机视觉模型及先前提出的概念模型,尤其在稀疏数据场景下表现突出。我们还提出了一种对抗性概念攻击算法,该算法通过对图像施加扰动,使概念模型的概念预测保持不变而类别预测发生改变。此类对抗样本的存在引发了对概念模型所承诺的可解释性特质的质疑。