A method for solving concept-based learning (CBL) problem is proposed. The main idea behind the method is to divide each concept-annotated image into patches, to transform the patches into embeddings by using an autoencoder, and to cluster the embeddings assuming that each cluster will mainly contain embeddings of patches with certain concepts. To find concepts of a new image, the method implements the frequentist inference by computing prior and posterior probabilities of concepts based on rates of patches from images with certain values of the concepts. Therefore, the proposed method is called the Frequentist Inference CBL (FI-CBL). FI-CBL allows us to incorporate the expert rules in the form of logic functions into the inference procedure. An idea behind the incorporation is to update prior and conditional probabilities of concepts to satisfy the rules. The method is transparent because it has an explicit sequence of probabilistic calculations and a clear frequency interpretation. Numerical experiments show that FI-CBL outperforms the concept bottleneck model in cases when the number of training data is small. The code of proposed algorithms is publicly available.
翻译:本文提出了一种解决基于概念学习问题的方法。该方法的核心思想是将每个带有概念标注的图像分割为多个图像块,利用自编码器将这些图像块转换为嵌入向量,并假设每个聚类主要包含具有特定概念的图像块嵌入,从而对这些嵌入进行聚类。为了识别新图像中的概念,该方法通过基于具有特定概念值的图像中图像块的出现频率计算概念的先验概率和后验概率,实现了频率推断。因此,所提出的方法被称为频率推断概念学习。FI-CBL允许我们将专家规则以逻辑函数的形式纳入推断过程。这种纳入的核心思想是更新概念的先验概率和条件概率以满足规则。该方法具有透明性,因为它具有明确的概率计算序列和清晰的频率解释。数值实验表明,在训练数据量较小的情况下,FI-CBL的性能优于概念瓶颈模型。所提出算法的代码已公开提供。