Attention mechanisms have exhibited promising potential in enhancing learning models by identifying salient portions of input data. This is particularly valuable in scenarios where limited training samples are accessible due to challenges in data collection and labeling. Drawing inspiration from human recognition processes, we posit that an AI baseline's performance could be more accurate and dependable if it is exposed to essential segments of raw data rather than the entire input dataset, akin to human perception. However, the task of selecting these informative data segments, referred to as hard attention finding, presents a formidable challenge. In situations with few training samples, existing studies struggle to locate such informative regions due to the large number of training parameters that cannot be effectively learned from the available limited samples. In this study, we introduce a novel and practical framework for achieving explainable hard attention finding, specifically tailored for few-shot learning scenarios, called FewXAT. Our approach employs deep reinforcement learning to implement the concept of hard attention, directly impacting raw input data and thus rendering the process interpretable for human understanding. Through extensive experimentation across various benchmark datasets, we demonstrate the efficacy of our proposed method.
翻译:注意力机制通过识别输入数据中的显著部分,在增强学习模型方面展现出巨大潜力。这一特性在因数据采集与标注困难而导致训练样本有限的场景中尤为重要。受人类识别过程的启发,我们提出一个假设:若人工智能基线模型能够接触原始数据的关键片段而非完整输入数据集(类似于人类感知方式),其性能将更为精确可靠。然而,选择这些信息性数据片段(称为硬注意力定位)的任务本身构成严峻挑战。在训练样本稀缺的情况下,现有研究难以定位此类信息区域,因为大量训练参数无法从有限的可用样本中有效学习。本研究提出了一种新颖且实用的可解释硬注意力定位框架,专门针对少样本学习场景设计,命名为FewXAT。该方法采用深度强化学习实现硬注意力概念,直接作用于原始输入数据,从而使整个过程具备人类可理解的解释性。通过在多个基准数据集上的广泛实验,我们验证了所提方法的有效性。