Few-shot image classification has emerged as a key challenge in the field of computer vision, highlighting the capability to rapidly adapt to new tasks with minimal labeled data. Existing methods predominantly rely on image-level features or local descriptors, often overlooking the holistic context surrounding these descriptors. In this work, we introduce a novel approach termed "Local Descriptor with Contextual Augmentation (LDCA)". Specifically, this method bridges the gap between local and global understanding uniquely by leveraging an adaptive global contextual enhancement module. This module incorporates a visual transformer, endowing local descriptors with contextual awareness capabilities, ranging from broad global perspectives to intricate surrounding nuances. By doing so, LDCA transcends traditional descriptor-based approaches, ensuring each local feature is interpreted within its larger visual narrative. Extensive experiments underscore the efficacy of our method, showing a maximal absolute improvement of 20\% over the next-best on fine-grained classification datasets, thus demonstrating significant advancements in few-shot classification tasks.
翻译:小样本图像分类已成为计算机视觉领域的关键挑战,其核心能力在于利用少量标注数据快速适应新任务。现有方法主要依赖图像级特征或局部描述子,往往忽略这些描述子周围的整体上下文信息。本文提出一种名为"局部描述子与上下文增强(LDCA)"的创新方法。具体而言,该方法通过自适应全局上下文增强模块独特地弥合了局部与全局理解之间的鸿沟。该模块引入视觉Transformer,为局部描述子赋予从宏观全局视角到精细周围细节的上下文感知能力。由此,LDCA突破了传统基于描述子方法的局限,确保每个局部特征都在其更大的视觉叙事框架中得到解读。大量实验验证了本方法的有效性,在细粒度分类数据集上相较于次优方法实现最高20%的绝对性能提升,从而证明了小样本分类任务的显著进步。