Recognizing implicit visual and textual patterns is essential in many real-world applications of modern AI. However, tackling long-tail pattern recognition tasks remains challenging for current pre-trained foundation models such as LLMs and VLMs. While finetuning pre-trained models can improve accuracy in recognizing implicit patterns, it is usually infeasible due to a lack of training data and high computational overhead. In this paper, we propose ADAMAB, an efficient embedding calibration framework for few-shot pattern recognition. To maximally reduce the computational costs, ADAMAB trains embedder-agnostic light-weight calibrators on top of fixed embedding models without accessing their parameters. To mitigate the need for large-scale training data, we introduce an adaptive data augmentation strategy based on the Multi-Armed Bandit (MAB) mechanism. With a modified upper confidence bound algorithm, ADAMAB diminishes the gradient shifting and offers theoretically guaranteed convergence in few-shot training. Our multi-modal experiments justify the superior performance of ADAMAB, with up to 40% accuracy improvement when training with less than 5 initial data samples of each class.
翻译:在现代人工智能的诸多实际应用中,识别隐式的视觉与文本模式至关重要。然而,处理长尾模式识别任务对于当前预训练的基础模型(如LLMs和VLMs)而言仍然具有挑战性。虽然微调预训练模型可以提高识别隐式模式的准确性,但由于缺乏训练数据和高昂的计算开销,这种方法通常并不可行。本文提出ADAMAB,一种面向少样本模式识别的高效嵌入校准框架。为了最大限度地降低计算成本,ADAMAB在固定的嵌入模型之上训练与嵌入器无关的轻量级校准器,而无需访问其参数。为了缓解对大规模训练数据的需求,我们引入了一种基于多臂老虎机(MAB)机制的自适应数据增强策略。通过改进的上置信界算法,ADAMAB减少了梯度偏移,并在理论上保证了少样本训练中的收敛性。我们的多模态实验证明了ADAMAB的优越性能,在每类使用少于5个初始数据样本进行训练时,准确率最高可提升40%。