We introduce SONO, a novel method leveraging Second-Order Neural Ordinary Differential Equations (Second-Order NODEs) to enhance cross-modal few-shot learning. By employing a simple yet effective architecture consisting of a Second-Order NODEs model paired with a cross-modal classifier, SONO addresses the significant challenge of overfitting, which is common in few-shot scenarios due to limited training examples. Our second-order approach can approximate a broader class of functions, enhancing the model's expressive power and feature generalization capabilities. We initialize our cross-modal classifier with text embeddings derived from class-relevant prompts, streamlining training efficiency by avoiding the need for frequent text encoder processing. Additionally, we utilize text-based image augmentation, exploiting CLIP's robust image-text correlation to enrich training data significantly. Extensive experiments across multiple datasets demonstrate that SONO outperforms existing state-of-the-art methods in few-shot learning performance.
翻译:本文提出了一种新颖的方法SONO,该方法利用二阶神经常微分方程来增强跨模态小样本学习。通过采用一个由二阶神经常微分方程模型与跨模态分类器组成的简单而有效的架构,SONO解决了小样本场景中因训练样本有限而常见的过拟合这一重大挑战。我们的二阶方法能够逼近更广泛的函数类别,从而增强了模型的表达能力和特征泛化能力。我们使用源自类别相关提示的文本嵌入来初始化跨模态分类器,通过避免频繁的文本编码器处理,简化了训练效率。此外,我们利用基于文本的图像增强技术,利用CLIP强大的图像-文本相关性来显著丰富训练数据。在多个数据集上进行的大量实验表明,SONO在小样本学习性能上优于现有的最先进方法。