ORCA (Shen et al., 2023) is a recent technique for cross-modal fine-tuning, i.e., applying pre-trained transformer models to modalities beyond their training data. The technique consists primarily of training an embedder and fine-tuning the embedder and model. Despite its high performance on a variety of downstream tasks, we do not understand precisely how each of these components contribute to ORCA's success. Therefore, we run a series of ablations and find that embedder training does not help 2D tasks at all, contrary to what the original paper posits. In 1D tasks, some amount of embedder training is necessary but more is not better. In 4 out of 6 datasets we experiment with, it is model fine-tuning that makes the biggest difference. Through our ablations and baselines, we contribute a better understanding of the individual components of ORCA.
翻译:ORCA(Shen等人,2023)是一种最新的跨模态微调技术,即对预训练Transformer模型应用于其训练数据之外的模态。该技术主要包括训练一个嵌入器,并对嵌入器和模型进行微调。尽管在各种下游任务中表现出色,但我们并不确切了解这些组件各自如何对ORCA的成功做出贡献。因此,我们进行了一系列消融实验,发现与原始论文的假设相反,嵌入器训练对2D任务完全没有帮助。对于1D任务,一定量的嵌入器训练是必要的,但并非越多越好。在我们实验的6个数据集中,有4个数据集上模型微调是造成最大差异的因素。通过消融实验和基线比较,我们加深了对ORCA各独立组件的理解。