We present a semi-supervised fine-tuning framework for foundation models that utilises mutual information decomposition to address the challenges of training for a limited amount of labelled data. Our approach derives two distinct lower bounds: i) for the downstream task space, such as classification, optimised using conditional and marginal cross-entropy alongside Kullback-Leibler divergence, and ii) for the latent space representation, regularised and aligned using a contrastive-like decomposition. This fine-tuning strategy retains the pre-trained structure of the foundation model, modifying only a specialised projector module comprising a small transformer and a token aggregation technique. Experiments on several datasets demonstrate significant improvements in classification tasks under extremely low-labelled conditions by effectively leveraging unlabelled data.
翻译:本文提出了一种用于基座模型的半监督微调框架,该框架利用互信息分解来应对有限标注数据下的训练挑战。我们的方法推导出两个不同的下界:i) 针对下游任务空间(如分类),通过条件与边际交叉熵以及Kullback-Leibler散度进行优化;ii) 针对潜在空间表示,采用一种类对比分解进行正则化与对齐。该微调策略保留了基座模型的预训练结构,仅修改一个由小型Transformer和令牌聚合技术组成的专用投影器模块。在多个数据集上的实验表明,通过有效利用未标注数据,该方法在极低标注条件下显著提升了分类任务的性能。