Few-shot prompting provides an effective way to adapt auditory large language models to low-resource tasks such as children's speech recognition. However, most auditory large language models are not explicitly trained to perform inference in this demonstration-conditioned format, limiting the extent to which they can benefit from few-shot prompting. To address this limitation, we introduce Few-Shot Aware GRPO (FSA-GRPO), an RL-based post-training recipe that uses a specially designed reward to encourage the model to leverage few-shot demonstrations, thereby strengthening its few-shot adaptation ability. Notably, training with only high-resource adult ASR data improves the model's general few-shot adaptation ability, yielding gains not only in children's speech recognition but also in speech translation and audio understanding. We further study data selection and auxiliary reward weighting to identify an effective training recipe. Our experiments show that when in-domain data are unavailable or cannot be used for training, FSA-GRPO is more effective than direct tuning on related out-of-domain data.
翻译:少样本提示为将听觉大语言模型适配到低资源任务(如儿童语音识别)提供了有效途径。然而,大多数听觉大语言模型并未被明确训练以在示范条件格式下进行推理,这限制了它们从少样本提示中获益的程度。为解决这一局限,我们提出了少样本感知GRPO(FSA-GRPO),一种基于强化学习的后训练方法,通过专门设计的奖励机制鼓励模型利用少样本示范,从而增强其少样本适配能力。值得注意的是,仅使用高资源成人ASR数据进行训练即可提升模型的通用少样本适配能力,不仅在儿童语音识别中取得收益,在语音翻译和音频理解任务上亦有效果。我们进一步研究了数据选择和辅助奖励加权机制,以确定有效的训练方案。实验表明,当领域内数据不可用或无法用于训练时,FSA-GRPO比直接对相关领域外数据进行微调更为有效。