Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PLMs) have achieved substantial advancements in the field of natural language processing. However, in real-world scenarios, data labels are often noisy due to the complex annotation process, making it essential to develop strategies for fine-tuning PLMs with such noisy labels. To this end, we introduce an innovative approach for fine-tuning PLMs using noisy labels, which incorporates the guidance of Large Language Models (LLMs) like ChatGPT. This guidance assists in accurately distinguishing between clean and noisy samples and provides supplementary information beyond the noisy labels, thereby boosting the learning process during fine-tuning PLMs. Extensive experiments on synthetic and real-world noisy datasets further demonstrate the superior advantages of our framework over the state-of-the-art baselines.
翻译:采用预训练与微调两阶段范式的预训练语言模型(PLMs)在自然语言处理领域取得了显著进展。然而,在实际应用场景中,由于标注过程复杂,数据标签往往包含噪声,因此亟需开发针对含噪声标签的PLMs微调策略。为此,我们提出了一种创新的含噪声标签微调方法,该方法引入大型语言模型(LLMs,如ChatGPT)的外部引导。这种引导能够帮助准确区分干净样本与噪声样本,并提供超越噪声标签的补充信息,从而促进PLMs微调过程中的学习效果。在合成噪声数据集和真实噪声数据集上的广泛实验进一步证明,我们的框架相比最先进的基线方法具有显著优势。