To address the sycophancy problem caused by reinforcement learning from human feedback in large language models, this research applies synthetic data intervention technology to the decoder-only transformer architecture. Based on the research gaps in the existing literature, the researcher designed an experimental process to reduce the tendency of models to cater by generating diversified data, and used GPT4o as an experimental tool for verification. The experiment used 100 true and false questions, and compared the performance of the model trained with synthetic data intervention and the original untrained model on multiple indicators. The results show that the SDI training model supports the technology in terms of accuracy rate and sycophancy rate and has significant effectiveness in reducing sycophancy phenomena. Notably, the data set, experimental process, code and data results have been uploaded to Github, the link is https://github.com/brucewang123456789/GeniusTrail.git.
翻译:为解决大型语言模型中因人类反馈强化学习引发的迎合问题,本研究将合成数据干预技术应用于仅解码器Transformer架构。基于现有文献的研究空白,研究者设计了通过生成多样化数据以降低模型迎合倾向的实验流程,并采用GPT4o作为实验工具进行验证。实验采用100道真假问题,对比了经合成数据干预训练的模型与原始未训练模型在多项指标上的表现。结果表明,SDI训练模型在准确率与迎合率方面均支持该技术,对降低迎合现象具有显著效果。值得注意的是,数据集、实验流程、代码及数据结果已上传至Github,链接为https://github.com/brucewang123456789/GeniusTrail.git。