In this paper, we propose a methodology to align a medium-sized GPT model, originally trained in English for an open domain, to a small closed domain in Spanish. The application for which the model is finely tuned is the question answering task. To achieve this we also needed to train and implement another neural network (which we called the reward model) that could score and determine whether an answer is appropriate for a given question. This component served to improve the decoding and generation of the answers of the system. Numerical metrics such as BLEU and perplexity were used to evaluate the model, and human judgment was also used to compare the decoding technique with others. Finally, the results favored the proposed method, and it was determined that it is feasible to use a reward model to align the generation of responses.
翻译:本文提出了一种方法,旨在将原本在英语开放领域训练的中型GPT模型对齐到西班牙语小型封闭领域。该模型微调的应用场景是问答任务。为实现这一目标,我们还需训练并实现另一个神经网络(称为奖励模型),该网络能够评分并判断给定问题的答案是否合适。这一组件用于改进系统答案的解码与生成过程。我们采用BLEU和困惑度等数值指标对模型进行评估,同时结合人工判断比较该解码技术与其他技术的优劣。最终,实验结果支持了所提方法,并证实利用奖励模型对齐回答生成是可行的。