Finetuning approaches in NLP often focus on exploitation rather than exploration, which may lead to suboptimal models. Given the vast search space of natural language, this limited exploration can restrict their performance in complex, high-stakes domains, where accurate negation understanding and logical reasoning abilities are crucial. To address this issue, we leverage Reinforcement Learning from Logical Feedback (RLLF) to create an effective balance between exploration and exploitation in LLMs. Our approach employs an appropriate benchmark dataset for training and evaluation, highlighting the importance of exploration in enhancing negation understanding capabilities. We compare the performance of our RLLF-enhanced LLMs with baseline models trained without RLLF, demonstrating the value of this balanced approach. Furthermore, we showcase the potential of our method in legal AI applications by employing transfer learning and evaluating its impact on negation understanding. Our experimental results exhibit the effectiveness of balancing exploration and exploitation with RLLF in improving LLMs' negation capabilities. This has implications for the development of more accurate, reliable, and logically consistent language models in high-stakes domains.
翻译:自然语言处理中的微调方法往往侧重于利用而非探索,这可能导致模型性能次优。鉴于自然语言的庞大搜索空间,这种有限的探索可能会限制其在复杂、高风险领域的表现,而此类领域对准确的否定理解能力和逻辑推理能力至关重要。为解决这一问题,我们利用逻辑反馈强化学习(RLLF)在大语言模型(LLM)中建立探索与利用之间的有效平衡。我们的方法采用合适的基准数据集进行训练与评估,凸显了探索对提升否定理解能力的重要性。我们将在RLLF增强下的LLM性能与未经RLLF训练的基线模型进行对比,证明了这种平衡方法的价值。此外,我们通过迁移学习展示了该方法在法律人工智能应用中的潜力,并评估了其对否定理解的影响。实验结果表明,利用RLLF平衡探索与利用能有效提升LLM的否定能力。这对高风险领域中开发更准确、可靠且逻辑一致的语言模型具有重要意义。