Large Language Models (LLMs) have made significant progress in recent years, achieving remarkable results in question-answering tasks (QA). However, they still face two major challenges: hallucination and outdated information after the training phase. These challenges take center stage in critical domains like climate change, where obtaining accurate and up-to-date information from reliable sources in a limited time is essential and difficult. To overcome these barriers, one potential solution is to provide LLMs with access to external, scientifically accurate, and robust sources (long-term memory) to continuously update their knowledge and prevent the propagation of inaccurate, incorrect, or outdated information. In this study, we enhanced GPT-4 by integrating the information from the Sixth Assessment Report of the Intergovernmental (IPCC AR6), the most comprehensive, up-to-date, and reliable source in this domain. We present our conversational AI prototype, available at www.chatclimate.ai and demonstrate its ability to answer challenging questions accurately in three different QA scenarios: asking from 1) GPT-4, 2) chatClimate, and 3) hybrid chatClimate. The answers and their sources were evaluated by our team of IPCC authors, who used their expert knowledge to score the accuracy of the answers from 1 (very-low) to 5 (very-high). The evaluation showed that the hybrid chatClimate provided more accurate answers, highlighting the effectiveness of our solution. This approach can be easily scaled for chatbots in specific domains, enabling the delivery of reliable and accurate information.
翻译:近年来,大型语言模型(LLMs)在问答任务(QA)中取得了显著进展,取得了令人瞩目的成果。然而,它们仍面临两大挑战:幻觉现象以及训练阶段后的信息过时问题。在气候变化等关键领域,这些挑战尤为突出,因为从可靠来源及时获取准确且最新的信息至关重要,却也困难重重。为克服这些障碍,一个潜在解决方案是为LLMs提供对科学准确且稳健的外部来源(长期记忆)的访问权,以持续更新其知识,防止不准确、错误或过时信息的传播。在本研究中,我们通过整合政府间气候变化专门委员会第六次评估报告(IPCC AR6)——该领域最全面、最新且最可靠的来源——的信息,增强了GPT-4模型。我们展示了对话式AI原型(网址:www.chatclimate.ai),并验证了其在三种不同QA场景中准确回答挑战性问题的能力:分别向1)GPT-4、2)chatClimate及3)混合chatClimate提问。回答及其来源由我们的IPCC作者团队进行评估,他们运用专业知识从1分(极低)到5分(极高)对答案的准确性进行评分。评估结果表明,混合chatClimate提供了更准确的答案,凸显了我们解决方案的有效性。该方法可轻松扩展至特定领域的聊天机器人,有助于提供可靠且准确的信息。