Exploring the Effectiveness of GPT Models in Test-Taking: A Case Study of the Driver's License Knowledge Test

Large language models such as Open AI's Generative Pre-trained Transformer (GPT) models are proficient at answering questions, but their knowledge is confined to the information present in their training data. This limitation renders them ineffective when confronted with questions about recent developments or non-public documents. Our research proposes a method that enables GPT models to answer questions by employing context from an information source not previously included in their training data. The methodology includes preprocessing of contextual information, the embedding of contexts and queries, constructing prompt through the integration of context embeddings, and generating answers using GPT models. We applied this method in a controlled test scenario using the California Driver's Handbook as the information source. The GPT-3 model achieved a 96% passing score on a set of 50 sample driving knowledge test questions. In contrast, without context, the model's passing score fell to 82%. However, the model still fails to answer some questions correctly even with providing library of context, highlighting room for improvement. The research also examined the impact of prompt length and context format, on the model's performance. Overall, the study provides insights into the limitations and potential improvements for GPT models in question-answering tasks.

翻译：OpenAI的生成式预训练Transformer（GPT）等大型语言模型在回答问题方面表现出色，但其知识仅限于训练数据中包含的信息。这一局限导致它们在面对近期发展或非公开文件相关问题时效率低下。我们提出了一种方法，使GPT模型能够利用此前未纳入训练数据的信息源中的上下文来回答问题。该方法包括上下文信息的预处理、上下文和查询的嵌入、通过整合上下文嵌入构建提示，以及使用GPT模型生成答案。我们以《加州驾驶员手册》为信息源，在受控测试场景中应用了该方法。GPT-3模型在50道驾驶知识样本测试题中取得了96%的通过率。相比之下，未提供上下文时，模型的通过率降至82%。然而，即便提供上下文库，模型仍无法正确回答部分问题，表明存在改进空间。本研究还考察了提示长度和上下文格式对模型性能的影响。总体而言，该研究为GPT模型在问答任务中的局限性及潜在改进提供了见解。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日