Finetuning an LLM on Contextual Knowledge of Classics for Q&A

The open-source publishing of large language models (LLMs) has created many possibilities for how anyone who understands language and has access to a computer can interact with significant tools of artificial intelligence, particularly in the context of learning and knowledge dissemination. However, the utility of these models in specialized fields like Classics is still largely unexplored. This project is an attempt to merge the knowledge of Classics with the capabilities of artificial intelligence by finetuning an LLM to cater to the specific needs of learners and professionals. The goal of this project is to develop an LLM that not only reproduces contextual knowledge accurately but also exhibits a consistent "personality" - and, indeed, has consistent propriety - to appeal to a diverse audience who possess differing levels of knowledge. A significant portion of this project was dedicated to refining the dataset, following the principle of "garbage in, garbage out," to ensure the model generates relevant, useful, and creative responses when given a prompt (a statement, question, or single word). After training and evaluation, my model's ability to handle a vast array of different types of inputs and prompting exceeded expectations for a 355M parameter model, though its occasional hallucinations (especially when set with a high temperature), particularly in its assertions about historical events or its own identity, make it seem somewhat capricious and more work in the form of continuous finetuning will be undertaken.

翻译：大型语言模型的开源发布为任何理解语言并能使用计算机的人创造了众多与重要人工智能工具交互的可能性，尤其是在学习和知识传播的背景下。然而，这些模型在古典学等专业领域的实用性仍有待深入探索。本项目尝试通过微调大语言模型，将古典学知识与人工智能能力相结合，以满足学习者和专业人士的特定需求。项目目标是开发一个不仅能准确复现语境知识，还能展现一致"个性"——实际上也包括一致的得体性——以吸引具备不同知识水平的多样化受众的模型。本项目的一个重要部分致力于优化数据集，遵循"垃圾进，垃圾出"原则，确保模型在收到提示（陈述、问题或单个词语）时能生成相关、有用且富有创意的响应。经过训练和评估，我开发的模型处理大量不同类型输入和提示的能力超出了对355M参数模型的预期，尽管其偶尔出现的幻觉（尤其是在设置较高温度参数时），特别是在对历史事件或自身身份的主张上，使其显得有些反复无常，因此需要继续进行微调。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日