Large language models (LLMs) are powerful AI tools that can generate and comprehend natural language text and other complex information. However, the field lacks a mathematical framework to systematically describe, compare and improve LLMs. We propose Hex a framework that clarifies key terms and concepts in LLM research, such as hallucinations, alignment, self-verification and chain-of-thought reasoning. The Hex framework offers a precise and consistent way to characterize LLMs, identify their strengths and weaknesses, and integrate new findings. Using Hex, we differentiate chain-of-thought reasoning from chain-of-thought prompting and establish the conditions under which they are equivalent. This distinction clarifies the basic assumptions behind chain-of-thought prompting and its implications for methods that use it, such as self-verification and prompt programming. Our goal is to provide a formal framework for LLMs that can help both researchers and practitioners explore new possibilities for generative AI. We do not claim to have a definitive solution, but rather a tool for opening up new research avenues. We argue that our formal definitions and results are crucial for advancing the discussion on how to build generative AI systems that are safe, reliable, fair and robust, especially in domains like healthcare and software engineering.
翻译:大型语言模型(LLM)是能够生成和理解自然语言文本及其他复杂信息的强大人工智能工具。然而,该领域缺乏系统描述、比较和改进LLM的数学框架。我们提出Hex框架,该框架阐明了LLM研究中的关键术语与概念,如幻觉、对齐、自我验证和思维链推理。Hex框架提供了一种精确且一致的方式来表征LLM、识别其优缺点并整合新发现。借助Hex,我们区分了思维链推理与思维链提示,并确定了它们等价的条件。这一区分澄清了思维链提示背后的基本假设及其对自我验证和提示编程等应用方法的影响。我们的目标是为LLM提供形式化框架,帮助研究人员和实践者探索生成式AI的新可能性。我们并不声称这是最终解决方案,而是将其作为开辟新研究路径的工具。我们认为,这些形式化定义和结果对于推进如何构建安全、可靠、公平且鲁棒的生成式AI系统(尤其是在医疗和软件工程等领域的应用)的讨论至关重要。