This study aims to acquire knowledge for creating very large language models that are immune to hallucinations. Hallucinations in contemporary large language models are often attributed to a misunderstanding of real-world social relationships. Therefore, I hypothesize that very large language models capable of thoroughly grasping all these relationships will be free from hallucinations. Additionally, I propose that certain types of equivariant language models are adept at learning and understanding these relationships. Building on this, I have developed a specialized cross-entropy error function to create a hallucination scale for language models, which measures their extent of equivariance acquisition. Utilizing this scale, I tested language models for their ability to acquire character-level equivariance. In particular, I introduce and employ a novel technique based on T5 (Text To Text Transfer Transformer) that efficiently understands permuted input texts without the need for explicit dictionaries to convert token IDs (integers) to texts (strings). This T5 model demonstrated a moderate ability to acquire character-level equivariance. Additionally, I discovered scale laws that can aid in developing hallucination-free language models at the character level. This methodology can be extended to assess equivariance acquisition at the word level, paving the way for very large language models that can comprehensively understand relationships and, consequently, avoid hallucinations.
翻译:本研究旨在获取创建免疫幻觉的超大规模语言模型的知识。当代大语言模型中的幻觉通常归因于对现实世界社会关系的误解。因此,我假设能够彻底掌握所有这些关系的超大规模语言模型将不会产生幻觉。此外,我提出某些类型的等变语言模型擅长学习和理解这些关系。基于此,我开发了一种专门的交叉熵误差函数来创建语言模型的幻觉量表,该量表衡量其获得等变性的程度。利用这一量表,我测试了语言模型获取字符级等变性的能力。特别地,我引入并采用了一种基于T5(文本到文本转换Transformer)的新技术,该技术无需显式词典将token ID(整数)转换为文本(字符串),即可高效理解经过置换的输入文本。该T5模型展示了中等程度的字符级等变性获取能力。此外,我发现了有助于在字符级开发无幻觉语言模型的规模定律。该方法可扩展至评估词级等变性获取,为能够全面理解关系从而避免幻觉的超大规模语言模型铺平道路。