Large language models (LLM) are generating information at a rapid pace, requiring users to increasingly rely and trust the data. Despite remarkable advances of LLM, Information generated by LLM is not completely trustworthy, due to challenges in information quality. Specifically, integrity of Information quality decreases due to unreliable, biased, tokenization during pre-training of LLM. Moreover, due to decreased information quality issues, has led towards hallucination, fabricated information. Unreliable information can lead towards flawed decisions in businesses, which impacts economic activity. In this work, we introduce novel mathematical information quality evaluation of LLM, we furthermore analyze and highlight information quality challenges, scaling laws to systematically scale language models.
翻译:大型语言模型(LLM)正在以快速生成信息,使用户日益依赖并信任这些数据。尽管LLM取得了显著进展,但由于信息质量方面的挑战,其生成的信息并非完全可信。具体而言,LLM预训练过程中因不可靠、有偏见的令牌化处理导致了信息质量的完整性下降。此外,信息质量下降问题还引发了幻觉和捏造信息。不可靠信息可能导致商业决策存在缺陷,进而影响经济活动。在本研究中,我们提出了新颖的LLM数学信息质量评估方法,并进一步分析并强调了信息质量挑战,以及系统化扩展语言模型的缩放定律。