Assessing GPT Model Uncertainty in Mathematical OCR Tasks via Entropy Analysis

This paper investigates the uncertainty of Generative Pre-trained Transformer (GPT) models in extracting mathematical equations from images of varying resolutions and converting them into LaTeX code. We employ concepts of entropy and mutual information to examine the recognition process and assess the model's uncertainty in this Optical Character Recognition (OCR) task. By analyzing the conditional entropy of the output token sequences, we provide both theoretical insights and practical measurements of the GPT model's performance given different image qualities. Our experimental results, obtained using a Python implementation available on GitHub, demonstrate a clear relationship between image resolution and GPT model uncertainty. Higher-resolution images lead to lower entropy values, indicating reduced uncertainty and improved accuracy in the recognized LaTeX code. Conversely, lower-resolution images result in increased entropy, reflecting higher uncertainty and a higher likelihood of recognition errors. These findings highlight the practical importance of considering image quality in GPT-based mathematical OCR applications and demonstrate how entropy analysis, grounded in information-theoretic concepts, can effectively quantify model uncertainty in real-world tasks.

翻译：本文研究了生成式预训练Transformer（GPT）模型从不同分辨率图像中提取数学公式并转换为LaTeX代码过程中的不确定性。我们运用熵与互信息的概念来检验识别过程，并评估模型在此光学字符识别（OCR）任务中的不确定性。通过分析输出标记序列的条件熵，我们从理论视角和实际测量两个维度评估了GPT模型在不同图像质量下的性能。基于GitHub上公开的Python实现所获得的实验结果表明，图像分辨率与GPT模型不确定性之间存在明确关联：更高分辨率的图像导致更低的熵值，表明不确定性降低且识别的LaTeX代码准确性提高；反之，较低分辨率的图像会导致熵值增加，反映出更高的不确定性和更大的识别错误概率。这些发现凸显了在基于GPT的数学OCR应用中考虑图像质量的现实重要性，并证明了基于信息论概念的熵分析能够有效量化实际任务中的模型不确定性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日