This paper investigates the uncertainty of Generative Pre-trained Transformer (GPT) models in extracting mathematical equations from images of varying resolutions and converting them into LaTeX code. We employ concepts of entropy and mutual information to examine the recognition process and assess the model's uncertainty in this Optical Character Recognition (OCR) task. By analyzing the conditional entropy of the output token sequences, we provide both theoretical insights and practical measurements of the GPT model's performance given different image qualities. Our experimental results, obtained using a Python implementation available on GitHub, demonstrate a clear relationship between image resolution and GPT model uncertainty. Higher-resolution images lead to lower entropy values, indicating reduced uncertainty and improved accuracy in the recognized LaTeX code. Conversely, lower-resolution images result in increased entropy, reflecting higher uncertainty and a higher likelihood of recognition errors. These findings highlight the practical importance of considering image quality in GPT-based mathematical OCR applications and demonstrate how entropy analysis, grounded in information-theoretic concepts, can effectively quantify model uncertainty in real-world tasks.
翻译:本文研究了生成式预训练Transformer(GPT)模型从不同分辨率图像中提取数学公式并转换为LaTeX代码过程中的不确定性。我们运用熵与互信息的概念来检验识别过程,并评估模型在此光学字符识别(OCR)任务中的不确定性。通过分析输出标记序列的条件熵,我们从理论视角和实际测量两个维度评估了GPT模型在不同图像质量下的性能。基于GitHub上公开的Python实现所获得的实验结果表明,图像分辨率与GPT模型不确定性之间存在明确关联:更高分辨率的图像导致更低的熵值,表明不确定性降低且识别的LaTeX代码准确性提高;反之,较低分辨率的图像会导致熵值增加,反映出更高的不确定性和更大的识别错误概率。这些发现凸显了在基于GPT的数学OCR应用中考虑图像质量的现实重要性,并证明了基于信息论概念的熵分析能够有效量化实际任务中的模型不确定性。