Converting mathematical expressions into LaTeX is challenging. In this paper, we explore using newer transformer based architectures for addressing the problem of converting handwritten/digital mathematical expression images into equivalent LaTeX code. We use the current state of the art CNN encoder and RNN decoder as a baseline for our experiments. We also investigate improvements to CNN-RNN architecture by replacing the CNN encoder with the ResNet50 model. Our experiments show that transformer architectures achieve a higher overall accuracy and BLEU scores along with lower Levenschtein scores compared to the baseline CNN/RNN architecture with room to achieve even better results with appropriate fine-tuning of model parameters.
翻译:将数学表达式转换为LaTeX代码具有挑战性。本文探索使用基于Transformer的新型架构来解决手写/数字数学表达式图像到等效LaTeX代码的转换问题。我们采用当前最先进的CNN编码器和RNN解码器作为实验基线,并通过将CNN编码器替换为ResNet50模型来研究CNN-RNN架构的改进。实验表明,与基线CNN/RNN架构相比,Transformer架构在整体准确率和BLEU分数上表现更优,同时Levenschtein分数更低,且通过适当的模型参数微调还有进一步提升结果的空间。