In this paper we present Texo, a minimalist yet highperformance formula recognition model that contains only 20 million parameters. By attentive design, distillation and transfer of the vocabulary and the tokenizer, Texo achieves comparable performance to state-of-the-art models such as UniMERNet-T and PPFormulaNet-S, while reducing the model size by 80% and 65%, respectively. This enables real-time inference on consumer-grade hardware and even in-browser deployment. We also developed a web application to demonstrate the model capabilities and facilitate its usage for end users.
翻译:本文提出Texo,一种仅包含2000万参数的极简高性能公式识别模型。通过词汇表与分词器的精心设计、蒸馏与迁移,Texo在性能上达到了与UniMERNet-T、PPFormulaNet-S等前沿模型相当的水平,同时模型规模分别缩减了80%和65%。这使得模型能够在消费级硬件上实现实时推理,甚至支持浏览器内部署。我们还开发了网络应用程序以展示模型能力,并为终端用户提供便捷的使用体验。