Transformer-based language models often achieve strong results on mathematical reasoning benchmarks while remaining fragile on basic numerical understanding and arithmetic operations. A central limitation is that numbers are processed as symbolic tokens whose embeddings do not explicitly encode numerical value, leading to systematic errors. We introduce a value-aware numerical representation that augments standard tokenized inputs with a dedicated prefix token whose embedding is explicitly conditioned on the underlying numerical value. This mechanism injects magnitude information directly into the model's input space while remaining compatible with existing tokenizers and decoder-only Transformer architectures. Evaluation on arithmetic tasks shows that the proposed approach outperforms baselines across numerical formats, tasks, and operand lengths. These results indicate that explicitly encoding numerical value is an effective and efficient way to improve fundamental numerical robustness in language models.
翻译:基于Transformer的语言模型在数学推理基准测试中通常表现优异,但在基础数值理解和算术运算方面仍显脆弱。一个核心局限在于:数字被处理为符号标记,其嵌入表示并未显式编码数值,从而导致系统性错误。我们提出一种数值感知的数值表示方法,通过在标准分词输入中引入专用前缀标记来增强表示——该标记的嵌入表示被显式地基于底层数值进行条件化处理。该机制将数值大小信息直接注入模型的输入空间,同时保持与现有分词器及仅解码器Transformer架构的兼容性。在算术任务上的评估表明,所提方法在多种数值格式、任务类型及操作数长度上均优于基线模型。这些结果表明,显式编码数值信息是提升语言模型基础数值鲁棒性的有效且高效的途径。