Hieroglyphs, as logographic writing systems, encode rich semantic and cultural information within their internal structural composition. Yet, current advanced Large Language Models (LLMs) and Multimodal LLMs (MLLMs) usually remain structurally blind to this information. LLMs process characters as textual tokens, while MLLMs additionally view them as raw pixel grids. Both fall short to model the underlying logic of character strokes. Furthermore, existing structural analysis methods are often script-specific and labor-intensive. In this paper, we propose Hieroglyphic Stroke Analyzer (HieroSA), a novel and generalizable framework that enables MLLMs to automatically derive stroke-level structures from character bitmaps without handcrafted data. It transforms modern logographic and ancient hieroglyphs character images into explicit, interpretable line-segment representations in a normalized coordinate space, allowing for cross-lingual generalization. Extensive experiments demonstrate that HieroSA effectively captures character-internal structures and semantics, bypassing the need for language-specific priors. Experimental results highlight the potential of our work as a graphematics analysis tool for a deeper understanding of hieroglyphic scripts. View our code at https://github.com/THUNLP-MT/HieroSA.
翻译:象形文字作为一种表意文字系统,其内部结构构成编码了丰富的语义与文化信息。然而,当前先进的大型语言模型和多模态大型语言模型通常对此类结构信息缺乏感知能力。LLMs将字符处理为文本标记,而MLLMs则额外将其视为原始像素网格。两者均难以建模字符笔画的内在逻辑。此外,现有的结构分析方法往往针对特定文字系统且依赖大量人工标注。本文提出象形文字笔画分析器,这是一种新颖且可泛化的框架,使MLLMs能够无需人工标注数据即可从字符位图自动推导笔画级结构。该框架将现代表意文字与古代象形文字字符图像转换为规范化坐标空间中显式、可解释的线段表示,从而实现跨语言泛化。大量实验表明,HieroSA能有效捕捉字符内部结构与语义,且无需依赖语言特定先验知识。实验结果凸显了本工作作为文字形态分析工具的潜力,可为深入理解象形文字系统提供支持。代码发布于https://github.com/THUNLP-MT/HieroSA。