基于信息熵感知的结构对齐网络用于零样本手写汉字识别 (Entropy-Aware Structural Alignment for Zero-Shot Handwritten Chinese Character Recognition)

Zero-shot Handwritten Chinese Character Recognition (HCCR) aims to recognize unseen characters by leveraging radical-based semantic compositions. However, existing approaches often treat characters as flat radical sequences, neglecting the hierarchical topology and the uneven information density of different components. To address these limitations, we propose an Entropy-Aware Structural Alignment Network that bridges the visual-semantic gap through information-theoretic modeling. First, we introduce an Information Entropy Prior to dynamically modulate positional embeddings via multiplicative interaction, acting as a saliency detector that prioritizes discriminative roots over ubiquitous components. Second, we construct a Dual-View Radical Tree to extract multi-granularity structural features, which are integrated via an adaptive Sigmoid-based gating network to encode both global layout and local spatial roles. Finally, a Top-K Semantic Feature Fusion mechanism is devised to augment the decoding process by utilizing the centroid of semantic neighbors, effectively rectifying visual ambiguities through feature-level consensus. Extensive experiments demonstrate that our method establishes new state-of-the-art performance, achieving an accuracy of 55.04\% on the ICDAR 2013 dataset ($m=1500$), significantly outperforming existing CLIP-based baselines in the challenging zero-shot setting. Furthermore, the framework exhibits exceptional data efficiency, demonstrating rapid adaptability with minimal support samples, achieving 92.41\% accuracy with only one support sample per class.

翻译：零样本手写汉字识别旨在通过基于偏旁的语义组合来识别未见过的汉字。然而，现有方法通常将汉字视为扁平的偏旁序列，忽略了其层次化拓扑结构以及不同部件间不均匀的信息密度。为应对这些局限性，我们提出了一种基于信息熵感知的结构对齐网络，通过信息论建模来弥合视觉与语义之间的鸿沟。首先，我们引入信息熵先验，通过乘法交互动态调制位置嵌入，其作为一种显著性检测器，能够优先关注具有区分性的字根而非普遍存在的部件。其次，我们构建了一种双视图偏旁树以提取多粒度结构特征，这些特征通过一个基于Sigmoid的自适应门控网络进行整合，从而同时编码全局布局与局部空间角色。最后，我们设计了一种Top-K语义特征融合机制，通过利用语义邻域的质心来增强解码过程，从而在特征层面通过共识有效校正视觉歧义。大量实验表明，我们的方法确立了新的最先进性能，在ICDAR 2013数据集（$m=1500$）上达到了55.04\%的准确率，在具有挑战性的零样本设定下显著优于现有的基于CLIP的基线方法。此外，该框架展现出卓越的数据效率，在极少量支持样本下表现出快速的适应能力，在每类仅有一个支持样本的情况下达到了92.41\%的准确率。