Sign language production from symbolic notation offers a scalable route to accessible sign animation. We present KANMultiSign, a multi-scale sequence generator that translates HamNoSys notation into two-dimensional human pose sequences. Our framework makes two complementary contributions. First, we introduce a coarse-to-fine generation strategy with multi-scale supervision: the model is first guided by an intermediate body--hand--face scaffold to encourage global structural coherence, and then refines fine-grained hand articulation to improve finger-level detail. Second, we investigate integrating Kolmogorov--Arnold Network modules into a Transformer backbone, using learnable univariate function primitives to model the highly non-linear mapping from discrete phonological symbols to continuous body kinematics with a compact parameterization. Experiments on multiple public corpora spanning Polish, German, Greek, and French sign languages show consistent reductions in dynamic time warping based joint error compared with a strong notation-to-pose baseline, while using substantially fewer parameters. Controlled ablations further indicate that KAN-based variants substantially reduce parameter count while maintaining competitive performance when coupled with multi-scale supervision, rather than serving as the main driver of accuracy gains. These findings position multi-scale supervision as the key mechanism for improving notation-conditioned pose generation, with KAN offering a compact alternative for efficient modeling. Our code will be publicly available.
翻译:从符号标注生成手语为无障碍手语动画提供了一条可扩展的路径。我们提出KANMultiSign,一种多尺度序列生成器,能将HamNoSys符号转换为二维人体姿态序列。本框架包含两项互补贡献:首先,我们引入了一种由粗到精的生成策略与多尺度监督——模型先通过中间体-手-脸骨架引导以确保全局结构连贯性,再细化手部精细关节运动以提升手指级细节;其次,我们研究了将Kolmogorov-Arnold网络模块集成到Transformer主干中,通过可学习单变量函数基元,以紧凑参数化方式对从离散音系符号到连续人体运动学的强非线性映射进行建模。在覆盖波兰语、德语、希腊语和法语手语的多个公开语料库上的实验表明,与强符号-姿态基线相比,本方法在动态时间规整关节误差上实现了一致性降低,同时使用了显著更少的参数。控制消融实验进一步表明,基于KAN的变体在多尺度监督配合下能大幅减少参数量并保持竞争性性能,而非作为精度提升的主要驱动因素。这些发现将多尺度监督定位为改进符号条件姿态生成的关键机制,而KAN则为高效建模提供了紧凑替代方案。我们的代码将公开发布。