Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resource-constrained mobile and assistive robots. While large language models (LLMs) have shown promise for natural communication, their size and latency limit on-device deployment. Small language models (SLMs) offer a potential alternative, but their effectiveness for role classification in HRI has not been systematically evaluated. In this paper, we present a benchmark of SLMs for leader-follower communication, introducing a novel dataset derived from a published database and augmented with synthetic samples to capture interaction-specific dynamics. We investigate two adaptation strategies: prompt engineering and fine-tuning, studied under zero-shot and one-shot interaction modes, compared with an untrained baseline. Experiments with Qwen2.5-0.5B reveal that zero-shot fine-tuning achieves robust classification performance (86.66% accuracy) while maintaining low latency (22.2 ms per sample), significantly outperforming baseline and prompt-engineered approaches. However, results also indicate a performance degradation in one-shot modes, where increased context length challenges the model's architectural capacity. These findings demonstrate that fine-tuned SLMs provide an effective solution for direct role assignment, while highlighting critical trade-offs between dialogue complexity and classification reliability on the edge.
翻译:领导者-跟随者交互是人机交互(HRI)中的重要范式。然而,对于资源受限的移动辅助机器人而言,实时角色分配仍具挑战性。尽管大规模语言模型(LLMs)在自然交流方面展现出潜力,但其模型规模与延迟问题限制了在设备端的部署。小规模语言模型(SLMs)提供了潜在的替代方案,但其在人机交互角色分类中的有效性尚未得到系统评估。本文提出了面向领导者-跟随者交互的SLMs基准测试,通过已发布数据库构建并辅以合成样本增强,创建了能捕捉交互动态特性的新型数据集。我们研究了两种适应策略:提示工程与微调,在零样本和单样本交互模式下进行对比分析,并与未经训练的基线模型进行比较。基于Qwen2.5-0.5B的实验表明,零样本微调在保持低延迟(每样本22.2毫秒)的同时实现了稳健的分类性能(准确率86.66%),显著优于基线方法与提示工程策略。然而,结果也显示单样本模式下存在性能下降现象,其中增加的上下文长度对模型架构容量提出了挑战。这些发现证明,经过微调的SLMs能为直接角色分配提供有效解决方案,同时揭示了边缘计算中对话复杂度与分类可靠性之间的关键权衡关系。