Co-speech gesture generation on artificial agents has gained attention recently, mainly when it is based on data-driven models. However, end-to-end methods often fail to generate co-speech gestures related to semantics with specific forms, i.e., Symbolic and Deictic gestures. In this work, we identify which words in a sentence are contextually related to Symbolic and Deictic gestures. Firstly, we appropriately chose 12 gestures recognized by people from the Italian culture, which different humanoid robots can reproduce. Then, we implemented two rule-based algorithms to label sentences with Symbolic and Deictic gestures. The rules depend on the semantic similarity scores computed with the RoBerta model between sentences that heuristically represent gestures and sub-sentences inside an objective sentence that artificial agents have to pronounce. We also implemented a baseline algorithm that assigns gestures without computing similarity scores. Finally, to validate the results, we asked 30 persons to label a set of sentences with Deictic and Symbolic gestures through a Graphical User Interface (GUI), and we compared the labels with the ones produced by our algorithms. For this scope, we computed Average Precision (AP) and Intersection Over Union (IOU) scores, and we evaluated the Average Computational Time (ACT). Our results show that semantic similarity scores are useful for finding Symbolic and Deictic gestures in utterances.
翻译:近年来,基于数据驱动模型的人工智能体伴随语音手势生成研究受到广泛关注。然而,端到端方法往往难以生成具有特定形式且与语义相关的伴随手势,即符号性手势与指示性手势。本研究旨在识别句子中哪些词语在语境上与符号性及指示性手势相关联。首先,我们精心选取了意大利文化中人们可识别的12种手势,这些手势可由不同人形机器人复现。随后,我们实现了两种基于规则的手势标注算法,用于对句子进行符号性与指示性手势标注。算法规则依赖于RoBerta模型计算的语义相似度得分,该得分通过启发式表征手势的语句与目标语句(即智能体需播报的语句)内部子句之间的比对获得。我们还实现了一种无需计算相似度得分的基础手势分配算法作为基线。最后,为验证结果有效性,我们邀请30位参与者通过图形用户界面对一组语句进行指示性与符号性手势标注,并将人工标注结果与算法输出进行对比。为此,我们计算了平均精度与交并比得分,并评估了平均计算时间。实验结果表明,语义相似度得分对于识别话语中的符号性与指示性手势具有显著作用。