Co-speech gesture generation on artificial agents has gained attention recently, mainly when it is based on data-driven models. However, end-to-end methods often fail to generate co-speech gestures related to semantics with specific forms, i.e., Symbolic and Deictic gestures. In this work, we identify which words in a sentence are contextually related to Symbolic and Deictic gestures. Firstly, we appropriately chose 12 gestures recognized by people from the Italian culture, which different humanoid robots can reproduce. Then, we implemented two rule-based algorithms to label sentences with Symbolic and Deictic gestures. The rules depend on the semantic similarity scores computed with the RoBerta model between sentences that heuristically represent gestures and sub-sentences inside an objective sentence that artificial agents have to pronounce. We also implemented a baseline algorithm that assigns gestures without computing similarity scores. Finally, to validate the results, we asked 30 persons to label a set of sentences with Deictic and Symbolic gestures through a Graphical User Interface (GUI), and we compared the labels with the ones produced by our algorithms. For this scope, we computed Average Precision (AP) and Intersection Over Union (IOU) scores, and we evaluated the Average Computational Time (ACT). Our results show that semantic similarity scores are useful for finding Symbolic and Deictic gestures in utterances.
翻译:近年来,人工智能体伴随语音的手势生成研究日益受到关注,尤其是在基于数据驱动模型的情况下。然而,端到端方法往往难以生成与特定形式语义相关的手势,即符号性手势与指示性手势。本研究旨在识别句子中哪些词语在语境上与符号性及指示性手势相关联。首先,我们精心选取了12种在意大利文化中被广泛识别、且可由不同人形机器人复现的手势。随后,我们实现了两种基于规则的算法,用于对包含符号性及指示性手势的句子进行标注。这些规则依赖于通过RoBerta模型计算得到的语义相似度分数,该分数通过启发式表征手势的句子与目标句子(即人工智能体需表达的句子)内部子句之间的比较得出。我们还实现了一种不计算相似度分数的基线算法作为对照。最后,为验证结果,我们邀请30名参与者通过图形用户界面(GUI)对一组句子进行指示性及符号性手势标注,并将标注结果与我们算法生成的结果进行对比。为此,我们计算了平均精度(AP)与交并比(IOU)分数,并评估了平均计算时间(ACT)。研究结果表明,语义相似度分数对于在话语中识别符号性及指示性手势具有实用价值。