Individuals with complex communication needs (CCN) often rely on augmentative and alternative communication (AAC) systems to have conversations and communique their wants. Such systems allow message authoring by arranging pictograms in sequence. However, the difficulty of finding the desired item to complete a sentence can increase as the user's vocabulary increases. This paper proposes using BERTimbau, a Brazilian Portuguese version of BERT, for pictogram prediction in AAC systems. To finetune BERTimbau, we constructed an AAC corpus for Brazilian Portuguese to use as a training corpus. We tested different approaches to representing a pictogram for prediction: as a word (using pictogram captions), as a concept (using a dictionary definition), and as a set of synonyms (using related terms). We also evaluated the usage of images for pictogram prediction. The results demonstrate that using embeddings computed from the pictograms' caption, synonyms, or definitions have a similar performance. Using synonyms leads to lower perplexity, but using captions leads to the highest accuracies. This paper provides insight into how to represent a pictogram for prediction using a BERT-like model and the potential of using images for pictogram prediction.
翻译:复杂沟通需求(CCN)个体常依赖辅助与替代沟通(AAC)系统进行对话并表达需求。此类系统通过按序排列象形图实现信息创编,但随着用户词汇量增长,查找所需项目以完成句子的难度可能随之增加。本文提出采用BERTimbau(巴西葡萄牙语版BERT)进行AAC系统中的象形图预测。为微调BERTimbau,我们构建了巴西葡萄牙语AAC语料库作为训练语料。测试了三种象形图表示预测方法:词级表示(使用象形图标题)、概念级表示(使用词典定义)及同义词集表示(使用关联术语),并评估了图像用于象形图预测的效果。结果表明,基于象形图标题、同义词或定义计算的嵌入向量性能相近。使用同义词可降低困惑度,而使用标题可获得最高准确率。本文揭示了如何利用类BERT模型进行象形图预测表示,并探索了图像用于象形图预测的潜力。