An idealized, though simplistic, view of the referring expression production and grounding process in (situated) dialogue assumes that a speaker must merely appropriately specify their expression so that the target referent may be successfully identified by the addressee. However, referring in conversation is a collaborative process that cannot be aptly characterized as an exchange of minimally-specified referring expressions. Concerns have been raised regarding assumptions made by prior work on visually-grounded dialogue that reveal an oversimplified view of conversation and the referential process. We address these concerns by introducing a collaborative image ranking task, a grounded agreement game we call "A Game Of Sorts". In our game, players are tasked with reaching agreement on how to rank a set of images given some sorting criterion through a largely unrestricted, role-symmetric dialogue. By putting emphasis on the argumentation in this mixed-initiative interaction, we collect discussions that involve the collaborative referential process. We describe results of a small-scale data collection experiment with the proposed task. All discussed materials, which includes the collected data, the codebase, and a containerized version of the application, are publicly available.
翻译:一种理想化但简化的(情境化)对话中的指称表达生成与基础构建过程观点认为,说话者只需恰当指定其表达,使目标指称物能被听话者成功识别即可。然而,对话中的指称是一个协作过程,无法恰当地描述为最低限度指定指称表达的交换。以往关于视觉基础对话的研究所依赖的假设揭示了对话与指称过程的过度简化观点,这已引发关注。我们通过引入一项协作图像排序任务(一种我们称为“分类游戏”的基于共识的博弈)来解决这些担忧。在该游戏中,玩家需根据某种排序标准,通过基本未加限制且角色对称的对话,就一组图像的排序达成共识。通过强调这种混合主动型交互中的论证过程,我们收集了涉及协作指称过程的讨论。我们报告了基于所提任务的小规模数据收集实验结果。所有讨论材料(包括收集的数据、代码库及应用程序的容器化版本)均已公开提供。