The state of the art in human computer conversation leaves something to be desired and, indeed, talking to a computer can be down-right annoying. This paper describes an approach to identifying ``opportunities for improvement'' in these systems by looking for abuse in the form of swear words. The premise is that humans swear at computers as a sanction and, as such, swear words represent a point of failure where the system did not behave as it should. Having identified where things went wrong, we can work backward through the transcripts and, using conversation analysis (CA) work out how things went wrong. Conversation analysis is a qualitative methodology and can appear quite alien - indeed unscientific - to those of us from a quantitative background. The paper starts with a description of Conversation analysis in its modern form, and then goes on to apply the methodology to transcripts of frustrated and annoyed users in the DARPA Communicator project. The conclusion is that there is at least one species of failure caused by the inability of the Communicator systems to handle mixed initiative at the discourse structure level. Along the way, I hope to demonstrate that there is an alternative future for computational linguistics that does not rely on larger and larger text corpora.
翻译:人机对话技术的现状仍不尽如人意,事实上,与计算机交谈有时会令人相当恼火。本文描述了一种通过识别脏话形式的辱骂行为来发现这些系统中“改进机会”的方法。其前提是人类对计算机说脏话是一种制裁行为,因此,脏话代表了系统未能按预期运行时的故障点。在确定问题所在后,我们可以反向追溯对话记录,并运用对话分析来理清故障发生的原因。对话分析是一种定性研究方法,对于具有定量背景的研究者而言,可能显得非常陌生,甚至不科学。本文首先介绍了现代形式的对话分析,然后将其应用于DARPA通信器项目中沮丧和恼怒用户的对话记录。结论是,至少存在一种故障类型,是由通信器系统无法在话语结构层面处理混合主动性所导致的。在此过程中,我希望证明计算语言学存在一种不依赖越来越庞大的文本语料库的替代发展路径。