Open-vocabulary state tracking is a more practical version of state tracking that aims to track state changes of entities throughout a process without restricting the state space and entity space. OpenPI is to date the only dataset annotated for open-vocabulary state tracking. However, we identify issues with the dataset quality and evaluation metric. For the dataset, we categorize 3 types of problems on the procedure level, step level and state change level respectively, and build a clean dataset OpenPI-C using multiple rounds of human judgment. For the evaluation metric, we propose a cluster-based metric to fix the original metric's preference for repetition. Model-wise, we enhance the seq2seq generation baseline by reinstating two key properties for state tracking: temporal dependency and entity awareness. The state of the world after an action is inherently dependent on the previous state. We model this dependency through a dynamic memory bank and allow the model to attend to the memory slots during decoding. On the other hand, the state of the world is naturally a union of the states of involved entities. Since the entities are unknown in the open-vocabulary setting, we propose a two-stage model that refines the state change prediction conditioned on entities predicted from the first stage. Empirical results show the effectiveness of our proposed model especially on the cluster-based metric. The code and data are released at https://github.com/shirley-wu/openpi-c
翻译:开放词汇状态追踪是状态追踪的一种更具实用性的版本,旨在追踪过程中实体的状态变化,且不受状态空间和实体空间的限制。OpenPI是目前唯一为开放词汇状态追踪标注的数据集。然而,我们发现该数据集的质量和评估指标存在问题。对于数据集,我们分别在过程级、步骤级和状态变化级归纳了三类问题,并通过多轮人工判断构建了干净数据集OpenPI-C。对于评估指标,我们提出了一种基于聚类的指标,以修正原始指标对重复的偏向性。在模型方面,我们通过恢复状态追踪的两个关键特性——时间依赖性和实体感知性——增强了序列到序列生成基线。执行动作后的世界状态本质上依赖于前一个状态。我们通过动态记忆库建模这种依赖性,并允许模型在解码过程中关注记忆槽位。另一方面,世界状态自然是所涉及实体状态的并集。由于在开放词汇设置中实体未知,我们提出了一种两阶段模型,该模型基于第一阶段预测的实体来细化状态变化预测。实验结果表明了我们所提模型的有效性,尤其是在基于聚类的指标上。代码和数据已在https://github.com/shirley-wu/openpi-c发布。