Despite the remarkable progress in natural language understanding with pretrained Transformers, neural language models often do not handle commonsense knowledge well. Toward commonsense-aware models, there have been attempts to obtain knowledge, ranging from automatic acquisition to crowdsourcing. However, it is difficult to obtain a high-quality knowledge base at a low cost, especially from scratch. In this paper, we propose PHALM, a method of building a knowledge graph from scratch, by prompting both crowdworkers and a large language model (LLM). We used this method to build a Japanese event knowledge graph and trained Japanese commonsense generation models. Experimental results revealed the acceptability of the built graph and inferences generated by the trained models. We also report the difference in prompting humans and an LLM. Our code, data, and models are available at github.com/nlp-waseda/comet-atomic-ja.
翻译:尽管预训练Transformer在自然语言理解方面取得了显著进展,但神经语言模型通常无法很好地处理常识知识。为了构建具备常识感知能力的模型,学界已尝试通过自动获取和众包等方式获取知识。然而,以低成本获取高质量知识库仍面临困难,尤其当从零开始构建时。本文提出PHALM方法——通过同时提示众包工作者和大型语言模型(LLM)从零构建知识图谱。我们运用该方法构建了日语事件知识图谱,并训练了日语常识生成模型。实验结果表明,所构建图谱及训练模型生成的推理具有可接受性。我们还报告了人类提示与LLM提示之间的差异。我们的代码、数据和模型已开源至github.com/nlp-waseda/comet-atomic-ja。