Conversational question answering systems often rely on semantic parsing to enable interactive information retrieval, which involves the generation of structured database queries from a natural language input. For information-seeking conversations about facts stored within a knowledge graph, dialogue utterances are transformed into graph queries in a process that is called knowledge-based conversational question answering. This paper evaluates the performance of large language models that have not been explicitly pre-trained on this task. Through a series of experiments on an extensive benchmark dataset, we compare models of varying sizes with different prompting techniques and identify common issue types in the generated output. Our results demonstrate that large language models are capable of generating graph queries from dialogues, with significant improvements achievable through few-shot prompting and fine-tuning techniques, especially for smaller models that exhibit lower zero-shot performance.
翻译:对话式问答系统常依赖语义解析实现交互式信息检索,即从自然语言输入中生成结构化数据库查询。当信息获取型对话涉及知识图谱中存储的事实时,对话语句将通过称为“基于知识的对话式问答”的过程转化为图查询。本文旨在评估未在此任务上进行显式预训练的大语言模型的性能。通过在广泛基准数据集上开展系列实验,我们比较了不同规模的模型及多种提示技术,并识别了生成输出中的常见问题类型。实验结果表明,大语言模型能够从对话中生成图查询,且通过少样本提示与微调技术可显著提升性能,尤其针对零样本性能较低的小规模模型效果更为突出。