Linear programming (LP) problems are pervasive in real-life applications. However, despite their apparent simplicity, an untrained user may find it difficult to determine the linear model of their specific problem. We envisage the creation of a goal-oriented conversational agent that will engage in conversation with the user to elicit all information required so that a subsequent agent can generate the linear model. In this paper, we present an approach for the generation of sample dialogues that can be used to develop and train such a conversational agent. Using prompt engineering, we develop two agents that "talk" to each other, one acting as the conversational agent, and the other acting as the user. Using a set of text descriptions of linear problems from NL4Opt available to the user only, the agent and the user engage in conversation until the agent has retrieved all key information from the original problem description. We also propose an extrinsic evaluation of the dialogues by assessing how well the summaries generated by the dialogues match the original problem descriptions. We conduct human and automatic evaluations, including an evaluation approach that uses GPT-4 to mimic the human evaluation metrics. The evaluation results show an overall good quality of the dialogues, though research is still needed to improve the quality of the GPT-4 evaluation metrics. The resulting dialogues, including the human annotations of a subset, are available to the research community. The conversational agent used for the generation of the dialogues can be used as a baseline.
翻译:线性规划(LP)问题在现实应用中无处不在。然而,尽管其形式看似简单,未经训练的用户可能难以确定其具体问题的线性模型。我们设想构建一个面向目标的对话智能体,通过与用户交互来获取所有必要信息,以便后续智能体能够生成线性模型。本文提出了一种生成示例对话的方法,这些对话可用于开发和训练此类对话智能体。通过提示工程,我们构建了两个相互"对话"的智能体,一个扮演对话智能体,另一个扮演用户。利用NL4Opt数据集中仅向用户提供的线性问题文本描述,智能体与用户进行对话,直至智能体从原始问题描述中获取所有关键信息。我们还提出了一种对话外在评估方法,通过评估对话生成的摘要与原始问题描述的匹配程度来衡量效果。我们进行了人工评估和自动评估,其中一种评估方法采用GPT-4来模拟人工评估指标。评估结果表明对话整体质量良好,但仍需进一步研究以提升GPT-4评估指标的质量。生成的对话数据集(包括其中部分子集的人工标注)已向研究社区开放。用于生成对话的对话智能体可作为基线模型使用。