Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute to achieving the task, with edges specifying temporal dependencies between them. SCTD is an important component of assistive planning tools, and a challenge for commonsense reasoning systems. We probe how accurately SCTD can be done with the knowledge extracted from Large Language Models (LLMs). We introduce a high-quality human-annotated dataset for this problem and novel metrics to fairly assess performance of LLMs against several baselines. Our experiments reveal that LLMs are able to decompose complex tasks into individual steps effectively, with a relative improvement of 15% to 280% over the best baseline. We also propose a number of approaches to further improve their performance, with a relative improvement of 7% to 37% over the base model. However, we find that LLMs still struggle to predict pairwise temporal dependencies, which reveals a gap in their understanding of complex tasks.
翻译:结构化复杂任务分解(SCTD)是指将复杂现实任务(如策划婚礼)分解为有向无环图,图中节点为完成该任务所需的具体步骤,边表示步骤间的时序依赖关系。SCTD是辅助规划工具的重要组成部分,也是常识推理系统面临的挑战。我们探究如何利用从大型语言模型(LLMs)中提取的知识准确完成SCTD。为此,我们构建了一个高质量人工标注数据集,并设计新型评估指标,以公平衡量LLMs与多个基线模型的性能。实验表明,LLMs能有效将复杂任务分解为独立步骤,相较于最佳基线模型获得15%至280%的相对性能提升。我们还提出多种方法进一步提升其性能,使基础模型获得7%至37%的相对提升。然而,我们发现LLMs在预测成对步骤间的时序依赖关系时仍存在困难,这揭示了其对复杂任务理解的不足之处。