When using language models (LMs) to solve complex problems, humans might struggle to understand the LM-generated solutions and repair the flawed ones. To assist humans in repairing them, we propose to automatically decompose complex solutions into multiple simpler pieces that correspond to specific subtasks. We introduce a novel objective for learning task decomposition, termed assistive value (AssistV), which measures the feasibility and speed for humans to repair the decomposed solution. We collect a dataset of human repair experiences on different decomposed solutions. Utilizing the collected data as in-context examples, we then learn to critique, refine, and rank decomposed solutions to improve AssistV. We validate our method under competitive programming problems: under 177 hours of human study, our method enables non-experts to solve 33.3\% more problems, speeds them up by 3.3x, and empowers them to match unassisted experts.
翻译:当使用语言模型(LM)解决复杂问题时,人类可能难以理解LM生成的解决方案并修复其中的缺陷。为了辅助人类进行修复,我们提出将复杂解决方案自动分解为多个更简单的片段,这些片段对应于特定的子任务。我们引入了一种新颖的学习任务分解目标,称为辅助价值(AssistV),用于衡量人类修复分解后解决方案的可行性和速度。我们收集了人类在不同分解方案上的修复经验数据集。利用收集到的数据作为上下文示例,我们随后学习如何评估、优化和排序分解方案以提高AssistV。我们在竞技编程问题下验证了我们的方法:在177小时的人体研究中,我们的方法使非专家能够多解决33.3%的问题,将他们的解决速度提升3.3倍,并使他们能够与无辅助的专家相媲美。