While the emergence of powerful language models along with Chain-of-thought prompting has made automation more and more omnipresent, it sometimes demonstrates its weakness in long-term or multi-step logical reasoning. For example, users don't always get desirable answers for complex mathematical problems without human involvement. Against this background, we present the Manual Correction System (MCS) -- a human-in-the-loop system enhanced by Chain-of-Thought prompting, which explores how manual correction of sub-logics in rationales can improve LLM's reasoning performance. Moving one step forward, considering a system with human-in-the-loop involves more than having humans improve performance but also controlling the cost. Therefore, we post a Cost-utility Analysis Model for Human-in-the-Loop systems (CAMLOP) based on classical economics theory to analyze, quantify and balance the utility and the corresponding cost. We conduct experiments of MCS and CAMLOP with twelve datasets. A significant advantage w.r.t cost and utility proves its superiority over strong baselines.
翻译:尽管强大的语言模型结合链式思考提示法的出现使自动化日益普及,但在长期或多步逻辑推理中仍时常暴露其局限性。例如,在无人类参与的情况下,用户对复杂数学问题并不总能获得理想答案。针对这一背景,我们提出了人工修正系统——一种由链式思考提示增强的人类介入系统,探究对推理过程中子逻辑进行人工修正如何提升大语言模型的推理性能。更进一步,考虑到人类介入系统不仅涉及提升性能,还需控制成本,我们基于经典经济学理论提出了人类介入系统的成本效用分析模型,用于分析、量化并平衡效用与相应成本。我们使用十二个数据集对MCS和CAMLOP进行了实验,结果表明其在成本与效用方面相较于强基线模型具有显著优势。