Large Language Models still struggle in challenging scenarios that leverage structured data, complex reasoning, or tool usage. In this paper, we propose Source2Synth: a new method that can be used for teaching LLMs new skills without relying on costly human annotations. Source2Synth takes as input a custom data source and produces synthetic data points with intermediate reasoning steps grounded in real-world sources. Source2Synth improves the dataset quality by discarding low-quality generations based on their answerability. We demonstrate the generality of this approach by applying it to two challenging domains: we test reasoning abilities in multi-hop question answering (MHQA), and tool usage in tabular question answering (TQA). Our method improves performance by 25.51% for TQA on WikiSQL and 22.57% for MHQA on HotPotQA compared to the fine-tuned baselines.
翻译:大型语言模型在利用结构化数据、复杂推理或工具使用等具有挑战性的场景中仍然存在困难。本文提出Source2Synth:一种无需依赖昂贵人工标注即可用于教授LLMs新技能的新方法。Source2Synth以自定义数据源作为输入,生成基于真实世界来源、包含中间推理步骤的合成数据点。该方法通过根据可回答性丢弃低质量生成内容来提升数据集质量。我们通过将其应用于两个具有挑战性的领域来证明该方法的通用性:在多跳问答中测试推理能力,以及在表格问答中测试工具使用能力。与微调基线相比,我们的方法在WikiSQL上的TQA任务中性能提升了25.51%,在HotPotQA上的MHQA任务中性能提升了22.57%。