For complex logical data augmentation, heavy reliance on human annotation is costly, whereas direct generation with large language models yields uninterpretable and logically homogeneous examples. To address this, we present LFC-DA, a symbolic-logic-controlled pipeline: logical text is first mapped to propositional expressions, a compact rule library is compiled, and a bounded state-space search systematically discovers valid formulas that are then verbalized back into natural-language questions, ensuring both diversity and logical rigor under propositional logic. Experiments on ReClor and LogiQA show significant improvements in the logical-reasoning accuracy of pretrained models, confirming the effectiveness of LFC-DA for LLM-guided logical data augmentation.
翻译:针对复杂逻辑数据增强任务,过度依赖人工标注成本高昂,而直接使用大语言模型生成则会产生不可解释且逻辑同质化的样本。为解决这一问题,我们提出了LFC-DA——一种符号逻辑控制的流程框架:首先将逻辑文本映射为命题表达式,编译紧凑的规则库,通过有界状态空间搜索系统性地发现有效公式,随后将这些公式重新表述为自然语言问题。该方法在命题逻辑框架下确保了数据多样性与逻辑严谨性。在ReClor和LogiQA数据集上的实验表明,预训练模型的逻辑推理准确率得到显著提升,证实了LFC-DA在大语言模型引导的逻辑数据增强中的有效性。