In an era of rampant misinformation, generating reliable news explanations is vital, especially for under-represented languages like Hindi. Lacking robust automated tools, Hindi faces challenges in scaling misinformation detection. To bridge this gap, we propose a novel framework integrating Direct Preference Optimization (DPO) with curriculum learning to align machine-generated explanations with human reasoning. Fact-checked explanations from credible sources serve as preferred responses, while LLM outputs highlight system limitations and serve as non-preferred responses. To refine task-specific alignment, we introduce two key parameters -- Actuality and Finesse -- into the DPO loss function, enhancing explanation quality and consistency. Experiments with LLMs (Mistral, Llama, Gemma) and PLMs (mBART, mT5) confirm the framework's effectiveness in generating coherent, contextually relevant explanations. This scalable approach combats misinformation and extends automated explanation generation to low-resource languages.
翻译:在错误信息泛滥的时代,生成可靠的新闻解释至关重要,对于印地语等代表性不足的语言尤其如此。由于缺乏强大的自动化工具,印地语在规模化错误信息检测方面面临挑战。为弥补这一差距,我们提出了一种将直接偏好优化(DPO)与课程学习相结合的新颖框架,以使机器生成的解释与人类推理保持一致。来自可信来源的事实核查解释作为优选响应,而大语言模型(LLM)的输出则突显系统局限性并作为非优选响应。为细化任务特定的对齐,我们在DPO损失函数中引入了两个关键参数——事实性(Actuality)与精细度(Finesse),从而提升了解释的质量与一致性。使用LLM(Mistral、Llama、Gemma)和PLM(mBART、mT5)进行的实验证实了该框架在生成连贯、上下文相关解释方面的有效性。这种可扩展的方法不仅有助于对抗错误信息,还将自动化解释生成扩展到了低资源语言。