Background: Large language models (LLMs) are trained to follow directions, but this introduces a vulnerability to blindly comply with user requests even if they generate wrong information. In medicine, this could accelerate the generation of misinformation that impacts human well-being. Objectives/Methods: We analyzed compliance to requests to generate misleading content about medications in settings where models know the request is illogical. We investigated whether in-context directions and instruction-tuning of LLMs to prioritize logical reasoning over compliance reduced misinformation risk. Results: While all frontier LLMs complied with misinformation requests, both prompt-based and parameter-based approaches can improve the detection of logic flaws in requests and prevent the dissemination of medical misinformation. Conclusion: Shifting LLMs to prioritize logic over compliance could reduce risks of exploitation for medical misinformation.
翻译:背景:大型语言模型(LLMs)被训练以遵循指令,但这使其存在盲目遵从用户请求的脆弱性,即使这些请求会生成错误信息。在医学领域,这可能加速影响人类福祉的误导信息生成。目标/方法:我们分析了在模型已知请求不合逻辑的情境下,其生成药物误导内容的顺从程度。我们探究了上下文指令以及通过指令微调使LLMs优先考虑逻辑推理而非顺从性,是否能降低误导信息风险。结果:尽管所有前沿LLMs均顺从了误导信息请求,但基于提示和基于参数的方法均能提升对请求中逻辑缺陷的检测能力,从而防止医学误导信息的传播。结论:推动LLMs优先考虑逻辑而非顺从性,可降低其被利用传播医学误导信息的风险。