Large Language Models (LLMs) have shown remarkable capabilities for complex tasks, yet adaptation in medical domain, specifically mental health, poses specific challenges. Mental health is a rising concern globally with LLMs having large potential to help address the same. We highlight three primary challenges for LLMs in mental health - lack of high quality interpretable and knowledge grounded training data; training paradigms restricted to core capabilities, and evaluation of multi turn dialogue settings. Addressing it, we present oMind framework which includes training and aligning LLM agents for diverse capabilities including conversations; high quality ~164k multi-task SFT dataset, as a result of our generation pipeline based on Structured Knowledge retrieval, LLM based pruning, and review actions. We also introduce oMind-Chat - a novel multi turn benchmark dataset with expert annotated turn level and conversation level rubrics. Our diverse experiments on both core capabilities and conversations shows oMind LLMs consistently outperform baselines. oMind-LLM also shows significantly better reasoning with up to 80% win rate.
翻译:摘要:大语言模型在复杂任务中展现出卓越能力,但在医疗领域(尤其是心理健康)的适配仍面临特殊挑战。心理健康已成为全球日益关注的议题,而大语言模型在应对该问题上具有巨大潜力。我们指出大语言模型在心理健康领域的三大核心挑战:缺乏高质量、可解释且基于知识的训练数据;训练范式局限于核心能力;多轮对话场景评估不足。为应对这些问题,我们提出oMind框架,该框架涵盖对话等多种能力的大语言模型智能体训练与对齐方法,并基于结构化知识检索、大语言模型剪枝与审查流程,构建了约16.4万条高质量多任务SFT数据集。我们还引入oMind-Chat——一个包含专家标注的轮次级与会话级评估指标的新型多轮对话基准数据集。针对核心能力与对话场景的多项实验表明,oMind大语言模型持续优于基线模型,其中oMind-LLM展现出显著更强的推理能力,胜率高达80%。