Although explainability and interpretability have received significant attention in artificial intelligence (AI) and natural language processing (NLP) for mental health, reasoning has not been examined in the same depth. Addressing this gap is essential to bridge NLP and mental health through interpretable and reasoning-capable AI systems. To this end, we investigate the pragmatic reasoning capability of large-language models (LLMs) in the mental health domain. We introduce PRiMH dataset, and propose pragmatic reasoning tasks in mental health with pragmatic implicature and presupposition phenomena. In particular, we formulate two tasks in implicature and one task in presupposition. To benchmark the dataset and the tasks presented, we consider four models: Llama3.1, Mistral, MentaLLaMa, and Qwen. The results of the experiments suggest that Mistral and Qwen show substantial reasoning abilities in the domain. Subsequently, we study the behavior of MentaLLaMA on the proposed reasoning tasks with the rollout attention mechanism. In addition, we also propose three StiPRompts to study the stigma around mental health with the state-of-the-art LLMs, GPT4o-mini, Deepseek-chat, and Claude-3.5-haiku. Our evaluated findings show that Claude-3.5-haiku deals with stigma more responsibly compared to the other two LLMs.
翻译:尽管在人工智能(AI)和自然语言处理(NLP)应用于心理健康领域时,可解释性和可理解性已受到广泛关注,但推理能力尚未得到同等深度的研究。填补这一空白对于通过具备可解释性和推理能力的人工智能系统来连接NLP与心理健康领域至关重要。为此,我们研究了大型语言模型(LLMs)在心理健康领域的语用推理能力。我们引入了PRiMH数据集,并提出了基于语用含义和预设现象的语用推理任务。具体而言,我们构建了两个语用含义任务和一个预设任务。为了对提出的数据集和任务进行基准测试,我们考虑了四种模型:Llama3.1、Mistral、MentaLLaMa和Qwen。实验结果表明,Mistral和Qwen在该领域展现出显著的推理能力。随后,我们通过展开注意力机制研究了MentaLLaMA在所提出推理任务上的行为。此外,我们还提出了三种StiPRompts,以利用最先进的LLMs(GPT4o-mini、Deepseek-chat和Claude-3.5-haiku)来探究围绕心理健康的社会污名。我们的评估结果显示,与其他两种LLMs相比,Claude-3.5-haiku在处理污名问题时表现出更强的责任感。