The significance of mental health classification is paramount in contemporary society, where digital platforms serve as crucial sources for monitoring individuals' well-being. However, existing social media mental health datasets primarily consist of text-only samples, potentially limiting the efficacy of models trained on such data. Recognising that humans utilise cross-modal information to comprehend complex situations or issues, we present a novel approach to address the limitations of current methodologies. In this work, we introduce a Multimodal and Multi-Teacher Knowledge Distillation model for Mental Health Classification, leveraging insights from cross-modal human understanding. Unlike conventional approaches that often rely on simple concatenation to integrate diverse features, our model addresses the challenge of appropriately representing inputs of varying natures (e.g., texts and sounds). To mitigate the computational complexity associated with integrating all features into a single model, we employ a multimodal and multi-teacher architecture. By distributing the learning process across multiple teachers, each specialising in a particular feature extraction aspect, we enhance the overall mental health classification performance. Through experimental validation, we demonstrate the efficacy of our model in achieving improved performance. All relevant codes will be made available upon publication.
翻译:心理健康分类在当代社会具有至关重要的意义,数字平台已成为监测个体健康状况的关键信息来源。然而,现有的社交媒体心理健康数据集主要由纯文本样本构成,这可能限制基于此类数据训练的模型效能。认识到人类利用跨模态信息来理解复杂情境或问题,我们提出了一种新方法以解决现有技术的局限性。本研究引入了一种基于跨模态人类认知理解的多模态多教师知识蒸馏模型用于心理健康分类。与传统方法通常依赖简单拼接来整合不同特征不同,我们的模型解决了如何恰当表征不同性质输入(如文本与音频)的挑战。为降低将所有特征整合到单一模型带来的计算复杂度,我们采用了多模态多教师架构。通过将学习过程分配给多个专门负责特定特征提取维度的教师模型,我们提升了整体心理健康分类性能。实验验证表明,该模型在提升分类效能方面具有显著优势。所有相关代码将在论文发表后开源。