There exists an invisible barrier between healthcare professionals' perception of a patient's clinical experience and the reality. This barrier may be induced by the environment that hinders patients from sharing their experiences openly with healthcare professionals. As patients are observed to discuss and exchange knowledge more candidly on social media, valuable insights can be leveraged from these platforms. However, the abundance of non-patient posts on social media necessitates filtering out such irrelevant content to distinguish the genuine voices of patients, a task we refer to as patient voice classification. In this study, we analyse the importance of linguistic characteristics in accurately classifying patient voices. Our findings underscore the essential role of linguistic and statistical text similarity analysis in identifying common patterns among patient groups. These results allude to even starker differences in the way patients express themselves at a disease level and across various therapeutic domains. Additionally, we fine-tuned a pre-trained Language Model on the combined datasets with similar linguistic patterns, resulting in a highly accurate automatic patient voice classification. Being the pioneering study on the topic, our focus on extracting authentic patient experiences from social media stands as a crucial step towards advancing healthcare standards and fostering a patient-centric approach.
翻译:在医疗专业人员对患者临床体验的认知与实际状况之间,存在一道无形的屏障。这种屏障可能源于阻碍患者向医疗专业人员坦诚分享经历的环境因素。由于观察到患者在社交媒体上更坦率地讨论和交流知识,我们可以从这些平台中获取宝贵的见解。然而,社交媒体上存在大量非患者发帖,因此需要过滤掉此类无关内容以识别真正的患者声音,我们将此任务称为患者声音分类。本研究分析了语言特征在准确分类患者声音中的重要性。我们的研究结果强调了语言和统计文本相似性分析在识别患者群体间共同模式中的关键作用。这些结果表明,患者在疾病层面及不同治疗领域的表达方式存在更为显著的差异。此外,我们在具有相似语言模式的合并数据集上对预训练语言模型进行了微调,实现了高精度的自动化患者声音分类。作为该领域的开创性研究,我们专注于从社交媒体中提取真实患者体验,这为推动医疗标准进步和促进以患者为中心的方法迈出了关键一步。