In the last decade, the United States has lost more than 500,000 people from an overdose involving prescription and illicit opioids (https://www.cdc.gov/drugoverdose/epidemic/index.html) making it a national public health emergency (USDHHS, 2017). To more effectively prevent unintentional opioid overdoses, medical practitioners require robust and timely tools that can effectively identify at-risk patients. Community-based social media platforms such as Reddit allow self-disclosure for users to discuss otherwise sensitive drug-related behaviors, often acting as indicators for opioid use disorder. Towards this, we present a moderate size corpus of 2500 opioid-related posts from various subreddits spanning 6 different phases of opioid use: Medical Use, Misuse, Addiction, Recovery, Relapse, Not Using. For every post, we annotate span-level extractive explanations and crucially study their role both in annotation quality and model development. We evaluate several state-of-the-art models in a supervised, few-shot, or zero-shot setting. Experimental results and error analysis show that identifying the phases of opioid use disorder is highly contextual and challenging. However, we find that using explanations during modeling leads to a significant boost in classification accuracy demonstrating their beneficial role in a high-stakes domain such as studying the opioid use disorder continuum. The dataset will be made available for research on Github in the formal version.
翻译:过去十年间,美国因处方阿片类药物及非法阿片类药物滥用导致的过量死亡人数已超过50万(https://www.cdc.gov/drugoverdose/epidemic/index.html),这构成了国家公共卫生紧急状态(USDHHS, 2017)。为更有效预防非故意性阿片类药物过量,医疗从业者需要能够识别高风险患者的稳健且及时的检测工具。基于社区的社交媒体平台(如Reddit)允许用户自我披露通常敏感的涉药行为,这些行为往往成为阿片类药物使用障碍的指标。为此,我们构建了一个中等规模语料库,包含来自多个子论坛的2500篇阿片类药物相关帖子,涵盖药物使用的六个阶段:医疗使用、滥用、成瘾、康复、复发、未使用。针对每篇帖子,我们标注了跨度级可提取性解释,并重点研究其在标注质量与模型开发中的作用。我们在监督学习、少样本学习及零样本学习场景下评估了多种最先进模型。实验结果与错误分析表明,识别阿片类药物使用障碍的阶段高度依赖上下文且具有挑战性。然而,我们发现建模过程中引入解释可显著提升分类准确率,证明了其在阿片类药物使用障碍连续谱研究等高敏感性领域中的有益作用。该数据集将在正式版本中通过Github开放供研究使用。