This paper presents our system for the SMM4H-HeaRD 2025 shared tasks, specifically Task 4 (Subtasks 1, 2a, and 2b) and Task 5 (Subtasks 1 and 2). Task 4 focused on detecting mentions of insomnia in clinical notes, while Task 5 addressed the extraction of food safety events from news articles. We participated in all subtasks and report key findings across them, with particular emphasis on Task 5 Subtask 1, where our system achieved strong performance-securing first place with an F1 score of 0.958 on the test set. To attain this result, we employed encoder-based models (e.g., RoBERTa), alongside GPT-4 for data augmentation. This paper outlines our approach, including preprocessing, model architecture, and subtask-specific adaptations
翻译:本文介绍了我们为 SMM4H-HeaRD 2025 共享任务构建的系统,具体针对任务4(子任务1、2a和2b)和任务5(子任务1和2)。任务4侧重于检测临床记录中提及的失眠症状,而任务5则致力于从新闻文章中抽取食品安全事件。我们参与了所有子任务,并报告了其中的关键发现,尤其重点阐述了任务5子任务1,在该任务中我们的系统取得了优异的性能——在测试集上以0.958的F1分数获得了第一名。为达成此结果,我们采用了基于编码器的模型(例如 RoBERTa),并辅以 GPT-4 进行数据增强。本文概述了我们的方法,包括数据预处理、模型架构以及针对特定子任务的适配策略。