Automatic identification of clinical trials for which a patient is eligible is complicated by the fact that trial eligibility is stated in natural language. A potential solution to this problem is to employ text classification methods for common types of eligibility criteria. In this study, we focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness. Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level. We experiment with common transformer models as well as a new pre-trained clinical trial BERT model. Our results demonstrate the feasibility of automatically classifying common exclusion criteria. Additionally, we demonstrate the value of a pre-trained language model specifically for clinical trials, which yields the highest average performance across all criteria.
翻译:患者是否符合某项临床试验资格的自动识别因试验资格以自然语言表述而变得复杂。解决该问题的潜在方法是对常见类型的纳入标准采用文本分类技术。本研究聚焦于癌症试验中七类常见排除标准:既往恶性肿瘤、人类免疫缺陷病毒感染、乙型肝炎、丙型肝炎、精神疾病、药物/物质滥用及自身免疫性疾病。我们的数据集包含764项III期癌症试验,并在试验层面标注了这些排除项。我们实验了通用Transformer模型以及一种新的预训练临床试验BERT模型。结果表明,自动分类常见排除标准具有可行性。此外,我们展示了专门针对临床试验的预训练语言模型的价值,该模型在所有标准中取得了最高的平均性能。