Automatic identification of clinical trials for which a patient is eligible is complicated by the fact that trial eligibility is stated in natural language. A potential solution to this problem is to employ text classification methods for common types of eligibility criteria. In this study, we focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness. Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level. We experiment with common transformer models as well as a new pre-trained clinical trial BERT model. Our results demonstrate the feasibility of automatically classifying common exclusion criteria. Additionally, we demonstrate the value of a pre-trained language model specifically for clinical trials, which yields the highest average performance across all criteria.
翻译:患者是否符合某项临床试验资格需要自动识别,但这一过程因试验入排标准以自然语言表述而变得复杂。解决该问题的潜在方案是采用文本分类方法处理常见类型的入排标准。本研究聚焦于癌症试验中七类常见的排除标准:既往恶性肿瘤、人类免疫缺陷病毒、乙型肝炎、丙型肝炎、精神疾病、药物/物质滥用及自身免疫性疾病。我们的数据集包含764项III期癌症试验,并在试验层面标注了上述排除标准。我们实验了常见的Transformer模型以及一种新的预训练临床试验BERT模型。结果表明,自动分类常见排除标准具有可行性。此外,我们证明了专门为临床试验预训练的语言模型的价值,该模型在所有标准上均取得了最高平均性能。