This paper explores methods for extracting information from radiology reports that generalize across exam modalities to reduce requirements for annotated data. We demonstrate that multi-pass T5-based text-to-text generative models exhibit better generalization across exam modalities compared to approaches that employ BERT-based task-specific classification layers. We then develop methods that reduce the inference cost of the model, making large-scale corpus processing more feasible for clinical applications. Specifically, we introduce a generative technique that decomposes complex tasks into smaller subtask blocks, which improves a single-pass model when combined with multitask training. In addition, we leverage target-domain contexts during inference to enhance domain adaptation, enabling use of smaller models. Analyses offer insights into the benefits of different cost reduction strategies.
翻译:本文探索了从放射学报告中提取信息的方法,这些方法能跨检查模态泛化,以减少对标注数据的需求。我们证明,基于多轮T5的文本到文本生成模型在跨检查模态泛化方面优于采用基于BERT的任务特定分类层的方法。随后,我们开发了降低模型推理成本的方法,使大规模语料库处理在临床应用中更具可行性。具体而言,我们引入了一种生成式技术,将复杂任务分解为较小的子任务模块,该技术结合多任务训练提升了单轮模型的性能。此外,我们在推理过程中利用目标领域上下文来增强领域自适应,从而支持使用更小的模型。分析为不同成本降低策略的优势提供了见解。