TACL: Threshold-Adaptive Curriculum Learning Strategy for Enhancing Medical Text Understanding

Medical texts, particularly electronic medical records (EMRs), are a cornerstone of modern healthcare, capturing critical information about patient care, diagnoses, and treatments. These texts hold immense potential for advancing clinical decision-making and healthcare analytics. However, their unstructured nature, domain-specific language, and variability across contexts make automated understanding an intricate challenge. Despite the advancements in natural language processing, existing methods often treat all data as equally challenging, ignoring the inherent differences in complexity across clinical records. This oversight limits the ability of models to effectively generalize and perform well on rare or complex cases. In this paper, we present TACL (Threshold-Adaptive Curriculum Learning), a novel framework designed to address these challenges by rethinking how models interact with medical texts during training. Inspired by the principle of progressive learning, TACL dynamically adjusts the training process based on the complexity of individual samples. By categorizing data into difficulty levels and prioritizing simpler cases early in training, the model builds a strong foundation before tackling more complex records. By applying TACL to multilingual medical data, including English and Chinese clinical records, we observe significant improvements across diverse clinical tasks, including automatic ICD coding, readmission prediction and TCM syndrome differentiation. TACL not only enhances the performance of automated systems but also demonstrates the potential to unify approaches across disparate medical domains, paving the way for more accurate, scalable, and globally applicable medical text understanding solutions.

翻译：医学文本，特别是电子病历（EMRs），是现代医疗保健的基石，记录了患者护理、诊断和治疗的关键信息。这些文本在推进临床决策和医疗保健分析方面具有巨大潜力。然而，其非结构化特性、领域特定语言以及跨语境的可变性使得自动化理解成为一个复杂的挑战。尽管自然语言处理技术取得了进展，现有方法通常将所有数据视为同等困难，忽略了临床记录中复杂性的固有差异。这种疏忽限制了模型在罕见或复杂病例上有效泛化和良好表现的能力。本文提出TACL（阈值自适应课程学习），这是一个通过重新思考模型在训练过程中如何与医学文本交互来应对这些挑战的新型框架。受渐进学习原理的启发，TACL根据单个样本的复杂性动态调整训练过程。通过将数据按难度分级并在训练早期优先处理较简单案例，模型在应对更复杂记录之前建立了坚实基础。将TACL应用于包括英语和中文临床记录在内的多语言医学数据后，我们在多种临床任务中观察到显著改进，包括自动ICD编码、再入院预测和中医证候鉴别。TACL不仅提升了自动化系统的性能，还展示了统一不同医学领域方法的潜力，为更准确、可扩展且全球适用的医学文本理解解决方案铺平了道路。