Clinical notes contain valuable, context-rich information, but their unstructured format introduces several challenges, including unintended biases (e.g., gender or racial bias), and poor generalization across clinical settings (e.g., models trained on one EHR system may perform poorly on another due to format differences) and poor interpretability. To address these issues, we present ClinStructor, a pipeline that leverages large language models (LLMs) to convert clinical free-text into structured, task-specific question-answer pairs prior to predictive modeling. Our method substantially enhances transparency and controllability and only leads to a modest reduction in predictive performance (a 2-3% drop in AUC), compared to direct fine-tuning, on the ICU mortality prediction task. ClinStructor lays a strong foundation for building reliable, interpretable, and generalizable machine learning models in clinical environments.
翻译:临床记录蕴含丰富且具有上下文价值的信息,但其非结构化形式带来了若干挑战,包括无意识的偏见(如性别或种族偏见)、跨临床环境的泛化能力差(例如,由于格式差异,在一个电子健康记录系统上训练的模型在另一个系统上可能表现不佳)以及可解释性不足。为解决这些问题,我们提出了ClinStructor,一种利用大语言模型在预测建模前将临床自由文本转换为结构化、任务特定的问答对的流程。我们的方法显著提升了透明度和可控性,在ICU死亡率预测任务中,与直接微调相比,仅导致预测性能的适度下降(AUC下降2-3%)。ClinStructor为在临床环境中构建可靠、可解释且可泛化的机器学习模型奠定了坚实基础。