Physicians write notes about patients. In doing so, they reveal much about themselves. Using data from 129,228 emergency room visits, we train a model to identify notes written by fatigued physicians -- those who worked 5 or more of the prior 7 days. In a hold-out set, the model accurately identifies notes written by these high-workload physicians, and also flags notes written in other high-fatigue settings: on overnight shifts, and after high patient volumes. Model predictions also correlate with worse decision-making on at least one important metric: yield of testing for heart attack is 18% lower with each standard deviation increase in model-predicted fatigue. Finally, the model indicates that notes written about Black and Hispanic patients have 12% and 21% higher predicted fatigue than Whites -- larger than overnight vs. daytime differences. These results have an important implication for large language models (LLMs). Our model indicates that fatigued doctors write more predictable notes. Perhaps unsurprisingly, because word prediction is the core of how LLMs work, we find that LLM-written notes have 17% higher predicted fatigue than real physicians' notes. This indicates that LLMs may introduce distortions in generated text that are not yet fully understood.
翻译:医师在为患者撰写临床笔记时,往往也透露了自身状态。基于129,228次急诊就诊数据,我们训练了一个模型,用以识别由疲劳医师(即在过去7天中工作≥5天的医师)撰写的笔记。在保留测试集中,该模型不仅准确识别了这些高工作量医师的笔记,还能标记其他高疲劳场景下的笔记:如夜班值班及接诊大量患者后。模型预测值与至少一项重要决策质量指标相关:模型预测疲劳度每增加一个标准差,心脏病检测的阳性检出率下降18%。此外,模型显示针对黑人和西班牙裔患者的笔记,其预测疲劳度分别比白人患者高出12%和21%——这一差异甚至超过夜班与白班之间的差距。这些结果对大语言模型(LLMs)具有重要启示:我们的模型表明,疲劳医师会撰写更可预测的临床笔记。由于词汇预测正是LLMs的核心机制,我们发现LLM生成的笔记比真实医师笔记的预测疲劳度高17%——这暗示LLMs可能在生成文本中引入尚未完全理解的扭曲效应。