While large language models (LLMs) have shown promise for medical question answering, there is limited work focused on tropical and infectious disease-specific exploration. We build on an opensource tropical and infectious diseases (TRINDs) dataset, expanding it to include demographic and semantic clinical and consumer augmentations yielding 11000+ prompts. We evaluate LLM performance on these, comparing generalist and medical LLMs, as well as LLM outcomes to human experts. We demonstrate through systematic experimentation, the benefit of contextual information such as demographics, location, gender, risk factors for optimal LLM response. Finally we develop a prototype of TRINDs-LM, a research tool that provides a playground to navigate how context impacts LLM outputs for health.
翻译:尽管大语言模型在医学问答领域展现出潜力,但针对热带与传染性疾病领域的专项探索研究仍较为有限。本研究基于开源热带与传染性疾病数据集进行扩展,通过融入人口统计学特征及语义层面的临床与患者数据增强,构建了包含11000余条提示词的增强数据集。我们系统评估了各类大语言模型在该数据集上的表现,对比了通用模型与医学专用模型,并将模型输出结果与医学专家判断进行对标分析。通过系统性实验验证,我们证明了人口特征、地理位置、性别、风险因素等情境信息对于优化大语言模型应答效果具有显著价值。最后,我们开发了TRINDs-LM原型系统——该研究工具构建了一个可探索情境因素如何影响健康领域大语言模型输出的实验平台。