ChatGPT explores a strategic blueprint of question answering (QA) in delivering medical diagnosis, treatment recommendations, and other healthcare support. This is achieved through the increasing incorporation of medical domain data via natural language processing (NLP) and multimodal paradigms. By transitioning the distribution of text, images, videos, and other modalities from the general domain to the medical domain, these techniques have expedited the progress of medical domain question answering (MDQA). They bridge the gap between human natural language and sophisticated medical domain knowledge or expert manual annotations, handling large-scale, diverse, unbalanced, or even unlabeled data analysis scenarios in medical contexts. Central to our focus is the utilizing of language models and multimodal paradigms for medical question answering, aiming to guide the research community in selecting appropriate mechanisms for their specific medical research requirements. Specialized tasks such as unimodal-related question answering, reading comprehension, reasoning, diagnosis, relation extraction, probability modeling, and others, as well as multimodal-related tasks like vision question answering, image caption, cross-modal retrieval, report summarization, and generation, are discussed in detail. Each section delves into the intricate specifics of the respective method under consideration. This paper highlights the structures and advancements of medical domain explorations against general domain methods, emphasizing their applications across different tasks and datasets. It also outlines current challenges and opportunities for future medical domain research, paving the way for continued innovation and application in this rapidly evolving field.
翻译:ChatGPT探索了在提供医疗诊断、治疗建议及其他医疗支持中实现问答(QA)的战略蓝图。这一目标通过自然语言处理(NLP)和多模态范式日益融合医学领域数据得以实现。通过将文本、图像、视频及其他模态的分布从通用领域迁移至医学领域,这些技术加速了医学领域问答(MDQA)的进展。它们弥合了人类自然语言与复杂医学领域知识或专家人工标注之间的鸿沟,能够处理医疗场景中大规模、多样、非平衡乃至无标签的数据分析案例。本文核心聚焦于利用语言模型和多模态范式进行医学问答,旨在引导研究社区根据特定医学研究需求选择合适机制。具体讨论了单模态相关问答、阅读理解、推理、诊断、关系抽取、概率建模等专业任务,以及视觉问答、图像描述、跨模态检索、报告摘要与生成等多模态相关任务。每个章节深入剖析所述方法的具体细节。本文突显了医学领域探索相较于通用领域方法的结构与进展,强调其在不同任务与数据集上的应用。最后概述了当前挑战及未来医学领域研究机遇,为这一快速演进领域的持续创新与应用铺平道路。