ChatGPT explores a strategic blueprint of question answering (QA) in delivering medical diagnosis, treatment recommendations, and other healthcare support. This is achieved through the increasing incorporation of medical domain data via natural language processing (NLP) and multimodal paradigms. By transitioning the distribution of text, images, videos, and other modalities from the general domain to the medical domain, these techniques have expedited the progress of medical domain question answering (MDQA). They bridge the gap between human natural language and sophisticated medical domain knowledge or expert manual annotations, handling large-scale, diverse, unbalanced, or even unlabeled data analysis scenarios in medical contexts. Central to our focus is the utilizing of language models and multimodal paradigms for medical question answering, aiming to guide the research community in selecting appropriate mechanisms for their specific medical research requirements. Specialized tasks such as unimodal-related question answering, reading comprehension, reasoning, diagnosis, relation extraction, probability modeling, and others, as well as multimodal-related tasks like vision question answering, image caption, cross-modal retrieval, report summarization, and generation, are discussed in detail. Each section delves into the intricate specifics of the respective method under consideration. This paper highlights the structures and advancements of medical domain explorations against general domain methods, emphasizing their applications across different tasks and datasets. It also outlines current challenges and opportunities for future medical domain research, paving the way for continued innovation and application in this rapidly evolving field.
翻译:ChatGPT探索了通过问答实现医疗诊断、治疗建议及其他医疗支持的策略蓝图。这一目标的实现依托于自然语言处理和多模态范式对医学领域数据的日益整合。通过将文本、图像、视频及其他模态的分布从通用领域迁移至医学领域,相关技术显著推进了医学领域问答的发展,弥合了人类自然语言与复杂医学领域知识或专家人工标注之间的鸿沟,能够处理医学场景中大规模、多样化、非均衡甚至无标签的数据分析任务。本文核心聚焦于利用语言模型与多模态范式开展医学问答研究,旨在指导研究群体根据特定医学研究需求选择适配机制。研究详细探讨了单模态相关问答、阅读理解、推理、诊断、关系抽取、概率建模等专业任务,以及视觉问答、图像描述、跨模态检索、报告摘要与生成等多模态相关任务。每个章节深入剖析了相应方法的技术细节。本文系统对比了医学领域方法与通用领域方法在结构与进展上的差异,重点阐述了其在不同任务与数据集中的应用实践,同时剖析了当前面临的挑战与未来医学领域研究的机遇,为这一快速演进领域的持续创新与应用指明方向。