Medical Visual Question Answering (Med-VQA) is a very important task in healthcare industry, which answers a natural language question with a medical image. Existing VQA techniques in information systems can be directly applied to solving the task. However, they often suffer from (i) the data insufficient problem, which makes it difficult to train the state of the arts (SOTAs) for the domain-specific task, and (ii) the reproducibility problem, that many existing models have not been thoroughly evaluated in a unified experimental setup. To address these issues, this paper develops a Benchmark Evaluation SysTem for Medical Visual Question Answering, denoted by BESTMVQA. Given self-collected clinical data, our system provides a useful tool for users to automatically build Med-VQA datasets, which helps overcoming the data insufficient problem. Users also can conveniently select a wide spectrum of SOTA models from our model library to perform a comprehensive empirical study. With simple configurations, our system automatically trains and evaluates the selected models over a benchmark dataset, and reports the comprehensive results for users to develop new techniques or perform medical practice. Limitations of existing work are overcome (i) by the data generation tool, which automatically constructs new datasets from unstructured clinical data, and (ii) by evaluating SOTAs on benchmark datasets in a unified experimental setup. The demonstration video of our system can be found at https://youtu.be/QkEeFlu1x4A. Our code and data will be available soon.
翻译:医学视觉问答(Med-VQA)是医疗健康领域的一项重要任务,旨在根据医学图像回答自然语言问题。现有信息系统中的VQA技术可直接应用于解决该任务。然而,这些技术常面临两大挑战:(i)数据不足问题,导致难以针对特定领域任务训练出先进模型(SOTAs);(ii)可复现性问题,许多现有模型尚未在统一的实验设置下得到充分评估。为解决这些问题,本文开发了医学视觉问答基准评估系统BESTMVQA。该系统利用自收集的临床数据,为用户提供自动构建Med-VQA数据集的有用工具,从而缓解数据不足问题。用户还可便捷地从模型库中选取多种SOTA模型,进行全面实证研究。通过简单配置,系统可在基准数据集上自动训练和评估所选模型,并报告综合结果,帮助用户开发新技术或开展医疗实践。现有工作的局限性通过以下方式得到克服:(i)利用数据生成工具从非结构化临床数据自动构建新数据集;(ii)在统一实验设置下基于基准数据集评估SOTA模型。本系统演示视频见https://youtu.be/QkEeFlu1x4A,代码与数据将稍后公开。