Large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in various tasks and attracted an increasing interest as a natural language interface across many domains. Recently, large vision-language models (VLMs) like BLIP-2 and GPT-4 have been intensively investigated, which learn rich vision-language correlation from image-text pairs. However, despite these developments, the application of LLMs and VLMs in image quality assessment (IQA), particularly in medical imaging, remains to be explored, which is valuable for objective performance evaluation and potential supplement or even replacement of radiologists' opinions. To this end, this paper introduces IQAGPT, an innovative image quality assessment system integrating an image quality captioning VLM with ChatGPT for generating quality scores and textual reports. First, we build a CT-IQA dataset for training and evaluation, comprising 1,000 CT slices with diverse quality levels professionally annotated. To better leverage the capabilities of LLMs, we convert annotated quality scores into semantically rich text descriptions using a prompt template. Second, we fine-tune the image quality captioning VLM on the CT-IQA dataset to generate quality descriptions. The captioning model fuses the image and text features through cross-modal attention. Third, based on the quality descriptions, users can talk with ChatGPT to rate image quality scores or produce a radiological quality report. Our preliminary results demonstrate the feasibility of assessing image quality with large models. Remarkably, our IQAGPT outperforms GPT-4 and CLIP-IQA, as well as the multi-task classification and regression models that solely rely on images.
翻译:大型语言模型(LLM),如ChatGPT,已在各类任务中展现出卓越能力,并作为自然语言接口在众多领域引起日益广泛的关注。近年来,BLIP-2和GPT-4等大型视觉语言模型(VLM)被深入研究,这些模型通过图像-文本对学习丰富的视觉语言关联。然而,尽管取得了这些进展,LLM和VLM在图像质量评估(IQA)中的应用,尤其是在医学成像领域,仍有待探索。该应用对于客观性能评估、甚至可能补充或替代放射科医师意见具有重要价值。为此,本文提出IQAGPT——一种创新的图像质量评估系统,它将图像质量描述VLM与ChatGPT相结合,用于生成质量评分和文本报告。首先,我们构建用于训练和评估的CT-IQA数据集,包含1000张具有专业标注的不同质量等级的CT切片。为更好利用LLM的能力,我们利用提示模板将标注质量评分转化为语义丰富的文本描述。其次,我们在CT-IQA数据集上微调图像质量描述VLM以生成质量描述。该描述模型通过跨模态注意力融合图像和文本特征。最后,基于质量描述,用户可与ChatGPT对话来评估图像质量评分或生成放射学质量报告。初步结果验证了利用大型模型评估图像质量的可行性。值得注意的是,我们的IQAGPT性能优于GPT-4和CLIP-IQA,以及仅依赖图像的多任务分类和回归模型。