IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models

Large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in various tasks and attracted an increasing interest as a natural language interface across many domains. Recently, large vision-language models (VLMs) like BLIP-2 and GPT-4 have been intensively investigated, which learn rich vision-language correlation from image-text pairs. However, despite these developments, the application of LLMs and VLMs in image quality assessment (IQA), particularly in medical imaging, remains to be explored, which is valuable for objective performance evaluation and potential supplement or even replacement of radiologists' opinions. To this end, this paper introduces IQAGPT, an innovative image quality assessment system integrating an image quality captioning VLM with ChatGPT for generating quality scores and textual reports. First, we build a CT-IQA dataset for training and evaluation, comprising 1,000 CT slices with diverse quality levels professionally annotated. To better leverage the capabilities of LLMs, we convert annotated quality scores into semantically rich text descriptions using a prompt template. Second, we fine-tune the image quality captioning VLM on the CT-IQA dataset to generate quality descriptions. The captioning model fuses the image and text features through cross-modal attention. Third, based on the quality descriptions, users can talk with ChatGPT to rate image quality scores or produce a radiological quality report. Our preliminary results demonstrate the feasibility of assessing image quality with large models. Remarkably, our IQAGPT outperforms GPT-4 and CLIP-IQA, as well as the multi-task classification and regression models that solely rely on images.

翻译：大型语言模型（LLM），如ChatGPT，已在各类任务中展现出卓越能力，并作为自然语言接口在众多领域引起日益广泛的关注。近年来，BLIP-2和GPT-4等大型视觉语言模型（VLM）被深入研究，这些模型通过图像-文本对学习丰富的视觉语言关联。然而，尽管取得了这些进展，LLM和VLM在图像质量评估（IQA）中的应用，尤其是在医学成像领域，仍有待探索。该应用对于客观性能评估、甚至可能补充或替代放射科医师意见具有重要价值。为此，本文提出IQAGPT——一种创新的图像质量评估系统，它将图像质量描述VLM与ChatGPT相结合，用于生成质量评分和文本报告。首先，我们构建用于训练和评估的CT-IQA数据集，包含1000张具有专业标注的不同质量等级的CT切片。为更好利用LLM的能力，我们利用提示模板将标注质量评分转化为语义丰富的文本描述。其次，我们在CT-IQA数据集上微调图像质量描述VLM以生成质量描述。该描述模型通过跨模态注意力融合图像和文本特征。最后，基于质量描述，用户可与ChatGPT对话来评估图像质量评分或生成放射学质量报告。初步结果验证了利用大型模型评估图像质量的可行性。值得注意的是，我们的IQAGPT性能优于GPT-4和CLIP-IQA，以及仅依赖图像的多任务分类和回归模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日