Multimodal GPTs represent a watershed in the interplay between Software Engineering and Generative Artificial Intelligence. GPT-4 accepts image and text inputs, rather than simply natural language. We investigate relevant use cases stemming from these enhanced capabilities of GPT-4. To the best of our knowledge, no other work has investigated similar use cases involving Software Engineering tasks carried out via multimodal GPTs prompted with a mix of diagrams and natural language.
翻译:多模态GPT模型标志着软件工程与生成式人工智能交互关系的一个分水岭。GPT-4不仅接受自然语言输入,还能处理图像与文本的混合输入。本研究基于GPT-4的增强能力探讨了相关应用场景。据我们所知,目前尚未有研究通过结合图表与自然语言提示多模态GPT的方式,探索其在软件工程任务中的类似应用场景。