Multimodal GPTs represent a watershed in the interplay between Software Engineering and Generative Artificial Intelligence. GPT-4 accepts image and text inputs, rather than simply natural language. We investigate relevant use cases stemming from these enhanced capabilities of GPT-4. To the best of our knowledge, no other work has investigated similar use cases involving Software Engineering tasks carried out via multimodal GPTs prompted with a mix of diagrams and natural language.
翻译:多模态GPT标志着软件工程与生成式人工智能交互的一个分水岭。GPT-4能够接收图像和文本输入,而不仅仅是自然语言。我们研究了GPT-4这些增强功能所产生的相关应用场景。据我们所知,目前尚无其他研究探讨通过混合图表与自然语言提示多模态GPT来执行软件工程任务的类似应用场景。