Visual prompt engineering is a fundamental technology in the field of visual and image Artificial General Intelligence, serving as a key component for achieving zero-shot capabilities. As the development of large vision models progresses, the importance of prompt engineering becomes increasingly evident. Designing suitable prompts for specific visual tasks has emerged as a meaningful research direction. This review aims to summarize the methods employed in the computer vision domain for large vision models and visual prompt engineering, exploring the latest advancements in visual prompt engineering. We present influential large models in the visual domain and a range of prompt engineering methods employed on these models. It is our hope that this review provides a comprehensive and systematic description of prompt engineering methods based on large visual models, offering valuable insights for future researchers in their exploration of this field.
翻译:视觉提示工程是视觉与图像人工智能通用领域的基础技术,是实现零样本能力的关键组成部分。随着大视觉模型的发展,提示工程的重要性日益凸显。针对特定视觉任务设计合适的提示已成为有意义的研究方向。本综述旨在总结计算机视觉领域中应用于大视觉模型及视觉提示工程的方法,探索视觉提示工程的最新进展。我们介绍了视觉领域中有影响力的大模型以及在这些模型上采用的一系列提示工程方法。希望本综述能够基于大视觉模型对提示工程方法进行全面系统的描述,为未来研究人员探索该领域提供有价值的见解。