The study explores the effectiveness of the Chain-of-Thought approach, known for its proficiency in language tasks by breaking them down into sub-tasks and intermediate steps, in improving vision-language tasks that demand sophisticated perception and reasoning. We present the "Description then Decision" strategy, which is inspired by how humans process signals. This strategy significantly improves probing task performance by 50%, establishing the groundwork for future research on reasoning paradigms in complex vision-language tasks.
翻译:该研究探讨了链式思维方法在提升需要复杂感知与推理能力的视觉-语言任务中的有效性——该方法通过将任务分解为子任务和中间步骤而擅长语言处理任务。我们提出"先描述后决策"策略,其灵感来源于人类处理信号的方式,该策略将探测任务性能显著提升50%,为未来复杂视觉-语言任务中推理范式的研究奠定了基础。