Advancements in multimodal Large Language Models (LLMs), such as OpenAI's GPT-4o, offer significant potential for mediating human interactions across various contexts. However, their use in areas such as persuasion, influence, and recruitment raises ethical and security concerns. To evaluate these models ethically in public influence and persuasion scenarios, we developed a prompting strategy using "Where's Waldo?" images as proxies for complex, crowded gatherings. This approach provides a controlled, replicable environment to assess the model's ability to process intricate visual information, interpret social dynamics, and propose engagement strategies while avoiding privacy concerns. By positioning Waldo as a hypothetical agent tasked with face-to-face mobilization, we analyzed the model's performance in identifying key individuals and formulating mobilization tactics. Our results show that while the model generates vivid descriptions and creative strategies, it cannot accurately identify individuals or reliably assess social dynamics in these scenarios. Nevertheless, this methodology provides a valuable framework for testing and benchmarking the evolving capabilities of multimodal LLMs in social contexts.
翻译:多模态大语言模型(如OpenAI的GPT-4o)的进展为跨语境调解人际互动提供了重要潜力。然而,其在说服、影响和招募等领域的应用引发了伦理与安全关切。为在公共影响与说服场景中伦理评估此类模型,我们开发了一种提示策略,以《沃尔多在哪里?》图像作为复杂拥挤集会的代理。该方法通过可控、可复现的环境,评估模型处理复杂视觉信息、解读社会动态及提出参与策略的能力,同时规避隐私问题。通过将沃尔多设定为承担面对面动员任务的假设性代理,我们分析了模型在识别关键个体与制定动员策略方面的表现。结果表明,尽管模型能生成生动的描述与创造性策略,但其无法准确识别个体或可靠评估此类场景中的社会动态。尽管如此,该方法为测试和基准评估多模态大语言模型在社会语境中持续演进的能力提供了有价值的框架。