Understanding how diverse individuals and communities respond to persuasive messaging holds significant potential for advancing personalized and socially aware machine learning. While Large Vision and Language Models (VLMs) offer promise, their ability to emulate nuanced, heterogeneous human responses, particularly in high stakes domains like public health, remains underexplored due in part to the lack of comprehensive, multimodal dataset. We introduce PHORECAST (Public Health Outreach REceptivity and CAmpaign Signal Tracking), a multimodal dataset curated to enable fine-grained prediction of both individuallevel behavioral responses and community-wide engagement patterns to health messaging. This dataset supports tasks in multimodal understanding, response prediction, personalization, and social forecasting, allowing rigorous evaluation of how well modern AI systems can emulate, interpret, and anticipate heterogeneous public sentiment and behavior. By providing a new dataset to enable AI advances for public health, PHORECAST aims to catalyze the development of models that are not only more socially aware but also aligned with the goals of adaptive and inclusive health communication
翻译:理解不同个体和群体如何响应说服性信息,对于推进个性化和社会感知的机器学习具有重要潜力。尽管大规模视觉语言模型展现出前景,但其模拟细微、异质性人类响应的能力——尤其是在公共卫生等高风险领域——仍未得到充分探索,部分原因在于缺乏全面的多模态数据集。我们提出了PHORECAST(公共卫生宣传接受度与活动信号追踪),这是一个精心构建的多模态数据集,旨在实现对健康信息在个体层面行为响应和社区范围参与模式的细粒度预测。该数据集支持多模态理解、响应预测、个性化和社会预测等任务,能够严格评估现代人工智能系统在模拟、解释和预测异质性公众情绪与行为方面的表现。通过提供这一推动公共卫生领域人工智能进步的新数据集,PHORECAST旨在促进开发不仅更具社会意识,而且能与适应性、包容性健康传播目标相一致的模型。