Inspiration is linked to various positive outcomes, such as increased creativity, productivity, and happiness. Although inspiration has great potential, there has been limited effort toward identifying content that is inspiring, as opposed to just engaging or positive. Additionally, most research has concentrated on Western data, with little attention paid to other cultures. This work is the first to study cross-cultural inspiration through machine learning methods. We aim to identify and analyze real and AI-generated cross-cultural inspiring posts. To this end, we compile and make publicly available the InspAIred dataset, which consists of 2,000 real inspiring posts, 2,000 real non-inspiring posts, and 2,000 generated inspiring posts evenly distributed across India and the UK. The real posts are sourced from Reddit, while the generated posts are created using the GPT-4 model. Using this dataset, we conduct extensive computational linguistic analyses to (1) compare inspiring content across cultures, (2) compare AI-generated inspiring posts to real inspiring posts, and (3) determine if detection models can accurately distinguish between inspiring content across cultures and data sources.
翻译:灵感与诸多积极成果密切相关,例如提升创造力、生产效率与幸福感。尽管灵感具有巨大潜力,但现有研究多聚焦于识别具有吸引力或积极性的内容,而非真正具有启发性的内容,且相关探索主要集中于西方文化背景,对其他文化的关注度不足。本研究首次通过机器学习方法探讨跨文化灵感,旨在识别并分析真实与AI生成的跨文化启发式帖子。为此,我们构建并公开了InspAIred数据集,包含2,000条真实启发式帖子、2,000条真实非启发式帖子以及2,000条AI生成启发式帖子,这些数据均匀分布于印度与英国两大文化区域。其中真实帖子源自Reddit平台,生成帖子由GPT-4模型创作。基于该数据集,我们开展了大规模计算语言学分析:首先比较不同文化间的启发式内容差异;其次对比AI生成与真实启发式帖子的特征差异;最后检验检测模型能否准确区分跨文化与跨数据源的启发式内容类别。