InstaSynth: Opportunities and Challenges in Generating Synthetic Instagram Data with ChatGPT for Sponsored Content Detection

Large Language Models (LLMs) raise concerns about lowering the cost of generating texts that could be used for unethical or illegal purposes, especially on social media. This paper investigates the promise of such models to help enforce legal requirements related to the disclosure of sponsored content online. We investigate the use of LLMs for generating synthetic Instagram captions with two objectives: The first objective (fidelity) is to produce realistic synthetic datasets. For this, we implement content-level and network-level metrics to assess whether synthetic captions are realistic. The second objective (utility) is to create synthetic data that is useful for sponsored content detection. For this, we evaluate the effectiveness of the generated synthetic data for training classifiers to identify undisclosed advertisements on Instagram. Our investigations show that the objectives of fidelity and utility may conflict and that prompt engineering is a useful but insufficient strategy. Additionally, we find that while individual synthetic posts may appear realistic, collectively they lack diversity, topic connectivity, and realistic user interaction patterns.

翻译：大型语言模型（LLMs）引发了对其可能降低用于不道德或非法目的（尤其在社交媒体领域）文本生成成本的担忧。本文探讨了此类模型在协助执行网络赞助内容披露相关法律要求方面的潜力。我们研究了使用LLM生成合成Instagram标题的两个目标：第一个目标（保真度）是生成逼真的合成数据集。为此，我们实现了内容级和网络级指标以评估合成标题的真实性。第二个目标（实用性）是创建对赞助内容检测有用的合成数据。为此，我们评估了生成的合成数据在训练分类器识别Instagram上未披露广告方面的有效性。研究表明，保真度与实用性目标可能相互冲突，提示工程虽有用但策略不足。此外，我们发现尽管单条合成帖子看似真实，但整体缺乏多样性、主题连接性及真实的用户交互模式。

相关内容

Instagram

关注 4

Instagram 是一款运行在 iPhone 和 Android 平台上的应用程序，允许用户在任何环境下抓拍下自己的生活记忆，选择图片的滤镜样式，一键分享至社会化平台上。Instagram 在移动端融入了很多社会化元素，包括好友关系的建立、回复、分享等，这是Instagram 作为服务存在而非应用存在最大的价值。 http://instagram.com/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日