Online reviews in the form of user-generated content (UGC) significantly impact consumer decision-making. However, the pervasive issue of not only human fake content but also machine-generated content challenges UGC's reliability. Recent advances in Large Language Models (LLMs) may pave the way to fabricate indistinguishable fake generated content at a much lower cost. Leveraging OpenAI's GPT-4-Turbo and DALL-E-2 models, we craft AiGen-FoodReview, a multi-modal dataset of 20,144 restaurant review-image pairs divided into authentic and machine-generated. We explore unimodal and multimodal detection models, achieving 99.80% multimodal accuracy with FLAVA. We use attributes from readability and photographic theories to score reviews and images, respectively, demonstrating their utility as hand-crafted features in scalable and interpretable detection models, with comparable performance. The paper contributes by open-sourcing the dataset and releasing fake review detectors, recommending its use in unimodal and multimodal fake review detection tasks, and evaluating linguistic and visual features in synthetic versus authentic data.
翻译:以用户生成内容(UGC)形式呈现的在线评论显著影响消费者决策。然而,不仅存在人为虚假内容,机器生成内容的普遍问题也对UGC的可靠性构成挑战。大型语言模型(LLMs)的最新进展可能以更低成本制造出难以分辨的虚假生成内容。通过利用OpenAI的GPT-4-Turbo和DALL-E-2模型,我们构建了AiGen-FoodReview数据集——一个包含20,144条餐厅评论-图像配对的多模态数据集,并将其分为真实内容与机器生成内容两类。我们探索了单模态与多模态检测模型,其中FLAVA模型实现了99.80%的多模态分类准确率。我们分别采用可读性理论和摄影理论属性对评论和图像进行评分,证明这些手工特征在可扩展且可解释的检测模型中具有与模型性能相当的有效性。本文的贡献包括:开源该数据集并发布虚假评论检测器,推荐将其用于单模态与多模态虚假评论检测任务,以及评估合成数据与真实数据在语言和视觉特征上的差异。