With the growth of social media and large language models, content moderation has become crucial. Many existing datasets lack adequate representation of different groups, resulting in unreliable assessments. To tackle this, we propose a socio-culturally aware evaluation framework for LLM-driven content moderation and introduce a scalable method for creating diverse datasets using persona-based generation. Our analysis reveals that these datasets provide broader perspectives and pose greater challenges for LLMs than diversity-focused generation methods without personas. This challenge is especially pronounced in smaller LLMs, emphasizing the difficulties they encounter in moderating such diverse content.
翻译:随着社交媒体和大语言模型的发展,内容审核变得至关重要。现有许多数据集对不同群体的代表性不足,导致评估结果不可靠。为解决这一问题,我们提出了一种面向大语言模型驱动内容审核的社会文化感知评估框架,并引入了一种基于角色生成的可扩展方法,用于创建多样化数据集。我们的分析表明,与不采用角色的、仅聚焦多样性的生成方法相比,这些数据集提供了更广泛的视角,并对大语言模型构成了更大的挑战。这种挑战在较小规模的大语言模型中尤为明显,突显了它们在审核此类多样化内容时所面临的困难。