E-commerce live streaming in China, particularly on platforms like Douyin, has become a major sales channel, but hosts often use morphs to evade scrutiny and engage in false advertising. This study introduces the Live Auditory Morph Resolution (LiveAMR) task to detect such violations. Unlike previous morph research focused on text-based evasion in social media and underground industries, LiveAMR targets pronunciation-based evasion in health and medical live streams. We constructed the first LiveAMR dataset with 86,790 samples and developed a method to transform the task into a text-to-text generation problem. By leveraging large language models (LLMs) to generate additional training data, we improved performance and demonstrated that morph resolution significantly enhances live streaming regulation.
翻译:在中国,尤其是在抖音等平台上,电商直播已成为重要的销售渠道,但主播常使用变体词规避审查并进行虚假宣传。本研究提出了直播听觉变体词解析任务,以检测此类违规行为。与以往专注于社交媒体和地下产业中基于文本规避的变体词研究不同,该任务针对健康和医疗直播中基于发音的规避行为。我们构建了首个包含86,790个样本的直播听觉变体词解析数据集,并开发了一种将该任务转化为文本到文本生成问题的方法。通过利用大语言模型生成额外训练数据,我们提升了任务性能,并证明变体词解析能显著加强直播监管效能。