Live streaming has become a cornerstone of today's internet, enabling massive real-time social interactions. However, it faces severe risks arising from sparse, coordinated malicious behaviors among multiple participants, which are often concealed within normal activities and challenging to detect timely and accurately. In this work, we provide a pioneering study on risk assessment in live streaming rooms, characterized by weak supervision where only room-level labels are available. We formulate the task as a Multiple Instance Learning (MIL) problem, treating each room as a bag and defining structured user-timeslot capsules as instances. These capsules represent subsequences of user actions within specific time windows, encapsulating localized behavioral patterns. Based on this formulation, we propose AC-MIL, an Action-aware Capsule MIL framework that models both individual behaviors and group-level coordination patterns. AC-MIL captures multi-granular semantics and behavioral cues through a serial and parallel architecture that jointly encodes temporal dynamics and cross-user dependencies. These signals are integrated for robust room-level risk prediction, while also offering interpretable evidence at the behavior segment level. Extensive experiments on large-scale industrial datasets from Douyin demonstrate that AC-MIL significantly outperforms MIL and sequential baselines, establishing new state-of-the-art performance in room-level risk assessment for live streaming. Moreover, AC-MIL provides capsule-level interpretability, enabling identification of risky behavior segments as actionable evidence for intervention. The project page is available at: https://qiaoyran.github.io/AC-MIL/.
翻译:直播已成为当今互联网的基石,实现了海量的实时社交互动。然而,其面临着由多参与者间稀疏、协同的恶意行为所带来的严重风险,这些行为常隐匿于正常活动中,难以及时、准确地检测。本研究针对直播间的风险评估问题进行了开创性探索,其特点在于弱监督——仅可获得房间级别的标签。我们将该任务形式化为一个多示例学习问题,将每个直播间视为一个包,并将结构化的用户-时间段胶囊定义为实例。这些胶囊表示特定时间窗口内用户行为的子序列,封装了局部化的行为模式。基于此形式化,我们提出了AC-MIL,一个行为感知的胶囊多示例学习框架,该框架同时对个体行为和群体层面的协同模式进行建模。AC-MIL通过一个串行与并行结合的架构,联合编码时间动态性和跨用户依赖关系,从而捕获多粒度的语义和行为线索。这些信号被整合用于鲁棒的房间级别风险预测,同时也在行为片段层面提供了可解释的证据。在来自抖音的大规模工业数据集上进行的大量实验表明,AC-MIL显著优于多示例学习和序列基线方法,在直播间风险评估任务上确立了新的最优性能。此外,AC-MIL提供了胶囊级别的可解释性,能够识别风险行为片段,为干预提供可操作的证据。项目页面位于:https://qiaoyran.github.io/AC-MIL/。