Video identification attacks pose a significant privacy threat that can reveal videos that victims watch, which may disclose their hobbies, religious beliefs, political leanings, sexual orientation, and health status. Also, video watching history can be used for user profiling or advertising and may result in cyberbullying, discrimination, or blackmail. Existing extensive video inference techniques usually depend on analyzing network traffic generated by streaming online videos. In this work, we observe that the content of a subtitle determines its silhouette displayed on the screen, and identifying each subtitle silhouette also derives the temporal difference between two consecutive subtitles. We then propose SilhouetteTell, a novel video identification attack that combines the spatial and time domain information into a spatiotemporal feature of subtitle silhouettes. SilhouetteTell explores the spatiotemporal correlation between recorded subtitle silhouettes of a video and its subtitle file. It can infer both online and offline videos. Comprehensive experiments on off-the-shelf smartphones confirm the high efficacy of SilhouetteTell for inferring video titles and clips under various settings, including from a distance of up to 40 meters.
翻译:视频识别攻击构成严重的隐私威胁,可能泄露受害者观看的视频内容,进而暴露其兴趣爱好、宗教信仰、政治倾向、性取向及健康状况。此外,观看历史可用于用户画像构建或广告投放,并可能引发网络欺凌、歧视或敲诈勒索。现有的大规模视频推断技术通常依赖于分析在线视频流产生的网络流量。本研究观察到,字幕内容决定了其在屏幕上显示的轮廓形态,识别每个字幕轮廓的同时还可推导连续字幕间的时间差。基于此,我们提出SilhouetteTell——一种融合空间与时间域信息形成字幕轮廓时空特征的新型视频识别攻击方法。SilhouetteTell通过分析视频录制字幕轮廓与其字幕文件间的时空关联性,能够同时推断在线与离线视频。在商用智能手机上的综合实验证实,SilhouetteTell在多种场景下(包括最远40米距离)对视频标题及片段的推断均具有高效性。