BehaVR: User Identification Based on VR Sensor Data

Virtual reality (VR) platforms enable a wide range of applications, however pose unique privacy risks. In particular, VR devices are equipped with a rich set of sensors that collect personal and sensitive information (e.g., body motion, eye gaze, hand joints, and facial expression), which can be used to uniquely identify a user, even without explicit identifiers. In this paper, we are interested in understanding the extent to which a user can be identified based on data collected by different VR sensors. We consider adversaries with capabilities that range from observing APIs available within a single VR app (app adversary) to observing all, or selected, sensor measurements across all apps on the VR device (device adversary). To that end, we introduce BEHAVR, a framework for collecting and analyzing data from all sensor groups collected by all apps running on a VR device. We use BEHAVR to perform a user study and collect data from real users that interact with popular real-world apps. We use that data to build machine learning models for user identification, with features extracted from sensor data available within and across apps. We show that these models can identify users with an accuracy of up to 100%, and we reveal the most important features and sensor groups, depending on the functionality of the app and the strength of the adversary, as well as the minimum time needed for user identification. To the best of our knowledge, BEHAVR is the first to analyze user identification in VR comprehensively, i.e., considering jointly all sensor measurements available on a VR device (whether within an app or across multiple apps), collected by real-world, as opposed to custom-made, apps.

翻译：虚拟现实（VR）平台支持广泛的应用场景，但也带来了独特的隐私风险。具体而言，VR设备配备了大量传感器，可采集个人敏感信息（如身体动作、眼动轨迹、手部关节及面部表情），即便没有显式标识符，这些数据仍可用于唯一识别用户身份。本文旨在探究基于不同VR传感器采集的数据在多大程度上可实现用户身份识别。我们考虑的对手能力范围包括：从仅能观测单个VR应用内API的对手（应用级对手），到能够观测VR设备上所有应用的全部或部分传感器测量值的对手（设备级对手）。为此，我们提出BEHAVR框架，用于采集和分析VR设备上所有运行应用的所有传感器组数据。我们利用BEHAVR开展用户研究，从实际使用流行商用应用的真实用户中采集数据，并基于这些数据构建了跨应用内外的传感器特征提取模型，实现用户身份识别。实验表明，这些模型的识别准确率最高可达100%，我们同时揭示了依据应用功能与对手强度不同而变化的最关键特征与传感器组，以及实现用户识别所需的最短时间。据我们所知，BEHAVR是首个对VR场景中的用户身份识别进行系统性分析的框架——即同时考虑VR设备上所有可获取的传感器测量值（涵盖单一应用内及跨多个应用），且基于真实商用应用而非定制化应用采集的数据。