Visual localization is the task of estimating the camera pose from which a given image was taken and is central to several 3D computer vision applications. With the rapid growth in the popularity of AR/VR/MR devices and cloud-based applications, privacy issues are becoming a very important aspect of the localization process. Existing work on privacy-preserving localization aims to defend against an attacker who has access to a cloud-based service. In this paper, we show that an attacker can learn about details of a scene without any access by simply querying a localization service. The attack is based on the observation that modern visual localization algorithms are robust to variations in appearance and geometry. While this is in general a desired property, it also leads to algorithms localizing objects that are similar enough to those present in a scene. An attacker can thus query a server with a large enough set of images of objects, \eg, obtained from the Internet, and some of them will be localized. The attacker can thus learn about object placements from the camera poses returned by the service (which is the minimal information returned by such a service). In this paper, we develop a proof-of-concept version of this attack and demonstrate its practical feasibility. The attack does not place any requirements on the localization algorithm used, and thus also applies to privacy-preserving representations. Current work on privacy-preserving representations alone is thus insufficient.
翻译:视觉定位是从给定图像估计相机姿态的任务,是多个3D计算机视觉应用的核心。随着AR/VR/MR设备和基于云的应用的快速普及,隐私问题正成为定位过程中一个非常重要的方面。现有的隐私保护定位工作旨在防御能够访问基于云服务的攻击者。本文中,我们表明攻击者无需任何访问权限,仅通过查询定位服务即可了解场景的细节。该攻击基于以下观察:现代视觉定位算法对几何外观变化具有鲁棒性。尽管这通常是期望的属性,但它也使得算法能够定位与场景中对象足够相似的物体。因此,攻击者可以使用从互联网等来源获取的大量物体图像向服务器发送查询,其中部分图像将被成功定位。攻击者从而可以从服务返回的相机姿态(即此类服务返回的最小信息)中了解物体的布局。本文中,我们开发了该攻击的概念验证版本,并证明了其实践可行性。该攻击对所使用的定位算法没有任何要求,因此也适用于隐私保护的表示方法。仅靠现有的隐私保护表示方法尚不足以防范此类攻击。