From Street View to Visibility Network: Mapping Urban Visual Relationships with Vision-Language Models

Visibility analysis is one of the fundamental analytics methods in urban planning and landscape research, traditionally conducted through computational simulations based on the Line-of-Sight (LoS) principle. However, when assessing the visibility of named urban objects such as landmarks, geometric intersection alone fails to capture the contextual and perceptual dimensions of visibility as experienced in the real world. The study challenges the traditional LoS-based approaches by introducing a new, image-based visibility analysis method. Specifically, a Vision Language Model (VLM) is applied to detect the target object within a direction-zoomed Street View Image (SVI). Successful detection represents the object's visibility at the corresponding SVI location. Further, a heterogeneous visibility graph is constructed to address the complex interaction between observers and target objects. In the first case study, the method proves its reliability in detecting the visibility of six tall landmark constructions in global cities, with an overall accuracy of 87%. Furthermore, it reveals broader contextual differences when the landmarks are perceived and experienced. In the second case, the proposed visibility graph uncovers the form and strength of connections for multiple landmarks along the River Thames in London, as well as the places where these connections occur. Notably, bridges on the River Thames account for approximately 30% of total connections. Our method complements and enhances traditional LoS-based visibility analysis, and showcases the possibility of revealing the prevalent connection of any visual objects in the urban environment. It opens up new research perspectives for urban planning, heritage conservation, and computational social science.

翻译：可视性分析是城市规划和景观研究中的基本分析方法之一，传统上基于视线原理通过计算模拟进行。然而，在评估地标等具名城市物体的可视性时，仅依靠几何相交无法捕捉现实世界中体验到的可视性的语境与感知维度。本研究通过引入一种新型的基于图像的可视性分析方法，对传统基于视线的方法提出了挑战。具体而言，我们应用视觉语言模型在方向放大的街景图像中检测目标物体。成功检测即代表该物体在相应街景位置的可视性。进一步地，我们构建了一个异质可视性图以处理观察者与目标物体之间的复杂交互。在第一个案例研究中，该方法在检测全球城市六座高层地标建筑的可视性方面证明了其可靠性，总体准确率达到87%。此外，该方法还揭示了这些地标在被感知和体验时更广泛的语境差异。在第二个案例中，所提出的可视性图揭示了伦敦泰晤士河沿岸多个地标连接的形式与强度，以及这些连接发生的位置。值得注意的是，泰晤士河上的桥梁约占总连接数的30%。我们的方法补充并增强了传统基于视线的可视性分析，展示了揭示城市环境中任意视觉物体普遍连接的可能性，为城市规划、遗产保护和计算社会科学开辟了新的研究视角。