Bridging Geographic Bias in Urban Streetscape Inference via Lifelong Learning with Visual-Semantic Pivoting

Visual perception of urban streetscapes underpins evidence-based decisions in landscape planning, public health, and place-making. Yet models trained on a few well-photographed metropolises systematically misjudge underrepresented districts, propagating geographic bias into downstream policy. We address this gap with HVSP-LL, a lifelong learning framework that couples a stratified visual-semantic pivoting module with an equity-aware rehearsal mechanism. The pivoting module organises landscape concepts along a three-tier ontology (macro structure, meso composition, micro element) and aligns image features to learnable semantic anchors at each tier, providing transferable representations that resist distributional drift. The lifelong adaptation component sequentially absorbs new urban regions while constraining inter-region perception gaps through a worst-region sample-reweighting objective and a structurally-aware exemplar buffer. We evaluate HVSP-LL on a panoramic streetscape benchmark assembled from twelve cities across four continents and seven perceptual dimensions. The framework attains 0.834 Spearman correlation on the held-out city sequence, an absolute 6.1 point improvement over the strongest continual baseline, and shrinks the inter-city perception gap to 0.094 -- a 38% reduction relative to the strongest continual baseline (0.151) and a 57% reduction relative to a representative regularisation baseline (0.218). Ablations confirm that each tier of the pivoting hierarchy contributes monotonically, and the equity-aware rehearsal converts mean backward transfer from -0.038 (without retention) to +0.013, eliminating catastrophic forgetting on the held-out sequence. Our results indicate that hierarchical anchoring is a practical pathway toward geographically equitable streetscape inference at city scale.

翻译：城市街景的视觉感知支撑着景观规划、公共卫生和场所营造中的循证决策。然而，基于少数精心拍摄的大都市训练出的模型会系统性地误判代表性不足的区域，将地理偏差传播到下游政策中。为解决这一问题，我们提出了HVSP-LL，一个终身学习框架，它结合了分层视觉-语义枢轴模块和公平感知排练机制。枢轴模块沿三层本体（宏观结构、中观构成、微观要素）组织景观概念，并将图像特征与每层的可学习语义锚点对齐，提供抗分布漂移的可迁移表示。终身自适应组件通过最差区域样本重加权目标和结构感知示例缓冲区，在顺序吸收新的城市区域的同时约束区域间感知差距。我们在一个由来自四大洲十二个城市和七个感知维度组成的全景街景基准上评估了HVSP-LL。该框架在保留的城市序列上达到了0.834的斯皮尔曼相关系数，比最强的持续学习基线绝对提高了6.1个百分点，并将城市间感知差距缩小到0.094——相比最强的持续学习基线（0.151）减少了38%，相比代表性的正则化基线（0.218）减少了57%。消融实验证实，枢轴层级结构的每一层都单调地贡献了性能，公平感知排练将平均后向迁移从-0.038（无保留）转换为+0.013，消除了保留序列上的灾难性遗忘。我们的结果表明，分层锚定是实现城市尺度地理公平街景推理的实用途径。