Our study introduces a novel, low-cost, and reproducible framework for real-time, object-level structural assessment and geolocation of roadside vegetation and infrastructure with commonly available but underutilized dashboard camera (dashcam) video data. We developed an end-to-end pipeline that combines monocular depth estimation, depth error correction, and geometric triangulation to generate accurate spatial and structural data from street-level video streams from vehicle-mounted dashcams. Depth maps were first estimated using a state-of-the-art monocular depth model, then refined via a gradient-boosted regression framework to correct underestimations, particularly for distant objects. The depth correction model achieved strong predictive performance (R2 = 0.92, MAE = 0.31 on transformed scale), significantly reducing bias beyond 15 m. Further, object locations were estimated using GPS-based triangulation, while object heights were calculated using pin hole camera geometry. Our method was evaluated under varying conditions of camera placement and vehicle speed. Low-speed vehicle with inside camera gave the highest accuracy, with mean geolocation error of 2.83 m, and mean absolute error (MAE) in height estimation of 2.09 m for trees and 0.88 m for poles. To the best of our knowledge, it is the first framework to combine monocular depth modeling, triangulated GPS-based geolocation, and real-time structural assessment for urban vegetation and infrastructure using consumer-grade video data. Our approach complements conventional RS methods, such as LiDAR and image by offering a fast, real-time, and cost-effective solution for object-level monitoring of vegetation risks and infrastructure exposure, making it especially valuable for utility companies, and urban planners aiming for scalable and frequent assessments in dynamic urban environments.
翻译:本研究提出了一种新颖、低成本且可复现的框架,用于利用普遍可用但未充分利用的行车记录仪视频数据,对路边植被和基础设施进行实时、对象级别的结构评估与地理定位。我们开发了一个端到端的处理流程,该流程结合了单目深度估计、深度误差校正和几何三角测量,以从车载行车记录仪采集的街景级视频流中生成精确的空间和结构数据。深度图首先使用最先进的单目深度模型进行估计,然后通过梯度提升回归框架进行细化,以校正(特别是针对远处物体的)深度低估问题。深度校正模型展现出强大的预测性能(在转换尺度上 R² = 0.92,MAE = 0.31),显著降低了15米以外的偏差。此外,物体位置通过基于GPS的三角测量进行估计,而物体高度则利用针孔相机几何原理计算得出。我们的方法在不同相机放置位置和车辆速度条件下进行了评估。配备车内相机、低速行驶的车辆获得了最高的精度,其平均地理定位误差为2.83米,对于树木和杆状物的高度估计平均绝对误差分别为2.09米和0.88米。据我们所知,这是首个结合了单目深度建模、基于GPS的三角测量地理定位以及利用消费级视频数据对城市植被和基础设施进行实时结构评估的框架。我们的方法为植被风险和基础设施暴露的对象级监测提供了一个快速、实时且经济高效的解决方案,从而补充了传统的遥感方法(如LiDAR和航拍图像)。这使得该方法对于公用事业公司和城市规划者尤其有价值,他们旨在动态城市环境中进行可扩展且频繁的评估。