The integration of Large Vision-Language Models (LVLMs) such as OpenAI's GPT-4 Vision into various sectors has marked a significant evolution in the field of artificial intelligence, particularly in the analysis and interpretation of visual data. This paper explores the practical application of GPT-4 Vision in the construction industry, focusing on its capabilities in monitoring and tracking the progress of construction projects. Utilizing high-resolution aerial imagery of construction sites, the study examines how GPT-4 Vision performs detailed scene analysis and tracks developmental changes over time. The findings demonstrate that while GPT-4 Vision is proficient in identifying construction stages, materials, and machinery, it faces challenges with precise object localization and segmentation. Despite these limitations, the potential for future advancements in this technology is considerable. This research not only highlights the current state and opportunities of using LVLMs in construction but also discusses future directions for enhancing the model's utility through domain-specific training and integration with other computer vision techniques and digital twins.
翻译:以OpenAI的GPT-4 Vision为代表的大型视觉语言模型(LVLMs)与各行业的融合,标志着人工智能领域尤其在视觉数据分析和解读方面取得了重大进展。本文探讨了GPT-4 Vision在建筑行业中的实际应用,重点关注其在监控与追踪施工项目进度方面的能力。研究利用施工现场的高分辨率航拍图像,检验了GPT-4 Vision如何执行详细的场景分析并跟踪随时间推移的发展变化。研究结果表明,尽管GPT-4 Vision在识别施工阶段、材料与机械设备方面表现出色,但在精确的目标定位与分割方面仍面临挑战。尽管存在这些局限,该技术未来发展的潜力依然巨大。本研究不仅揭示了LVLMs在建筑领域应用的现状与机遇,还讨论了通过领域特定训练、与其他计算机视觉技术及数字孪生系统集成,以提升模型实用性的未来发展方向。