Computer vision models trained on Google Street View images can create material cadastres. However, current approaches need manually annotated datasets that are difficult to obtain and often have class imbalance. To address these challenges, this paper fine-tuned a Swin Transformer model on a synthetic dataset generated with DALL-E and compared the performance to a similar manually annotated dataset. Although manual annotation remains the gold standard, the synthetic dataset performance demonstrates a reasonable alternative. The findings will ease annotation needed to develop material cadastres, offering architects insights into opportunities for material reuse, thus contributing to the reduction of demolition waste.
翻译:计算机视觉模型在谷歌街景图像上的训练能够生成材料地籍。然而,现有方法需要人工标注的数据集,这类数据集难以获取且常存在类别不平衡问题。为应对这些挑战,本文在利用DALL-E生成的合成数据集上微调了Swin Transformer模型,并将其性能与类似的人工标注数据集进行了对比。尽管人工标注仍为黄金标准,但合成数据集的性能表现表明其是一种合理的替代方案。研究结果将简化开发材料地籍所需的标注工作,为建筑师提供材料再利用的机遇洞察,从而助力减少建筑拆除废弃物。