Recent advancements in multimodal Generative AI have the potential to democratize specialized architectural tasks, such as interpreting technical drawings and creating 3D CAD models, which traditionally require expert knowledge. This paper presents a comparative evaluation of two systems: GPT-4o and Claude 3.5, in the task of architectural 3D synthesis. We conduct a case study on two buildings from Palladio's Four Books of Architecture (1965): Villa Rotonda and Palazzo Porto. High-level architectural models and drawings of these buildings were prepared, inspired by Palladio's original texts and drawings. Through sequential text and image prompting, we assess the systems' abilities in (1) interpreting 2D and 3D representations of buildings from drawings, (2) encoding the buildings into a CAD software script, and (3) self-improving based on outputs. While both systems successfully generate individual parts, they struggle to accurately assemble these parts into the desired spatial relationships, with Claude 3.5 demonstrating better performance, particularly in self-correcting its output. This study contributes to ongoing research on benchmarking the strengths and weaknesses of off-the-shelf AI systems in performing intelligent human tasks that require discipline-specific knowledge. The findings highlight the potential of language-enabled AI systems to act as collaborative technical assistants in the architectural design process.
翻译:近年来,多模态生成式人工智能的进展有望使解读技术图纸和创建三维CAD模型等传统上需要专业知识的高度专业化建筑任务实现大众化。本文对GPT-4o和Claude 3.5两个系统在建筑三维合成任务中的表现进行了比较评估。我们选取帕拉第奥《建筑四书》(1965年)中的两座建筑——圆厅别墅和波尔图宫——作为案例研究对象。依据帕拉第奥原始文本与图纸,我们制备了这些建筑的高层级建筑模型与图纸。通过序列化的文本与图像提示,我们评估了系统在以下方面的能力:(1)从图纸中解读建筑的二维与三维表征;(2)将建筑编码为CAD软件脚本;(3)基于输出进行自我改进。尽管两个系统均能成功生成独立构件,但在将这些构件精确组装为预期空间关系方面存在困难,其中Claude 3.5表现出更优的性能,尤其在自我修正输出方面。本研究为持续开展的基准测试研究提供了贡献,旨在评估现成人工智能系统在执行需要学科特定知识的智能人类任务时的优势与局限。研究结果凸显了具备语言能力的人工智能系统在建筑设计过程中作为协作技术助手的潜力。