Cultural heritage restoration in Bangladesh faces a dual challenge of limited resources and scarce technical expertise. Traditional 3D digitization methods, such as photogrammetry or LiDAR scanning, require expensive hardware, expert operators, and extensive on-site access, which are often infeasible in developing contexts. As a result, many of Bangladesh's architectural treasures, from the Paharpur Buddhist Monastery to Ahsan Manzil, remain vulnerable to decay and inaccessible in digital form. This paper introduces Oitijjo-3D, a cost-free generative AI framework that democratizes 3D cultural preservation. By using publicly available Google Street View imagery, Oitijjo-3D reconstructs faithful 3D models of heritage structures through a two-stage pipeline - multimodal visual reasoning with Gemini 2.5 Flash Image for structure-texture synthesis, and neural image-to-3D generation through Hexagen for geometry recovery. The system produces photorealistic, metrically coherent reconstructions in seconds, achieving significant speedups compared to conventional Structure-from-Motion pipelines, without requiring any specialized hardware or expert supervision. Experiments on landmarks such as Ahsan Manzil, Choto Sona Mosque, and Paharpur demonstrate that Oitijjo-3D preserves both visual and structural fidelity while drastically lowering economic and technical barriers. By turning open imagery into digital heritage, this work reframes preservation as a community-driven, AI-assisted act of cultural continuity for resource-limited nations.
翻译:孟加拉国的文化遗产修复面临着资源有限与技术专长稀缺的双重挑战。传统的三维数字化方法,如摄影测量或激光雷达扫描,需要昂贵的硬件、专业操作人员以及大量的现场访问权限,这在发展中国家往往难以实现。因此,从巴哈尔普尔佛教寺院到阿赫桑曼济勒,孟加拉国的许多建筑瑰宝仍易遭受自然侵蚀,且无法以数字形式保存。本文介绍了Oitijjo-3D,一个免费、民主化的三维文化遗产保护生成式AI框架。该框架利用公开可用的谷歌街景图像,通过两阶段流程重建文化遗产结构的精确三维模型:首先使用Gemini 2.5 Flash Image进行多模态视觉推理以实现结构-纹理合成,随后通过Hexagen进行神经图像到三维的几何恢复。该系统可在数秒内生成具有照片级真实感且度量一致的重建结果,相比传统的运动恢复结构流程实现了显著的加速,且无需任何专用硬件或专家监督。在阿赫桑曼济勒、乔托索纳清真寺及巴哈尔普尔等地标建筑的实验表明,Oitijjo-3D在保持视觉与结构保真度的同时,大幅降低了经济与技术门槛。通过将开放图像转化为数字遗产,本研究将文化遗产保护重塑为资源有限国家中一种社区驱动、AI辅助的文化延续行动。