人工智能生成大师级建筑设计

人工智能生成大师级建筑设计: History

View Latest Version

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Architecture And Design

Contributor:

Junming Chen

Duolin Wang

Zichun Shao

Xu Zhang

Mengchao Ruan

Huiting Li

Jiaqi Li

建筑大师设计的卓越建筑是人类共同的财富，体现了他们的设计技巧和理念，是普通建筑设计师所不具备的。与依赖大量脑力劳动进行创新设计和绘图的传统方法相比，人工智能（AI）方法大大提高了设计过程的创造力和效率。它克服了传统扩散模型中生成高质量设计的指定风格困难的问题。

architectural design
text to design
design process optimization
design quality
design style
diffusion model

一、简介

1.1. 背景和动机

Often the icon of a city, excellent architecture can attract tourists and promote local economic development [1]. However, designing outstanding architecture via conventional design methods poses multiple challenges. For one thing, conventional design methods involve a significant amount of manual drawing and design modifications [2,3,4], resulting in low design efficiency [4]. For another thing, cultivating designers with superb skills and ideas usually proves difficult [4,5], hence low-quality and inefficient architectural designs [5,6]. Such issues in the construction industry warrant urgent solutions.

1.2. Problem Statement and Objectives

Artificial intelligence (AI) has been widely used in daily life [7,8,9,10]. Specifically, diffusion models can assist in addressing the low efficiency and quality in architectural design. Based on the machine learning concept, diffusion models are trained by learning knowledge from a vast amount of data [11,12] to generate diverse designs based on text prompts [13]. Nevertheless, the current mainstream diffusion models, such as Stable Diffusion [14], Midjourney [15], and DALL E2 [11], have limited applications in architectural design due to their inability to embed specific design style and form in the generated architectural designs (Figure 1).

Figure 1. Mainstream diffusion models compared with the proposed method for generating architectural designs. Stable Diffusion [14] fails to generate an architectural design with a specific style, and the image is not aesthetically pleasing (left panel). The architectural design styles generated by Midjourney [15] (second from the left) and DALL E2 [11] (third from the left) are incorrect. None of these generated images met the design requirements. The proposed method (far right) generates architectural design in the correct design style. (Prompt: “An architectural photo in the Shu Wang style, photo, realistic, high definition”).

2. Architectural Design

Architectural designing relies on the professional skills and concepts of designers. Outstanding architectural designs play a crucial role in showcasing the image of a city [2,3,4]. Moreover, iconic landmark architecture in a city stimulates local employment and boosts the tourism industry [1,25].

Designers typically communicate architectural design proposals with clients through visual renderings. However, this conventional method has low efficiency and low quality. The inefficiency stems from the complexity of the conventional design process involving extensive manual drawing tasks [2,5], such as creating 2D drawings, building 3D models, applying material textures, and rendering visual effects [26]. This linear design process restricts client involvement in the decision-making until producing the final rendered images. If clients find the design not to meet their expectations upon viewing the final images, designers must redo the entire design, leading to repetitive modifications [2,3,4]. Consequently, the efficiency of this design practice needs improvement [26].

建筑设计质量低下的原因在于优秀设计师培养难度大、设计能力提升过程漫长。设计者缺乏设计技能导致设计质量难以提高[ 4,5,6 ]。然而，设计能力的提升是一个渐进的过程，设计师必须不断学习新的设计方法，探索不同的设计风格[ 2,3,6,27,28,29 ] 。同时，在复杂条件下寻求最佳设计方案也给设计者带来了巨大的挑战[ 2 , 5 ]。

所有这些因素最终导致低效和低质量的架构设计 [ 2 , 5 ]。因此，必须及时将新技术引入建筑行业来解决这些问题。

3. 扩散模型

近年来，扩散模型迅速发展成为主流的图像生成模型[ 19,20,21,22,30,31 ] ，可以使设计者快速获取图像，从而显着提高建筑设计效率和质量[ 13 , 32 ]。

传统的扩散模型包括前向过程和后向过程。在前向过程中，噪声不断地添加到输入图像中，将其转变为噪声图像。后向过程的目的是从噪声图像中恢复原始图像[ 33 ]。通过学习图像去噪过程，扩散模型获得了生成图像的能力[ 32 , 34 ]。当需要生成具有特定设计元素的图像时，设计者可以将文本提示纳入扩散模型的去噪过程中，以生成一致的图像并实现对生成结果的可控性 [ 35、36、37、38、39 ]，40 ]。使用文本引导扩散模型进行图像生成的优点在于它们可以简单地控制图像的生成[ 12,14,41,42,43 ]。

尽管扩散模型在大多数领域都表现出色，但它们在建筑设计中的应用仍然有改进的空间[ 35 , 44 ]。具体来说，限制来自于获取大量互联网数据进行训练，而这些数据缺乏具有专业架构术语的高质量注释。结果，该模型在学习过程中无法在建筑设计和建筑语言之间建立联系，这使得使用专业设计词汇对建筑设计生成进行指导变得具有挑战性 [ 45,46,47,48]。因此，有必要收集高质量的建筑设计图像，用相关信息对其进行注释，然后对模型进行微调以使其适应建筑设计任务。

4. 模型微调

扩散模型通过针对新场景的整个再训练或微调来学习新的知识和概念。由于整个模型重新训练的成本巨大，需要大量图像数据集，并且训练时间较长[ 11 , 15 ]，模型微调是目前最可行的。

有四种标准微调方法。第一个是文本反转[ 11,36,46,49 ]，即冻结文本到图像模型，仅提供最合适的嵌入向量来嵌入新知识。该方法提供了快速的模型训练和最少的生成模型，但图像生成效果普通。第二种是Hypernetwork[ 47 ]方法，即在原始扩散模型的中间层插入一个单独的小神经网络来影响输出。该方法训练速度较快，但图像生成效果一般。第三个是LoRA[ 48]，即为跨层注意力分配权重，以允许学习新知识。该方法在中等训练时间后可以生成平均数百MB大小的模型，且图像生成效果较好。第四种是Dreambooth[ 45 ]方法，即对原始扩散模型进行整体微调。使用该方法，设计了先验保留损失来训练扩散模型，使其能够生成符合提示的图像，同时防止过度拟合[ 50 , 51 ]。命名新知识时建议使用稀有词汇，以避免由于与原始模型词汇相似而导致语言漂移 [ 50 , 51]。该方法只需要特定主题的 3 到 5 个图像以及相应的文本描述，即可针对特定情况进行微调，并将特定文本描述与输入图像的特征相匹配。微调模型根据特定主题词和一般描述符生成图像[ 31 , 46 ]。由于整个模型是使用 Dreambooth 方法进行微调的，因此产生的结果通常是这些方法中最好的。

This entry is adapted from the peer-reviewed paper 10.3390/buildings13092285

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.