建筑大师设计的卓越建筑是人类共同的财富,体现了他们的设计技巧和理念,是普通建筑设计师所不具备的。与依赖大量脑力劳动进行创新设计和绘图的传统方法相比,人工智能(AI)方法大大提高了设计过程的创造力和效率。它克服了传统扩散模型中生成高质量设计的指定风格困难的问题。
一、简介
1.1. 背景和动机
Often the icon of a city, excellent architecture can attract tourists and promote local economic development [
1]. However, designing outstanding architecture via conventional design methods poses multiple challenges. For one thing, conventional design methods involve a significant amount of manual drawing and design modifications [
2,
3,
4], resulting in low design efficiency [
4]. For another thing, cultivating designers with superb skills and ideas usually proves difficult [
4,
5], hence low-quality and inefficient architectural designs [
5,
6]. Such issues in the construction industry warrant urgent solutions.
1.2. Problem Statement and Objectives
Artificial intelligence (AI) has been widely used in daily life [
7,
8,
9,
10]. Specifically, diffusion models can assist in addressing the low efficiency and quality in architectural design. Based on the machine learning concept, diffusion models are trained by learning knowledge from a vast amount of data [
11,
12] to generate diverse designs based on text prompts [
13]. Nevertheless, the current mainstream diffusion models, such as Stable Diffusion [
14], Midjourney [
15], and DALL E2 [
11], have limited applications in architectural design due to their inability to embed specific design style and form in the generated architectural designs (
Figure 1).
Figure 1. Mainstream diffusion models compared with the proposed method for generating architectural designs. Stable Diffusion [
14] fails to generate an architectural design with a specific style, and the image is not aesthetically pleasing (left panel). The architectural design styles generated by Midjourney [
15] (second from the left) and DALL E2 [
11] (third from the left) are incorrect. None of these generated images met the design requirements. The proposed method (far right) generates architectural design in the correct design style. (Prompt: “An architectural photo in the Shu Wang style, photo, realistic, high definition”).
2. Architectural Design
Architectural designing relies on the professional skills and concepts of designers. Outstanding architectural designs play a crucial role in showcasing the image of a city [
2,
3,
4]. Moreover, iconic landmark architecture in a city stimulates local employment and boosts the tourism industry [
1,
25].
Designers typically communicate architectural design proposals with clients through visual renderings. However, this conventional method has low efficiency and low quality. The inefficiency stems from the complexity of the conventional design process involving extensive manual drawing tasks [
2,
5], such as creating 2D drawings, building 3D models, applying material textures, and rendering visual effects [
26]. This linear design process restricts client involvement in the decision-making until producing the final rendered images. If clients find the design not to meet their expectations upon viewing the final images, designers must redo the entire design, leading to repetitive modifications [
2,
3,
4]. Consequently, the efficiency of this design practice needs improvement [
26].
建筑设计质量低下的原因在于优秀设计师培养难度大、设计能力提升过程漫长。设计者缺乏设计技能导致设计质量
难以提高[ 4,5,6
]。然而,设计能力的提升是一个渐进的
过程,设计师必须不断学习新的设计方法,探索不同
的设计风格[
2,3,6,27,28,29 ]
。同时,在复杂条件下寻求最佳设计方案也给设计者带来了巨大的挑战[
2 ,
5 ]。
所有这些因素最终导致低效和低质量的架构设计 [
2 ,
5 ]。因此,必须及时将新技术引入建筑行业来解决这些问题。
3. 扩散模型
尽管扩散模型在大多数领域都表现出色,但它们在建筑设计中的应用仍然有改进的空间[
35 ,
44 ]。具体来说,限制来自于获取大量互联网数据进行训练,而这些数据缺乏具有专业架构术语的高质量注释。结果,该模型在学习过程中无法在建筑设计和建筑语言之间建立联系,这使得使用专业设计词汇对建筑设计生成进行指导变得
具有挑战性[ 45,46,47,48]。因此,有必要收集高质量的建筑设计图像,用相关信息对其进行注释,然后对模型进行微调以使其适应建筑设计任务。
4. 模型微调
扩散模型通过针对新场景的整个再训练或微调来学习新的知识和概念。由于整个模型重新训练的成本巨大,需要大量图像数据集,并且训练时间较长[
11 ,
15 ],模型微调是目前最可行的。
有四种标准微调方法。第一个是文本反转[
11,36,46,49 ],即冻结文本到图像模型,仅提供最合适的嵌入向量来
嵌入新知识
。该方法提供了快速的模型训练和最少的生成模型,但图像生成效果普通。第二种是Hypernetwork[
47 ]方法,即在原始扩散模型的中间层插入一个单独的小神经网络来影响输出。该方法训练速度较快,但图像生成效果一般。第三个是LoRA[
48],即为跨层注意力分配权重,以允许学习新知识。该方法在中等训练时间后可以生成平均数百MB大小的模型,且图像生成效果较好。第四种是Dreambooth[
45 ]方法,即对原始扩散模型进行整体微调。使用该方法,设计了先验保留损失来训练扩散模型,使其能够生成符合提示的图像,同时防止过度拟合[
50 ,
51 ]。命名新知识时建议使用稀有词汇,以避免由于与原始模型词汇相似而导致语言漂移 [
50 ,
51]。该方法只需要特定主题的 3 到 5 个图像以及相应的文本描述,即可针对特定情况进行微调,并将特定文本描述与输入图像的特征相匹配。微调模型根据特定主题词和一般描述符生成图像[
31 ,
46 ]。由于整个模型是使用 Dreambooth 方法进行微调的,因此产生的结果通常是这些方法中最好的。
This entry is adapted from the peer-reviewed paper 10.3390/buildings13092285