Automation in Interior Space Planning: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , , , ,

In interior space planning, the furnishing stage usually entails manual iterative processes, including meeting design objectives, incorporating professional input, and optimizing design performance. Machine learning has the potential to automate and improve interior design processes while maintaining creativity and quality.

  • interior space planning
  • CGANs
  • image-to-image translation

1. Introduction

To provide a clear representation of a given design plan, comprehensive drawings of the walls, doors, windows, and furniture are required. Designers tend to manually depict their furnishing work using Computer-Aided Design (CAD) software based on their knowledge and experience and in line with their clients’ requests [1]. Optimally integrating furniture items within floor plans plays a pivotal role in various stages of the design process. Not only do furniture arrangements serve to unveil and communicate the quality of a given space, but they also accentuate the space’s functionality and performance. As such, when incorporating furniture items into floor plans, interior designers meticulously consider a wide array of factors in order to create a space that is both aesthetically pleasing and functional.
To assist with this process, architectural automation has been rapidly improving via the utilization of techniques such as parametric design and generative design within CAD software. The recent emergence of Machine-Learning (ML) and Artificial Intelligence (AI) techniques has introduced processes that impact a wide range of fields [2]. As these advances become more widely incorporated in architectural designs, it is critical to establish effective assessment methods for measuring their impact on design quality. In turn, it is essential to investigate how these technologies might be utilized to improve design processes while examining their possible challenges and limitations.

2. Automation in Interior Space Planning

2.1. Machine Learning in Architectural Design Processes

The rapidly expanding field of ML is frequently employed in applications such as online browsing, information filtering, and credit card fraud detection. One key strength of ML models is their ability to learn complex rules without being explicitly programmed, thereby achieving human-like performance [2]. Most modern ML models are based on artificial neural networks, which require large training datasets in order to achieve good results [3].
In architecture, ML techniques are used for a range of tasks. For example, in legacy 2D floor plan analyses, vision-based detection models are used to identify architectural objects, such as windows, doors, and furniture in the design [4]. In similar drawings, segmentation models are used for identifying room boundaries and tracing walls [5,6]. Optical Character Recognition (OCR) models help identify function types by reading texts on scanned floor plan images [7]. Owners and real estate agents could use such methods to reduce the need to create 3D models, which require greater time and effort. Additionally, semantic and graph-based analyses of Building Information Models (BIM) offer the ability to recognize a room type based on the objects within its boundaries and on the space features [8,9].
Chaillou [10] demonstrates the potential of CGAN-type ML models for generating space and furniture layouts of the apartment’s footprint. DPrix et al. [11] use similar models for generating new exterior designs based on an existing archive of images in their architecture firm. State-of-the-art advances in image synthesis models, such as Midjourney (Midjourney, n.d.) [12], Stable Diffusion [13], and DALL-E2 [14], have also been used to generate architectural images.
The novel method presented in this research study employs the CGAN-based generation method, as proposed by [10], who provided the foundation of this line of research and demonstrated its feasibility. The researchers also presented several ways in which CGANs can be used in interior design, such as room allocation, function assignment, and furnishing. This current study focuses on the interior furnishing of a single functional room, i.e., the bathroom, and provides a dataset and comprehensive workflow that outlines the necessary steps for achieving the research aims. Finally, a set of developed evaluation criteria is presented to enable the measuring of the performance of the suggested method.

2.2. Interior Residential Design Furnishing

Throughout history, scientists and researchers have studied human body movements, with behaviors and patterns being collected as a basis for designing spaces that enhance levels of human comfort [15]. Indeed, object scales and sizes are often defined when positioned next to a human body [16]. Although definitions of terms such as “comfortable” or “qualified” spaces may be subjective, unified guidelines and standards are usually followed when designing spaces for humans.
To reduce the manual work entailed in creating and designing interior spaces, studies suggest employing interactive tools for positioning objects of furniture while responding to the users’ real-time selection of furniture pieces. Doing so could provide users with access to an editable library of furniture pieces [17]. Geman and Geman [18] address the concept of simulated annealing, which can be used to create indoor scenes for optimizing ergonomic factors. In a study conducted by [19], the researchers utilize the hierarchy between furniture objects to enable the generation of indoor furniture arrangements. Generating furniture layouts has also been achieved via geometrical analyses of objects, using algorithmic hyper-relations between and within groups of items to assign the positioning and orientation of each group. For example, the relationship between a group of objects, comprising a sofa, table, and television, could change based on the layout of the room [20]. Alternatively, Kim and Lee [21] generated furniture layouts based on a design-style perspective, whereby different types of rooms were created according to the desired style (casual, modern, classic, or natural). Their model’s dataset comprised perspective images of interior spaces in which each architectural style had distinct features, such as materials, forms, patterns, and colors.

2.3. Interior Residential Design Applications

PlanFinder, for instance, is a software plugin for CAD and BIM software that enables designers to furnish spaces with greater ease and speed simply by entering a door point and clicking on the furnish button [22]. Finch 3D, an additional online service, leverages AI and graph-based algorithms to optimize building designs and even provides performance feedback, error detection, and optimal solution identification [23]. Finally, the web-based Rayon Design service offers an intuitive online interface that allows users to make changes to their floor plans, delete walls, and add furniture from a suggested set of furniture objects. Users must, however, have a drawn floor plan prior to uploading the design [24].

2.4. Conditional Generative Adversarial Neural Networks Models

Deep neural networks and other recent ML advancements have significantly expanded the scope and accuracy of generative modeling across domains to reflect the diverse and complex nature of real-world data. These highly varied models exhibit a wide range of capabilities and characteristics that depend on the applied algorithms and parameters. These include Generative Adversarial Networks (GANs), which offer a range of solutions based on their ability to learn to generate novel data from a given set of examples [25]. Its architecture consists of two neural networks that are trained in a competing manner: the generator strives to create fake yet realistic data samples, while the discriminator attempts to distinguish between the real and fake samples. Moreover, the generator continuously improves its ability to generate increasingly realistic data through this adversarial process, while the discriminator continuously improves its ability to detect real versus fake data [26]. GANs have been used in a wide range of applications, including image generation [27], text-to-image synthesis [28], and video synthesis [29]. Conditional GANs (CGANs) improve GAN models by consuming a given condition as input, such as the category or class label desired for generation. The CGAN generator is trained to generate images according to the condition injected into it. The CGAN discriminator is trained not only to distinguish between real and fake data but also to consider added conditioning information. CGANs have been found to be useful in image-to-image translation tasks, where they translate an input image from one domain into a corresponding output image from a different domain; they do so by maintaining some elements of the original image as the basis of the newly generated one. This process strives to learn the mapping between input and output pictures so that the generated output image resembles the desired target image as precisely as possible [30].
The CGAN family includes many different models. For example, based on label mappings, the pix2pix model learns to map between pairs of input–output images and then produces a single synthesized image. This model involves training a generator network based on the input image using two losses. The first is a regression loss that helps the generator produce an output image that is similar to a paired ground-truth image. The second is a learned discriminator loss that encourages the generator to produce realistic images [31]. This model has been tested on various architectural datasets, such as labels-to-street and labels-to-facades scenes [30]. An additional CGAN model is SPADE, which is based on a more advanced mechanism for turning a semantic label map into a photorealistic image. The model converts the labels and combines them with the style to generate the desired realistic content [32]. Finally, the BicycleGAN model offers a multimodal image-to-image translation, generating multiple potential images for the same input. This model proposes a latent mapping vector between the generator and the discriminator, encouraging a bijection between the output and the latent space, thereby leading to less ambiguous or diverse results [31].

2.5. CGAN Models in Architecture

Huang and Zheng [33] utilized a variant of pix2pix to segment room boundaries and detect doors and windows. In an additional study, Yang et al. [34] applied certain modifications to the standard CGAN model to create functional floor divisions. Next, they deployed a fully connected, three-tier neural network for each type of functional area to place each piece of furniture within each space. Yet, the researchers do not measure or quantify the quality of such placings. Chaillou [10] introduced the capabilities of CGAN models in generating interior design, floor plan layout, and urban footprint. His groundbreaking work demonstrates the potential of ML to generate architectural designs by filling in “empty” black-and-white architectural drawings. Chaillou [10] proposes using this framework to address the style and organization of different scale floor plans. Yet, the researcher does not provide comprehensive descriptions or details of the employed training processes, GAN models, or the employed settings. In this study, CGAN models such as pix2pix, SPADE, and BicycleGAN were trained to generate furnishing layouts in empty floor plans using the room data that were created in this study. This work presents the training process of the models together with the generated results and proposes various metrics for quantifying the quality of the generated results.

2.6. Available Architectural Datasets

Most Internet databases provide photographed perspective images of rooms that can be gathered by anyone, anywhere, via their mobile devices. Examples of such datasets include the 3D-FRONT Benchmark of synthetic indoor scenes and a large number of rooms [35]. Kaggle, an online community of data scientists and ML professionals, exposes images of homes that are taken on mobile telephones and that appear on the Renthop website, which helps people find home rentals (RentHop|Kaggle) [36,37]. The walls, doors, windows, and room boundaries in the dataset are presented via three online sources: (1) Rent3D (R3D) [36], which displays 222 images of round-shaped layouts, as well as straight ones with nonuniform wall thickness; (2) Raster-to-Vector (R2V), which includes 815 images of rectangular shapes with uniform wall thickness [6,38]; (3) CVC-FP, which includes 500 images in two versions, i.e., original floor plans in black-and-white and color-mapped images [39,40]. The international Ikea furniture chain has an online furniture dataset that uses furniture objects from their product lists and 298 room scene photos to convey the context in which these objects can be used or placed [41]. Each of these available public datasets was created for a specific research topic and does not meet the requirements of this research since this case study requires a flexible dataset, one that includes top-view room layouts that can be modified to include or exclude furnishings as a means for training the models to achieve the defined aims. A customized database, therefore, was created, as well as a tool that can generate various types of architectural datasets.

2.7. Machine Learning Dataset Formats and Available Annotation Tools

Comma-separated values (CSV) are a popular format for conducting text-data loading; a text can contain numbers or strings organized in a tabular format [42]. JavaScript Object Notation (JSON) is another well-known format that is easy to read for humans and can easily be parsed via machines. This format is interchangeable between several computer languages and may hold a variety of data [43]. The Common Objects in Context (COCO) format is a specific JavaScript Object Notation (JSON) structure format. COCO JSON files are commonly used for object detection and segmentation purposes, with files being able to store entire data, and with translators into numerous formats [44]. Some online image annotation platforms provide segmentation polygons and bounding box detection, with a reference label to imported images. One example of such a platform is the CVAT website [45]. Image labeling necessitates a significant amount of time and resources since it is performed manually via annotators. As such, online dataset annotation platforms may hinder the efficiency with which custom data is created. Designers do not typically use these online platforms in their work, and, to the best of the authors’ knowledge, no online tagging tool exists that is compatible with architectural CAD programs. Thus, those who best understand and have the most to gain from ML design-automation methods, i.e., designers, are not included in the development processes.

2.8. Evaluation Metrics for CGAN Models

2.8.1. GANs Evaluation Metrics

When dealing with image-to-image models, the following two evaluation metrics are commonly used to measure similarities and differences between two sets of images and to evaluate the quality of the generated images in CGANs models: Fréchet Inception Distance (FID) and Kernel Inception Distance (KID) [46].
FID is a metric for measuring the distance between two multivariate Gaussian distributions, which are calculated from the real and generated image features that are extracted using a pre-trained Inception-v3 network. The FID score is calculated by first computing the mean and covariance of the feature representations for real and generated images discretely. Next, the distance between the two sets of images is calculated based on the statistics of their feature representations in the pre-trained neural network. Lower FID scores mean smaller distances, indicating higher similarities between the real and generated image distributions. However, FID requires a large dataset and can be overly sensitive to the choice of the pre-trained network used to extract the image features. As such, it may fail to capture certain aspects of image quality, such as diversity or coherence [47,48]. KID is also based on feature representations of the images extracted using a pre-trained Inception-v3 network, a statistical method that is used to measure the similarity between two sets of data, such as real and generated images. Both FID and KID measure distances between the distributions of deep features of real and fake images. They differ in how they measure these differences. However, KID can provide accurate results with less data since its predicted value is independent of the sample size [46,49].

2.8.2. Object Detection Metrics

Precision, recall, and intersection over union (IOU) metrics are commonly used for evaluating computer vision tasks, such as object detection, tracking, and segmentation [50,51]. These metrics compare the generated image against its respective ground truth. Precision is the metric that measures the proportion of true positive predictions out of the total number of predictions and can be seen as a measure of quality. Recall, on the other hand, is the metric that measures the proportion of true positive results out of the total number of actual positive instances and can be seen as a measure of quantity [50]. Finally, the IOU calculates the intersection of the predicted and ground-truth bounding boxes divided by the union of the two boxes.
CGAN models are expected to generate new and optimally synthesized images while considering the overall features of the dataset. Although these three detection metrics cannot be used to compare the generated images to overall ground-truth ones, it is important to find an alternative that will enable the evaluation of each generated image against its respective ground truth to evaluate similarities and diversities of the output.

This entry is adapted from the peer-reviewed paper 10.3390/buildings13071793

This entry is offline, you can click here to edit this entry!
Video Production Service