A growing number of soundscape studies involving audiovisual factors have been conducted; however, their bimodal and interactive effects on indoor soundscape evaluations have not yet been thoroughly reviewed. The overarching goal of this systematic review was to develop the framework for designing sustainable indoor soundscapes by focusing on audiovisual factors and relations. A search for individual studies was conducted through three databases and search engines: Scopus, Web of Science, and PubMed. Based on the qualitative reviews of the selected thirty papers, a framework of indoor soundscape evaluation concerning visual and audiovisual indicators was proposed. Overall, the greenery factor was the most important visual variable, followed by the water features and moderating noise annoyance perceived by occupants in given indoor environments. The presence of visual information and sound-source visibility would moderate perceived noise annoyance and influence other audio-related perceptions. Furthermore, sound sources would impact multiple perceptual responses (audio, visual, cognitive, and emotional perceptions) related to the overall soundscape experiences when certain visual factors are interactively involved. The proposed framework highlights the potential use of the bimodality and interactivity of the audiovisual factors for designing indoor sound environments in more effective ways.
In environmental acoustics research, the term “soundscape” is defined as the acoustic environment as perceived or experienced and/or understood by a person or people in context . Soundscape approaches help us create a healthy and comfortable sound environment for human beings to live in by promoting the sound experience’s quality rather than reducing unwanted sound stimuli, as considered sound as a resource rather than a waste . As soundscapes involve human perceptual constructs and experiences into physical acoustic phenomena under various environmental settings, their target environments are almost everywhere people experience/perceive sounds.
Although the original concept of the soundscape approach was derived from outdoor environmental research , its application has been recently expanded to indoor built spaces (i.e., indoor soundscapes) . There are several categories of indoor spaces that have been suggested for indoor soundscape evaluations, such as industrial/commercial buildings, music venues, and transportation space . Among those indoors, non-industrial building spaces, including working spaces (e.g., office spaces, classrooms) and homes (e.g., residential buildings, apartments), should be chiefly considered because people spend more time in those spaces as they live in modern cities . Therefore, it is crucial to focus on these types of indoor spaces and seek potential factors that would influence their soundscapes and environmental assessments.
The utilization of factors related to the audio environment is frequently seen for improving soundscape perceptions. Torresin et al. stated that a large part of the indoor-soundscape literature showed a general effort of minimizing noise annoyance by reducing noise exposure (i.e., in noise levels) . When altering the noise exposure degree, the responses from the same domains (e.g., audio-related perceptions) are clearly expected to be changed or predicted; however, the use of such a unimodal effect of noise exposure may not be a feasible solution because noise exposure reduction may not necessarily reflect the better soundscape perceptions .
As a variety of multisensory environmental factors and their variations are comprehensively influence soundscape experience in built environments  (pp. 17–41), their potential impacts on human perceptions should not be neglected and, in particular, a variety of non-acoustical factors are proposed for affecting soundscape perceptions . By reviewing a large body of indoor soundscape studies, we found that various categories of non-acoustical factors influencing the acoustic perception in indoor residential buildings are proposed: urban context (e.g., presence of green space, sea views at home), house-related (e.g., room location), person-related (e.g., age, gender, noise sensitivity), socio-economic (e.g., education level, income), and so on . Although listing those potential factors would enrich the existing framework of soundscape designs, the evidence of their effects is still ambiguous; therefore, it is still insufficient to fully utilize the non-acoustical factors to provide better soundscapes for future sustainable designs and human well-being.
Among the potential factors unrelated to the audio environments, one of the most prominent non-auditory factors is visual or visual-related features for two reasons. First, there is supportive evidence that visual stimuli from our sight influence the auditory system, including at the perceptual level. As human beings, the auditory and visual systems are two sensory modalities with distinct cortical representations; however, these sensory signals are often associated with the same objects and events and binding these two stimuli together is done naturally and effortlessly . By reviewing recent studies involving audiovisual interactions, Bulkin et al. highlighted the perceptual advantages of combining information from these two modalities as the visual and auditory systems’ roles overlap . It was also stated that predominantly unimodal brain regions play a role in multisensory processing . Thus, it is evident to propose the visual factor as the most promising feature that potentially influences audio-related perceptions. Second, there has been a growing interest in audiovisual combined effects or interactions, suggesting a critical role of visual factors in altering the soundscape perceptions .
Initial prescreening of the literature review articles published on the topic of soundscapes, was conducted to highlight the research interests in audiovisual effects on soundscape perceptions. By exploring 27 recent review articles in soundscape literature, 12 systematic reviews (i.e., articles clearly stating that they used systematical procedures for data extraction or following certain systematic procedures for data investigations in their methodology parts) were identified and further divided into two categories: location-specific  and concept-specific . Of the seven articles focusing on specific locations (e.g., outdoors, indoor spaces, residential spaces), two reviews intensively explored the existing studies examining the audiovisual effects. One review focused on the greenery effects on annoyance perceptions in indoor residential settings, considering greenery as visual measures ; however, no other soundscape perceptions or visual elements were examined. Another examined the audiovisual interactions in the urban built space, but they used only one search engine (Scopus) and did not exclusively examine the indoor effects as the review was broader in content . Of the remaining ten systematic reviews, none of them involved visual measures, sought bimodal or interactive effects of audio and visual factors, or specified these effects on soundscape evaluations. Of the 27 review works, the remaining fifteen articles were non-systematic (because they did not follow the systematical reviewing process) , but one of these non-systematic review papers by van Renterghem  focused on the effect of visual factors on acoustical perceptions. The positive impacts of visible vegetation for mitigating negative environmental noise perception, mostly focusing on noise annoyance. Some of the non-systematic review articles partially discussed the bimodal and interactive effects of audiovisual factors. Especially, Torresin et al.  distinguished the crossed (bimodal effect in the present study) and interactive effect of four IEQ (Indoor Environmental Quality) factors (i.e., acoustical, thermal, visual, and indoor air quality) and mentioned a few research papers examining the audiovisual interactions on human perception for indoor built environments; however, insufficient evidence was identified. The rationale for mapping the identified existing reviews is illustrated in Figure 1.
Figure 1. A rationale of the soundscape review articles (n = 27, published from 2000 to 2020) for audio, Indoor Environmental Quality (IEQ), and audiovisual effects. The initial prescreening of the literature review papers published on the topic of soundscapes, was conducted to clearly identify the research gap which this systematic review aims to cover.
Although several audiovisual factors have been suggested in previous literature surveys, none of the articles reviewed above deliberately explored the existing literature involving audiovisual bimodal and interactive effects on indoor soundscape assessments. Thus, updated research and review of individual studies should be conducted. Besides, there is a lack of framework showing bimodal and interactive effects of audiovisual factors on indoor soundscape evaluations, whereas their unimodal influences are commonly acknowledged, and most of the studies were conducted for the outdoor environments. Moreover, to formulate the framework of sustainable soundscape development, the utilization of the most prominent visual factors is crucial. Therefore, the clarification of the impact of visual and visual-related features on soundscape perceptions is essential.
The objective of this paper was to develop the framework for designing sustainable indoor soundscape by systematically reviewing the existing research papers involving audiovisual bimodal and interactive effects on soundscape evaluations, assessing their research methods and procedures, and identifying potential indicators influencing soundscape perceptual responses in the indoor environments. Achieving this objective will present concise assessment schemes of the indoor soundscape methodologies concerning audiovisual factors and provide evidence-based suggestions of the indicators that potentially influence the indoor soundscapes. Following two questions are being addressed in the present study: (1) what kind of evidence is there for the connection between audio, visual, and audiovisual combined factors and perceptual dimensions that are affected the most, and (2) which audio and visual factors would most contribute to the bimodal and interactive impacts on perceptual dimensions related to soundscapes.
As multiple environmental factors are inherently involved in soundscape experiences, their effects should be precisely defined and distinguishable from one another. To clarify this aspect, statistical interpretations and definitions of the terminologies (i.e., unimodal, bimodal, and interactive effects) are briefly introduced  (pp. 129–158). Suppose the presence of a linear effect of a factor on a single response criterion; one could examine the main effects of X (independent variable) on Y (the dependent variable). Suppose that Y is an audio-domain variable (e.g., noise annoyance) and X is a visual-domain variable (e.g., presence of greenery); the model is designed to predict the bimodal effect because it involves two different domains (i.e., audio and visual domains). Potential impacts of visual factors on audio-domain perceptions, or vice versa, are included in the present study. In contrast, when X is an audio-domain variable (i.e., noise level) predicting the same domain-variable of Y, the model would estimate the unimodal effect, which is not followed into our research interest. Any additive effects (i.e., the additive or joint effect of a set of multiple independent variables) are not considered in this study because the independent variables are treated as a single unit, so their contributions are added together, which obfuscates the unique effects of the target variables. Furthermore, suppose that X is an audiovisual interaction term (i.e., multiplication of audio and visual-domain independent variables), the model is designed to predict the audiovisual interactive (combined) effect, which is also included in the present study. If the audiovisual interactions are certified, Y will not be limited within audio or visual domains and can include various human perceptions such as psychological/emotional, physiological, behavioral, cognitive, and social responses (e.g., overall satisfaction, cognitive task performance, and so on). Moderators and mediators are excluded in this study as those variables are not commonly used, and bimodality and interactivity are more relevant in the soundscape literature.
This study systematically reviewed the existing research papers involving audiovisual bimodal and interactive effects on soundscape evaluations and identified potential indicators on soundscape perceptual responses in the indoor settings. A total of 30 studies were reviewed and summarized in terms of their characteristics, including study designs, methodologies, analytical models and variables, main findings, and effect types. The contextual differences, including the functions of the examined indoor spaces, the context of the target visual factors (i.e., indoor, outdoor, or interior components), and the observation condition (e.g., the point of view of the evaluations) were also discussed. Overall, most of the visual factors examined were outdoor environments, while their contexts were greatly diverse. The residential spaces’ evaluations typically reflected the function of general living spaces for residents, and a living room was commonly selected as it would be the most representative space reflecting this function. The majority of their findings were the positive effects of visual factors on noise annoyance moderation in the indoor residential spaces, expecting annoyance-free soundscapes. The studies focusing on office spaces examined the conditions where people worked or had a break in their works. Thus, the expectations of this space would be more complex. In addition to the window views of the outdoor environments, some interior components (e.g., water fountain, partitions, lightings) are suggested for improving soundscapes in the office spaces, which can be more practical as they would be adjustable by office workers (i.e., users). The findings of these studies include the positive audiovisual interactive impacts on the perceived restrictiveness, relaxation, and pleasantness, expecting fatigue-free soundscapes, and the task performance, enhancing workers’ achievements on their jobs.
The framework for designing a sustainable indoor soundscape has been proposed by the selected reviews, which further suggests the assessment schemes of the indoor soundscapes concerning audio, visual, and/or audiovisual factors for designing the sustainable sound environments. The two research questions were successfully answered as followed.
The perceptual dimension “noise annoyance” was the one that has been often researched as well as affected by audio and visual factors the most in the indoor environments. Six solo-indicator categories and three interactive-indicator categories were identified as potential factors influencing noise annoyance responses. The parameters and directions of their impacts were highlighted for each of the indicators. Overall, the positive effects of the greenery and water views on the noise annoyance moderation were found, whereas their negative effects of the traffic road and noise barrier were also noted. However, the contradictory effect of the greenery factors on noise annoyance reduction was identified as its positive impact may not be valid under controlled experimental settings. Additionally, the improvement of the physical properties and presence of sound sources’ visual information generally moderate perceived noise annoyance as those informative visual contexts would act as positive distractions that enable humans’ attention to be away from negative soundscape responses. Furthermore, the significant interactions using the combination of those indicators and/or other potential audiovisual factors influence noise annoyance responses. Nonetheless, the directions of the interactive effects are yet un-stabilized and inconclusive.
Based on the selected literature reviews, the greenery factor was found to be the most promising variable, followed by the water features, which generally moderate noise annoyance perceived by occupants in given indoor environments. The amount of those visual elements measured by objective and subjective parameters frequently predicted the noise annoyance response. As the greenery and water factors identified in this review are generally outdoor landscape components either perceived from indoors or observed in the neighborhood spaces, they should be considered and designed by urban city designers or landscape planners.
Although there are both bimodal and interactive audiovisual effects on noise annoyance in the indoor environments found in this review, the former effect was often found in the observational studies. In contrast, both bimodal and interactive effects were extensively examined in the interventional studies. This consequence may be because field or in-situ studies would not have effective control on the audiovisual variables; in contrast, the laboratory experiments have much control for these interactive variables. As more bimodality of the greenery effects on noise annoyance was theoretically evidenced compared to their interactivity in this study, further investigations of the audiovisual interactive effects, utilizing valid measures of the interaction factors, would be required, preferably in some in-situ study designs.
Although the greenery indicator was the most evidenced in its bimodal impact on the noise annoyance response, as discussed above, its interactive effects seem still minor. Regarding the interactivity of audiovisual factors, the combination of the physical properties and presence of the visual information and/or other potential audiovisual factors would moderate perceived noise annoyance and influence other audio and visual-related perceptions. The sound sources were primarily found as one of the most probable indicators that would interactively work with visual factors and influence the perceptions of all the four perceptual domains—audio, visual, cognitive, and emotional domains. Thus, researchers should bear in mind that selecting the type of sound sources (e.g., masking or background sounds) would significantly impact the multiple perceptual responses related to the overall soundscape experiences when certain visual factors are interactively involved within. Acoustic experts or consultants may consider a proper selection of the indoor sound sources or other interior components related to sounds and provide suitable recommendations to designers or end users for promoting sound environments. In contrast to the noise annoyance response that was frequently influenced by the greenery factor’s bimodality, other perceptual responses such as loudness, visual pleasantness, and restorativeness perceptions tended to be influenced by both bimodality and interactivity of the audiovisual factors. With regards to these perceptions, the bimodality of the audio or visual factors seems to be more apparent in audio and visual perceptions; whereas, the interactivity of the audiovisual factors can be seen more in perceptions of the multiple domains, including cognitive and emotional ones. Such results would provide useful insight into the practical implementations of the soundscape design. Considering the bimodality and interactivity of the audiovisual components, one can utilize auditory stimuli’ bimodality influencing visual perception and vice versa. In contrast, the audiovisual stimuli’ interactivity may be more suitable when changing the perceptions of multiple domains, including non-auditory ones. Although unimodal effects on soundscapes (e.g., the effect of acoustic stimuli on audio perceptions) have been more evident than bimodal and interactive effects, the change of the unimodality may only give limited solutions in practice. As accounting for the bimodality as well as interactivity, the number of the possible solutions would be factorial since more factors are involved. However, as pointed out by the previous study , some combinations of the audiovisual factors would be experienced as incongruent and unrealistic stimuli. Furthermore, the choice of the most suitable and feasible implementations, as well as management of those budgets, should be rigorously handled.
Taken as a whole, instead of reducing unfavorable auditory stimuli (e.g., noise levels), the proposed framework highlights the improvement of the occupants’ indoor soundscape experiences by adding those non-auditory, in particular, visual factors, which opens up more possibilities and versatilities of their application for designing indoor sound environments in more sustainable ways.