Immersive Virtual Reality (IVR) is a simulated technology used to deliver multisensory information to people under different environmental conditions. When IVR is generally applied in urban planning and soundscape research, it reveals attractive possibilities for the assessment of urban sound environments with higher immersion for human participation. In virtual sound environments, various topics and measures are designed to collect subjective responses from participants under simulated laboratory conditions. Soundscape or noise assessment studies during virtual experiences adopt an evaluation approach similar to in situ methods.
Ecological validity was introduced in the 1980s to evaluate the outcomes of a laboratory experiment focused on visual perception . Ecological validity describes the degree to which results obtained in a controlled laboratory experiment are related to those obtained in the real world . The discussion of the ecological approach regarding its internal validity and experimental control began in the 1980s with cognitive and behavioral psychology research , and these two factors are still significant factors in the design and undertaking of an ecological approach study. Under laboratory conditions, researchers should give participants corresponding environmental cues and instructions to enable the reactivation of the cognitive processes of participants that were determined in actual situations . For high ecological validity, the findings in the laboratory can be generalized into real-life settings . As a simulated technology, Immersive Virtual Reality (IVR) places the user inside an experience, which allows the impact on participants of a new environment with complex social interactions and contexts to be assessed . In 2001, Bishop et al.  reported their non-IVR assessments of path choices on a country walk, and they agreed that faster computers and better display systems make the virtual environment experience more credible. Thus, low ecological validity resulting from non-sufficient immersiveness could be a limiting factor for the generalizability of data collected from laboratory experiments. The need for more research that explores applications of perceptual simulations in general and related questions of validity and reliability has been stressed ever since the emergence of environmental simulation as a research paradigm.
Ecological validity has been conceptualized into two approaches: verisimilitude and veridicality. Verisimilitude refers to the extent of similarity of a virtual experience to relevant environmental behaviors ; it reflects the similarity of the task demands between the test in the laboratory and the real world . This approach attempts to create new evaluation assessments with ecological goals . Veridicality refers to the degree of accuracy in predicting some environmental behaviors ; the establishment of veridicality is required to assess the results from the laboratory test and the measures in the real-world. There are some limitations for both approaches. One limitation of the veridicality approach is that, for those conditions which are not likely to be reproduced in the real world or that have a high cost, the outcomes from real-world measures cannot correlate with experiment results. When using the verisimilitude approach alone, no empirical data are needed to claim that the evaluation is similar to real life settings .
Virtual reality has revealed a functional rapprochement that fuses the boundary between the laboratory and real life . Through multisensory stimuli with experimental control, participants tend to respond realistically to virtual situations as if they were in a real environment . The responses to a virtual environment are generated when place illusion (PI) and plausibility illusion (Psi) occur at the same time . The ecological approach studies based on virtual reality provide controlled dynamic presentations of background narratives to enhance the affective experience and social interactions . From a methodological viewpoint, environmental conditions and test results can be ecologically validated through virtual reality technologies according to a subjective evaluation framework. Numerous researchers have examined ecological validity in different topics and fields with the comparison of a virtual environment and real life .
Spatial audio is a technique of creating sound in a 3D space; then, a listener can hear the sound from any direction in a sphere . Because of this feature, it is often combined with virtual reality to render auditory stimuli. For the auralization of spatial audio in case of binaural reproduction, head-tracking is required for the reproduction of a dynamic sound field based on the real-time position of the head within Euclidean space. Binaural recordings only reproduce the sound field of both ears at the time of recording, which shows the incompatibility between binaural recordings and head-tracking during auralization. Ambisonics is a sound reproduction technique used for recording and playing-back spatial audio, and it is based on the spherical harmonic decomposition of the sound field . Ambisonics enables a listener to experience a spatially-accurate perception of the sound field , and this reproduction technique was originally introduced by Gerzon . In the case of first-order Ambisonics (FOA)—currently the most widespread Ambisonics recording technique—the signals are recorded as four audio channels, most usually in the so-called A-format. The audio files needed to reproduce such recordings are known as B-format audio files, which are converted from the A-format. The B-format can be decoded into any speaker array matching the needs of dynamic auralization under Immersive Virtual Reality (IVR), including higher-order Ambisonics (HOA). HOA has a higher spatial resolution based on higher-order spherical harmonics . Head-related transfer function (HRTF) is considered as a frequency response describing the sound pressure transformation from a free field point source to the eardrum . When this filtering is not applied with the listener’s own HRTF (acknowledging individual head size, auricle size and shape, etc.) , front–back and elevation confusions in localization typically occur .
For many urban sound assessment studies, in situ surveys have been widely applied as a conventional method to evaluate certain sound environments . In soundscape or noise assessment studies, researchers expect the presentation of controlled experimental conditions to participants; e.g., recorded audio and reconstructed visual stimuli in a listening room. Therefore, researchers introduced laboratory tests to validate their research questions with human participation. All simulations under laboratory conditions attempt to represent some aspects of the environment as accurately as possible to assess human responses. In urban noise prediction and soundscape assessment research, an audio–visual system is a conventional and valid approach to render essential information or cues during human participation. The audio–visual interaction influences the perception of the soundscape and global environment, as shown in previous studies . For interior spaces with VR techniques, several studies have assessed the evaluation of indoor noise protection with head-mounted displays (HMDs) , the main uses of auralization , the influence of visual distance , the use of water features  and the spatial representations of visually impaired participants . The urban sound environment in this review refers primarily to sound sources originating outdoors or in urban public spaces, and it reflects, to some extent, the mobility of people and the multifunctionality of urban spaces.
The evaluated multisensory method shows enormous significance in helping participants to perceive environments holistically. The reproduction system of listening tests needed to be adapted to the purpose of the study to allow the subjects to treat the test samples as potentially familiar experiences through cognitive processes elaborated in actual situations. With the aid of immersive virtual reality, the installation of laboratory conditions was performed with the aim of reproducing urban sound environments and presenting a multisensory experience to participants. A subjective test of immersive virtual reality reproduction in urban sound environment assessments would show high veridicality if it correlated well with measures of perceptual responses in the real world.
The concept of ecological validity has been extended from psychological experiments to the domain of complex sound environment perception. It is not only related to the evaluation methods during laboratory tests, but also closely associated with the developing IVR technologies. Attempting to establish a standardized soundscape evaluation protocol with high veridicality under an immersive virtual environment has a broader impact on the practice of soundscape planning and design. The research on soundscape standardization has discussed the definitions, variety of contexts, evaluation methods and reporting requirements .
The International Organization for Standardization Technical Standard (ISO TS) 12913-2  introduced two common recording techniques in soundscape research: binaural and Ambisonics. The standard states that if some environmental factors are not present or differ during playback, the outcomes could possibly result in different impressions to those received in the original context. In terms of the statement of ISO TS 12913-2 , the validity of these auralization techniques combined with other environmental factors still presents some uncertainty. The ISO TS 12913-3  stated that the key factors to consider when conducting ecologically valid laboratory studies are the effect of memory, the duration of exposure to each of the stimuli and the auditory immersiveness. As a multisensory tool, IVR could deliver more environmental stimuli than conventional 2D rendering methods. A comparison of the ecological validity using IVR for urban sound environments with different reproduction techniques and research topics was made.
Many studies have suggested that urban noise can negatively affect people’s cognitive functions and influence their daily life . Subjective responses may not show annoyance regarding urban noise, but the cognitive performance may be affected. Thus, during the laboratory test, some studies also used cognitive tasks to evaluate the cognitive performance caused by the virtual environment . Related to stress recovery, researchers have used measures based on the physiological responses of participants. Annerstedt et al. in 2013  conducted a study to investigate the sounds of nature inducing physiological stress recovery, and the Trier Social Stress Test (TSST), as a highly standardized protocol for inducing stress, was applied in their study. Cortisol, heart rate, T-wave amplitude (TWA), and heart rate variability (HRV) were tested to analyze the physiological stress recovery induced by the sounds of nature. Hedblom et al. in 2019  adopted mild electrical shocks and skin conductance measurements to evaluate the stress recovery under virtual environments with a birdsong–traffic noise interaction. Compared with subjective responses, physiological responses do not directly reflect the relationship between subjective sound preferences and characteristics of acoustic environments. Thus, these three methods can jointly assess the ecological validity of complex sound environment perception.
For visual rendering, many studies used non-HMD options. Some of them adopted non-immersive methods, such as a monitor screen , visual screen  and 2D projection . Some of the studies utilized the immersive Cave Automatic Virtual Environment (CAVE) system . The CAVE system was first introduced in 1992 , and the aim of its invention was to provide a one-to-many visualization experience that utilizes large projection screens . Compared with a CAVE system, HMD has some problems, especially when one user is trying to interact with other users, and it does not offer interaction with real objects aside from VR control devices . The large footprint, the cost of high-resolution projectors and the human–computer interaction are also reported to be limitations for a CAVE system .
Studies without visual stimuli were also conducted . A visual component presents rationality when examining the ecological validity of auditory perception. The coupled audio–visual interaction is associated with the spatial attributes of sound perception—e.g., distance, width and directionality —and it also provides an animated visual anchor, improving the sense of presence and immersiveness during the subjective evaluation .
Verisimilitude and veridicality in IVR-based sound environment research have different emphases according to their definitions. Establishing verisimilitude and veridicality in a subjective evaluation experiment allows a virtual sound environment to be perceived with reliable ecological validity. The IVR research involved with verisimilitude in soundscape or noise assessments assumes that the stimuli in the test and the cognitive processing are sufficiently similar to the psychological construct of corresponding scenarios in the real world. The verisimilitude approach is likely to focus on specific tasks in the laboratory test similar to the task demands in the real world. The evaluation indicators and questionnaire design can be formatted in a quite similar way to a participatory experiment. Sanchez et al. in 2017  pointed out that their study did not strictly prove that audio–visual designs in a virtual environment would lead to the predicted pleasantness of real environments. Establishing verisimilitude in soundscape evaluation is more intuitive compared with establishing a new cognitive task or a clinic neuropsychological assessment. However, when researchers discuss the relationship between subjective responses, cognitive performance and physiological responses, they need to carefully examine the verisimilitude approach with which some aspects of testing conditions limit the applicability of a method without empirical data to the real world.
A few studies validated veridicality in IVR-based soundscape or noise assessments. The pioneering studies examined several fundamental playback systems. In 2005, Guastavino et al.  explored the linguistic analysis of verbal data in soundscape reproduction through a field survey and two listening tests. Both listening tests compared exposure to the stimuli reproduced via stereophonic and Ambisonics approaches. They pointed out that both neutral visual elements and a good sense of spatial immersion should be provided to ensure ecological validity when testing the effects of urban background noise. Both reproduction methods have been demonstrated to be ecologically valid tools in terms of source identification. However, IVR was not applied in their study. Many perceptual attributes and indicators have been selected to describe the similarity between the real world and the laboratory conditions. In 2016, Maffei et al.  compared the congruence between audio and visual elements, and there was no significant difference in the perceived global quality of the environments in both the simulated and real world in their results. The global quality of the environments was shown to have high veridicality under the framework of subjective evaluation. The findings are consistent with the results of audio–visual interaction evaluation studies conducted in urban sound environments. In 2019, Hong et al.  validated three Ambisonics reproduction methods and tested their veridicality under a virtual sound environment related to the performance of reproduction methods. Immersive virtual reality has been shown to be a valid tool to simulate multisensory environments not only by acousticians but also in clinical neuroscience, cognitive psychology and other research fields. When researchers adopt the verisimilitude approach, they believe that the reproduction system and the subjective test have veridicality. In addition, there are also some difficulties to validate veridicality resulting from the complex contexts and unpredictability of outdoor sound environments. For outdoor sound environments, it is sometimes impossible to measure the real-world; e.g., a projected area without construction. Some contextual conditions cannot be changed independently in the real world as well.
It is notable that two studies addressed realism in their subjective experiments. The study by Jeon and Jo in 2019  validated that the usage of HMD significantly increased the impact on the recognition of realism. In 2019, Hong et al.  conducted both in situ and laboratory experiments to assess the performance of different Ambisonics reproduction systems in perception. They both successfully assessed realism in their studies. The former de-emphasized the verisimilitude to the real world, and they underlined the realism difference brought by HMD compared with the non-HMD condition. The latter conducted a veridicality study with in situ responses, and they described the degree to which different reproduction approaches were similar to reality. When both verisimilitude and veridicality are examined, the most ecologically valid studies  revealed the congruence between immersive virtual experience and real experience along with multisensory stimuli.
This entry aims to review the approaches to assess the ecological validity of IVR for the perception of urban sound environments and the necessary technologies during audio-visual reproduction ensuring ecological validity. The review qualitatively shows that immersive virtual reality techniques have the potential to contribute greatly as an ecologically valid tool in soundscape or noise assessments. The ecological validity of virtual reality to assess urban sound environments is multimodal, dynamic and contextual. The main conclusions of this work are as follows.
Through the approaches of laboratory tests including subjective response surveys, cognitive performance tests and physiological responses, the ecological validity of complex sound environment perception can be assessed for IVR. With participatory experiments in situ and in a laboratory, the veridicality of IVR can be verified through subjective responses including environmental preferences/quality, audio–visual indicators (e.g., pleasantness and annoyance), coupled interactions and reproduction quality (e.g., realism and immersiveness).
A head-tracking unit with a display and synchronized spatial audio (e.g., HMD with FOA-tracked binaural playback) is advantageous to assess ecological validity in immersive virtual environments. When the urban sound environment research involves interaction among multiple users, a CAVE system should be considered. With higher spatial resolutions, HOA also shows increasing potential for the ecological validity of IVR in urban sound environment research.
These studies on ecological validity with the utilized evaluation methods also go beyond the outcomes gained towards a normalized framework in soundscape and noise assessment protocols. For standardized soundscape evaluation, the ISO TS 12913 series should give more detailed guidelines and specifications on the establishment of an IVR system. In particular, to deliver a dynamic virtual experience, more research is needed on the influence of the Ambisonics orders of complexity at the recording and reproduction stages, and issues such as encoding and decoding Ambisonics formats, on soundscape perception. The pursuit of a standardized soundscape evaluation protocol and IVR-based soundscape research can serve to enhance the field as a whole.