Augmented Reality Environments

Augmented Reality Environments: Comparison

Please note this is a comparison between Version 1 by Ilias Logothetis and Version 2 by Lindsay Dong.

Contemporary software applications have shifted focus from 2D representations to 3D. Augmented and Virtual Reality (AR/VR) are two technologies that have captured the industry’s interest as they show great potential in many areas. As extended Reality (XR) technology evolves, the development and integration of natural interaction methods into these environments becomes increasingly important. The existing technologies, both software and hardware, are rapidly advancing, requiring developers to constantly learn new tools, and forbidding them from reusing existing components. In addition, companies oblige developers to use specific tools. The lack of frameworks and libraries with which to facilitate this process can discourage developers from engaging with this technology.

extended reality
augmented reality
user interaction

1. Introduction

Extended Reality (XR) is a term widely used these days. It has emerged as an umbrella term to describe all technologies that blend the real and virtual worlds [1], with the two best-known examples being augmented reality (AR) and virtual reality (VR). These technologies allow users to display virtual content in the real world (AR) [2], blend the real and virtual worlds (Mixed Reality—MR) [3], or fully immerse themselves in a virtual world (VR) [4]. As these technologies continue to evolve, the need for improved interaction in their corresponding applications becomes increasingly important.

Traditional means of interaction, such as the use of a mouse and keyboard, a touchscreen, or a controller, may be convenient in some cases but limit the immersive experience and realism expected by users [5]. In addition, the requirements of external interaction devices necessitate time and effort for familiarization and can shift the focus from the content to the operation of such devices [6]. To increase the level of immersion, researchers are exploring more natural and realistic ways to interact within these environments ^{[7][8][9][10]}[7,8,9,10].

To this end, various hardware devices capable of tracking hand movements have been developed, such as Microsoft’s Kinect [11], LeapMotion [12], and wearables such as gloves [13]. These devices provide high accuracy, but they are either limited to a specific location or require specialized equipment to set up. A modern approach is to attempt to detect and track hands using a camera. This approach offers the advantage that almost every smart device has a camera. While this method does not require any external hardware to operate, the hands must be in the camera’s field of view. Modern VR headsets such as Meta Quest 2 include hand-tracking functions based on the camera use of the mobile device. Depending on the application requirements, one technique may be preferred over another, as each has advantages and disadvantages.

The estimation of the exact position of a hand is also possible through techniques other than hand detection and tracking, allowing for the execution of high-precision actions. These methods tend to be computationally intense, using a variety of algorithms to predict and reconstruct the exact position of a hand. The recognition of specific hand gestures is a technique that is not as computationally intense. When high precision is not required, the latter method is preferred. If hardware resources are limited, the second method is also preferred. Both can provide similar functionalities, favoring a natural type of interaction. Such functionalities include grabbing, moving, and detecting collisions with virtual objects. These are simple tasks that any application in a three-dimensional environment integrates with traditional interaction techniques.

2. Augmented Reality EnvironmentsRelated Work

Freehand interaction is critical in AR environments, which is why there is voluminous research on this topic, ranging from the study of hand-tracking techniques to better ways to handle interactions. With regard to hand tracking, there are many different approaches and methodologies, some of which include the tracking of hands using a camera [14] or wireless systems [15]. Many devices geared toward AR and VR have adopted a hand-tracking technique to offer such functionality to end users. Some devices that include hand-tracking functionality or are even built specifically for this task include LeapMotion, Kinect, HoloLens2, and Meta Quest 2. Most developers or creators seeking to develop applications in AR require more than just the means to track a hand. There is a need for high-level tools that can enable the functionality of users’ hands within the applications. Tools and systems are emerging for such tasks, including GesturAR [16], which enables users to configure their gestures and actions with the objects of the scene. Furthermore, this capacity allows users to define behaviors such as transformations of the objects within a scene. Another tool is the Hand Interfaces system [17], which lets users use their hands as tools. Likewise, virtualGrasp [18] uses gestures performed during interaction with physical objects to retrieve the corresponding digital objects inside a virtual scene. MagicalHands [19] is another tool that employs gestures to create animations in VR environments. While these tools provide a captivating way in which to define new gestures and animations, their design does not allow them to act as tools for creating AR applications. Such tools can behave as applications that enable the configuration of some mini-scenes or behaviors within an application. They require time for setup and do not always cover the case of new content creation. In addition, while this type of interaction feels natural to young people engaging in a virtual environment, older people that are unfamiliar with interacting with technology might expect more traditional means with which to configure their scenes (such as checkboxes and drop-down menus) [20]. This is the case with EduARdo, which aims to help educators who are unfamiliar with technology. Despite the importance of freehand interaction in AR, other means include voice-based, gaze-based, location-based, or even tactile interaction, which have proven to be well-suited to specific situations. Combined interaction techniques were proposed in [21] to assist the user. For these types of interaction, the same need applies, i.e., developers and creators require higher-level tools that will allow them to configure the desired behaviors and functionality within a few simple steps. EduARdo aims to address this need by providing a simple and intuitive way in which to configure such functionality for every possible use. Interaction is a large component, but there are many more crucial components of an application. For instance, there is a need to place objects within a scene and configure their actions. Another important component is the User Interface (UI), such as menus and text fields that present information. As technology advances, it departs from 2D content—which tends to become obsolete—and everything moves toward 3D visualizations, for which there is a more complex procedure of placing content within a scene and interacting with it. Even on modern websites, there are 3D objects, or, in some cases, the whole site is a 3D environment. In AR and VR, which are terms mostly used with regard to games, a 3D environment is the norm. One study [22] reported the lack of tools for application development in AR and the necessity of creating standards for the development process. One System developed to create AR applications is ZeusAR [23], which was built using Javascript and the ThreeJS library. This tool provides a wizard that users configure to create AR serious games. A potential drawback of this tool is related to the technologies used, as they currently offer limited functionality in AR environments. This imposes limitations when a user requires a more complex application that the tool cannot support. Additionally, ZeusAR presents games in AR that have already been created with another creation tool; as a result, the creation of new content cannot ensue. EduARdo addresses these issues via its implementation in the Unity 3D game engine [24], which supports every AR functionality available and can exploit low-level APIs to configure extra functionality if required. AR-Maze [25] is an educational tool that allows children to create their own scenes using wooden cubes with a texture. Children can arrange these cubes in the real world and create mazes in which the AR application can project virtual content. The game was developed using Unity 3D and the Qualcomm Vuforia platform for AR support. A similar framework was proposed in [26] for augmenting physical boards with pawns and other objects. Interactive Educational Content Based on Augmented Reality and 3D Visualization [27] was proposed as a tool for creating educational content using AR that was designed for secondary education. ComposAR [28] focuses on associating virtual content with real objects and defining interactions for them. SMADICT [29] is a framework in which teachers and students can participate in the design process. ScoolAR [30] is a system that allows teachers to create AR and VR experiences without requiring programming skills, but whose provided functionality is limited, only allowing images to be uploaded and tagged with text. The BIM-VR [31] framework proposes another approach for 360-degree videos in which a user can tag structures and items when pointing at them using a gaze interaction method. A general-purpose framework that facilitates the development of applications for Unreal Engine was suggested in [32]; however, it was only intended for VR. GLUEPS-AR [33] is a system that can include different tools for creating AR applications.