Multi-Resolution 3D Rendering for High-Performance Web AR: History
Please note this is an old version of this entry, which may differ significantly from the current revision.

In the context of web augmented reality (AR), 3D rendering that maintains visual quality and frame rate requirements remains a challenge. The lack of a dedicated and efficient 3D format often results in the degraded visual quality of the original data and compromises the user experience. 

  • augmented reality
  • multi-resolution
  • AR.js
  • 3D visualization
  • web

1. Introduction

Web AR advances along with high-end mobile computing and alters how experiences are interpreted. Common digital overlays vary from text and animations to 3D graphics and low-poly models. However, there is a demand for more intuitive, realistic and efficient types of media in order to increase user engagement and retention. Such examples are the detailed and high-resolution 3D models that are derived from laser scanning, photogrammetry or fast surface modeling. Their geometry is represented as a mesh of triangles, quadrilaterals or other polygons that includes the shape, size and position of the object’s vertices, edges and faces. These mesh data often exhibit spatial redundancies, such as repeated vertex positions or similar attribute values, non-manifold geometry, outliers and surface discontinuities. Besides the inner structure, the accompanied texture maps, material properties, lighting information, animation data, metadata, etc., add to their overall complexity. The process of streaming these assets on the web is subject to a set of physical and logistical constraints: storage, bandwidth, latency, memory and computational power [1]. WebGL graphics API and rendering engines typically expect well-formed meshes with consistent topology. Since there is a lack of 3D web standards, a combination of proprietary formats, third-party libraries and custom workflows tailored to the requirements of the current project is usually adopted. Besides hardware acceleration, common rendering practices include adaptive streaming, progressive loading and optimized compression.
Meanwhile, AR is a motion-tracking issue that requires stable poses and accurate localization. The recovery of 3D poses from a frame sequence captured by a live camera involves many steps of intense processing. Feature detection and matching, feature tracking, robust estimation techniques such as RANSAC and relocalization algorithms can place pressure on the computing capabilities of mobile devices. Besides a high CPU load, the real-time 3D rendering proves to be a resource-intensive, long process, even for 3D models larger than 4 or 5 MB [2]. The absence of a lightweight 3D format or a standardized method on a graphics pipeline can result in a user experience degradation and performance limitations. This may manifest as reduced frame rates, increased latency or the inefficient use of device resources, ultimately impacting the responsiveness and smoothness of AR interactions. Moreover, the meshes used as AR overlays may undergo modifications or simplifications as a trade-off for quick loading times and low bandwidth, resulting in visual degradation, reduced detail or lower fidelity compared to the original ones. Therefore, the AR overlays may not fully capture the intricate details and realism of the original 3D content, affecting the overall experience.

2. Optimized 3D Rendering

Numerous studies have addressed the optimized web 3D visualization of large and complex geometry. In this section, only view-dependent approaches, able to adapt to the concurrent operation with the already arduous task of AR, and the tile-based rendering of mobile devices are presented and discussed. For instance, progressive mesh streaming prevents geometric attributes but is computationally intensive since every part of the model is rendered if certain criteria or other techniques are not set [9]. Multi-triangulation is constrained by the high CPU load required to traverse the direct acyclic graphs (DAGs) and by the fact that is not possible to fully exploit the GPU processing power [10].

2.1. Compression

Compression significantly reduces the file size of 3D models, making them more efficient to store, transfer and render in real-time applications. General-purpose compression libraries, such as Zstd [11] or Oodle [12], target real-time visualization cases where a fast decompression speed is a crucial factor. However, they are not designed to exploit redundancies in vertex and index data and they may result in low compression ratios. Instead of basic mesh decimation and simplification algorithms, which try to reduce the vertex and face count with minimal shape changes [13], dedicated arbitrary geometry algorithms utilize one or a combination of the following techniques: quantization, delta encoding, vertex cache optimization or octree-based encoding. The Draco algorithm primarily adopts quantization and delta encoding to reduce the precision of vertex attributes and encode the differences between the consecutive ones. While it is highly effective at compressing complex meshes with high vertex and triangle counts, it does not prioritize the UV texture mapping quality. Additional compression techniques specific to textures may need to be considered alongside Draco for optimal results, such as Khronos TeXture (KTX2) [14]. Other algorithms convert the connectivity, geometry and vertex attributes into a stream of symbols and bits to reduce the data volume and allow for progressive transmission during decompression [15,16,17]. The decompression time on the client side is of greater importance than the compression rate. In the case of single-rate compression, WebGL-Loader of Google provides a better decompression ratio than the Draco system but they both do not support progressive encoding/decoding [18]. The Corto library compresses large point clouds and meshes with per vertex attributes, aiming to support streaming and adaptive rendering [3]. Regarding decoding, edges are connected in such a way that allows for the efficient traversal and fast reconstruction of the mesh topology, especially when viewing distance or angle-based criteria are employed.

2.2. View-Dependent Techniques

A model compression technology can only shorten the download time; the rendering operations on mobile devices still consume a large amount of CPU, memory and battery resources. A natural approach to reducing the model data is a simplification method that represents the models with selective levels of detail depending on their properties or user-defined criteria [19]. Level-of-detail (LOD) techniques can be broadly categorized into discrete and continuous LOD based on the transition between levels, as well as into hierarchical, screen space and temporal LOD if the respective optimization factor is scene hierarchy, screen visibility or temporal coherence. View-dependent LOD considers the viewer’s position and orientation to dynamically select the appropriate LODs, optimizing the rendering for the areas visible to the viewer. Since the display of an overlay is dynamically adjusted based on the viewer’s perspective and the direction of the camera, view-dependent LOD could be a valuable feature for AR. Unlike the view-dependent approaches, the meshoptimizer library reorders triangles to minimize the overdraw from all directions and not only from the point of view [7]. It primarily focuses on mesh optimization, such as vertex cache optimization, index buffer optimization and vertex quantization, rather than a specific loading strategy. These processes help to decrease memory consumption and enhance the rendering performance by minimizing the time required for index fetching. However, quantizing vertex attributes may cause a minimal loss of visual quality.
A hybrid approach that includes the aforementioned techniques should be engaged. Thus, encoding, quantization and compression come as a natural extension to the family of multi-resolution algorithms. Multi-resolution structures for the web have been well studied since 2000 and typically involve representing a 3D model using a hierarchy of levels of detail with varying degrees of geometric complexity [20]. Any type of multi-resolution representation can be naturally extended to a view-dependent streaming geometry due to its intrinsic characteristics [21]. For example, in multi-resolution strip masking, tiles can be rendered at various resolutions, according to their global and local properties, using a set of coarse to fine masks, which are precomputed triangle strip indices [22]. While it effectively offloads a significant portion of the CPU load to the GPU, it is less geometrically optimal than local LOD algorithms. On the other hand, several algorithms have been introduced for cluster-based partition driven by several criteria, such as partition, hierarchy, kernel, quantum theory, etc. [23,24]. Gobbetti and Marton transformed point clouds into two-manifold meshes and encoded them into compact multi-resolution structures consisting of variable-resolution quad patches [25]. The patches store both geometry and texture information in a lightweight, tightly packed texture atlas. These 3D structures have the potential to be employed as AR overlays since the GPU-accelerated adaptive tessellation algorithm of the method minimizes the CPU overhead. On the other hand, the Nexus algorithm adopts a patch-based data structure, from which view-dependent conforming mesh representations can be efficiently extracted by combining precomputed patches [3]. Since each patch is itself a mesh composed of a few thousand triangles, the multi-resolution extraction cost is amortized over many graphics primitives, and CPU/GPU communication can be optimized to fully exploit the complex memory hierarchy of modern graphics platforms. It defines a publishing format (NXS) for multi-resolution meshes that stores the sequence of the increasingly coarse partitions. More recent multi-resolution schemes elaborate their parallel compression algorithm to the GPU for hardware acceleration [26,27]. They achieve significant performance improvements in cases of multi-core computing, currently excluding consumer mobile devices. A hardware tessellation-based patch rendering with continuous LOD is a promising approach for AR, offering a stable frame rate for high-fidelity outputs with optimum memory and computational usage, but it currently addresses triangular models [28].

3. Web AR Approaches for 3D Overlays

In general, the synergy of open-source JavaScript AR software such as AR.js with the Three.js library has a variety of applications in interior design [29], cultural heritage [30], synthetic visualizations [31] and education [32,33]. Nitika et al. developed a WebRTC marker-based application with AR.js and Three.js to analyze web AR performance, examine potential rendering optimizations and proposing a cloud-based solution [34]. Rendering can be offloaded to cloud servers, minimizing the overhead, but data transfer suffers from E2E latency and the risk of security and privacy threats. MEC-oriented web AR solutions compensate for this latency, as well as the weak computational efficiency of browsers, through the operation of a client–server model typically located close to the user’s device. However, both technologies are prone to single-point failures and network loads while huge or multiple chunks of 3D data are transferred. From a rendering performance standpoint, there is not a lightweight 3D format file suited to overlays, while the emergence of dedicated mobile web AR browsers [35,36] has not resolved the compatibility issues that arise from the lack of a standard for web 3D objects. The Universal Scene Description (USDZ) schema, developed by Apple and Pixar, is a compact format that incorporates AR functionalities into 3D content [37]. Apart from the typical storage of geometry, materials, textures, animations, lighting and camera properties, it defines the state, the tracking mode and the location data of the AR experience. Being primarily designed for the iOS operating system, it is tightly integrated with Apple’s ARKit and has limited authoring tools. This conflict between having a uniform cross-platform SDK or API and leveraging the unique characteristics of each platform is still an open problem [38]. glTF and its binary type glb is a royalty-free specification of the Khronos Group for the efficient loading of 3D scenes on the web [39]. A remaining issue is that the various glTF renderers do not always appear to produce visually identical results [40]. Apart from the format, large-sized and detailed source data of arbitrary geometry require pre-processing or further optimization steps at runtime by the rendering engine or the framework. gltfpack of the meshoptimizer suite [41] applies a set of optimization steps, including encoding, reordering, quantization and memory management, to the main GPU procedures (vertex processing, rasterization and fragment shading), while the Gltf pipeline by CesiumJS [42] supports quantization and Draco compression.
Only a few approaches can be considered complete regarding remote 3D rendering during an AR session. A dynamic adaptive streaming technique represents AR assets at different LODs and only the LODs of high fidelity are selected by a heuristic to be visualized each time. It manages to decrease the start-up latency by up to 90% with respect to a download-and-play baseline, as well as the amount of data needed to deliver the AR experience by up to 79%, without sacrificing the visual quality [43]. Another work dedicated to the rendering phase of augmentation, developed for Mobile AR (MAR), exploits a meta-file during the progressive HTTP streaming to determine the next requested chunk of data and enhance the visual quality of the view-dependent display [44]. An HTTP adaptive AR streaming system that aims to provide high-quality AR streaming services with low latency over time-varying wireless networks has recently been introduced [45]. The system utilizes progressive mesh techniques coupled with a metafile structure to render, display and schedule the streaming of chunks of data according to the wireless network load. Unlike the view-dependent rendering of Nexus.js, it adapts to different network conditions, and it takes into account the displayed scale values of the AR overlays on the screen, ensuring that the requested chunks of content are optimized to enhance human visual perception.

This entry is adapted from the peer-reviewed paper 10.3390/s23156885

This entry is offline, you can click here to edit this entry!
Video Production Service