3. Software Integration
From the master–slave testbed to the operating theater, AR plays a pivotal role in the visualization of anatomical landmarks, particularly the ear, nose, and throat, as well as gastro-intestinal areas. AR-assisted robotic surgery has facilitated the surgeon’s task in reducing hand tremors and loss of triangulation during complicated procedures. Studies show that the transition from invasive open surgery to indirect, perception-driven surgery has resulted in a lower number of cases of tissue perforation, blood loss, and post-operative trauma
[20]. In contrast to open surgery, which involves the direct manipulation of tissue, image-guided operations enable the surgeon to map medical image information, virtual or otherwise, spatially onto real-world body structures. Usually, the virtual fixture is defined as the superposition of augmented sensory data upon the user’s FoV from a software-generated platform that uses fiducial markers to register the location of an anatomical section in real time and space with respect to the user’s real-time scene. The use of publicly available datasets obtained from cutting-edge technology, such as CT and magnetic resonance imaging (MRI), in such scenarios enables minimal human error in data processing and hence improved success rates of surgeries.
3.1. Patient-To-Image Registration
The preliminary steps in diagnosing the area of concern in a patient include the use of computer guiding software to visualize inner anatomical landmarks. The loss of direct feel of the inner anatomy, reduced depth perception due to the monocularity of cameras, and distorted images have been addressed in novel techniques such as the segmentation of tissue in medical scans and 3D modeling for an augmented 360-degree field of view (FoV)
[21]. In several papers by Londono et al.
[22] and Pfefferle et al.
[23], case studies of kidney biopsies examine the development of AR systems for the superposition of holograms over experimental phantoms. Studies show that preoperative CT scans from the lateral decubitus position result in deformed tissue internally, in addition to discrepancies between preoperative and intraoperative scans. Accurate image-guided surgery greatly depends on the registration of preoperative medical scans with their corresponding ones within the intraoperative anatomy. During the procedure, aligned coordinate frames are mapped onto the output registered image. The need to compensate for the time lag during registration means that multiple time frames are required at different regions of interest to enhance the quality of the registered image.
Usually, the preferred choice of registration method depends on the type of robotic environment that the surgeon is navigating, where feature-based registration attracts the most attention within the academic community. These methods are less computationally heavy and can be used to effectively match fiducials between preoperative and intraoperative images, with primarily deformable methods of surface registration. Due to the sole use of 2D parameters, the possibility of obtaining highly accurate 3D information is low, hence driving the research community to establish novel sensing technologies for 3D marker tracking. Registration methods such as point-based registration, feature-based registration, segmentation-based registration, and fluoroscopy-based registration are widely used in the image processing of medical scans. The geometric transformations of deformable objects are computed using fiducial markers, which act as positioning cues and can be analyzed for fiducial localization errors (FLEs). In cases where images have varying gray levels, DL algorithms are able to segregate different features using parameters such as the sum of squared or absolute differences (SSD), correlation coefficients, and mean squared difference (MSD). For real-time X-ray image processing, a contrasting material, such as barium or iodine, is used to create more subtle contrast differences for clinicians to analyze. The process of 2D to 3D image registration involves the alignment of matching preoperative and intraoperative features, which can be reconstructed in AR and superposed over a live fluoroscopic image with respect to reference points in the image sequence (Figure 2).
Figure 2. CT scans of the lung with its corresponding 3D reconstruction and marker localization, used by surgeons to locate tumors as indicated by the red marker.
3.2. Camera Calibration for Optimal Alignment
Automatic camera calibration and corresponding image alignment in intraoperative ultrasound is used to determine internal structural characteristics such as the focal length and surface anatomy of different organs. Analysis, visualization, and pre-planning using registered medical images enable the development of patient-specific models of the relevant anatomy. The researchers in
[24] created a cross-modality AR model to correct the shifts in positioning using lesion holograms, generated during a CT image reconstruction process. A US transducer obtains two-dimensional scans from the site of interest and is merged with magnetic tracking data to produce a 3D resultant scan in line with a CNN algorithm. This alleviates the probability of false negatives appearing in the dataset, especially when mapping magnetically tracked ultrasound scans onto non-rigidly registered 3D scans for the detection of mismatches in deformation. Furthermore, this method is also used for needle guidance, as mentioned in
[23], to predict trans-operative pathways during navigation, as well as detecting areas of extraction for lesions on Unity3D via the collision avoidance system. The object-to-image registration is optimized by placing markers, sufficiently far apart in a non-linear configuration, such that their combined center coincides with the projection of the target in the workspace.
3.3. 3D Visualization using Direct Volume Rendering
The next steps in creating an AR model include image processing techniques such as direct volume rendering, which are used to remove outliers and delineators from raw DICOM data. A method proposed by Calhoun et al.
[25] involves voxel contrast adjustment and multimodal volume registration of the voxels in the CT images by replacing their existing density with a specific color and enhancing their contrast through thresholding, performed by a transfer function. Manual intensity thresholding removes all low-intensity artefacts and background noise from the image, ready for rigid attachment to an organ in virtuality. A transparency function is applied to filter out extreme contrasts in anatomical or pathological 3D landmarks and any blob-like contours detected can be used in the initial registration of CT scans under techniques such as topological structural analysis. The deformation properties of the organs are modeled using software such as Osirix 12.0, 3D Slicer 5.2.2, or VR-Render IRCAD2010, and the high contrast applied to output images makes structures such as tumors, bones, and aneurism-prone vessels more visible to the naked eye.
3.4. Surface Rendering after Segmentation of Pre-Processed Data
Surface rendering techniques in
[26] depict the conversion of anatomical structures into a mesh for delineation and segmentation. Tamadazte et al.
[27] used the epipolar geometry principle to acquire images from the left and right stereovision cameras. The authors then used a point correspondence approach to resample and build a 3D triangular mesh from local data points in its neighborhood. The current techniques utilized in AR are developed using a software program called Unity3D and require patient-specific polygons such as triangles for rapid processing. Furthermore, the anatomical scenes detected using US transducers may be reconstructed using multi-view stereo (MVS), which analyzes pieces of tissue extracted from an area, remeshes them by warping the raw DICOM data, and displays them with appropriate textures using speeded up robust feature (SURF) methods
[28]. In most cases, segmentation may cause the loss of essential information in the original volume data. Therefore, in the quest to improve the quality of segmented images, Pandey et al.
[29] introduced a faster and more robust system for US to CT registration using shadow peak (SP) bone registration. In another study by Hacihaliloglu et al.
[30], similar bone surface segmentation techniques have been used to determine the phase symmetry of bone fractures.
3.5. Path Computational Framework for Navigation and Planning
In studies by El-Hariri et al.
[31] and Hussain et al.
[32], the use of tracking mechanisms for marker-based biopsy guidance has been widely commended and applied in surgery, such as that of the middle ear and the kidneys. Fiducial cues are registered to different locations on the patient’s body, using the robust surface matching of sphere markers with the standard model, alongside laparoscopic video streams. Image-to-patient registration is performed by comparing the acquired live images to the available patient-to-image datasets, which is a crucial operation to eliminate errors during automatic correction, as explained by Wittman et al.
[33]. Leeming et al.
[34] used proximity queries to detect internal changes in anatomy during the manipulation of a continuum robot for surgery around a bone cavity. A covariance tree is used in this case, as a live modeling algorithm, to maintain an explicit safety margin between the walls of an anatomical landmark during the maneuvering of surgical tools. For cases of minimally invasive surgery, precautionary measures such as CO
2 inflation of the patient’s body and highlighting target locations with contrasting colors (for example, with ICG) facilitate the surgeon’s task, especially when performing cross-modality interventions with AR systems such as headsets. A study by Zhang et al.
[35] explained the tracking mechanisms used in US procedures for intraoperative use. The probe was equipped with a HoloLens-tracked reference frame, which contained multiple reflective spheres on an organ. In terms of biopsy needle tracking, Pratt et al.
[36] introduced the concept of registered stylus guidance in line with a simulated 3D replica reconstructed from CT images of the torso. During preoperative surgical navigation, a calibrated probe is used to collect data from internal organs to send to the 3D Slicer software over OpenIGTLink, whilst combining tracked data from the input instruments. The stylus tip is calibrated about a pivot and can be moved to various positions in the anatomical plane while tracking it over the probe reference frame using an iterative closest point (ICP)-based detection algorithm. Jiang et al.
[35] proved that the projector view for puncture surgery also improves the efficiency of perception-based navigation, using superimposed markers to align the needle tip to a magenta circle. The researchers in the above study generated an accurate AR positioning method using DL techniques such as the Newton–Gauss method and Lie algebra to produce an optimized projection matrix. Any projection is performed towards the target location of the body, hence reducing the probability of parallax errors, as shown by Wu et al.
[37].