Challenges in Computer Vision and Artificial Intelligence Surgery: Comparison
Please note this is a comparison between Version 1 by Andrew A. Gumbs and Version 2 by Beatrix Zheng.

It can be said that deep-learning-based methods have significantly advanced research and development in the area of robotic surgery, across various tasks including objects detection and recognition, classification, navigation and construction of 3D representation of the environment. However, one of the key requirements for successfully implementation of DL-based methods is the availability of large volumes of carefully annotated datasets. In the surgical environment, this can be very expensive and labor-intensive tasks. More importantly, in some scenarios, it can be even impossible to acquire such data, for example, when you consider the need for estimating depth information for endoscopic surgery images, which is an important task to facilitate navigation in a surgery setting using a robot or a semi-autonomous device. In the deep learning era, if the reswearchers can obtain large volumes of good quality videos with the corresponding depth maps, then such a task may be very possible.

  • artificial intelligence surgery
  • autonomous actions
  • computer vision
  • deep learning
  • machine learning

1. Dexemes/Surgemes/Situation Awareness

Surgical maneuvers have been classified into smaller gestures termed surgemes, using hidden Markhov models (HMMs), data from these various movements during tele-manipulation robotic surgery can be registered and analyzed [1][55]. These HMMs have been found to be able to stratify surgeons according to where they sit on the learning curve by analyzing the precise tissue–tool interactions [2][56]. Additionally, analysis of torque and force data has been able to successfully classify skills when using the complete robotic surgical systems [3][57]. Subsequent studies calculated kinematic measurements with more dimensions via linear discriminant analysis, and Bayes’ classifiers were used to increase to four dimensions so that the quality of surgical gesture segmentation could be maximized [4][58].
Accuracy of surgical maneuver segmentation was further enhanced by again using HMMs to analyze even smaller parts of surgical maneuvers called dexemes [5][6][59,60]. What this approach to the analysis of surgical gestures shows the reusearchers is that the computer can, in essence, see and process surgical movements that the human body and eye may not even notice or register. The ability of computers to analyze the movements of surgical robots that are tele-manipulated and that robotic devices create autonomously may be as relevant as the analysis of more studied and traditional forms of CV.
For instance, an anastomosis of the small intestine would be divided into several steps or surgemes, specifically, the placement of the two ends of intestine next to each other, the placement of sutures to fix the orientation of the two ends together, the creation of the two enterotomies, the creation of the stapled anastomosis with linear staplers or with sutures and if indicated the mesenteric defect closure. Dexemes would be the different hand gestures of the tele-manipulator that are needed to perform each one of these various steps/surgemes. A sum of the different dexemes would make one surgeme and the sum of a set of surgemes would make up an entire procedure such as cholecystectomy, minor liver resection all the way up to more complicated procedures such as a total gastrectomy or major hepatectomy [7][61].
Other examples of AI surgemes/dexemes are robots that can autonomously create cochleostomies and others that can carry out knot tying [8][9][62,63]. Another interesting effort to create an autonomous action was the development of an independently functioning robot that was created to be able to autonomously accomplish a peg and ring task via the analysis of motion trajectories that can adapt in real-time [10][64]. To accomplish this task, the research team used the da Vinci Research Kit and dynamic movement primitives and answer set algorithms to develop a working framework. This framework works by imitating the movement of each dexeme, but with the added enhancement of situation awareness that enabled the robot to constantly adapt to a changing environment with new obstacles to overcome. The authors believe that this may be the first time that situation awareness was combined with an autonomous robotic action with a documented ability to correct errors and recover in real-time in a surgical environment [10][64].

2. Phase Recognition

Phase recognition can help in hospitals by managing workflows, for example, other healthcare providers can be alerted to ongoing progress in the endoscopy suite or operating room. It can help alert providers during interventions to abnormalities such as hemorrhage or missed injuries. Theoretically, phase recognition may be easier to obtain than some of the other tasks that are mentioned above. This is because it may be easier to create algorithms that can teach the computer to identify steps during cholecystectomy such as clipping of the cystic duct and differentiate them from visually different steps such as dissecting the gallbladder off of the liver bed. Alternatively, differentiating more similar steps such as cystic artery clipping from cystic duct clipping may prove more challenging especially when anatomic variations are present. Furthermore, more complex cases that have more anatomic variability, locations in the abdomen and steps will stress the system even more. For example, during total colectomy, all four quadrants of the abdomen are involved, and during pancreatic head resection, the surgeon may go back and forth between operative fields and by definition, between operative phases rendering useful autonomous phase recognition a difficult task indeed.
A review article on ML published in the Annals of Surgery in 2021 noted a significant increase in publications on AI with 35 articles dedicated to resolving the task of phase recognition [11][65]. Research in this field is not limited to HMMs, there is also a lot of interest on the viability of artificial neural networks in creating reliable and effective phase recognition. Most of the datasets analyzed are from feature learning of videos of surgical procedures with the annotation of instrument utilization carried out manually by a trained expert, often a surgeon in training or fully trained surgeon [11][65].
As mentioned above, large datasets are used for the detection of objects, but they are also fundamental to the development of phase recognition models. Some datasets of minimally invasive cholecystectomy are publicly available and include, EndoVis workflow challenge dataset, Cholec8 and MICCAI 2016 that are available for training and testing [11][65]. In summary, standardized procedures such as cholecystectomy and sleeve gastrectomy are ideal minimally invasive surgeries to study when developing phase recognition AI architectures, but for less standardized operations such as hepatic-pancreatic and biliary or colorectal surgery this will take much more sophisticated algorithms. Currently, the main obstacle is the relative dearth of usable surgical videos of these procedures and the fact that these procedures are significantly longer when compared to more routine procedures such as gallbladder removal and restrictive bariatric surgery.

3. Robotic-Assisted Surgery and Autonomous Actions

The daunting task of realizing more autonomous actions in robotic-assisted surgery is highlighted by the fact that the first complete robotic surgical system on the market (da Vinci Surgical System, Intuitive Surgical, Sunnyvale, CA, USA) was initially conceived as a tele-manipulator that was supposed to allow surgeons to remotely operate on soldiers injured during conflicts via open surgical techniques. At the same time, as the research team began to realize the complexities involved in making robotic tele-manipulation viable, safe and effective, the laparoscopic revolution was in full swing and the company completely shifted focus to become a tool for minimally invasive surgery [12][66]. To date, AI in so-called complete robotic surgical systems is limited to simulator training and evaluating surgical skills, and the only true differences to standard laparoscopy are the added degrees of freedom and ability to intermittently control a third arm [13][51]. Both of these historical advantages are now also available during so-called traditional laparoscopy [14][3].
Just as with the human body, CV is not only dependent on visual cues, but also information on position and proprioception. Similarly, one of the most useful forms of data for analyzing surgical movements, skills and tasks is motion analysis. Before the era of DL and HMM, surgical skill could only be evaluated by real-time observations of surgeries or by reviewing recorded videos of surgical procedures [15][67]. Due to the vast amount of data created during procedures that can take several hours, a useful way to hasten the analysis and evaluation of surgical videos is the utilization of time action analysis. This technique only analyzes data from non-continuous fixed intervals so that analysis can be carried out quicker. It is likely that this type of solution may not be safe for the creation of more complex autonomous actions in surgery such as dissection, but may be useful for simpler tasks such as the placement of surgical clips. Because of motion priors analysis with HMMs, algorithms have been developed that can automatically calculate gesture and skill metrics without using any visual information [13][51]. Next steps will be to combine data from both non-visual time action analysis with visual data, but the massive amount of data that has to be analyzed is currently a significant barrier.
To date, one of the most exciting examples of autonomous actions in surgery is the autonomous creation of a gastrointestinal anastomosis [16][68]. Accomplished via the Smart Tissue Autonomous Robot (STAR) in an animal model, this robot was able to perform an autonomous, but supervised surgeme via an open approach. To do this, the computer was able to create useable CV by combining a plenoptic three-dimensional tracking system with near-infrared fluorescence (NIRF) imaging. The fact that this was successfully carried out is especially impressive because this task was implemented on soft tissue, which is flexible and malleable. The research team also found that the robot had better skill metrics when compared to surgeons with a minimum of 7 years of experience.
The STAR has a vision system, surgeon interface, robotic arm and a force sensor. To generate a working CV, the cut intestine has to first be injected with biocompatible NIRF markers until a “point cloud” is fashioned around the edges of the cut porcine intestine so that the robot can know where to place the sutures. It is clear that even though this robot can carry out some actions autonomously, it is still completely reliant on a human placing fluorescent markers for working CV to be a reality. Nevertheless, this shows that once a robot can see sufficiently, even complex surgical tasks such as sutured gastrointestinal anastomoses can be carried out autonomously. It should be noted that although the robot’s performance was deemed to be superior to humans, this was based on movement criteria and not on clinical criteria such as stenosis or anastomotic leak rates. This emphasizes the dangers of evaluating autonomous actions in surgery on short-term mathematical movement criteria alone.
The aspiration of blood is a crucial task during surgery and the CV needed to accomplish this task autonomously is surprisingly difficult [17][69]. The initial obstacle that researchers had to overcome was the reliable and accurate detection of the blood contour. After this was carried out, a mask R-CNN method was used to create a robotic prototype that was able to aspirate blood. Known as the Blood Removal Robot (BRR), this system needs a robotic arm, two cameras, an aspirator, a suction tip and tubing. The BRR has been used in an animal model created to simulate skin and then craniotomy. The best trajectory for the aspiration is calculated using a CNN; however, the robot does not yet have the ability to “see” any other instruments in the surgical field and cannot yet take into account the existence of more than one area of bleeding [17][69].
Dissection around scar tissue created by inflammatory tissue from benign or malignant disease, infection or previous disease is the most difficult task for surgeons and will certainly be the most daunting task for AIS. Companies have already begun to gather as many data of surgical gestures and movements created by the arms of complete robotic surgical systems [18][70]. It is important to know that every time a procedure is carried out with the da Vinci robotic system, the motion data of the robotic arms are being recorded and transmitted to the manufacturer. Engineers and computer scientists hope that the sheer quantity of these data will permit the generation of functioning algorithms that will ultimately result in more autonomous actions by surgical robots [19][71]. Although it is tempting to wonder whether or not this is possible, perhaps this is not the most important question. However, maybe thwe researchers shshould be asking themourselves which is the best way forward, more specifically, are complete robotic surgical systems with tele-manipulation really the best way forward? Maybe the surgeon needs to be kept in the loop by keeping the surgeon at the bedside and developing more handheld collaborative robots [20][21][6,72].

Haptics vs. Audio-Haptics

AI has become a reality thanks to advances in ML and DL, which is a kind of ML approach using multi-layer neural networks. Both of these models can be used in supervised and unsupervised tasks. CV, which is the field of AI that enables computers to interpret images, in turn has become a reality because of advances in DL. However, similar to the human body, the concept of sight and interpretation of digitized visual inputs by computers does not fully define all the ways that a computer can interpret its surrounding, gather information and act effectively and safely on that information. In addition to analyzing pixel data, computers can also incorporate non-visual data such as motion analysis and instrument priors. These types of additional data highlight the potential significance of haptics in the future development of more autonomous actions in surgery. An insightful description of the utilization of random forests to track microsurgical instruments during retinal surgery was published in 2018 in a book entitled Computer Vision for Assistive Healthcare [22][73].
The analysis of sound or audio-haptics may also be an interesting tool in the quest for more autonomy. Computer scientists and engineers have been studying sound waves to see if algorithms can be developed that will give surgical robots even more information [23][24][25][74,75,76]. Because sound waves may require less memory, it is hoped that the analysis of sound waves will give computers more sensitive ways to obtain the relevant information with less data crunching. This may enable more useful information for the robot to have and less time lost during the analysis, resulting in more AI that can actually be used in real-time. Additionally, these types of data could give pixel data another dimension and theoretically improve computers and robots ability to safely perform autonomous tasks [25][26][76,77]. Alternative techniques devised to allow for the differentiation of tissues during surgery involve the utilization of electrical bio-impedance sensing and analysis of force feedback, but are still in the prototype phase [27][78].
Video Production Service