The application of artificial intelligence (AI) has become more and more widespread in medicine and dentistry. It may contribute to improved quality of health care as diagnostic methods are getting more accurate and diagnostic errors are rarer in daily medical practice. The accuracy of determining cephalometric landmarks using widely available commercial AI-based software and advanced AI algorithms was presented. Most AI algorithms used for the automated positioning of landmarks on cephalometric radiographs had relatively high accuracy. At the same time, the effectiveness of using AI in cephalometry varies depending on the algorithm or the application type, which has to be accounted for during the interpretation of the results.
1. Introduction
Artificial intelligence (AI) is the ability of a machine to imitate logical human behaviour, including complex activities
[1]. For the first time, this term was introduced by John McCarthy during a conference at Dartmouth College in 1956
[2]. There are many forms of AI, most notably machine learning (ML), artificial neural networks (ANNs), convolutional neural networks (CNN) and deep learning (DL)
[3]. Artificial intelligence is used on a daily basis in the internet search engines (Google) and online private intelligent assistants (Siri), also rapidly evolving in other areas, including medicine. It may contribute to the improved quality of health care due to the increased quality of diagnostic methods and the elimination of diagnostic errors in daily medical practice
[4]. In medicine, it is primarily used in radiological diagnosis of neoplastic lesions and in assessing histological specimens regarding the advancement of pathological processes. In gastroenterology, it may assist in detecting and monitoring colon polyps and preventing intestinal cancers; in cardiology, it may assist in the interpretation of ECG results
[5]. Medical radiology offers a wide range of AI applications as it relies on digitally coded images that can be easily converted into a computer language
[6]. Also, in many areas of dentistry, interest in the use of artificial intelligence has considerably increased in recent years
[7]. AI algorithms can be useful in the diagnosis of dental caries, periapical or periodontal diseases, classification of maxillofacial cysts or tumours and localisation of cephalometric landmarks
[8].
Analysis of lateral cephalometric radiographs is a method widely used in orthodontic diagnosis and treatment planning. It allows for assessing skeletal relations of the maxilla, mandible and cranial base in the sagittal and vertical dimensions as well as dental relations of the upper and lower teeth to the skeletal bases. It is also used to predict the growth direction in children and adolescents and to evaluate the results of orthodontic treatment. Cephalometric analysis is valuable when planning orthognathic surgery to correct skeletal maloclussions in adults
[9]. At present, it is used to identify cephalometric points via their digitalisation on the computer screen utilising software for digital cephalometric analysis. In recent years, AI was employed to perform cephalometric analysis, which is supposed to relieve clinicians’ work and save time. Applications that use AI-based image analysis are becoming more common and available to clinicians.
2. AI algorithms in a Cephalometric Analysis
The accuracy of different types of AI algorithms varies, as demonstrated by the results published in the included studies (
Table 1). The authors used different numbers of cephalograms for the testing and validation of the database, which varied from dozen to a thousand. Also, the number of clinicians performing the manual annotation of landmarks varied in number and in clinical experience in cephalometric tracing. Moon et al. (2020) concluded that the more data that were implemented during the training procedure of AI, the smaller the detection errors observed
[10]. The development of reliable “gold standards” in the identification of cephalometric landmarks is important to reduce bias in the dataset used for AI training. Also, the time of the AI analysis varied between studies.
Table 1. Studies on the effectiveness of AI in the analysis of lateral cephalometric radiographs.
No |
Study |
No. of Cephalograms |
Patients’ Age (in Years) |
Type of Algorithm |
No. of Examiners |
No. of Landmarks/Mean SDR |
No. of Measurements/Mean Error |
Time for Analysis (in Seconds) |
1 |
Leonardi et al., 2009 [11] |
41 |
10–17 |
Authors’ algorithm/CNN, Borland C++ |
5 |
10/ n.s. |
n.s. |
257 for 10 landmarks |
2 |
Tanikawa et al., 2010 [12] |
859 (400: permanent dentition; 459: mixed dentition) |
5–60; mean age: 23.6 (permanent dentition); 8.9 (mixed dentition) |
Authors’ algorithm/PPED system |
2 |
18/ n.s. |
n.s. |
n.s. |
3 |
Lindner et al., 2016 [13] |
400 |
7–76 |
Authors’ algorithm/FALA system, RFRV-CLM |
2 |
19/ 84.7% in the range of 2 mm |
8/ 78.4 ± 2.61% |
<3 |
4 |
Park et al., 2019 [14] |
1311 (1028: training set; 283: testing set) |
n.s. |
Authors’ algorithm/YOLOv3 and SSD |
1 |
80/ YOLOv3: 80.4% in the range of 2 mm |
n.s. |
0,05 for YOLOv3; 2.89 for SSD |
5 |
Hwang et al., 2020 [15] |
1311 (1028: training set; 283: testing set) |
n.s. |
Authors’ algorithm/YOLOv3 and manual analysis |
2 |
80/ mean detection error: 1.46 ± 2.97 mm |
n.s. |
n.s. |
6 |
Moon et al., 2020 [10] |
2400 (2200: training set; 200 test set) |
n.s. |
Authors’ algorithm/YOLO v3 |
2 |
80/ n.s. |
n.s. |
n.s. |
7 |
Lee et al., 2020 [16] |
400 |
n.s. |
Authors’ algorithm/Bayesian CNN |
2 |
19/ 82.11% in the range of 2 mm |
n.s. |
512/38 for 19 landmarks (1 GPU/4 GPU) |
8 |
Kunz et al., 2020 [17] |
1792 (96.6%: training set; 3.4% validation set) |
n.s. |
Authors’ algorithm/CNN, Keras and Google Tensorflow |
12 |
18/ n.s. |
12/ <0.37° (angular measurements); <0.20 mm (metric measurements); <0.25% (proportional measurements) |
n.s. |
9 |
Kim at al., 2020 [18] |
2075 |
n.s. |
Authors’ algorithm/DL, SHG, Tensorflow, Python |
2 |
23/ 84.7% in the range of 2 mm |
n.s. |
0.4 for 23 landmarks |
10 |
Kim et al., 2021 [19] |
950 (800: training set; 100: validation set; 50: testing set |
n.s. |
Authors’ algorithm/CNN |
2 |
13/ 64.3% in the range of 2 mm |
n.s. |
n.s. |
11 |
Tanikawa et al., [20] |
1785 |
5.4–56.5; mean age: 12.2 |
Authors’ algorithm/CNN-PC & CNN-PE, Adam |
2 |
26/ success rates from 85% to 91% |
n.s. |
n.s. |
12 |
Tanikawa et al., 2021 [21] |
2385 |
5.8–77.9 |
Authors’ algorithm/ CNN-PC&PE, Adam |
2 |
26/ success rates from 85% to 90% |
n.s. |
n.s. |
13 |
Yao et al., 2022 [22] |
512 (312: training set; 100: validation set; 100: testing set) |
9–40 |
Authors’ algorithm/CNN, PyTorch |
2 |
37/ 45.95% in the range of 1 mm; 97.3% in the range of 2 mm |
n.s. |
3 for 37 landmarks |
14 |
Uğurlu, 2022 [23] |
1620 (1360: training set; 140: validation set; 180: testing set) |
9–20 |
Authors’ algorithm/CNN/PyTorch, Python |
1 |
21/ 76.2% in the range of 2 mm |
n.s. |
n.s |
15 |
Popova et al., 2023 [24] |
890 (387: training set; 43: validation set; 460: testing set) |
All ages |
Authors’ algorithm/CNN/(Keras and TensorFlow, Python |
3 |
16/ 84.73% in the range of 2 mm |
n.s. |
n.s. |
16 |
Jeon et al., 2021 [25] |
35 |
Mean age: 23.8 |
Commercial analysis/CephX |
1 |
16 |
26/ 0.1–0.3° (angular measurements); 0.1–0.3% (linear measurements) |
n.s. |
17 |
Bulatova et al., 2021 [26] |
110 |
n.s. |
Commercial analysis/Ceppro |
2 |
16/ ±0.13 mm in the range of 2 mm for 75% of landmarks; mean difference 2.0 ± 3.0 in X plane and 2.1 ± 3.0 in Y plane |
n.s. |
n.s. |
18 |
Ristau et al., 2022 [27] |
60 |
Patients with a full complement of teeth |
Commercial analysis/AudaxCeph |
2 |
13/max. mean error: <2.6 mm in X plane; <2.3 mm in Y plane |
n.s. |
n.s. |
19 |
Kılınç et al., 2022 [28] |
110 |
10–24, mean age: 15.83 ± 2.85 |
Commercial analysis/ WebCeph and CephNinja |
1 |
n.s. |
11/ ICC from 0.170 to 0.884 |
n.s. |
20 |
Çoban et al., 2022 [29] |
105 |
>15, mean age: 17.25 ± 2.85 |
Commercianalyser/ WebCeph |
1 |
n.s. |
22/ ICC from 0.418 to 0.959 |
n.s. |
21 |
Mahto et al., 2022 [30] |
30 |
Mean age: 20.17 ± 6.72 |
Commercianalyser/WebCeph |
1 |
n.s. |
12/ ICCC from 0.795 to 0.966 |
n.s. |
22 |
Tsolakis et al., 2022 [31] |
100 |
Mean age: 15.9 ± 4.8 |
Commercial analyser/CS imaging V8 |
1 |
16 |
18/ ICC from 0.70 to 0.92 |
n.s. |
23 |
Jiang et al., 2023 [32] |
9870 (8611: training set; 1000: validation set; 259: testing set) |
6–50 |
Commercial analyser/CNN/CephNet |
5/100 |
28/ 66.15% in the range of 1 mm; 91.73% in the range of 2 mm |
11/ 89.33% |
n.s. |
Today, CNN-based algorithms derived by many authors for the purpose of their studies, or YOLOv3 or SSD algorithms, not available to doctors in their daily clinical practice, are more effective and accurate than the widely available web-based software such as WebCeph, AudaxCeph or CS Imaging.
Most AI algorithms used for the automated tracing of landmarks on lateral cephalographic radiographs are characterised by relatively high accuracy. In most studies, the confidence interval was within 2 mm, and the mean percentage of detected landmarks within this margin was above 80%. However, from the clinical point of view, the localisation error up to 2 mm can be acceptable for some, but not all points traced in cephalometric analysis. The localisation of cephalometric points A and B in the horizontal plane is crucial for the determination of maxillary/mandibular relations in the sagittal plane. An inaccurate localisation of these points in the range of 1.5–2 mm would result in a considerable inaccuracy of many angular and linear measurements, especially if errors are duplicated using the same landmark in several measurements. It also has to be stressed that cephalometric analysis of lateral head radiographs performed manually is a subjective examination, and the localisation of specific anthropometric points may differ between orthodontists. It has been demonstrated that the mean discrepancies between two experienced clinicians could be up to 1.5 mm as well. Moreover, a repeated tracing of landmarks on the same radiograph by one orthodontist may entail an error of approximately 1 mm between two measurements. Unlike manual tracing of cephalometric landmarks, the AI algorithm always marks identical localisation of the landmarks, which can be an additional asset for its use
[15].
The studies confirmed that the time needed for analysing a cephalometric radiograph using most of the popular AI algorithms takes a few seconds. This is considerably shorter than the manual tracing of landmarks by clinicians. The most recent algorithms evolve rapidly, and their calculating capacity increases, which will probably result in their increased efficiency and reliability. It can be expected that in the future, AI algorithms that are used for the automated localisation of landmarks may be more accurate than manual tracing. At the same time, the interpretation of cephalometric analysis via artificial intelligence may be inferior to the interpretation performed by experienced orthodontists but can still be useful to less experienced specialists or even non-specialists. It is necessary to conduct further studies to assess the reliability of AI-performed cephalometric analysis in planning, monitoring and analysing orthodontic treatment. There is no doubt that the ease and short duration of cephalometric analysis via AI may be a significant factor in facilitating orthodontic treatment in clinical practice.
The use of AI algorithms in radiological diagnostics in the area of orthodontics is not restricted to the automated detection of landmarks in cephalometric analysis. AI provides high accuracy in the assessment of cervical vertebral maturation on radiographs
[33][34]. Another AI algorithm that is described in the literature is supposed to predict the need for tooth extractions due to orthodontic reasons
[35].
The identification of cephalometric landmarks is challenging, as a skull is a 3D object projected onto a 2D plane on a lateral head cephalogram. Overlapping structures increase the difficulty in precise landmark identification, especially in patients with facial asymmetry. Moreover, improper head position during image acquisition and radiographic distortions may lead to errors in landmark identification by orthodontic professionals. The quality of cephalograms used for landmark identification, the level of orthodontic training and experience in landmark identification as well as inter-observer variability between clinicians who participate in the training and validation of the AI model are important factors and limitations of this diagnostic tool. Another source of AI inaccuracy might be due to the operator’s mistake while calibrating images for the AI cephalometric analysis, like in the Ceppro software (Bulatova et al., 2021)
[26]. Even a small error in using a digital ruler alters the number of pixels in 1 mm and can influence the coordinates for all points.
The advantage when using an automated system for the identification of cephalometric landmarks in comparison with the manual annotation is the fact that it would always give the same result for the same image, while there are large variations in the accuracy of manual annotation related to the levels of training and experience
[13]. Improving the training and validation of AI algorithms may completely replace manual cephalometric tracing in the future.
Threats and challenges of the future use and development of AI in the analysis of patients’ medical records are related to the data protection and application of the principles of medical ethics whenever computer software that simulates human brain activity is used. It is possible that new legal regulations concerning the application of AI in the diagnostics and monitoring of orthodontic treatment will have to be proposed and implemented. Pre- and postgraduate curricula and clinical practice must be adjusted for technological advancements, so they can contribute to the optimisation of orthodontic treatment without adversely affecting its effectiveness.