1. Introduction
Clear aligner therapy (CAT) has gained notable popularity amongst adult patients as of 2022 with the conventional orthodontic appliance treatment having “gone out with the ark”. Invisalign
® (Align Technology, Santa Clara, CA, USA) treatment has been an esthetic alternative to the conventional orthodontic appliance in treating minor-to-moderate malocclusions since 2007 [
1]. As of 2022, there are 13.4 million patients, 1.2 billion total aligners shipped, and about 229 thousand Invisalign
® trained doctors across the globe (Align Technology, Q2 Corporate Face Sheet). Despite its prevalence, there is not enough evidence on the reliability of ClinCheck
® predictability partly because there has been a major advancement in the generations of Invisalign
® [
2], rendering previous research findings obsolete. This obsoletion has been noted in recent systematic reviews, which state that no conclusion can be made from the present literature regarding the predictability of movement of teeth and reliability of ClinCheck
® accuracy [
3,
4,
5,
6]
In terms of the accuracy of sagittal movements, overjet and interincisal angles have been assessed. In 2012, Krieger et al. [
7] conducted a retrospective study to assess pretreatment and post-treatment models and ClinCheck
® measurements of 50 patients between the ages of 15 and 63 years old. They found that the mean difference between the achieved and predicted outcome was 0.34 ± 0.54 mm. In terms of the interincisal angle, in 2014, Simon et al. [
8] compared pretreatment and post-treatment models and ClinCheck
® measurements of 10 patients between 2011 and 2012. They found that the mean accuracy of incisor torque with attachment was 49.1%, while torque with power ridge was 51.5%.
Upon assessment of the transverse dimension, Houle et al. [
9], in 2017, reported an 88.7% accuracy of upper intercanine width when measured from cusp tip to cusp tip, a 100% accuracy of lower intercanine width, a 76.6% accuracy of upper intermolar width, and a 100% accuracy of lower intermolar width. When measured from cervical margin to cervical margin, the results were 67.8% accuracy for upper intercanine width, 61% for lower intercanine width, 52.9% for lower intermolar width, and 70.7% for lower intermolar width. The method of assessment included pretreatment and post-treatment digital models of 64 adult Caucasian patients. They concluded that ClinCheck
® predictions overly anticipated bodily movements and the movement observed was dental tipping more than bodily movement. Moreover, predictions tended to worsen the more posterior the measurement was taken. More recently, Tien et al. [
10], in 2022, reported a mean accuracy of 72.2% for upper intercanine width, 82.3% for lower intercanine width, 63.5% for upper intermolar width, and 79.8% for lower intermolar width. The method of outcome assessment included Invisalign’s
® arch width table, centroids, and software calipers of 57 adult patients.
When it came to the vertical dimension, Krieger et al. [
7] also reported a mean difference of 0.71 ± 0.87 mm for overbite when comparing pretreatment and post-treatment models and ClinCheck
® measurements of 50 patients between the ages of 15 and 63 years old. In the same study, Krieger et al. [
11] reported a decrease in Little’s Irregularity Index between pretreatment and post-treatment casts in the maxillary (−3.8 mm) and mandibular (−5 mm) casts.
While recent studies evaluated the accuracy of ClinCheck
® models provided by Align Technology
®, it is important to note that other measurement methods were used to evaluate the achieved and predicted results of Invisalign
® treatment. For example, some studies have utilized cephalometric radiographs to measure the changes in tooth position before and after treatment [
12]. Cephalometric analysis allows for the evaluation of skeletal and dental changes, which may not be fully captured in digital models. Additionally, subjective assessments, such as patient satisfaction and orthodontist evaluation of treatment outcomes, could provide further insight into the effectiveness of Invisalign
® treatment [
13]. Therefore, some studies utilized a combination of measurement methods to provide a more comprehensive evaluation of Invisalign
® treatment outcomes.
Recent generational enhancements have been made to the aligner material. According to Moshiri et al. [
2] G5 improved the predictability of deep bite correction by the adjustment of attachment shape for retention on the upper premolars, the introduction of pressure points on the lingual side of upper and lower incisors, and bite ramps on the lingual side of upper anterior teeth. G8 claims to increase the predictability of deep bite treatment and decrease undesired crown tipping when expanding the posterior arch. It is important to note that most of the previous studies conducted were done before the roll out of G8 rendering the previous studies obsolete. In addition, predicted tooth movements do not fully correspond with the achieved tooth movements, which warrants further studies for the improvement of the predictability of treatment. Orthodontists may want to exaggerate the digital movements to achieve the desired movement of teeth.
The aim of this study was to assess the reliability of ClinCheck® accuracy before and after Invisalign® treatment.
2. Materials and Methods
The primary objective was to evaluate the accuracy of ClinCheck® reliability in the sagittal, vertical, transverse, and arch length dimensions before and after Invisalign® treatment using iTero®. Secondary objectives included comparing the reliability of ClinCheck® between multicenters and to compare the experience of the Invisalign® providers. The null hypothesis was that there was no difference between the predicted movement and the actual movement achieved with respect to the sagittal, vertical, transverse, and arch length dimensions. The alternative hypothesis was that there was a difference between the predicted and actual movements. This retrospective study was conducted on available records of patients who underwent orthodontic treatment using Invisalign® clear aligners. This study utilized judgmental sampling, meaning that the samples were based on our discretion. The data were collected from private practices where treatment plans were executed by multiple orthodontists who are all Invisalign® Diamond providers or above. Each patient underwent non-extraction Invisalign® treatment with its variations and refinements meaning they have at least two approved ClinChecks® submitted. Ethical approval was obtained from the Research Ethics Committee, Faculty of Dentistry, King Abdulaziz University (#260-08-21).
The inclusion criteria included healthy adult patients between the ages of 18 and 60 years of age, patients treated exclusively with Invisalign
®, non-extraction cases, patients that have had at least two approved ClinChecks
® with complete orthodontic records, and class I malocclusion with crowding. The exclusion criteria were patients that had systemic diseases or syndromes, extraction cases, cases treated before G8, and cases with spacing. All data were obtained from the treatment evaluation tool in iTero
®, including overjet, interincisal angle, overbite, intercanine width, intermolar width, and crowding. An example of the parameters evaluated can be seen in
Figure 1. Initial refers to the initial measurement taken at the time of the scan, current refers to the achieved movement, and final is the predicted measurement.
Statistical Analysis: The ClinCheck
® models of the initial, achieved, and predicted outcome were obtained from Align Technology
® and the values were compared using Pearson correlation (
p < 0.05) [
14] to determine if the predicted values were correlated with the achieved values. Statistical significance is determined based on the criterion that “if the confidence interval does not contain the null value, it is considered to be a statistically significant finding”. This method is equivalent to the paired
t-test. [
15] Clinical significance is determined by a large effect that is considered clinically significant. This occurs when the two-sided 95% confidence interval is entirely above the threshold [
16]. ANOVA was used to compare the different centers.
3. Results
The total sample size was 206 patients from three different practices. The mean age was 21.24 ± 10.07 years. There were 75 male subjects (36.4%) and 131 female subjects (63.6%). Practice 1 had 70 subjects (34%), practice 2 had 36 subjects (17.5%), and practice 3 had 100 subjects (48.5%). Interpretation of the strength of the correlation coefficient was done according to Buschang et al. [
17] In terms of overjet, the data show that there was a weak but significant correlation between the achieved stage and the predicted stage proposed by ClinCheck
® (
p < 0.001) with an accuracy of 75.77%. On average there was a 0.20 ± 1.11 mm difference between the achieved and predicted ClinCheck
® values. The interincisal angle data show that there was a strong and significant correlation between the achieved stage and the predicted stage proposed by ClinCheck
® (
p < 0.001) with an accuracy of 96.23%. On average, there was a 2.56 ± 6.16° difference between the achieved value and predicted ClinCheck
® values. The overbite data show that there was a moderate but significant correlation between the achieved stage and the predicted stage proposed by ClinCheck
® (
p < 0.001) with an accuracy of 60.04%. On average, there was a 0.77 ± 1.29 mm difference between the achieved and predicted ClinCheck
® values. The upper intercanine width data show that there was a very strong and significant correlation between the achieved stage and the predicted stage proposed by ClinCheck
® (
p < 0.001) and an accuracy of 97.97%. On average, there was a difference of 0.53 ± 1.05 mm. The lower intercanine width data show that there was a very strong and significant correlation between the achieved stage and the predicted stage proposed by ClinCheck
® at (
p < 0.001) and an accuracy of 97.67%. On average, there was a difference of 0.23 ± 0.98 mm. The upper intermolar width data show that there was a strong and significant correlation between the achieved stage and the predicted stage proposed by ClinCheck
® (
p < 0.001) and an accuracy of 97.58%. On average, there was a difference of 0.95 ± 2.05 mm between the achieved and predicted final ClinCheck
® values. The lower intermolar width data show that there was a strong and significant correlation between the achieved stage and the predicted stage proposed by ClinCheck
® (
p < 0.001) and an accuracy of 97.72%. On average there was a difference of 0.49 ± 2.08 mm. Upper crowding showed that there was a very weak but significant correlation between the achieved stage and the predicted stage proposed by ClinCheck
® (
p = 0.03) and an accuracy of 38.79%. On average, there was a difference of 1.01 ± 1.55 mm. Lower crowding showed a weak but significant correlation between the achieved stage and the predicted stage proposed by ClinCheck
® (
p < 0.001) and an accuracy of 30.02%. On average, there was a difference of 1.66 ± 2.51 mm between the achieved and predicted final values (
Table 1). Based on the 95% confidence interval analysis of the difference values, overbite, upper intermolar width, upper crowding, and lower crowding exhibit statistically and clinically significant differences. This highlights the need for clinical intervention and attention to these parameters. When comparing the accuracies between the different practices, it was found that there was no difference between the centers in terms of the accuracy of treatment provided (
Table 2).
4. Discussion
The goal of this research was to evaluate the reliability of ClinCheck® accuracy of Invisalign® therapy in adults with class I malocclusion and non-extraction cases in all four dimensions. To the best of our knowledge, no previous study has evaluated the reliability of ClinCheck® using only the iTero® evaluation tool. This study aimed to compare predicted and achieved movements of teeth with respect to the sagittal, vertical, transverse, and arch length dimensions using only ClinCheck® to compare before and after.
Previous studies measured the accuracy of ClinCheck
® reliability by using various research tools, such as the American Board of Orthodontics Objective Grading System (ABO OGS) by Buschang et al. [
18] and Tooth Measure
® by Solano-Mendoza et al., Krieger et al., and Kravitz et al. [
7,
19,
20]. Despite these research tools having the ability to provide a method of accuracy assessment, a simple and convenient method of assessing accuracy is using ClinCheck
® itself to compare initial, current, and predicted movements before and after treatment. It is our belief that including more than one software would complicate the results and introduce more room for error; however, the algorithm and validity of the ClinCheck
® measurements is unknown and has yet to be released by Align Technology
® [
10].
In terms of demographic data, we had a total sample size of 206 patients, which is the largest in the literature thus far. Krieger et al. [
7] had a sample size of 50, Solano-Mendoza et al. [
19] had a sample size of 116, Buschang et al. [
18] had a sample size of 27, Kravitz et al. [
20] had a total of 37, Houle et al. [
9] had a total of 64, and Tien et al. [
10] had a total of 57. Moreover, we included samples from three different practices which offered us diversity. Most of the studies mentioned above were single-center studies which increases the risk of bias [
10,
18,
19]. Few studies included more than two centers [
9,
21].
Upon assessment of the sagittal dimension, the overjet and interincisal angle were assessed. In terms of overjet, we reported a mean difference of 0.2 ± 1.1 mm, which is in accordance with Krieger et al. [
7] who reported a mean difference of 0.34 ± 0.54 mm; however, it is important to note that they included 15-year-old patients in their study which means growth could have been a factor in the decrease in overjet. With regards to the interincisal angle, we reported a 96% accuracy, whereas Simon et al. [
8] reported only 50%. They only had a sample size of 10 patients and were using an older generation of Invisalign
®. When evaluating the vertical dimension, we reported a mean difference of 0.7 ± 1.3 mm, whereas Krieger et al. [
7] reported a mean difference of 0.71 ± 0.87 mm. Despite using an older generation of Invisalign
® the differences were negligible.
Regarding the transverse dimension, we reported an upper intercanine width accuracy of 97.9%, a lower intercanine width accuracy of 97.6%, an upper intermolar width accuracy of 97.6%, and a lower intermolar width accuracy of 97.7%, whereas Tien et al. [
10] reported 72.2%, 82.3%, 63.5%, and 79.8%, respectively. They mentioned that ClinCheck
® overestimates bodily movement and that what is achieved is mostly dental tipping rather than actual bodily movement. Another paper by Houle et al. [
9] showed an upper intercanine width accuracy of 67.8% from cervical margin to cervical margin and 88.7% from cusp tip to cusp tip. They also showed a lower intercanine arch width of 61% and 100%, an upper intermolar width of 52.9% and 100%, and a lower intermolar width of 70.7% and 100%, respectively. They concluded that ClinCheck
® predictions overly anticipated bodily movements, while the observed was more of a dental tipping than it was a bodily movement. They believed predictions tended to worsen the more posterior you go. It is possible that the conflicting results are due to the updated aligner material SmartTrack (LE30) or updated SmartForce features of G8 compared to the previous studies. However, it should be noted that multiple potential factors, such as the patient’s age, the initial malocclusion, the treatment plan design, and patient compliance, can influence the accuracy of tooth movement, and it is insufficient to determine the clinical significance based on accuracy alone. Further studies are needed that use the difference between predicted and achieved values as the evaluation indicator and control for confounding factors. There was also a reduced sample size in the previous studies.
With respect to arch length, we reported a mean difference of 1.01 ± 1.55 mm for upper crowding and a mean difference of 1.66 ± 2.51 mm for lower crowding; however, Krieger et al. [
7] reported a mean difference of 3.8 mm in the upper arch and a 5 mm difference in the lower arch. They concluded that moderate-to-severe crowding could be resolved by protruding the anterior teeth. The previous study did not consider posterior teeth, which could explain why we had conflicting results.
It is important to note that the method used by Invisalign
® and Kravitz et al. to quantify the accuracy of pretreatment and post-treatment orthodontic movements has been criticized for not considering the geometric and biomechanical rules that govern tooth movement [
22]. In a recent paper, Pandis et al. argued that the concept of “accuracy” in orthodontic treatment is complex and cannot be reduced to a simple comparison between planned and achieved tooth movements [
23]. They argued that the precision of tooth movement may be influenced by a range of factors, including the initial tooth position, the type of malocclusion, and the orthodontic appliances used. Therefore, it is important to reevaluate the data obtained with the ClinCheck
® models and consider the limitations of this method for accurately quantifying the accuracy of Invisalign
® treatment. Kravitz et al. [
20] discussed the limitations of the method used by Invisalign
® for evaluating the accuracy of pre- and post-orthodontic movements. The authors argue that the method lacks clear geometric and biomechanical rules for defining how teeth move in space, and therefore cannot provide a true measure of accuracy. They suggest that alternative methods, such as 3D imaging and finite element analysis, may provide more accurate and comprehensive data for evaluating orthodontic treatment outcomes. The authors also call for more standardized methods for evaluating orthodontic treatment outcomes to ensure consistent and reliable data. Overall, the article highlights the importance of developing accurate and reliable methods for evaluating orthodontic treatment outcomes to improve patient care and treatment planning.
Limitations: The results of this study must be interpreted with caution. Values that differed by >0.5 mm or 2° were considered clinically significant [
16,
24,
25]. Compliance was assessed by the treating orthodontist; however, no measure of assessment was made. Pretreatment and post-treatment scans were used to spawn comparisons; however, this means that patients that did not require additional aligners were excluded because no post-treatment scan would be taken. Treatment planning was based on the Invisalign
® provider’s discretion, which could limit external validity. There was also no specification as to whether a 7-day or 14-day change protocol was implemented, which has been shown to have a reduced accuracy for buccal crown torque and other movements of posterior teeth [
24]. It is important to note that ClinCheck
® is merely a graphic representation of force systems and not the decided final tooth position [
22]; moreover, it has been reported that up to 80% of patients would require refinements suggesting that the accuracy of treatment planning with ClinCheck
® is weak [
26]. The issue with refinements is that it leads to longer treatment duration, increased costs for the orthodontist, and higher manufacturing demand from the clear aligner company. It is our suggestion that Invisalign
® providers plan for overcorrection, especially in the posterior region and incorporate the use of auxiliaries to improve treatment. Another noteworthy limitation is that aligners rely heavily on patient compliance and the patient’s self-reporting is often unreliable.