**4. Discussion**

In the present analysis, we demonstrated that the CAG repeat length accounted for over 80% of the variance in AMO amongs<sup>t</sup> patients with JOHD. This was substantially higher than in either of the AOHD analyses conducted, which showed CAG repeat length accounting for 59% of the variance (Predict-HD sample) and 57% of the variance (Enroll-HD). These results support previous reports demonstrating potentially increased predictive power of higher CAG repeat lengths [7–9]. Of note, previous reports have demonstrated that the predictive power of CAG repeat length on AMO seems to decrease at the highest CAG repeats of approximately 80 or above [10,11]. Our current cohort only included seven participants with a CAG repeat length above 80. Therefore, we did not have sufficient data to formally analyze whether or not the relationship between CAG and AMO weakens at higher CAG repeat lengths. However, informally, our results seem to confirm these previous reports. In Figure 2A, it seems as though there is a strong, linear relationship between CAG repeat length and AMO in those participants with a CAG repeat length of <80. There seems to be a bend in the regression line at approximately a CAG repeat length of 80, where the line begins to flatten out. This same shape was seen in previous reports with larger numbers of patients [10,11]. This may be due to a floor effect in the

ability of the CAG repeat length to predict AMO. Specifically, it may be possible that neurodegenerative changes occur over the course of 3–5 years. Therefore, the earliest possible AMO may be approximately five years old, regardless of CAG repeat length. This is only a hypothesis, though. Another possible explanation for the weakened relationship between CAG and AMO at CAG repeats above 80 may be related to the role that the huntingtin protein plays in neurodevelopment [15,16]. Neurodevelopmental changes have been reported to be more prominent at higher CAG repeats in patients with AOHD [15]. At higher CAG repeats, it is possible that the neurodevelopmental aberrations play a major role in the AMO in addition to neurodegeneration. This likely leads to significant di fficulty in determining when their actual AMO is, likely resulting in a significant amount of heterogeneity in age of diagnosis.

One possible explanation for why higher CAG repeat lengths (>60) may explain more of the variance in AMO is that longer CAG repeat lengths may play a greater role in the development of pathologic changes that impact the onset of disease. Another consideration is that it is known that environmental exposures may modify disease onset in patients with AOHD [17–19]. Patients with JOHD may not have the same opportunity to be exposed to particular environmental factors. Therefore, their AMO is more closely linked to CAG repeat length alone and not the additional impact of environmental factors. Genetic modifiers of disease onset have also been identified in patients with AOHD [20,21]. These studies rely on large numbers of patients to identify genetic modifiers that may impact disease onset in AOHD. It cannot be ruled out that specific genetic modifiers exist that could impact the AMO in JOHD that have not been identified, given the rarity of this patient population.

One of the largest previous reports of the relationship between CAG repeat length and AMO amongs<sup>t</sup> patients with JOHD utilized data from the Italian Huntington's Disease Databank, which retrospectively collected data from patients at two separate institutions and only includes 15 patients with a CAG repeat length of >60 [10,22]. This same review gathered data from seven separate case reports and case series and identified 26 patients with more than 80 CAG repeats [10]. Given the means by which these data were collected, the conclusions drawn in the resulting review may be seen as preliminary, as they are not the result of primary data collection. However, using the present large dataset of patients with JOHD, we are now able to confirm these previous findings showing that the strength of correlation between CAG and AMO is greater in JOHD, predicting about 84% of the variance in AMO [10].

There are important limitations to this study. First, despite being one of the largest studies of JOHD in the world, the number of patients is still relatively small. Second, we did not implement natural logarithmic transformations of our data, which has been recommended in previous studies investigating CAG and AMO [23]. We opted to not use natural logarithmic transformations of the data because doing so appeared to lead to disproportionate variance across the groups, which would lead to heteroscedasticity in the data. Lastly, as mentioned previously, the diagnosis of motor symptoms in children with the longest CAG repeats can be quite di fficult and subject to bias and variability.
