3.1. Feature Extraction and Selection
All 431 features were extracted across all patient data. These extracted features were added to the existing data collected using RegionFinder (i.e., the total area of GA in mm
2/year) and exported into a standard comma-separated values (CSV) file. Using a five-fold cross-validation method, the top 30 features were extracted (
Figure 2). A close-up of the feature ranking plot can be found in
Figure 3. In the context of feature extraction, this meant that the top 30 most highly ranked features were extracted across all five folds. The objective of the five-fold cross-validation process here was to determine whether the top-ranked features were consistently the same across all five cross-validations, which would confirm the importance of these features relative to the output of GA total area (mm
2/year). For completeness, if there were some features that appeared in one or two cross-validations but not all, those features were still listed as potential candidates for consideration in the modelling process just to ensure that all potential top variables were observed in the model. The final complete list of features tested for modelling can be found in
Table 2. The feature “lesion_elongation” was always the top-ranked feature across all five cross-validations, with “lesion_elongation” representing the largest and second-largest principal component axes of lesions. While the clustering variables “early_cluster”, “late_cluster”, and “lesion_cluster” did not appear in the top-ranked variables across any of the cross-validations, they were still considered and manually tested given our interest in determining whether there is an association with the new cluster groups and GA progression. The cluster variables were assessed in the traditional univariate linear mixed-effects model style. In a univariate stage, all three clustering variables showed statistical significance with a threshold of
< 0.05.
These top features were then manually tested in linear mixed-effects models with the original scale (total area mm
2/year) with the random effects of (eye|subject). This originally involved the addition of all the variables of interest identified in
Table 2 in a singular model, and the significance of the variables was assessed using the
p < 0.05 threshold. Then, one by one, variables that were not significant against this threshold were eliminated and the model was rerun. This process continued until all the features tested in the original model were statistically significant. Once these base statistically significant variables had been identified, they were then tested in various combinations, as shown in
Table 3. These combinations were observed to see which combined sets of variables produced the most promising modelling results. These combinations were tested against our various modelling and forecasting metrics:
,
, Pearson’s
, RMSE, ME, MAE, MAD, AIC, BIC, and log likelihood (
Table 4). Furthermore, normal Q-Q plots (
Figure 4) and predicted vs. output plots (
Figure 5) show that all these various combinations did meet assumptions of normality and had good correlations between expected and observed values. However, the most suitable combination belonged to a model named model 1.2, which contained the variables “lesion_elongation”, “lesion_minoraxislength”, “lesion_meshvolume”, “lesion_contrast”, “e_hyp_busyness”, “lesion_10percentile”, “lesion_sphericity”, “lesion_cluster”, “e_hyp_sumsquares”, “lesion_correlation”, “late_stage_clustershade”, “lesion_complexity”, “late_stage_minoraxislength”, “late_stage_dependencevariance”, and “late_stage_smalldependencelowgray”.
This combination of variables had one of the highest conditional and marginal values ( = 0.83, = 0.96), a high correlation coefficient ( = 0.981; p < 0.001]), some of the smallest forecasting errors (RMSE = 1.32, ME = −7.3 × 10−15, MAE = 0.94, and MAD = 0.999), and the lowest criterion values (AIC = 2084.93, BIC = 2169.97, and log likehood = −1022.46). Therefore, in the succeeding sections, these variables were further tested using (1) the original, square-root-transformed, and log( + 1)-transformed outcomes, and (2) all of these model outcomes were tested using another five-fold cross-validation method similar to the one applied to the linear mixed-effects model. The transformations were tested to see if they improved the fit of the variables to the model.
3.2. Model Selection
The models described above were used to test which of the top featured variables would be most suitable when put into a linear mixed-effects model. These top features were determined using the original scale (total area mm
2/year) as the outcome of interest with the random effects of (eye|subject). Initially, assessment of the most suitable variables was performed using the entirety of the dataset. However, to minimise any overfitting, the features identified in
Section 3.1 were then evaluated using the original scale using the cross-validation approach. The process was then repeated for the two additional outcomes of square-root transformation and log(
+ 1) transformation. The features identified in model 1.2 were tested against the new and transformed outcomes to see if they maintained any statistical significance with the transformation. Most features still maintained their statistical significance in the square-root-transformed model (i.e., the model named 1.2sq), with only “late_stage_clustershade” and “lesion_complexity” losing statistical significance with the square-root-transformed outcome. However, with the log(
+ 1) transformation (i.e., the model named 1.2log), the variable list reduced somewhat, with only “lesion_minoraxislength”, “lesion_meshvolume”, “lesion_contrast”, “e_hyp_busyness”, “lesion_10percentile”, “lesion_sphericity”, “lesion_cluster”, and “late_stage_smalldependencelowgray” maintaining their statistical significance with the threshold of
p < 0.05. A complete list of the final variables used across the three different linear mixed-effects models can be found in
Table 5. The models 1.2sq and 1.2log also underwent five-fold cross-validation for a complete comparison (
Table 6).
At first glance, the performance of the square-root-transformation model quantitatively appears to supersede those of the original scaled and log-transformed models. Additionally, the model produces the most tight-fitting normal Q-Q plots relative to the other two models (
Figure 7) and has a good residual plot (
Figure 13). However, upon closer inspection of
Table 6, it will be noticed that, save for the forecasting metrics of RMSE, ME, MAE, and MAD, all the other metrics have
identical results across all five folds of cross-validations. This suggests that the square-root-transformed model is most likely an overfitted model [
13]. In our earlier publication, we demonstrated the effect of the square-root transformation on GA growth areas and progression modelling in the absence of features. The transformation had a flattening effect and a tendency to “tighten” the data points. The overfitting may be the result of this effect, and thus a square-root-transformed model most likely is not suitable. Furthermore, we can see with the correlation plot of predicted and observed outputs (
Figure 10) that there is a curvature at the lower tail end of the plot, suggesting that a perfectly linear relationship between predicted vs. output values is absent. Interestingly, while the marginal and conditional
, the correlation coefficient (
), and the AIC/BIC/log-likelihood metrics were consistently the same across all five folds, the varying metrics of RMSE, ME, MAE, and MAD were considerably larger relative to the same values of the original scaled model, 1.2. Thus, given the identical results produced across the five-fold cross-validation, along with the predicted vs. output plot not behaving in the expected manner, the square-root transformation was deemed inappropriate in this study.
For the log-transformed variables, the marginal and conditional
values were as promising as the square-root transformation. Additionally, the values did differ between cross-validations, suggesting consistency in outcomes but no overfitting. The AIC/BIC and log likelihood values were also very promising, as they presented the smallest values out of all the models, which usually would suggest that the log-transformed model would be the most promising. The Q-Q plot (
Figure 8) and the residual plot (
Figure 14) also suggested that the residuals were normally distributed. However, the forecasting errors for the log transformation were the largest out of the three modelling types. Furthermore, the correlation plot for the log-transformed model (
Figure 11) produced very similar patterns between predicted and observed values to the square-root-transformed model.
As a result, the original scaled model, 1.2, was deemed to be the most appropriate (represented in Equation (10), where
are the fixed coefficients for variables
highlighted in model 1.2 and
represent the random effects):
Generally, the quantitative values were very good, with high values for the conditional and marginal
as well as the correlation coefficient. The forecasting errors were smallest with the original scaled model as compared to the transformed models. The average
for the original scaled model was 0.84 ± 0.01. For
, the average was 0.95 ± 0.002, the
average was 0.97 ± 0.006, the RMSE was 1.44 ± 0.06, the ME was 0.09 ± 0.15, the MAE was 1.04 ± 0.06, the MAD was 3.52 ± 0.67, the AIC was 1718.41 ± 11.08, the BIC was 1798.99 ± 11.20, and the log likelihood was −839.21 ± 5.54. While the AIC/BIC and log likelihood were highest for the original scaled model, this was overshadowed by the consistency of the visualisation results. The normal Q-Q plot (
Figure 6) may not have been as tight fitting as for the square-root- and log-transformed models, but it was still in line with the assumptions of normality, and, coupled with the residual plots (
Figure 12), this further validated that the assumptions of normality were met. Furthermore, the original scaled model was the only one that produced visually correct correlation plots (
Figure 9), with the predicted and observed values having a strong linear relationship.
Figure 6.
Normal Q-Q plots for five-fold cross-validation (A–E) of the linear mixed-effects model with the original scale of mm2/year. The plots show a general adherence to the assumption of normality.
Figure 7.
Normal Q-Q plots for five-fold cross-validation (A–E) of the linear mixed-effects model with the square-root scale of mm/year. The plot shows a much tighter fit of residuals around the 45-degree line.
Figure 8.
Normal Q-Q plots for five-fold cross-validation (A–E) of the linear mixed-effects model with the log( + 1) scale of enlargement rate/year. Generally, there was a good adherence to assumptions of normality.
Figure 9.
Predicted vs. observed plots for five-fold cross-validation (A–E) of the linear mixed-effects model with the original scale of mm2/year. There were minimal differences between the predicted and observed values, with a strong linear relationship.
Figure 10.
Predicted vs. observed plots for five-fold (A–E) cross-validation of the linear mixed-effects model with the square-root scale of mm/year. There is a curvature at the lower tail end, and the relationship is not as strong as that of the original scaled model.
Figure 11.
Predicted vs. observed plots for five-fold cross-validation (A–E) of the linear mixed-effects model with the log( + 1) scale of enlargement rate/year. There is a curvature at the lower tail end, and the relationship is not as strong as that of the original scaled model.
Figure 12.
Residual plots for five-fold cross-validation (A–E) of the linear mixed-effects model with the original scale of mm2/year. Generally, there is an even distribution of residuals, although a slight cluster appears on the left-hand side.
Figure 13.
Residual plots for five-fold cross-validation (A–E) of the linear mixed-effects model with the square-root scale of mm/year. Generally, there is an even distribution of residuals, and the slight cluster on the left-hand side is spread out.
Figure 14.
Residuals plots for five-fold cross-validation (A–E) of the linear mixed-effects model with the log( + 1) scale of enlargement rate/year. Generally, there is an even distribution of residuals, and the slight cluster on the left-hand side is spread out.