*3.1. Model Analysis*

The training dataset was used to train models using each of the three approaches. Figure 1 shows the confusion matrices and accuracies, following training on the 34-week training dataset. All three models produced similar accuracies with AutoML, achieving a small increase in accuracy over the two averaging approaches. All the models showed a slightly increased propensity for misclassifying periods the vehicle was not available (true label=0) as periods the vehicle was available (predicted label=1), as shown in the upper right quadrants of the confusion matrices.

**Figure 1.** Confusion matrices and accuracy of all 3 models on the training dataset. The labels indicate predicted and actual vehicle availability. AutoML, automated machine learning (**a**); CMA, cumulative moving average (**b**); EMA, exponential moving average (**c**).

To determine if this performance carried over to novel data, the models were tested against the 8-week test dataset. The results in Figure 2 showed that although the accuracy of all 3 models reduced on the test set, the performance remained relatively robust with an accuracy of approximately 90% in all cases. A McNemar test [26] was performed to test the statistical significance between each of the models. The difference between all models was found to be highly significant (*p* < 0.001). This indicated that although the overall accuracy was similar for all models, the set of classification errors made by each approach was significantly different.

**Figure 2.** Confusion matrices and accuracies of the 3 models on the test dataset. The labels indicate predicted and actual vehicle availability. AutoML, automated machine learning (**a**); CMA, cumulative moving average (**b**); EMA, exponential moving average (**c**).

To further explore the differences between the models, accuracy was calculated for University term and non-term periods and separately for holidays and non-holidays. Figure 3 shows that performance for term and non-term periods was very similar for all models, including the averaging approaches for which term was not considered during training. This suggested that fleet behaviour was not substantially impacted by this feature. This was not the case, however, for holidays for which the averaging approaches performed poorly and AutoML very well.

**Figure 3.** Accuracy of the 3 models for term/non-term periods and holiday/workdays. Data labels show accuracy for holidays.

To demonstrate the reasons for this disparity, the average available vehicles for each half-hour period during the two holidays on Mondays in the test dataset was compared to that predicted by each of the three models. Figure 4 shows that the actual availability was relatively static throughout the holidays, a pattern that was typical for a weekend. AutoML correctly identified this pattern; however, the predictions for both CMA and EMA were representative of a typical non-holiday Monday. This was as would be expected, given that the holiday feature was not considered in those approaches.

**Figure 4.** Total available vehicles predicted by the 3 models for the 2 holiday Mondays in the test dataset compared to actual availability.

To accommodate this prediction error, a heuristic was used for the CMA and EMA models that treated any holiday as a Sunday. Therefore, for any rows in the test set with *hol* = 1, the prediction for Sunday was used, i.e., *d* was set to 0 for that row. The revised models, termed CMAh and EMAh, were tested on the same 8 weeks training set using this heuristic and the revised confusion matrices and accuracies, as shown in Figure 5.

**Figure 5.** Confusion matrices and accuracies for the averaging approaches using the holiday heuristic. CMAh, cumulative moving average with holiday heuristic (**a**); EMAh, exponential moving average with holiday heuristic (**b**).

The accuracy of both averaging approaches increased through the use of the holiday heuristic and was now comparable to AutoML. A McNemar test was performed and again showed a highly significant difference between the averaging models and the AutoML model (*p* < 0.001). However, no significant difference was now found between the CMAh and EMAh models (*p* > 0.05).
