**4. Case Study**

In order to verify the approach proposed in the article, a simulation experiment was carried out. The basis of the example was to estimate the forecasted state of the vehicle occupancy for a selected communication line with a given number of stops. In the example, a real data set from the automatic counting systems of the vehicle was used. The analysed time horizon covered two weeks for one of the most crowded bus lines in Cracow. The line under consideration belongs to one of the highest frequency levels and contain 19 bus stops. On business days, the number of trips on the line under consideration was *tmax* = 68, whereas on weekends *tmax* = 47. In the computational example, the forecasts of the occupation state *S* = {1, 2, ... 6} were determined sequentially for one time step ahead (each single departure from the bus stop was a correspondingly successive time step). The state *s*1 = 1 corresponds to the lowest level of vehicle occupancy, while *s*6 = 6 denotes the highest. For each time step *t*, the initial state distribution *D*(*t*) has been updated on the basis of the available historical data. The elements of the transition matrix P were estimated empirically based on the historical data set individually for each bus stop in order to map its specificity and dynamics. The forecasted state was assumed to be the one for which the probability of occurrence in the forecasted state distribution *D*(*<sup>t</sup>*+<sup>1</sup>) was the highest. Figure 2 shows an exemplary adjustment of state forecasts to the real observed states of vehicle occupancy for a selected bus stop on a given day.

**Figure 2.** Adjustment of the forecasted vehicle occupancy states to the observed values for a given bus stop.

The presented sequence of observed vehicle occupancy states and received forecasts concerns the bus stop located in the second part of the analysed transport line. This is evidenced by the high variability of the observed states during the working day. The obtained forecast values, despite errors, try to keep up with the pace of changes in the observed time series.

The distribution of the root means square errors for each bus stop for the considered period is shown in Figure 3.

**Figure 3.** Distribution of root mean square errors for the analysed period.

The distribution of Root Mean Square Error (RMSE) errors received along the time horizon and bus stop number indicates that the highest values occur at the bus stops in the second part of the line journey (counting from the first stop). This results in the specificity of the analysed line, which passes through crucial areas in the city and numerous interchange nodes. This generates a greater randomness and variability among the incoming passengers, which leads to more significant forecasting errors. Lower errors characterise periods (t = 6 Saturday, t = 7 Sunday) due to the reduced number of trips.

In order to determine how often the model made an error and how much the predicted occupancy state of the vehicle differed from the observed state, an error histogram was prepared, as shown in Figure 4.

**Figure 4.** Prediction error histogram.

The histogram shows the frequency occurrence of a forecast error equal to *e*1 = 1, *e*2 = 2, ...*e*5 = 5, where *e*1 is the difference by one state, *e*2 by two states, etc. In the period of time covered by the

analysis, the most numerous group is the *e*1 error set, where the obtained forecast differs only by one state from the observed real value. The second, much less numerous group is *e*2. The sets of errors *e*3 and *e*4 constitute a small percentage of the whole population, while the remaining errors did not occur at all during the examined time horizon.

The averaged absolute percentage forecast errors for the relevant period and subsequent stops are presented in Table 3.


**Table 3.** Mean absolute percentage forecast errors.
