**3. Results**

The DLP and MSSP models were solved while using Gurobi (version 9.1). The DLP models are quite small, with the largest one being the DLP-SH solved at *t* = 1 in the simulations (1231 constraints and 1291 variables). The DLP-RH with a 10-period window size has 411 constraints and 431 variables. The shrinking and rolling horizon DLP models both have CPU solve times that are below 0.3 s on average. The largest MSSP model solved is the MSSP-SH solved at *t* = 1 (263,958 constraints and 299,941 variables), which has an average CPU solve time of 119 s. The MSSP-RH with a 10-period window size has 64,698 constraints and 71,521 variables. The average CPU time to solve the MSSP-RH model is 12 s.

Table 2 summarizes the performance results from each solution method. Figure 6 shows the total inventory profiles at each node for the lost sales case. The total inventory includes both on-hand inventory and pipeline inventory incoming from a node's suppliers. Similar results (not shown) were observed for the backlogging case. Figures 7 and 8 show sample network flow plots for the RL and MSSP-RH models, respectively. The cumulative network flow plots for both DLP instances and MSSP-SH are not shown as they are similar to that of MSSP-RH. The edge thickness is proportional to the average total amount of material requested through that link. These network flows indicate the suppliers that are prioritized by the different model policies. Figure 9 shows the average unfulfilled market demand at the retailer node (lost sales), which gives an indication of the service levels of the supply network. A similar result is obtained for the backlogging case.

**Table 2.** Total reward comparison for the various models used to solve the IMP. *Performance Ratio* is defined as the ratio of the final cumulative profit of the perfect information model to that of the model used. *DLP* = Deterministic linear program; *MSSP* = Multi-stage stochastic program; *RL* = Reinforcement Learning; *RH* = rolling horizon; *SH* = shrinking horizon; *Oracle* = perfect information LP.


**Figure 6.** Average total inventory at each main network node (lost sales mode). Shaded areas denote ±1 standard deviation of the mean value.

**Figure 7.** Average network flow with the RL policy (lost sales mode). Total flow is proportional to the edge thickness.

**Figure 8.** Average network flow with the MSSP-RH policy (lost sales mode). Total flow is proportional to the edge thickness.

**Figure 9.** Average unfulfilled demands at the retailer node (lost sales mode).
