3.2.1. Q-Table Configuration

To carry out the reinforcement learning simulation using the Q-learning algorithm, a Q-table needed to be defined through state and action. The Q-table is a matrix that compiles rewards for all states and actions, and it is updated after each learning step. Here, 600 OD cells were used in an OD matrix of 25 zones in Seoul excluding intrazonal travel. The state refers to the price that can potentially be generated in each OD cell depending on time slots, which can be expressed as 13 (price range) × 6 h (time round, Tr) for each OD cell. In total, there were 13 actions, including a standard price and surge prices from 0.6-fold to 3.0-fold the standard price with an interval of 0.2. The composition of the district OD matrix is shown in Figure 2, with a separate layer constructed for each cell's travel time and passage cost (*Pbase*) to be used for analysis.


**Figure 2.** OD matrix, where each OD cell is the location data for each OD movement in the matrix.
