4.2.1. Overall Comparison

In this part, we show the effectiveness of the proposed LDA and the comparison to the baselines.

**Results of Baselines**: Figure 6a,b represent the baseline results of rental and dropoff, respectively. Baselines without machine learning such as HA, HSA, and HSW are worse than regression or NN results. CC-XGB, our proposed framework, defeats the second-best with an average of 0.2 to 0.3 approximately in RMSLE, whether in a rental or drop-off situation.

**Figure 6.** Performance of baselines. (**a**) represents the rental demand and (**b**) represents the drop-off one.

**Results of Feature Combination**: Figure 7a,b represent the results of the different combinations of features in rental and drop-off, respectively. The result of feature set E without features of category clustering in Figure 7b has poor performance evidently, confirming that G-clustering is effective. No one always performs better between IV-D and IV-K; one reason may be due to slight differences in clustering results. Though the differences in the batches are not obvious, CC-XGB performs much better than other feature sets in batch 2 and 3, confirming the applicability of our framework.

**Figure 7.** Performance of different feature combinations. (**a**) represents the rental demand and (**b**) represents the drop-off one.

**Analyze for Batches:** Under the prediction result of CC-XGB, our proposed framework, RMSLE decreases from Batch 1 to 3 in drop-off mode; ye<sup>t</sup> results in Batch 3 are worse than in Batch 2 in rental mode. We infer that the demand for renting bikes downtown is more stable than in other areas; in other words, users are less willing to rent a bike from newly established stations, making the prediction difficult. On the other hand, the drop-off demand is hard to predict for the first batch stations.

#### 4.2.2. Region Size Setting for Extracted Features

In our experiment, the reachable station region is set as 500 m (Figure 1 (left)) for the appropriate number of POIs and check-ins. In this part, we would like to compare how different radiuses affect the results. Features I, III, and V are related to the reachable station number. Experiments are conducted from 300 m to 1000 m in Figure 8. As shown in Figure 8, a larger radius does not necessarily mean a better prediction result. We can observe that in Figure 8, 500 m is a superior radius region for a target station to extract corresponding features since the RMSLE for three batches are relatively low when *r* = 500 m.

**Figure 8.** Results of different regions for feature extraction in (**a**) Rental and (**b**) Drop-off.

#### 4.2.3. Feature Importance (FI)

Figures 9–11 show the feature importance for Batch 1 to Batch 3, and the detailed features whose importance is ranked in the top five are listed aside. Figure 9a, Figure 10a, and Figure 11a show rental feature importance, while Figure 9b, Figure 10b, and Figure 11b show drop-off feature importance. Overall, the nearby station features are extremely important in prediction since they have the highest scores in all situations; in particular, the score gap is more significant in Batch 3 (Figure 11a,b), explaining that nearby stations are highly correlated to newly established stations. The feature importance obtained from G-clustering is all ranked in the top five in those five figures (top-6 in Figure 11a), proving that our idea of clustering categories is reasonable and useful.

#### 4.2.4. Prediction of Different Periods

Our work focuses on long-term prediction, e.g., six months, since the short-term prediction (e.g., one month) is too difficult to predict and not worth studying in practice due to initially unstable environments. The experiments conducted on one, three, six and nine month(s) in Figures 12 and 13 have shown that the six months' prediction has the best performance. The nine months case is worse than the six months. The reason comes from the data instead of our model. In our dataset, we observe that there are some new stations built surrounding the existing stations after six months so that the demands of some stations in a certain batch were influenced by new stations. The prediction then would become not so accurate. For batch 1, batch 2, and batch 3, the RMSLE of six months is the lowest comparing to one month, three months, and nine months. In batch 1, the gap between six-month and others for rental is from 0.02 to 0.31, and the gap for drop-off is from 0.07 to 0.36. In batch 2, the gap between six-month and others for rental is from 0.07 to 0.2, and the gap for drop-off is from 0.01 to 0.11. In batch 3, the gap between six-month and others for rental is from 0.09 to 0.2, and the gap for drop-off is from 0.03 to 0.09.

**Figure 9.** Feature importance of Batch 1. (**a**) Represents rental demand, and (**b**) represents the drop-off one.

**Figure 10.** Feature importance of Batch 2. (**a**) Represents rental demand, and (**b**) represents the drop-off one.

**Figure 11.** Feature importance of Batch 3. (**a**) Represents rental demand, and (**b**) represents the drop-off one.

**Figure 12.** Different periods of prediction for Rental.

**Figure 13.** Different periods of prediction for Drop-off.

#### *4.3. Random Prediction Results*

Similar to works focusing on predicting demand through splitting data into the training set and testing set without considering established time, we also repeat the same steps in our experiment to verify the usefulness of our LDA framework. In other words, we conduct the prediction experiment of rental/drop-off demand 10,000 times through randomly divided stations and return the average RMSLE result (Figure 14). The result of CC-XGB still performs the best. However, our superiority is not so apparent since our proposed features are relatively suitable for batch prediction rather than random prediction.

**Figure 14.** Random prediction for rental/drop-off.

#### **5. Discussion of the Results**

In this research, we are facing the demand prediction problem of real-world bikesharing systems. In the previous experiments, we can observe that two important factors in LDA settings are worth discussing, considering real-world applications. One is batch deployment. Another is the prediction time period. These two factors are mutually high-correlated.

**Discussion of batch deployment**: In the past, existing works usually aimed to predict human flows for each individual station in a short time, such as next hour, next day, and next 1–3 days. However, in real-world applications, we claim that predicting longterm demands for station deployment is also critical for urban planning and construction. Therefore, we propose an LDA framework, which can help governments or transportation companies to make decisions for deploying bike-sharing services in a smart city. We have observed that the real-world bike stations are mainly built-in batches for expansion areas in modern cities. That is, we can use only the historical demand data from previously deployed areas for prediction. The batch consideration in the LDA framework confirms that our work is the first to address the long-term demand of new stations for future batch

stations, providing the governmen<sup>t</sup> with a tool to pre-evaluate the bike flow of new stations before deployment. LDA can avoid wasting resources such as personnel expense or budget.

**Discussion of prediction periods**: In Section 4.2.4, our experiment shows that the six months' prediction has the best performance. The reason is we observe that in the New York Citi bike sharing system there are some new stations built surrounding the existing stations after six months so that the demands of some stations in a certain batch were influenced by new stations. However, we believe our proposed LDA framework is also helpful for making decisions using the prediction results of periods that are more than six months since the prediction error is mainly from the crawled future data. To conclude, our LDA framework can work as a web service to assess the effectiveness of new bike stations for expansion areas in different cities.
