*3.3. Feature Extraction*

We divide all features into six categories based on their data sources. They are I.#POI and #Checkins, II. Nearby station features, III. Popular spots, IV. G-clustering, V. Bike route structure, VI. Season. In the experiment, we will evaluate the effectiveness of these six categories. In Table 3, we give an overview of features.


**Table 3.** All Features and their Descriptions.

**I. #POI and #Checkins**. The number of POIs (Point-Of-Interests) and check-ins can be indicated as the level of prosperity in an area and therefore results in a higher frequency of bike demands. We extract #POI and #Checkin's based on Facebook API.

**II. Nearby station features**. A new station is usually highly related to the nearby stations due to spatial effect and human mobility. Three features of top-*k* nearby stations are considered in our work: the difference in establishing dates, the number of cumulative demands, and the Euclidean distance between the target location and their nearby stations. If a nearby station is built later than the target location, the number of cumulative demands will be set as zero. After extraction, we obtain a total of 3 *k* features for nearby stations. Such a large number might dominate the prediction result of the classifier. Therefore, PCA (**P**rincipal **C**omponen<sup>t</sup> **A**nalysis) is applied to reduce feature dimensions.

**III. Popular spots**. We define popular types of POIs (e.g., over 1000 stores in New York) specifically, calculating the number of corresponding types of POIs and check-ins of each station in its reachable station region.

**IV. G-clustering**. We perform the G-clustering algorithm to use the clustering result as our features. We set two kinds of clustering methods in step 1 of G-clustering: one is DBSCAN, and the other is K-means.

**IV-D**. Category clustering results applying DBSCAN.

**IV-K**. Category clustering results applying K-means.

**V. Bike route structure**. The more bike routes near a station, the higher the probability the bikes will be rented for convenience. We then calculate the sum of total route length and the number of intersections of bike routes in the reachable region of station *Si*.

**VI. Season**. Seasons will greatly affect people's willingness to ride a bike. For example, users tend to rent a bike in spring rather than in winter, so data in December is obviously less than in May. According to Definition 4, if station *Si* starts operating in May, then the number of months in the following six months from spring to winter is 1, 3, 2, 0.
