**3. Methodology**

In this section, we introduce (a) our proposed G-clustering algorithm, (b) extracted features correlated with rental/drop-off demand, and (c) demand prediction. We define notations used in this paper in Table 1. Problem definitions and our proposed framework are explained in Section 3.1.

**Table 1.** Notations used in this paper.


*3.1. Preliminary and Problem Definition*

**Definition 1.** *Reachable Station Region. Considering how far a resident is willing to move and to get appropriate modeling of spatial factors, we define r as the radius of the farthest influencing area of a new station. In other words, when considering a location to build a new bike station, we propose to set a Euclidean distance r to extract the neighbor characteristics and features. Figure 2 gives an example. Si is the target location, and we extract the density of our pre-defined POIs, which may be correlated with bike demands within the region.*

**Figure 2.** An illustration of the reachable region of station *Si*and the corresponding bike routes.

**Definition 2.** *Nearby Stations. For the target location of a new station, we extract its top-k nearest stations whose establishment dates are earlier than the corresponding nearby stations. Three features of corresponding nearby stations are considered in our work: the difference of establishment dates, the number of cumulative demands, and the Euclidean distance between the target location and the nearby stations.*

**Definition 3.** *Bike Route Structure. We consider the road length of bike routes and the number of intersections in road structure as features to improve the demand prediction effectiveness. The reason that we consider the road length of bike routes is because a bike station might have a great demand in the long-term if its surrounding environment contains many bike routes, which are convenient for riders to travel by taking bikes. The high number of intersections might also indicate a traffic hub with significant human mobility, leading to increased potential bike flows.*

In Figure 3, there are three kinds of bike routes, and a bike route *Ri* is composed of multiple intersections (red points) and road segments (black dotted lines). Those route segments and intersections within the reachable station region of *Si* are needed to be included. That is, the features extracted from *R*1, *R*2*,* and partial of *R*3 in Figure 2 should be taken into consideration.

**Figure 3.** Examples of bike route intersection.

**Definition 4.** *Season. The period after building a station will span multiple seasons, and all of them should be considered since the commuting behavior of people will change with seasons. For each target station, we calculate how many months it will operate in each season. Spring is defined as the months from March to May, and the season changes every three months.*

**Definition 5.** *Category Vector Pi for Each POI. A POI Pi may have more than one corresponding category defined in Facebook Place API. Then, we define Pi as:*

$$P\_{\vec{d}} = \langle p\_{\vec{i}\_r \vec{j}} \rangle \tag{1}$$

*where pi,j = 1, if Pi belongs to CTj; or 0, otherwise.*

*Where CTj is the jth element in the category set defined by Facebook.*

**Problem Definition.** *Rental/Drop-off demand prediction.* Given *k* new bike station locations *SN* = {*S*1, *S*2, ... , *Sk*}, we want to predict the rental/drop-off demands of each station six months after its establishment; that is, *Si* rent/*Si* drop defined in Table 1.
