*4.2. Transformation*

The transit journey data may be illustrated as a triangle where each point of triangle represents the coordinate of the trip origin, destination and transfer. The size of the journey triangles varies by the actual trip distance and therefore the journey data needs to be converted into a homogeneous coordinate system to analyse the spatial distribution pattern. The conversion is done by applying a series of transformation techniques in this study. The first step of the transformation is to transform the journey triangle *OTD* (origin–transfer–destination) on a spherical Earth's surface to a 2D plan, given the latitudes and longitudes of each point of interest, as shown in Figure 2.

**Figure 2.** Transformation from a spherical Earth's surface to a 2D plan.

The great-circle distance between two points, which is the shortest distance over the Earth's surface, is calculated based on the spherical law of cosines. The spherical law of cosines states that, for a spherical triangle,

$$
\cos OD = \cos OT \cos TD + \sin OT \sin TD \cos T \tag{1}
$$

where,

*O*, *T*, *D* = Interest points of the journey triangle, *OTD OD* = Distance between origin point and destination point *OT* = Distance between origin point and transfer point *TD* = Distance between transfer point and destination point

The location of any point on the earth can be defined by its latitude and longitude. In reference to Equation (1), the *OD* distance can be calculated as the arccosine of cos *OD*, as shown in Equation (2).

$$\begin{aligned} \cos OD &= \cos OT \cos TD + \sin OT \sin TD \cos T\\ OD \text{ (in rad.)} &= \cos^{-1} \left[ \cos OT \cos TD + \sin OT \sin TD \cos T \right] \end{aligned} \tag{2}$$

The unit used for angles is in radians, which gives the distance between origin and destination in radians. Given the convenient mean radius of the earth to be equivalent to 6371 km, the distance between origin and destination, in km, can be calculated by multiplying the *OD* distance (in radians) with 6371 km, as shown in Equation (3).

$$OD\ (\text{in km}) = \cos^{-1}\left[\cos OT \cos TD + \sin OT \sin TD \cos T\right] \* 6371 \text{ km} \tag{3}$$

The same technique was applied to calculate the great-circle distance of *OT* and *TD*. With the great-circle distance of *OT*, *TD* and *OD*, the respective angles of any triangle on a 2D plane could be calculated using the law of cosines, as shown in Equation (4).

$$\begin{array}{l} \cos O = (OT^2 + OD^2 - TD^2)/2(OT\*OD) \\ O = \cos^{-1} \left[ (OT^2 + OD^2 - TD^2)/2(OT\*OD) \right] \end{array} \tag{4}$$

After the journey triangle *OTD* was obtained, it needs to undergo a series of Euclidean transformations to display all the origin, destination and transfer points in a standardised Euclidean space, as illustrated in Figure 3.

**Figure 3.** Euclidean transformations.

The first step of Euclidean transformation is translation. Translation relocates the journey triangle *OTD* to set the triangle's origin point, *O*, at (0, 0). This transformation preserves the congruence and distance of the journey triangle *OTD*. Applying the translation process to the single-transfer journeys results in all the journey triangles originating from the same point at (0, 0). The notation for translation (*Th*,*<sup>k</sup>*) is shown in Equation (5). The origin and destination points will undergo the same transformation.

$$T\_{h,k}\left(T'\_{x\*}, T'\_{y}\right) = \left(T\_x + h\_{\prime}, T\_y + k\right) \tag{5}$$

Preserving the congruence and distance, the journey triangle *OTD* is rotated at *O* (0, 0) until the triangle plane, *OD*, rests on the *x*-axis. This transformation rotates all the journey triangles to lie along the *x*-axis for the destination point, *D*, to have the coordinate of (*<sup>x</sup>*, 0). The notation for rotation is shown in Equation (6).

$$
\begin{bmatrix} T'\_{\times} \\ T'\_{\times} \end{bmatrix} = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix} \begin{bmatrix} T'\_{\times} \\ T'\_{\times} \end{bmatrix} \tag{6}
$$

At this stage, all journey triangles *OTD* lie on the same plane (*x*-axis). The next step of the transformation is to loosen up the restriction to consider bijection, which preserves the shape and angles of the triangle, but not distance. The aim of this step is to transform all journey triangles *OTD* to have the same *OD* unit distance, as shown in Figure 4. The notation for compression and dilation (*CDk*) is shown in Equation (7).

$$\text{CD}\_k\left(T^{\prime\prime}{}\_{\text{x}\prime}T^{\prime\prime}{}\_{\text{y}}\right) = \left(kT^{\prime\prime}{}\_{\text{x}\prime}kT^{\prime}{}\_{\text{y}}\right) \tag{7}$$

**Figure 4.** Transfer locations on a homogeneous coordinate.

### *4.3. Transfer Location Map*

Figure 4 illustrates the transfer points of the single-transfer bus journeys, transformed to the scale of *OD* unit distance for both the *x* and *y* axis. In the figure, the scale is not the actual distance, but is adjusted through either compression or dilation. This study assumes that the plot represents the "acceptable" or "viable" transfer locations in relation to the straight path to destination. Consequent analysis quantifies and ranks the viability of transfer points and validates its impact on the travel mode choice.

The distribution of transfer points in Figure 4 may have been influenced by the availability and quality of transit services in the area. They are chosen transfer locations possibly among multiple alternatives, available to the travellers. The transit network structure must be an important determinant of the distribution pattern. The transit network of Brisbane takes the typical radial structure, with no trunk or feeder service. It is common that a transfer requires a significant deviation from the direct path to the destination or even in the opposite direction from the destination. The distribution pattern in the figure may reflect the inconvenience factor of the service transfer under the existing network structure.

### *4.4. Grid-Based Hierarchical Clustering*

To analyse the spatial distribution pattern of the transfer points, this study used the grid-based hierarchical clustering method, which combines the grid-based clustering and hierarchical clustering methods. Cluster analysis is a data reduction tool that partitions a sample dataset into clusters, where objects within a specific cluster share many characteristics, but are very dissimilar to objects not belonging to that cluster [40]. The grid-based clustering (also known as density-based clustering) is one of the most e fficient approaches for mining large data sets. Unsupervised clustering such as K-means was inappropriate for this study because it clusters the data in similar sizes (i.e., point densities). The underlying assumption of the choice modelling in the next step is that the cells with higher point densities are considered as more viable transfer locations by travellers. This method adopts algorithms that partition the data space into a finite number of cells to form a grid structure [41] as shown in Figure 5.

**Figure 5.** Grid structure on the transformed transfer location map.

For grid-clustering, each grid is defined as 0.2 × *OD* (origin to destination) unit distance increment. These transfer points are plotted in reference to 1.0 × *OD* unit distance. Figure 5 shows the clear concentration of transfer points in the cells, along with the direct path between the origin and destination. For cell clustering, the cell density is calculated for each cell as follows:

$$\text{Cell density} = \frac{\text{Total number of transfer points in grid } x}{\text{Total number of transfer points}} \tag{8}$$

The hierarchical clustering method was applied to sort the cells into clusters. Hierarchical clustering is useful for finding relatively homogenous clusters of cases based on measured characteristics. It starts with each case as a separate cluster. Next, these clusters are combined sequentially until only one cluster is left. The algorithm for this clustering method uses the dissimilarities or distances between objects when forming the clusters [40]. Figure 6 shows the cell-density for each cell, and to which cluster each cell is assigned by the hierarchical clustering method.

Figure 6 presents the preferred transfer locations with relatively high cell densities. Figure 7 shows the result of hierarchical clustering. Some interesting results were observed in the travellers' transfer selection. The majority of bus journeys conducted had made a transfer located in the cells F1 and J1. These two cells were identified to have the highest transfer point density at 13.78% and 15.47%, respectively (Cluster A). These two cells may be regarded as the most preferred transfer location and

by having a transfer service in those cell locations will increase the likelihood of making a transfer and eventually taking a transit, compared to other cells.

**Figure 6.** Cell-density in respective clusters.

**Figure 7.** Cell density dendrogram.

All the other cells are categorised into five different clusters by the cell's grid density. The hierarchical clustering uses the Ward's method to measure the dissimilarity among clusters. Ward's method uses an analysis of variance approach, instead of distance metrics to evaluate the distances between clusters, where cluster membership is assessed by calculating the total sum of squared deviations from the mean of a cluster [42]. The dendrogram allows the tracing backward and forward to any cluster at any level. It gives an idea of how grea<sup>t</sup> the distance is between clusters in a particular step, using the 0–25 scale along the top of the chart.

Cluster B includes G1, H1 and I1. Transfer points in those five cells of Cluster A and Cluster B account for 54.1% out of the total 150 cells in the map. This implies that most travellers prefer the transfer point to be closely located along the direction to their trip destination. Cluster C consists of seven cells, E1, F2 to J2 and K1. The average cell density significantly declined to 3.67%. Some bus users travelled to transfer points in the opposite direction from their destination, but not too far from their origination location. Similarly, some travellers made a transfer farther from their destination location. The cell density further decreased for the cells in Cluster D with the average density value at 1.37%. The transfer points located in the Cluster A to D groups accounted for 89.35% of the total transfers. The average density value of the cells in Cluster E and Cluster F was negligible at 0.27% and

0.02%, respectively although they accounted for more than 87.33% of the total map area (131 out of 150 cells).

In general, based on the transfer point density in each cell, it was observed that the majority of transfers are conducted at the locations along the direct path to the destination. Transit users occasionally travel to the opposite direction from the destination, or slightly farther away from the destination to make a transfer. When a transit journey is required to make a transfer that is deviated from the direct path between origin and destination, the realisation of such trips is unlikely. This demonstrates the impedance of transfer location, and the interpretation must take into account the transit network structure.

### **5. Mode Choice Analysis**

Two binomial logistic regression models (base and expanded model) are drawn on two travel modes: private vehicles and the bus. For a mode choice analysis, the information of mode specific variables of the alternative (unchosen) mode is necessary. Information of the alternative mode only can be inferred. This study used the GTFS (General Transit Feed Specification) data to infer the bus journey information for those who have chosen a private vehicle as their travel mode; and Google Maps to infer the private vehicle travel time for those who have chosen the bus as their travel mode. This analysis considers only the home-based work journeys. If a traveller has used the bus as the mode of transport, the journey must include one service transfer using the same travel mode (bus). Due to the strict criteria, only 330 private vehicle journeys and 63 bus journeys were used for the analysis. The 2009-10 SEQHTS is the most recent and detailed dataset available to demonstrate travellers' travel patterns and mode choice.

The dependent variables of the model are dichotomous, representing travel mode choice (transit or private vehicle). The independent variables tested in this analysis include individual characteristics (gender, age, individual weekly income, household size and number of cars in the household), journey attributes (travel time, initial wait time, first mile walk time and last mile walk time) and transfer attributes (proportion of in-vehicle bus travel time, proportion of transfer walk time, proportion of transfer wait time, type of transfer and transfer location). Table 1 presents the list of independent variables with brief descriptions.


**Table 1.** List of independent variables.

Two di fferent models were developed to test the e ffectiveness of the transfer location variable. The base model (Model I) takes the conventional approach to account for the e ffect of transfer by incorporating the proportion of the in-vehicle bus travel time, proportion of transfer walk time and

proportion of transfer wait time variables. The expanded model (Model II) used the same set of independent variables and an additional the "transfer location" variable. The test results of those two models are presented in Table 2.


**Table 2.** Binomial logit model results: transfer location as an ordinal variable.

Notes: \*\*\*: *p* < 0.01; \*\*: *p* < 0.05; \*: *p* < 0.1. Coefficients that are statistically insignificant (*p* ≥ 0.1) are not shown in this table.

Table 2 shows only the variables that provided the best fitting model fit. For instance, gender, age, network distance, transfer walking time, transfer wait time and transfer type were found not to be significant. The best-fitting basic model (Model I) incorporated six independent variables including: individual weekly income, household size, number of cars, car travel time, bus travel time, initial wait time for the first bus service, first mile walk time, last mile walk time and proportion of in-vehicle bus travel time. In Model II, the transfer location variable was found significant at the 0.05 level. This is notable as the new variable was found to make substantial influence on the mode choice, significant at the 0.05 level. As for socioeconomic variables, the household size had a positive effect on the utility of transit, whereas the individual weekly income and number of cars in the household had a negative effect on the transit utility. The car travel time factor was found significant (at the 0.01 level) among other journey attributes. Other journey attributes such as the first mile and last mile walk time were found significant at the 0.1 level for both the base and expanded models, which are consistent with the literature. As the access and egress increases, the use of transit decreases [38,43]. The initial wait time for the first bus service was found significant at the 0.1 level, only for the expanded models. If transit is not available at the time when individuals needed to travel, it decreases the attractiveness of transit.

As for the transfer-related variables, only the proportion of the in-vehicle bus travel time factor was found to be significant in both the base and expanded models (significant at the 0.05 level). Exp. β shows the effect of the independent variable on the odds ratio. The Exp. β coefficient relating the proportion of in-vehicle transit travel to the likelihood of using transit was 49.45 and 69.27 in the base and expanded model, respectively. These results implied that travellers are more likely to use transit as the proportion of in-vehicle bus travel time increases. This finding is consistent with the literature that shorter in-vehicle transit travel times could lead travellers to perceive the walking and wait times during transfer more onerous and eventually increases the relative attractiveness of a private vehicle [1,20,44,45].

The transfer location variable in the expanded model was found to be significant at the 95% confidence level. The negative coefficient suggests that a transfer location farther from the OD path will decrease the utility of bus and the probability to choose the bus mode. In fact, it turns out that the transfer location factor is one of the most important determinants of travel mode choice. This variable has the Exp. β (the odds ratio) value of 0.75, which shows that a change in the transfer location from a more preferred cluster to a less preferred cluster (e.g., from Cluster A to Cluster B) would decrease the probability of choosing the bus to 0.43, and increase the probability of choosing a private vehicle to 0.57.

The prediction capability of Model I and Model II was compared using McFadden rho squared to demonstrate the effectiveness of the new transfer location variable and its impact on the travel mode choice. Model I resulted in the pseudo R-squared, ρ2 at 0.29, whereas Model II increased it to 0.31. McFadden suggested ρ2 values of between 0.2 and 0.4 should represent a very good fit of the model [46]. The increase in ρ2 by Model II demonstrates that with the inclusion of the new variable, Model II has a better explanatory power on mode choice as compared to Model I.

The chi-squared (χ2) test was conducted to investigate the statistical improvement between Model I and Model II, by gauging the change in the log-likelihood function relative to the change in degrees of freedom. The chi-squared, χ2 value of 5.04 exceeds the critical chi-squared of 1 degree of freedom of 3.84, at the 0.05 significant level. This gives a sufficient evidence to reject the null hypothesis that Model II is no better than Model I. With the inclusion of the transfer location variable into Model II, it outperforms Model I (base model).

The transfer location variable in Table 2 is ordered from the most preferred cluster to the least preferred cluster, in an ordinal-scale. This approach is effective to study the impact of transfer location as a variable, based on the assumption that the distance between clusters is equal. To study the relationship between the clusters, an additional binomial logistic regression model is conducted to include the transfer location variable as nominal variables. The result is shown in Table 3.




**Table 3.** *Cont.*

Notes: \*\*\*: *p* < 0.01; \*\*: *p* < 0.05; \*: *p* < 0.10. Coe fficients that are statistically insignificant (*p* ≥ 0.10) are not shown in this table.

The result from Tables 2 and 3 did not di ffer much. The age of the travellers became significant at the 0.05 confidence level, with a negative e ffect on transit utility. Having the transfer location as nominal variables, Cluster F was assigned to be the reference category. The exponential β coe fficient shows that if a transfer location is in Cluster A, it will have 5.99 times more chance to use the bus over Cluster F. Transfer locations located in Cluster B, C and D were found to be significant at the 0.05 level, but not Cluster E. This implies that as the transfer location changes from a less preferred cluster to a more preferred cluster (e.g., from Cluster F to Cluster A), it will increase the probability of choosing the bus over an automobile.

### *Transfer Location and Transit Travel Time*

The analysis results indicate that the chance to make a transit trip was likely to decrease if the trip involves a transfer at the location that deviates from the direct path to the destination. In the conventional mode choice analysis, the level of deviation is quantified in terms of travel time incurred during the transfer. The new variable was created to capture the impact of deviation in the travel direction. We take two approaches to examine the potential collinearity between transfer location and transit travel time. Firstly, the Spearman's correlation coe fficient (ρ) was calculated between the bus travel time (continuous variable) and the transfer location (categorical variable) of the Household Travel Survey bus trip data. A weak correlation (ρ = −0.211) was found between two variables, which implies that the location of transfer may play as an independent factor for the travel mode choice.

The second approach presents three plots of the transfer point distribution by the length of the bus journey time (less than 30 min, between 30 and 45 min and more than 45 min), as shown in Figure 8.

The level of deviation of transfer points was derived using Equation (9). An arbitrary cell length of 4 was used—for example, the deviation of a transfer point (*<sup>x</sup>*, *y*) was calculated as the sum of the distance from the origination point (0, 0) and the distance from the destination point (20, 0).

$$\text{Level of deviation} = \sqrt{\mathbf{x}^2 + y^2} + \sqrt{(\mathbf{x} - 20)^2 + y^2} \tag{9}$$

The average level of deviation of a short (less than 30 min), medium (between 30 and 45 min) and long journey (longer than 45 min) was found at 24.7, 24.3 and 25.7, respectively. Although more deviation was found among the longer bus trips, the distribution pattern is largely unchanged regardless of the length of travel time. This suggests that the preference for transfer location was not a ffected by the travel time (or distance) and travellers might disfavour adjacent transfer services depending on their relative location with respect to destination. The conventional approach of using door-to-door travel time to capture the transfer cost is not su fficient. Transit travel time may be able

to capture the effect of deviation in the travel distance, but it is not capable to capture the effect of deviation in the travel direction towards transfer services.

**Figure 8.** Distribution of transfer points.
