1. Introduction
The air transportation system (ATS) is a socio-technical system analyzed as a complex network for many years [
1,
2]. The ATS is analyzed at different geographical scales (see, for example, studies covering the ATSs of China [
3], Europe [
4] and the U.S. [
5]) and at different resolutions starting from the airport–flight network down to the network of the reference points used in the definition of flight routes (called navigation points) [
6].
In the majority of studies, the ATS is investigated by setting up a flight network where nodes are airports and flights connecting airports are links. The flight networks have been investigated by considering them undirected and/or directed networks (in this last case, the direction of the links originates from the departing airport and ends up in the arrival airport), unweighted and/or weighted [
7]. Several studies have considered the problem of the resilience of the ATS to failures and attacks [
5,
8,
9,
10]. Other studies have selected a subset of links (labeled as the “backbone” of the ATS) presenting statistical properties that are not consistent with a specific null hypothesis [
11,
12], making the ATS one of the first systems where statistically validated networks [
13] have been investigated.
The ATS is a complex system composed of well-defined subunits. In fact, flights are operated by different airlines that compete and collaborate among them. Since 2010, the ATS has been analyzed by distinguishing the role of its subunits (i.e., by analyzing properties of flight networks of single airlines [
5]). Moreover, the presence of different flight networks observed for different airlines made this system a natural candidate for the study of so-called multiplex, which are networks where nodes can have multiple kinds of relations called layers. In fact, the ATS was one of the first socio-technical systems described as a multiplex, where layers represent flights operated by different airlines [
14,
15].
Flight networks have been investigated from different perspectives and at different scales [
16,
17,
18], for example, by considering basic network metrics, topology of the degree distribution, resilience to attack or failures, community detection of large clusters and computation and analysis of network motifs. Motifs are isomorphic subnetworks of a specified number of nodes and shape. Motifs were first investigated in studies of social networks [
19]. In these earlier studies, motifs were primarily investigated as triads (i.e., as subnetworks of three nodes) and put in relation with the properties of the degree sequence. At the beginning of this century, such structures were also investigated in biological systems under the name of motifs [
20]. By considering isomorphic motifs (i.e., subnetworks where the identity of the node is not taken into account when considering the shape of the subnetwork), there are 13 isomorphic for subnetworks with 3 nodes or 3-motifs. This number soon explodes when the subnetwork includes more nodes. For subnetworks with 4 nodes (or 4-motifs), one counts 199 isomorphic motifs [
20].
Network motifs have been investigated in flight networks both in studies comparing the informativeness of network detection in several types of complex networks [
21] and in studies fully focused on the static and dynamics characteristics of the flight networks [
22,
23,
24,
25]. In this study, we investigate the temporal evolution of 3-motifs and 4-motifs for the 50 European airlines with the highest number of flights in the European Civil Aviation Conference (ECAC) airspace in the year 2017. By investigating the number and temporal evolution of the 3- and 4-motifs, we are able to perform an unsupervised classification of the 50 airlines indicating that main differences among different airlines are due to their regional specialization (including the ability to perform intercontinental flights) and to their business model. We observe that the business model of each airline ranges between the two stylized models of
hub-and-spoke and
point-to-point business models [
26,
27]. In a
hub-and-spoke model, one or more airports act as “hubs”, i.e., as special airports directly connecting all remaining airports. In a
hub-and-spoke structure with a single hub, the network therefore has a star topology with the hub at the center of the star and all the other airports acting as leaves of the network. In the
point-to-point structure, all the airports are equivalent and the network degree is characterized by pair interconnections between airports.
The main goal of our investigation is a reliable and effective classification of airlines. The classification is obtained by an unsupervised methodology that only takes into account the information about the airline flights. We hypothesize that the business models of each airline induce specific constraints on its flight network. These constraints are reflected in the motif occurrence of each airline. Our network analysis shows that European airlines present a heterogeneous profile distributed between the two boundaries of
hub-and-spoke and
point-to-point business models. The heterogeneity is clearly shown by using a measure of concentration of degree in the degree sequence. Specifically, as a measure of concentration, we use an adapted version of the Herfindal–Hirshman index [
28,
29]. For the sake of simplicity, in the remaining text, we will call this index by the more traditional, although imprecise, name of Herfindal index.The time evolution of motifs shows that the basic temporal unit of the flight schedule is the week. Differences in the degree concentration observed during winter and summer schedules are detected, but their amount is negligible for most airlines. Average values of the motif occurrences may therefore be a useful proxy of the average behavior of the airlines over a calendar year. By using average values of the 4-motifs occurrence, we are able to obtain an unsupervised classification of airlines. The obtained hierarchical clustering is showing that the presence of a given number of hubs together with the presence or absence of intercontinental flights characterizes groups of airlines. On the other hand, a hierarchical clustering based on a similarity measure estimated starting from the co-presence of the two airlines in the origin–destination flight is providing a poorly informative hierarchical clustering.
The paper is organized as follows. In
Section 2, we discuss the data used in our analysis and the metrics and methods used to characterize flight networks. In
Section 3, we present our results about the heterogeneity of the degree concentration and our results about the structure and time evolution of 3- and 4-motifs for the different airlines. Average 4-motif occurrences are used to perform an unsupervised clustering of the 50 airlines providing an informative hierarchical cluster. In
Section 4, we discuss our results.
2. Data and Methods
We investigate the flight networks of the 50 biggest commercial airlines flying over the European flight zone. Specifically, we consider all flights that occurred during the period from 1 January 2017 to 31 December 2017.
A flight network is a network where nodes are airports and links are flights that occurred in a given time interval. By considering that the flight occurs from a departing airport to an arrival airport, flight networks can be described as directed weighted networks (where the weight of a link is the number of flights that occurred from airport i to airport j in the chosen time interval). In this study, we considered flight networks as directed networks while we disregard the weights of the links. Networks are computed using daily and weekly time intervals.
Flight networks and their metrics of each airline are analyzed both in their time evolution and in their subunits. Specifically, we investigate the daily degree sequence of each airline for each day. In our analysis, we primarily focus on the concentration of the highest degree values on a limited set of airports usually described as “hubs”. This is performed by adapting the Herfindal index, i.e., a well-known measure of concentration, to the degree sequence. The subunits analysis is carried out by considering all isomorphic small networks with 3 or 4 nodes. These subnetworks are called motifs in the biological literature or triads or subnetworks in the social science literature.
We compare similarity between pairs of airlines both by considering the links, i.e., flights, they are performing on a specific day or week and by considering the motifs they present on a specific day or on average over the full year. Similarity between the airlines is therefore estimated and interpreted by extracting hierarchical trees from the selected similarity matrix.
2.1. Flight Data
Our dataset comprises all the flights that, even partly, cross the ECAC airspace for the entire 2017 year. Data were obtained by EUROCONTROL (
http://www.eurocontrol.int, accessed on 4 February 2022), the European public institution that coordinates and plans air traffic control for all of Europe.
Specifically, we obtained access to the Demand Data Repository (DDR) from which one can obtain all flights followed by any aircraft in the ECAC airspace. Data about flights contain several types of information. In the present study, we just focus on the origin–destination of each flight crossing the ECAC airspace at a given time.
By considering that our focus is on the specific characteristics of airlines, in the present study, we investigate flights of the major 50 airlines performing flights in the ECAC airspace in 2017. In our set, we do not consider Air Berlin because this airline ceased operations on 27 October 2017. Since 2016, Germanwings has been a lease operator for its sister company Eurowings. In our set, we are not considering Germanwings flights. The selected airlines have performed
of the total number of flights of 2017, which corresponds to approximately 3000 flights per company per month on average. The list of the 50 airlines is provided in
Appendix A. The large majority of airlines are commercial airlines. There are 24 flag carrier airlines, 14 low cost carrier (LCC) airlines, 6 regional airlines, 2 leisure airlines, 2 scheduled airlines, 1 cargo airline and 1 rental airline.
2.2. Herfindal Index
The Herfindal index [
28] has been introduced in the economic literature in order to measure the amount of competition among industrial firms. As such, it has also been used as an indicator of concentration, as large firms usually contribute more to the Herfindal index than smaller ones. In the context of complex networks, the Herfindal index can be defined as
where
is the degree of node
i and
is twice the number of directed links.
2.3. Motifs Detection
The investigation of subnetworks of fixed size (also called motifs) has a long history. Originally investigated as triads and put in relation with the properties of the degree sequence in the investigation of social networks [
19], they were then also introduced in biology where the term “motif” was used for the first time [
20].
In network analysis, a motif of size k is a structure of k nodes not necessarily all linked between each other, as, for example, in
Figure 1. Motifs are different from cliques. A clique is defined in undirected networks, and it is a subgraph such that every two distinct vertices are adjacent.
For size k = 3, there are 13 isomorphic 3-motifs. In
Figure 1, we are showing all of them together with the classification scheme used in [
20]:
Isomorphic 3-motifs present unidirectional links (as in the case of motifs labeled as 6, 12, 36, 38 and 98), bidirectional links (as in the case of motifs 78 and 238) and both types of links (as in the case of motifs 14, 46, 74, 102, 108 and 110).
The number of isomorphic 4-motifs is 199 and therefore much larger than 13. As for the 3-motifs, we use the classification of [
20]. For the shape of each 4-motif, one can consult the motifs dictionary that can be downloaded from the website of Uri Alon laboratory.
Network motif analysis can be performed by computational or analytical approaches. In our investigation, we considered a computational approach as it allows for the exact count of network motifs. Computational approaches usually follow a three-step procedure that can be summarized as follows:
Search and enumerate occurrences of a topology with fixed size in the observed network;
Classify topologies by their isomorphic classes;
Calculate statistical significance for each isomorphic classes comparing occurrences with those in random ensemble.
In particular, we considered the
mfinder [
30] software developed by Uri Alon laboratory.
2.4. Average Linkage Clustering Analysis
We assess the similarity between each pair of the n airline by estimating the correlation between the average occurrence of each 4-motif of each airline. The average is computed over the 365 days of the year. To take into account the large interval of values observed for the different 4-motifs, we use the Spearman correlation coefficient. Therefore, by starting from the matrix of records obtained by averaging the occurrence of each 4-motif, we estimate a correlation matrix and we use the correlation as a measure of similarity between airlines i and j.
From the correlation values, we compute a distance according to the relation . This distance is therefore used to extract a hierarchical tree with the method of the average linkage.
The average linkage cluster analysis is a hierarchical clustering procedure [
31,
32]. The procedure gives as an output a rooted tree or dendrogram. In this procedure, at each step, when two elements or one element and a cluster or two clusters
p and
q merge in a wider single cluster
t, the distance
between the new cluster
t and any cluster
r is recursively determined as the average distance between any element of
t and any other element of cluster
r.
3. Results
3.1. Herfindal Index
Our first analysis determines the daily flight network of each investigated airline. The day is defined as the calendar day at European Central Time. For illustrative purposes, we show the networks of the nine biggest airlines on day 1 September 2017 in
Figure 2. This day has been retrospectively selected as an example of a day with routinely operational activities.
For each flight network, we extract the degree sequence by considering the network as a directed network. The average values over the year of the number of nodes
N (i.e., number of airports where airlines flight), the number of direct links
E (i.e., the number of distinct origin destination flights), minimum degree, median degree, mean degree, maximum degree, standard deviation of the degree and Herfindal index are shown in
Table 1. The metrics shown in
Table 1 are quite basic and standard with the exception of the adaptation of the Herfindal index as an indicator of concentration in the degree sequence observed in one or more of the nodes.
Given the definition of Equation (
1), a pure
hub-and-spoke setting of flights would imply a Herfindal index of 0.25 for large values of
N. This is what we observe (see
Table 1) as average yearly value for Brussels Airlines (BEL), Aeroflot (AFL), KLM, Iberia Airlines (IBE) and Finnair (FIN). Networks of these airlines are very close to a pure
hub-and-spoke setting. Other airlines show lower values of the average Herfindal index. The values observed range from 0.213 for Austrian Airlines to 0.016 for Ryanair, showing a high variability of the underlying flight network structure. For the sake of compactness, in
Table 1, we show only the yearly average values of the selected indicators. To assess the degree of variability of the Herfindal index, we show in
Figure 3 the daily profile of this index for the top ten airlines in number of flights. They are Ryanair (RYR), Lufthansa (DLH), Turkish Airlines (THY), EasyJet (EZY) Air France (AFR), Scandinavian Airlines (SAS), British Airways (BAW), KLM (KLM), Vueling Airlines (VLG) and Alitalia (AZA). Time dynamics of the Herfindal index is detectable for several airlines but fluctuations are quite limited and primarily reflect a weekly or intra-weekly periodicity. Some airlines also show detectable winter–summer dynamics. Examples are THY, AZA and VLG. Horizontal dashed line is the expected values of the Herfindal index for networks with only bidirectional links and with
K pure hubs and all the remaining (large) number of leaves only flying to a single hub for
K ranging from one (top dashed line) to five (bottom dashed line). In particular, KLM networks are compatible with a network structure having a single hub (i.e., Schipol airport), Lufthansa (DLH) networks are compatible with a two hub network (prominent Lufthansa hubs are Frankfurt and Munich airports). Vueling (VLG) and Scandinavian Airlines (SAS) have a pattern compatible with three or more hubs and/or with a prominent section of the flight network based on
point-to-point flight circulation, whereas the Herfindal index of Ryanair and EasyJet have pretty low values, manifesting the poor relevance of the
hub-and-spoke structure in their flight networks.
Our analysis can therefore confirm that flight network characteristics are deeply related to the business organization of each airline with a prominent role played by the choice of a hub-and-spoke versus a point-to-point structure and with a role played by the number of hubs characterizing the flight network.
In the next section, we investigate 3-motifs to better characterize similarity and differences among the flight networks of airlines.
3.2. 3-Motifs
We have computed the number of 3-motifs present on daily flight networks for all 50 airlines. In
Figure 4, we show a color code map of the occurrence of the 13 isomorphic 3-motifs for the 9 largest airlines.
The occurrence of each 3-motif presents large variability among the different types of motifs and is correlated with properties of the flight networks such as number of nodes, number of links, number of bidirectional links and topology structure of the network. The most common 3-motif is motif 78. This type of motif is clearly manifesting that a hub-like structure and bidirectional links are essential ingredients of all flight networks. The other 3-motif with all bidirectional link, i.e., 3-motif 238, is significantly present in airlines presenting flight networks with a pronounced point-to-point structure, such as Ryanair, EasyJet and Vueling, or airlines having more than a single hub, such as Lufthansa, Turkish and Scandinavian Airlines. The 3-motifs with only unidirectional links are poorly observed (see average occurrence values of 3-motifs 6, 12, 36, 38 and 98). Some of the 3-motifs with mixed types of links are significantly present (for example, 3-motifs 14 and 74), while others are rather poorly expressed (as in the case of 3-motifs 102 and 108).
The profile of occurrence of the 3-motifs in different airlines is certainly informative. However, the number of 3-motifs is somewhat limited and therefore it is useful to consider motifs of larger size. In the next section, we investigate the occurrence of 4-motifs.
3.3. 4-Motifs
3.3.1. Daily Occurrence of 4-Motifs
We compute the occurrence of all 4-motifs for the daily flight networks of the 50 biggest airlines. In
Figure 5, we show a color code map of the occurrence of the 199 isomorphic 4-motifs for the 9 largest airlines.
The profile of 4-motifs is richer than the one of the 3-motifs. Occurrences of the 4-motifs span about 5 orders of magnitude. For this reason, in
Figure 5, we show the decimal logarithm to provide a comprehensive overview of the results. Airlines characterized by the presence of a single hub such as KLM present only a very limited number of 4-motifs with occurrence different from zero. Airlines with a business model closer to a
point-to-point structure such as Ryanair and EasyJet present a higher number of observed 4-motifs. The other airlines characterized by a different number of hubs present an intermediate behavior between the two extremes. In addition to the presence or absence of a given motif at a given day,
Figure 5 also shows a time variation of the occurrence of a given motif. To investigate the main frequencies associated with this time variation, we compute the periodogram of the occurrence of a set of 4-motifs. Specifically, we consider the twelve 4-motifs with the highest occurrence averaged over all considered days. In
Figure 6, we show the power spectrum of the time evolution of the occurrence of the top twelve 4-motifs of Ryanair. For all 4-motifs, frequency peaks are detected for
f = 0.14 day
and for its second and third harmonics. The main frequency
f = 0.14 day
corresponds to a weekly cycle and the second and third harmonics correspond to a bi-weekly or three-weekly cycle. Therefore, the main underlying periodicity is the week periodicity as already observed in the estimation of the Herfindal index (see periodicity observed in
Figure 3).
Tracking in details of the occurrence for the 199 different 4-motifs is impractical. For this reason, we first consider the ten 4-motifs with highest occurrence in the nine biggest airlines. Specifically, we rank these 4-motifs, and the rank is obtained by considering the daily occurrence of each motif averaged over all days of the year. Labels of these motifs are listed on
Table 2 according to their rank for the 9 biggest airlines.
The link configuration of these 4-motifs is shown in
Figure 7. In the figure, we show on the left 4-motifs composed by unidirectional links, whereas on the right we have motifs with only bidirectional links. The 4-motifs with both unidirectional and bidirectional links are shown in the middle of the figure. It is worth noting that 4-motifs with only unidirectional links (i.e., 4-motifs 14, 28, 280 and 2184) are only observed for KLM in the top 10. KLM is one of the airlines with an almost pure
hub-and-spoke structure and the flight concurring to this type of motif is, in the majority of the cases, an intercontinental flight. All the other airlines have the top 10 4-motifs presenting a high number of bidirectional links. The 4-motif with the highest occurrence for all the top 9 airlines (and indeed the top 4-motif for 49 of 50 airlines) is 4-motif 4382. These motifs present three bidirectional links originating from the same node. As for 3-motif 78, the largest occurrence of this motif reflects the fact that at least one important airport is used as a hub by the airline generating the network. The other 4-motifs composed by only bidirectional links (i.e., 4-motifs 4698, 4958, 13260 and 13278) are compatible with a
point-to-point structure or with a
hub-and-spoke structure in the presence of at least two hubs. In fact, these 4-motifs are not observed for KLM and are observed at the highest rank for more oriented
point-to-point airlines such as Ryanair, EasyJet and Vueling. They are also present when more than one hub is present as, for example, in the case of Lufthansa or Scandinavian Airlines. The ranking of the 4-motifs can therefore be used to evaluate the similarity of flight airline networks and we investigate this possibility in the next section.
3.3.2. Similarity of 4-Motif Profile
We use the information about the 4-motifs occurrence to obtain a categorization of airlines by using the methodology of
Section 2.4. It is worth recalling here that, given this specific purpose, it is not necessary for us to maintain the information about the specific airports that is present in a motif. In fact, since we are interested in extracting a clusterization of airlines by using the structural information about the 4-motifs, only the isomorphic motifs will be relevant for us.
The result of our analysis is shown in
Figure 8.
The hierarchical tree of
Figure 8 is highly informative with respect to the clustering of groups of airlines. One airline markedly distinct from all others is NetJets Transportes Aéreos, S.A. (NJE). This airline is the only airline of the set providing rental of jets and therefore observing it distinct from all the others indicates that the observed flights of this rental company have 4-motifs that are quite distinct from the ones of all other airlines. An inspection of the hierarchical tree indicates the presence of clusters of airlines presenting a certain similarity among them and a degree of dissimilarity from the other airlines. Here, we wish to comment about some of them. One cluster is the cluster of KLM, Aeroflot (AFL) and Brussels Airlines (BEL). These three airlines are airlines with a single large hub as testified by a Herfindal index very close to 0.25 (see
Table 1). Another cluster comprises Delta Air Lines (DAL), American Airlines (AAL) and United Airlines (UAL). These three airlines are American airlines primarily performing intercontinental flights. A large cluster is composed by Vueling Airlines (VLG), Volotea (VOE), Norwegian Air International (IBK), EasyJet (EZY), Ryanair (RYR), Eurowings (EWG), Wizz Air (WZZ), Norwegian Air Shuttle (NAX) and Scandinavian Airlines (SAS). These are all airlines with several hubs and/or with a
point-to-point business model. Another cluster comprises Royal Air Maroc (RAM), Pegasus (PGT) and Turkish Airlines (THY). These airlines are primarily serving Middle East destinations and airlines are headquartered in Middle East countries. Another distinct cluster comprises Qatar Airways Company Q.C.S.C. (QTR), Austrian Airlines (AUA), European Air Transport Leipzig (BCS) and Lufthansa (DLH). With the exception of Qatar Airways, the airlines of this cluster are all based in central Europe. In fact, Lufthansa and European Air Transport Leipzig are German airlines (Lufthansa is the second largest commercial airline in Europe and European Air Transport Leipzig is the largest cargo company in Europe by number of flights) and Austrian Airlines is a subsidiary of the Lufthansa Group.
3.4. Airline Networks Overlap
It is worth estimating whether similarity between 4-motif occurrences could just be due to overlap between the links of the flight network of airlines. We rule out, to a large extent, this possibility by investigating the degree of overlap between all pairs of airline networks. Our investigation is conducted by estimating the Jaccard measure
between each pair of airline networks
and
. The Jaccard similarity is defined as
where
is the number of directed links appearing in both flight networks, and
is the number of links that appear in at least one of the two networks.
To take into account weekly variability of flight schedules, we have performed this analysis by considering the weekly schedule of each airline. The results obtained at the daily level are showing a degree of similarity of the same order or less. In
Figure 9, we show the average linkage hierarchical tree obtained by using the Jaccard measure as a similarity measure. The hierarchical tree is poorly informative and only a very limited number of small clusters can be highlighted. This is in marked contrast with what we have obtained in the previous section when the similarity measure between airlines was obtained from the analysis of 4-motifs. The hierarchical tree shown in
Figure 9 is representative of hierarchical trees obtained for all weeks of 2017.
By summarizing, we are the first to use the Herfindal index to characterize each airline operating in a given period. Moreover, by using the Herfindal index together with 3- and 4-motif analysis, we are able to achieve an unsupervised classification of airlines, clarifying the main characteristics of each airline.
4. Discussion and Conclusions
In the present study, we have analyzed the structure and dynamics of flight networks of 50 airlines performing most of the flights that occurred in the European airspace in 2017. Our analysis of directed flight networks shows that the degree concentration of the different networks is quite heterogeneous among the different airlines. We have been able to quantify this heterogeneity by using an adapted version of a classic measure of concentration, i.e., the Herfindal index. The Herfindal index provides a simple and reliable estimation of the closeness of the airline network to the reference models classified as hub-and-spoke and point-to-point. It can also be informative about the number of main hubs that are present in a network with a hub-and-spoke structure and multiple hubs. It is worth noting that the European ATS presents a very heterogeneous set of airline companies. In other words, business optimization performed at the level of a single airline generates different business models that eventually coexist in the global system.
The time evolution of the different networks presents a basic time cycle that is a weekly cycle. This basic timescale is evident both from the analysis of the time evolution of the Herfindal index and from the analysis of the time evolution of the occurrence of 3-motifs and 4-motifs. The summer–winter cycle primarily detected in the number of flights occurring daily or weekly does not significantly affect the long-term time evolution of the Herfindal index and the occurrence of 3-motifs and 4-motifs. These indicators are therefore more related to the type of business model followed by the airline than to the specific origin–destination links or number of flights operated in a given time interval.
In summary, an unsupervised classification based on hierarchical clustering and obtained by using a correlation coefficient between the occurrence profile of 4-motifs of airline networks as similarity measure is highly informative with respect to the properties of the different airlines (for example, the number of main hubs, their participation to intercontinental flights, their regional coverage, their nature of commercial, cargo, leisure or rental airline). The 4-motifs are therefore distinctive of the airlines and reflect information about the main determinants of the different airlines. Information is distinct from that originating from the overlap of the same directed links.
Such results indicate that a reliable and effective classification of airlines can be obtained by an unsupervised methodology that only takes into account the information about the airline flights. This is an important result given that, currently, the characterization of airlines and their business model has become a fundamental part of modern air transportation systems. An appropriate airline categorization is important not only for the practitioners but because it also influences the passengers’ perception. An indubitable advantage of our approach is that it is flexible as it may directly reflect any positioning of an airline within the general landscape of airlines, due to any change in its business model as reflected within its flight plans.