**Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A. Fitness and Complexity of Economic sectors**

From the matrices containing *RCAc*,*<sup>p</sup>* time series, described in Section 3.3 we can derive the **M***y* matrix which has entries given by

$$M\_{c,p}^y = \begin{cases} 1 & \text{if } RCA\_{c,p}^y \ge 1, \\ 0 & \text{otherwise} \end{cases} \tag{A1}$$

where *c* represents a country, *p* represents a product (or service), and *y* represents a given year.

This matrix therefore summarises the countries having a comparative advantage at exporting the different products or services in a given year, or not. Two key quantities from the economic complexity literature are defined using this matrix, namely the *fitness* of countries and the *complexity* of products (or services) [17,55]. The intuition behind these quantities is that the higher the fitness of a country the higher its capability of exporting products of high complexity. It is therefore natural for the fitness to be proportional to the weighted sum of the products of which it is a competitive exporter. The definition of the complexity of a product is more subtle. In general terms, the complexity of a product should be inversely proportional to the number of countries exporting it. We should also note that more economically developed countries tend to have a highly diversified export basket, while less economically developed countries tend to have a much more limited diversification in their exports, and focused on low complexity products. Therefore, the upper bound of a product's complexity should be determined by the fitness of the countries' exporting it, with a strong bias towards lower fitness countries: if a product is exported by lower fitness countries, its complexity can not be high. The fitness *Fc* of a country and the complexity *Qp* of a product (or service) are therefore defined using the following set of coupled iterative equations

$$\begin{cases} \begin{aligned} \bar{F}\_{\boldsymbol{c}}^{(n)} &= \sum\_{p} \mathcal{M}\_{\mathcal{C}p} \mathcal{Q}\_{p}^{(n-1)} \\ \bar{Q}\_{p}^{(n)} &= \frac{1}{\sum\_{\boldsymbol{c}} \mathcal{M}\_{\mathcal{C}p} \frac{1}{F\_{\boldsymbol{c}}^{(n-1)}}} \end{aligned} \rightarrow \begin{cases} \begin{aligned} F\_{\boldsymbol{c}}^{(n)} &= \frac{F\_{\boldsymbol{c}}^{(n)}}{\left< \bar{F}\_{\boldsymbol{c}}^{(n)} \right>\_{\boldsymbol{c}}} \\ \mathcal{Q}\_{p}^{(n)} &= \frac{\mathcal{Q}\_{p}^{(n)}}{\left< \bar{Q}\_{p}^{(n)} \right>\_{p}} \end{aligned} \end{cases} \end{cases} \end{cases} \tag{A2}$$

which are iterated until a fixed point is reached [56]. This fixed point has been shown to be stable and not dependent on the initial conditions, which are set to *Q*˜(0) *p* = <sup>1</sup>∀*p* and *F* ˜(0) *c* = 1∀*c* [17]. We use the complexity of products and services in our dataset to calculate an assortativity metric on the network *G* as described in Section 4.2.

It is worth noting that the dataset analysed and similar datasets explored in the economic complexity literature exhibit a nested structure [56]. This nested structure is manifested as a triangular structure in the **M***y* matrices when countries (rows) and sectors (columns) are sorted by their fitness and complexity rank, respectively. This can be seen in Figure A1, which is the **M***y* matrix for the year *y* = 2005.

**Figure A1.** Binary matrix **M**<sup>2005</sup> displaying high *RCAc*,*<sup>p</sup>* values for the year 2005. Blue indicates an entry of one and yellow an entry of zero. The triangular structure of the matrix implies a nestedness in the data.

#### **Appendix B. Confidence and Prediction interval calculations**

The 95% confidence interval around a linear fit *μ*ˆ *y*|*<sup>x</sup>*0 done on *n* data points (*xi*, *yi*) *n* = 1, ... , *n* contains the mean response of new values *μy*|*<sup>x</sup>*0 at a given value *x*0 with a 95% probability. This is given by

$$\left| \left| \hat{\mu}\_{y|\mathbf{x}\_0} - \mu\_{y|\mathbf{x}\_0} \right| \right| \le T\_{n-2}^{975} \theta \sqrt{\frac{1}{n} + \frac{\left(\mathbf{x}\_0 - \boldsymbol{\mathfrak{x}}\right)^2}{\sum\_{i=1}^n \left(\mathbf{x}\_i - \boldsymbol{\mathfrak{x}}\right)^2}} \tag{A3}$$

where *μ*ˆ *y*|*<sup>x</sup>*0 = *a* + *bx*0 is computed from the linear fit, *T*.975 *n*−2 is the 97.5th percentile of the Student's t-distribution with *n* − 2 degrees of freedom and *σ*ˆ is the standard deviation of the residuals in the linear fit given by

$$\mathcal{O} = \sqrt{\sum\_{i=1}^{n} \frac{\left(\mathcal{Y}\_i - \mathcal{Y}\right)^2}{n-2}}.\tag{A4}$$

The 95% prediction interval around a linear fit *y*ˆ0 is the interval within which a new observation, *y*0, at a given value, *x*0, is found, with 95% probability. This is given by

$$|\mathfrak{H}\_0 - \mathfrak{y}\_0| \le T\_{n-2}^{975} \sigma \sqrt{1 + \frac{1}{n} + \frac{\left(\chi\_0 - \mathfrak{x}\right)^2}{\sum\_{i=1}^n \left(\chi\_i - \mathfrak{x}\right)^2}}\tag{A5}$$

where *y*ˆ0 = *a* + *bx*0 is computed from the linear fit. See [57] for a more detailed description.

**Figure A2.** Average correlation matrix **K ¯** sorted by communities *ν* found by maximising modularity.

**Figure A3.** Matrix showing average influence values between products *d*(*p* : *p*) sorted by communities *ν* found by maximising modularity. Entries in white indicate that the average influence of a sector on itself is undefined.

#### **Appendix D. Sector List**

**Table A1.** List of product (HS2007) and service (IMF BP6) sector codes in the analysed dataset.

