*Article* **Søren Johansen and Katarina Juselius: A Bibliometric Analysis of Citations through Multivariate Bass Models**

**Fragiskos Archontakis 1,\* and Rocco Mosconi <sup>2</sup>**


**Abstract:** We showcase the impact of Katarina Juselius and Søren Johansen's contribution to econometrics using bibliometric data on citations from 1989 to 2017, extracted from the Web of Science (WoS) database. Our purpose is to analyze the impact of KJ and SJ's ideas on applied and methodological research in econometrics. To this aim, starting from WoS data, we derived two composite indices whose purpose is to disentangle the authors' impact on applied research from their impact on methodological research. As of 2017, the number of applied citing papers per quarter had not yet reached the peak; conversely, the peak in the methodological literature seem to have been reached around 2000, although the shape of the trajectory is very flat after the peak. We analyzed the data using a multivariate dynamic version of the well known Bass model. Our estimates suggest that the methodological literature is mainly driven by "innovators", whereas "imitators" are relatively more important in the applied literature: this might explain the different location of the peaks. We also find that, in the literature referring to KJ and SJ, the "cross-fertilization" between methodological and applied research is statistically significant and bi-directional.

**Keywords:** bass diffusion model; bibliometrics; cointegration

#### **1. Introduction**

Using bibliometric methods in order to value the quantity and quality of knowledge produced by researchers is increasingly the standard practice in most disciplines (Garfield et al. 1978; Redner 1998). In the field of economics, Kalaitzidakis et al. (1999) provided a ranking of European departments based on ten top journals, which was later updated and expanded to include, amongst others, also a ranking of academic journals in economics (Kalaitzidakis et al. 2003). At the same time, Coupé (2003) published a paper including rankings for researchers based on publications and citations; there, he explicitly mentions the highly-cited work by Søren Johansen and Katarina Juselius on cointegration, stating that "first in the citation ranking is Søren Johansen. Thanks to his top cited papers on cointegration written at the beginning of the 1990s, he is first on the three different citation rankings"(Coupé 2003, p. 1336).

The aim of this paper is to showcase, through a bibliometric analysis, the impact of Katarina Juselius (KJ) and Søren Johansen's (SJ) contribution to the field of econometrics. An important distinctive trait of their scientific production is to combine methodological and applied research, placing their work in the so-called "Pasteur's Quadrant" (Stokes 1997), characterized by use-inspired basic research, where applied objectives are chased in parallel with fundamental scientific creativity. This motivates our main research question: what is the influence of KJ and SJ's work on applied and methodological research in econometrics? We believe that, from this analysis, we can learn something about the mechanisms of scientific discovery in general. Although the methodology used in this paper is different, our analysis has some resemblance to the work by Stigler (1994), who analyzed citation data

**Citation:** Archontakis, Fragiskos, and Rocco Mosconi. 2021. Søren Johansen and Katarina Juselius: A Bibliometric Analysis of Citations through Multivariate Bass Models. *Econometrics* 9: 30. https://doi.org/ 10.3390/econometrics9030030

Academic Editor: In Choi

Received: 22 April 2021 Accepted: 10 August 2021 Published: 12 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

in the journals of statistics and probability, investigating the mechanisms of knowledge diffusion within and across fields. Among other findings, he observed that "there is a tendency for influence to flow from theory to applications to a much greater extent than in the reverse direction" (see Stigler 1994, p. 94). As we will show in this paper, this is, to some extent, confirmed also in the abundant literature inspired by KJ and SJ, although the flow running from applied econometrics toward econometric methodology is also clearly visible in this case. We think that this depends on the peculiar approach to empirical research inspired by KJ and SJ work, "in which data would be allowed to speak freely without being silenced by prior restriction and in which basic hypotheses could be adequately tested and empirically relevant structures estimated"—the quote is taken from Juselius (2021, p. 6), in the same Special Issue of *Econometrics* hosting this paper. This approach requires a continuous dialogue with methodologists, posing to them challenging requests for appropriate statistical models and suitable probability results allowing for correct inference within such models.

Our empirical investigation is based on citation data collected through the Web of Science (WoS) database, based on which we derived two new composite indices whose purpose is to disentangle the citations originated in the applied econometric research from those coming from the methodological research. Our analysis reveals that the majority of citations (about 85%) arise from applied research. Of course, to put this figure into perspective, one should compare it with the share of methodological research in econometrics in general: unfortunately, we do not have this information (our impression is that the share is somewhat lower than 15%). Interestingly, the dynamic pattern of the two indices is quite different: the citation peak in the applied literature does not seem to be reached yet, whereas the peak in the methodological literature seems to have occurred around the turn of the century. To analyze these bibliometric data, we resorted to a multivariate dynamic version of the well known Bass (1969) model, proposed by Boswijk et al. (2009) building on Franses (2003), Boswijk and Franses (2005) and Fok and Franses (2007). Bibliometric evidence suggests that Bass-type models provide a useful way to fit most Nobel in Economics prize winner citation trajectories; see Bjork et al. (2014). This fact might indicate that, up to a point, economic knowledge could follow the well-known product life cycle, which is usually characterized by the following phases: introduction, growth, maturity (including peak) and decline, within the context of a scholar's professional lifetime and beyond. An interesting aspect of Bass models is that they describe the diffusion pattern as dependent on two key parameters, *p* and *q*, measuring the relevance of innovation and imitation, respectively: these two parameters are shown in Min et al. (2018) to have an important role in the growth and decay of citation counts in several scientific disciplines. In this paper, we will show that, according to our estimates, the relative importance of imitative and innovative mechanisms is quite different for methodological and applied econometric research: this seems to be responsible for the different trajectories of the two research strands.

The paper is organized as follows: Section 2 describes the data collection and management process to support the analysis. Section 3 presents the univariate and multivariate Bass model, and Section 4 illustrates our empirical findings. Finally, Section 5 concludes and provides directions for further research.

A word on notation used in the paper. The backshift operator *L* is defined as *LXt* = *Xt*−1, where *Xt* is a time series; the difference operator is defined as Δ = 1 − *L*, so that <sup>Δ</sup>*Xt* <sup>=</sup> *Xt* <sup>−</sup> *Xt*−1. *<sup>I</sup><sup>n</sup>* is the *<sup>n</sup>* <sup>×</sup> *<sup>n</sup>* identity matrix, *<sup>u</sup>n*,*<sup>i</sup>* is the *<sup>i</sup>*-th column of *<sup>I</sup>n*, **1***<sup>n</sup>* = ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *<sup>u</sup>n*,*i*, **<sup>0</sup>***m*,*<sup>n</sup>* is an *<sup>m</sup>* <sup>×</sup> *<sup>n</sup>* matrix of zeros, *diag*{*ai*} is the block diagonal matrix whose generic diagonal block is the matrix *a<sup>i</sup>* (of course any of the *ai*'s could also be a scalar).

#### **2. The Data**

This section describes the line of thought and the data collection process, including the source and sample size, while providing some preliminary analysis through stylized facts.

To the purpose of this study, we consider the scientific production on cointegration by KJ and SJ as an indissoluble whole, where economic questions motivate the development of econometric theory and the development of econometric theory sharpens the economic questions. Their papers on cointegration are, therefore, analyzed together, whether single authored or coauthored and whether the main focus is on methodology (with just an illustrative example) or application (with a pedagogical effort to illustrate how the methodology can be applied to a real problem).

On 9 April 2018, we collected from the Web of Science (WoS) the data about the citations received by KJ and SJ for papers between 1989:Q1 and 2017:Q3.<sup>1</sup> For practical reasons, we limit the analysis to the 10 most quoted papers, which are presented in Table 1, sorted by publication date.


**Table 1.** The 10 "Top Cited" papers by S. Johansen and K. Juselius, in chronological order (data collected on 9 April 2018).

The total number of citations received by the top ten papers amounted at that time to 10,453, whereas the number of citing papers was 6457, <sup>2</sup> so that every citing paper cites, on average, 1.62 papers, with a maximum of 7 observed four times.3 In terms of the number of citations, well ahead of the rest of the publications, are the papers by Johansen (1988) in the *Journal of Economic Dynamics & Control* with 4008 citations, the joint paper by Johansen and Juselius (1990) in the *Oxford Bulletin of Economics & Statistics* with 2567 citations, and the paper by Johansen (1991) in *Econometrica* with 2256 citations. For each paper, the last column in Table 1, "new citations", indicates the number of citing papers referring to that paper and to none of the earlier ones: for example, paper number 7 is cited by 251 papers, but only 69 of them cite paper number 7 and none of the earlier. A high "new citations"/citations ratio suggests that the paper has broken new ground in the field: for example, the paper by Hendry and Juselius (2001) treats data from the field of energy, and as a result, energy-related papers often cite Hendry and Juselius (2001), rather than the earlier papers. Notice that the first three papers account for 84.5% of the citations and 93.9% of the citing papers.

To avoid double counting, we focus on the number of citing papers rather than on the number of citations,<sup>4</sup> and we define by *ct* the number of citing papers published in quarter *t* (*t* ranges from 1989:Q1 to 2017:Q3, i.e., 115 quarters). Based on the WoS data, we split *ct* into two composite indices aimed at measuring the impact of KJ and SJ ideas on applied and methodological econometric research, respectively.5 To this aim, we have analyzed each of the 6457 citing papers, classifying them according to their methodological or applied nature. The classification is essentially based on the title of the citing paper.6 We adopted the following classification:

• Purely applied (PA) papers: the title refers to an application, with no reference to an econometric method, technique or issue. We have found *nPA* = 4198 such papers.


We have, therefore, derived four quarterly time series, labeled *cPA*,*t*, *cMA*,*t*, *cPM*,*<sup>t</sup>* and *cMM*,*t*, counting the citing papers of each group in each quarter; of course, *ct* = *cPA*,*<sup>t</sup>* + *cMA*,*<sup>t</sup>* + *cPM*,*<sup>t</sup>* + *cMM*,*t*. The four time series are reported in Figure 1, where one can observe that the behavior of *cPA*,*<sup>t</sup>* and *cMA*,*<sup>t</sup>* is quite similar, steadily increasing over time, with some low frequency fluctuations, which seem to be shared by both series (the correlation is 69.5%). Conversely, *cPM*,*<sup>t</sup>* has a peak around the year 2000 with about 10 papers per quarter, and then it declines until 2005, seeming to stabilize at around 5 papers per quarter. The series *cMM*,*<sup>t</sup>* is irregular, due to the small number of MM papers, but resembles *cPM*,*<sup>t</sup>* to some extent, as it shows a higher frequency around the year 2000; then, the frequency seems to slightly decline.

**Figure 1.** Time series plot of *cPA*,*t*, *cMA*,*t*, *cPM*,*<sup>t</sup>* and *cMM*,*t*, quarterly data from 1989:1 to 2017:3.

Finally, by combining the four series with suitable weights, we obtained two composite indicators, whose purpose is to measure the impact of KJ and SJ ideas on applied (*c*1,*t*) and methodological (*c*2,*t*) research:

$$
\varepsilon\_{1,t} = -\varepsilon\_{PA,t} + \omega \upsilon\_{MA,t} + (1 - \omega)\varepsilon\_{MM,t} \tag{1}
$$

$$
\varepsilon\_{2,t} = -\varepsilon\_{PM,t} + \omega \upsilon\_{MM,t} + (1-\omega)\upsilon\_{MA,t} \tag{2}
$$

Of course *c*1,*<sup>t</sup>* + *c*2,*<sup>t</sup>* = *ct* for any *ω* by construction. Composite indicators have several pros and cons, as illustrated for example in Nardo et al. (2008) and Kuc-Czarnecka et al. (2020): they allow to summarize complex, multi-dimensional realities, reducing the dimensionality. On the other hand, they might simplify too much, and, even more importantly, the selection of indicators and weights could be the subject of dispute. It is, therefore, important to motivate clearly one's weighting choice and to provide an extensive sensitivity analysis. We provide a thorough discussion of both aspects in Appendix B. In short, the baseline results presented in this paper are based on *ω* = 0.85. This choice is motivated by two main reasons: (i) *ω* should be in the range from 0.5 to 1, extremes excluded, since the papers classified as MA (or MM) should contribute mainly (*ω* > 1/2) to the applied (or methodological) index but also, to a lesser extent (1 − *ω* > 0), to the methodological (or applied) index; (ii) *ω* = 0.85 would be approximately equal to *nPA*/(*nPA* + *nPM*) = 0.8545—in practice, this corresponds to the assumption that the share of "applied research" of an MA paper is similar, on average, to the share of applied research in the econometric literature referring to KJ and SJ papers in general. Notice, however, that, as illustrated in Appendix B, the main results of our econometric analysis are robust to the choice of *ω* in the range from 0.5 to 1.

In order to fix ideas, we provide a short example based on the first few citations in the WoS data. For the year 1989, we have to split the only two existing citations by Baillie and Bollerslev (classified as MA) and Gilbert (classified as PM).<sup>7</sup> Thus, for this example we have obtained the series illustrated in Table 2.


**Table 2.** Illustration of the classification scheme: citations in 1989.

The cumulative applied index at time *T* = 115, i.e., 2017:Q3, is equal to the following:

$$\begin{array}{rcl} \mathbb{C}\_{1,T} &=& \sum\_{t=1}^{T} (c\_{PA,t} + \omega c\_{MA,t} + (1 - \omega)c\_{MM,t}) \\ &=& 4198 + 0.85 \times 1451 + 0.15 \times 92 = 5445.2 \end{array}$$

while the cumulative methodological index will be equal to the following:

$$\begin{array}{rcl} \mathcal{C}\_{2,T} &=& \sum\_{t=1}^{T} (\mathcal{c}\_{PM,t} + \omega \mathcal{c}\_{MM,t} + (1 - \omega) \mathcal{c}\_{MA,t})\\ &=& 716 + 0.85 \times 92 + 0.15 \times 1451 = 1011.8. \end{array}$$

This shows that the majority of citations originates from applied research: defining *Ct* <sup>=</sup> *<sup>C</sup>*1,*<sup>t</sup>* <sup>+</sup> *<sup>C</sup>*2,*t*, the ratio *<sup>C</sup>*1,*<sup>T</sup> CT* is 84.4%, whereas *<sup>C</sup>*2,*<sup>T</sup> CT* is 15.6%. To check the appropriateness of our classification scheme, we analyzed how these ratios vary by publishing journal. Tracking down the 6457 citing papers, we obtained from the WoS database that they appeared in 696 distinct journals. Table 3 provides the ranked list of the top 20 journals by the number of citing papers: these journals hosted 2676 citing papers, i.e., 41.4%.

The evidence in Table 3 seems to confirm the validity of our classification: the average *c*1,*<sup>t</sup>* for the papers that appeared in mainly applied journals (for example, *Energy Policy*, *Journal of International Money and Finance*, *Journal of Policy Modelling*) is above 90%. To the other extreme, the average *c*2,*<sup>t</sup>* is above 90% for the *Journal of Econometrics* and for *Econometric Theory* (but also for *Econometrica*, which hosted 24 citing papers). Other journals, such as *Oxford Bulletin of Economics and Statistics*, *Journal of Applied Econometrics*, *Journal of Forecasting* are more balanced, with an average *c*2,*<sup>t</sup>* around 50%. We believe that the evidence in Table 3 supports the idea that classifying based on the title and the abstract is more accurate than classifying based on the publishing journal.

The time series *c*1,*<sup>t</sup>* and *c*2,*<sup>t</sup>* are illustrated in Figure 2. The plot shows some evidence of a "second wind" especially in the applied index *c*1,*<sup>t</sup>* but to some extent also in the methodological index *c*2,*t*: both series seem to have a peak around 1998, after which they start decreasing very slowly, but around 2004 the citations start increasing again, especially for the applied research index, whereas the references found in methodological papers remain rather steady. A possibility/conjecture is that the second wind was triggered by the 2003 Nobel Prize in Economics, which popularized the concept of cointegration in a wider variety of scientific disciplines. The trajectory of *c*1,*<sup>t</sup>* resembles the cases of Friedrich Hayek, referred to in Bjork et al. (2014) as "bi-modal", whereas the trajectory of *c*2,*<sup>t</sup>* resembles more closely the cases of Kenneth Arrow and Milton Friedman, called "staying power" in Bjork et al. (2014). Boswijk et al. (2010) also claim the same with different wording: they report evidence of a "second life" for the famous Engle and Granger (1987) paper in *Econometrica* after the authors were awarded the Nobel prize in 2003, an event which is likely to have revamped the interest in the work of KJ and SJ as well.


**Table 3.** The top 20 journals supplying citations to S. Johansen and K. Juselius' works.

**Figure 2.** The composite citation indices *c*1,*<sup>t</sup>* (thick red, left scale) and *c*2,*<sup>t</sup>* (thin blue, right scale), quarterly data from 1989:1 to 2017:3.

#### **3. The Bass Diffusion Model**

The Bass diffusion model (Bass 1969) is widely used in many fields. Originally developed for marketing applications, the model has since been adopted also in other fields, such as the analysis of the diffusion of technological innovation (see Guseo and Guidolin 2008), bibliometric analysis (see Bjork et al. 2014) and epidemiology (see Eryarsoy et al. 2021).

The continuous time Bass model assumes a population of *m* potential adopters. Let us define by *t* > 0 the time of adoption of a randomly picked potential adopter: *t* is therefore a random variable. Define by *f*(*t*) the corresponding density, and by *F*(*t*) = *<sup>t</sup>* <sup>0</sup> *f*(*u*)*du* the cumulative density function, i.e., the probability that adoption occurs before *t*. Notice that the expected number of adopters at time *t* is given by the following:

$$
\mathcal{C}(t) = mF(t) \tag{3}
$$

and the corresponding "adoption intensity" is given by the following:

$$
\ddot{\varepsilon}(t) = mf(t). \tag{4}
$$

Bass assumes that the hazard rate *<sup>f</sup>*(*t*) <sup>1</sup>−*F*(*t*) is a linear function of the expected number of previous adopters:

$$\frac{f(t)}{1 - F(t)} = p + qF(t),\tag{5}$$

where *q* is defined as the "imitation parameter" (or internal influence, or word-of-mouth effect) since it represents the idea that some potential adopters (imitators) tend not to adopt initially, but are more likely to adopt when the innovation is widespread. Conversely, *p* is defined as the "innovation parameter" (or external influence or advertising effect) since it represents the idea that some potential adopters (innovators) decide to adopt the innovation regardless of the level of diffusion. It is interesting to observe that when *q* = 0, Equation (5) implies a constant hazard, and therefore the Bass model collapses into the exponential distribution. In other words, in the absence of imitators, the adoption peak, as in the exponential distribution, would occur at the beginning of the process.8

Using (3) and (4), the differential Equation (5) can be rewritten as follows:

$$
\dot{\mathcal{L}}(t) = mp + (q - p)\mathcal{C}(t) - \frac{q}{m}\mathcal{C}(t)^2. \tag{6}
$$

The solution to (6) with *C*¯(0) = 0 is the following:

$$\mathcal{C}(t) = m \frac{1 - e^{-(p+q)t}}{1 + \frac{q}{p}e^{-(p+q)t}} \, \tag{7}$$

so that

$$\bar{\varepsilon}(t) = \frac{\partial \bar{\mathbb{C}}(t)}{\partial t} = m \frac{p(p+q)^2 e^{-(p+q)t}}{\left(p + q e^{-(p+q)t}\right)^2}.\tag{8}$$

Starting from the latter equation, one can easily find the timing of the adoption peak *t<sup>P</sup>* (i.e., the inflection point of the diffusion curve), the corresponding peak *c*¯ *<sup>P</sup>*, and the level of adoption at the peak *C*¯*P*:

$$t^P \quad = \quad \frac{1}{p+q} \ln \left(\frac{q}{p}\right) , \tag{9}$$

$$
\overline{x}^P \quad = \; \overline{x} \left( t^P \right) = m \frac{\left( p + q \right)^2}{4q},
\tag{10}
$$

$$\mathbb{C}^{P} \quad = \quad \mathbb{C} \binom{p}{t} = m \frac{q-p}{2q}. \tag{11}$$

Formula (9) shows that the location of the peak depends on the innovation parameter *p* and the imitation parameter *q* through the sum (*p* + *q*) and the ratio *<sup>q</sup> <sup>p</sup>* : as clear in Formula (5), when either innovators or imitators or both are very active so that the sum (*p* + *q*) is large, then the hazard is large, which leads to a rapid exhaustion of the population at risk and therefore to an early peak.

#### *3.1. Bass Discrete Time Model*

A number of estimation procedures have been proposed to estimate the parameters *m*, *p* and *q* (see for example Satoh 2001). Bass (1969) suggested a simple estimation strategy based on Ordinary Least Squares (OLS) applied to a discretized version of (6) where essentially the expected adoption stock *C*¯(*t*) and the expected adoption flow *c*¯(*t*) are replaced by the observed counterpart *Ct* and *ct* = *Ct* − *Ct*−1, and an error term is added. This leads to the following:

$$\mathbf{c}\_{t} = mp + (q - p)\mathbf{C}\_{t-1} - \frac{q}{m}\mathbf{C}\_{t-1}^{2} + \boldsymbol{\mu}\_{t}.\tag{12}$$

In the standard discrete time Bass model, *ut* is assumed to be *iidN* 0, *σ*<sup>2</sup> , so that OLS is the natural candidate for estimation. To apply OLS, (12) is then reparameterized as follows:

$$\mathbf{c}\_{t} = \beta \mathbf{o}\_{0} + \beta\_{1} \mathbf{C}\_{t-1} + \beta\_{2} \mathbf{C}\_{t-1}^{2} + \boldsymbol{\mu}\_{t}. \tag{13}$$

The "reduced form" parameters *β* = (*β*0, *β*1, *β*2) are related to the "structural form" parameters *θ* = (*m*, *p*, *q*) by the following:

$$\begin{array}{rcl} \beta\_0 & = & mp\_{\prime} \\ \beta\_1 & = & q - p\_{\prime} \\ \beta\_2 & = & -\frac{q}{m} . \end{array} \tag{14}$$

and these relations can be inverted:9

$$\begin{array}{rcl} m & = & \frac{-\beta\_1 - \sqrt{\beta\_1^2 - 4\beta\_0\beta\_2}}{2\beta\_2}, \\ p & = & \frac{\beta\_0}{m} = \frac{-\beta\_1 + \sqrt{\beta\_1^2 - 4\beta\_0\beta\_2}}{2}, \\ q & = & -m\beta\_2 = \frac{\beta\_1 + \sqrt{\beta\_1^2 - 4\beta\_0\beta\_2}}{2}. \end{array} \tag{15}$$

Assuming that *ut* is uncorrelated, homoskedastic and normal, ML estimates of the parameters vector, say *β***ˆ**, can be obtained by OLS, and the corresponding variance–covariance matrix **Σˆ** *<sup>β</sup>*<sup>ˆ</sup> can be obtained as usual.10 Replacing *<sup>β</sup>***<sup>ˆ</sup>** in (15) instead of *<sup>β</sup>* gives *<sup>θ</sup>***<sup>ˆ</sup>** <sup>=</sup> *<sup>θ</sup>*(*β***ˆ**). Defining by

$$I\_{\theta,\theta} = \frac{\partial\theta}{\partial\theta'} $$

and using the delta method, the variance–covariance matrix associated to *θ***ˆ** is given by the following:

$$
\mathfrak{L}\_{\varnothing} = f\_{\theta,\theta} \mathfrak{L}\_{\varnothing} f'\_{\theta,\theta'} \tag{16}
$$

where **ˆ** *Jθ***.***<sup>β</sup>* is the estimated counterpart of *Jθ***.***β*. Tedious computation shows the following:

$$J\_{\theta,\mathfrak{F}} = \frac{\partial\theta}{\partial\mathcal{J}'} = \frac{1}{\delta\_{\beta}} \begin{bmatrix} 1 & -\frac{\beta\_1 + \delta\_{\beta}}{2\beta\_2} & \frac{\beta\_1(\beta\_1 + \delta\_{\beta}) - 2\beta\_0\beta\_2}{2\beta\_2^2} \\ -\beta\_2 & \frac{\beta\_1 - \delta\_{\beta}}{2} & -\beta\_0 \\ -\beta\_2 & \frac{\beta\_1 + \delta\_{\beta}}{2} & -\beta\_0 \end{bmatrix} \tag{17}$$

where *δβ* = *β*2 <sup>1</sup> <sup>−</sup> <sup>4</sup>*β*0*β*2. **<sup>ˆ</sup>** *<sup>J</sup>θ*.*<sup>β</sup>* is therefore obtained by replacing *<sup>β</sup>***<sup>ˆ</sup>** in (17).

We remark that when one considers *n* "seemingly unrelated" equations such as (13), i.e.,

$$\mathbf{c}\_{i,t} = \beta\_{0,i} + \beta\_{1,i}\mathbf{C}\_{i,t-1} + \beta\_{2,i}\mathbf{C}\_{i,t-1}^2 + \boldsymbol{u}\_{i,t}, \qquad i = 1, \ldots, n,\tag{18}$$

and the variance–covariance matrix of *u<sup>t</sup>* = (*u*1,*t*, ..., *un*,*t*) , say **Ω***u*, is not diagonal, then equation by equation OLS is no longer equivalent to ML. In this case, the likelihood can be maximized by iterated Seemingly Unrelated Regression Equations (SURE), obtaining *β***ˆ** *<sup>i</sup>* <sup>=</sup> *β*ˆ 0,*i*, *β*ˆ 1,*i*, *β*ˆ 2,*i* , *i* = 1, ..., *n*, **Ωˆ** *<sup>u</sup>*, and the variance–covariance matrix of *β***ˆ** = *β***ˆ** <sup>1</sup>, ..., *<sup>β</sup>***<sup>ˆ</sup>** *n* , i.e.,

$$
\mathfrak{T}\_{\beta} = \left[ \begin{array}{ccccc}
\mathfrak{T}\_{\beta\_1} & \cdots & \mathfrak{T}\_{\beta\_n}' \\
\vdots & \ddots & \vdots \\
\mathfrak{T}\_{\beta\_n, \beta\_1} & \cdots & \mathfrak{T}\_{\beta\_n} \\
\end{array} \right].
$$

Then, applying (15) and (16) to each pair (*β***ˆ** *i* , **Σˆ** *β*ˆ *i* ) it is easy to obtain the ML estimates of the structural parameters *θ<sup>i</sup>* = (*mi*, *pi*, *qi*) and the associated variance–covariance matrices.

#### *3.2. Boswijk and Franses Model*

Boswijk and Franses (2005), henceforth BF, emphasize two major problems in the model (13):


To deal with the first problem, they propose the following alternative model:11

$$
\Delta \mathbf{c}\_{t} = a \left( \mathbf{c}\_{t-1} - mp - (q - p)\mathbf{C}\_{t-1} + \frac{q}{m} \mathbf{C}\_{t-1}^{2} \right) + u\_{t}. \tag{19}
$$

To understand the relationship between (12) and (19) it is interesting to observe that, adding and subtracting *α* <sup>−</sup>(*<sup>q</sup>* <sup>−</sup> *<sup>p</sup>*)*Ct*−<sup>2</sup> <sup>+</sup> *<sup>q</sup> <sup>m</sup> <sup>C</sup>*<sup>2</sup> *t*−2 to the right hand side of (19), and rearranging, one obtains the following:

$$\Delta \mathbf{c}\_{t} = \mathbf{a} \left( \mathbf{c}\_{t-1} - mp - (q - p) \mathbf{C}\_{t-2} + \frac{q}{m} \mathbf{C}\_{t-2}^{2} \right) - \mathbf{a} \left( (q - p) \Delta \mathbf{C}\_{t-1} - \frac{q}{m} \Delta \mathbf{C}\_{t-1}^{2} \right) + u\_{t}. \tag{20}$$

To interpret (20), define the following:

$$\mathbf{c}\_{t}^{\*} = mp + (q - p)\mathbf{C}\_{t-1} - \frac{q}{m}\mathbf{C}\_{t-1}^{2}\mathbf{c}\_{t}$$

and notice that *c*∗ *<sup>t</sup>* is the expected value or *ct* according to the Bass discrete time model (12). Using the notation *c*∗ *<sup>t</sup>* , (20) can be rewritten as the following:

$$
\Delta \mathbf{c}\_{t} = \kappa \left( \mathbf{c}\_{t-1} - \mathbf{c}\_{t-1}^{\*} \right) - \alpha \Delta \mathbf{c}\_{t}^{\*} + u\_{t}. \tag{21}
$$

The parameter *α* is expected to be negative. The first term in (21), i.e., *α ct*−<sup>1</sup> − *c*<sup>∗</sup> *t*−1 , can be thought of as an Error-Correction Mechanism: if *ct*−<sup>1</sup> = *c*<sup>∗</sup> *<sup>t</sup>*−1, the ECM is ineffective; if instead *ct*−<sup>1</sup> > *c*<sup>∗</sup> *<sup>t</sup>*−1, then the ECM term partly corrects the disequilibrium by reducing *ct* with respect to *ct*−1; conversely, if *ct*−<sup>1</sup> < *c*<sup>∗</sup> *<sup>t</sup>*−1, then, through the negative *<sup>α</sup>*, *ct* will increase with respect to *ct*−1. The second term in (21), i.e., −*α*Δ*c*<sup>∗</sup> *<sup>t</sup>* , can be thought of as a "Target Seeking" Mechanism, which induces dynamics in *ct*, even if *ct*−<sup>1</sup> = *c*<sup>∗</sup> *<sup>t</sup>*−<sup>1</sup> and *ut* <sup>=</sup> 0: in fact Δ*c*∗ *<sup>t</sup>* will be zero when <sup>Δ</sup>*Ct*−<sup>1</sup> (and therefore <sup>Δ</sup>*C*<sup>2</sup> *<sup>t</sup>*−1) is zero, which happens when the target level *m* is reached and therefore *Ct*−<sup>1</sup> = *Ct*−<sup>2</sup> = *m*. Another viewpoint on the BF model, seen as an AR(2) model for *Ct* with state dependent parameters is given in Appendix A.

It is important to remark that the standard Bass model (12) is a special case of (19) with *α* = −1, so that one can set up a test *H*<sup>0</sup> : *α* = −1 to decide which model is preferable. The interpretation of the parameters *m*, *p* and *q* is exactly the same in both models since (19) is a generalized version of the original Bass model, where the "adjustment intensity", instead of being fixed at −1, is represented by the unrestricted parameter *α*. For example, when *α* = −0.5, only half of the disequilibrium observed at the end of a time unit is adjusted within the subsequent time unit: this gives rise to some persistence in the disequilibrium.

To deal with the second problem (heteroskedasticity), BF propose to model *ut* as the following:

$$u\_t = \mathfrak{c}\_{t-1}^{\phi} \varepsilon\_{t\prime} \tag{22}$$

where *ε<sup>t</sup>* is assumed to be uncorrelated and homoskedastic with variance *σ*2, so that the variance of *ut* is assumed to be proportional to *c* 2*φ <sup>t</sup>*−1; the authors do not consider *<sup>φ</sup>* as a parameter to be estimated, but they rather fix it heuristically to either 1/2 or 1, finding that 1/2 is preferable in their application. In the application, we will use the residuals of the homoskedastic model to test for homoskedasticity vs. heteroskedasticity of the proposed type.

Model (19) can be reparameterized in different ways:

$$
\Delta \mathbf{c}\_{t} = \left( \mathbf{c}\_{t-1} - \beta\_0 - \beta\_1 \mathbf{C}\_{t-1} - \beta\_2 \mathbf{C}\_{t-1}^2 \right) + u\_{t\prime} \tag{23}
$$

$$
\sigma\_t = \|\alpha\mathbf{c}\_{t-1} + \gamma\_0 + \gamma\_1 \mathbf{C}\_{t-1} + \gamma\_2 \mathbf{C}\_{t-1}^2 + \boldsymbol{\mu}\_t. \tag{24}
$$

The parametrization (24) is suited for estimation, either with OLS when *ut* is assumed to be uncorrelated and homoskedastic, or by WLS (dividing left and right by *c φ <sup>t</sup>*−1), if *ut* is assumed to follow (22). Conversely, the parametrization (23) is useful because the parameters in *β* = (*β*0, *β*1, *β*2) are related to the parameters *θ* = (*m*, *p*, *q*) as in (15): therefore if we obtain estimates of *β* and **Σ***β***ˆ**, we can map them into estimates of *θ* and **Σ***θ***<sup>ˆ</sup>** using (15) and (16) directly.12

Let us define *<sup>γ</sup>* <sup>=</sup> (*γ*0, *<sup>γ</sup>*1, *<sup>γ</sup>*2). Assuming that *ut* <sup>∼</sup> *iidN* 0, *σ*<sup>2</sup> , ML estimates of *π* = (*α*, *γ* ) can be obtained by OLS in (24), obtaining *α*ˆ, *γ***ˆ** and the corresponding variance– covariance matrix:

$$
\mathfrak{L}\_{\mathfrak{A}} = \left[ \begin{array}{cc} \mathfrak{d}\_{\mathfrak{a}}^2 & \mathfrak{L}'\_{\mathfrak{f},\mathfrak{a}} \\ \mathfrak{L}\_{\mathfrak{f},\mathfrak{a}} & \mathfrak{L}\_{\mathfrak{f}} \end{array} \right].
$$

Notice that *<sup>β</sup>* <sup>=</sup> <sup>−</sup><sup>1</sup> *<sup>α</sup>γ*; therefore, ML estimates of *β* are given by the following:

$$
\hat{\mathcal{B}} = -\frac{1}{\hbar}\hat{\gamma}.\tag{25}
$$

We then obtain the following:

$$J\_{\mathfrak{f}\mathfrak{a}\cdot\pi} = \frac{\partial \mathcal{B}}{\partial \pi'} = \frac{1}{a^2} \begin{bmatrix} \gamma\_0 & -a & 0 & 0\\ \gamma\_1 & 0 & -a & 0\\ \gamma\_2 & 0 & 0 & -a \end{bmatrix} = \frac{1}{a^2} (\gamma\_\prime - aI\_3)\_\prime$$

and, using the delta method, we have the following:

$$\mathfrak{L}\_{\mathfrak{F}} = f\_{\mathfrak{f},\pi} \mathfrak{L}\_{\hat{\pi}} f'\_{\mathfrak{f},\pi} = \frac{1}{\hbar^2} \left( \frac{\hat{\gamma}\gamma'}{\hbar^2} \hat{\sigma}\_a^2 - \frac{\hat{\mathsf{L}}\_{\hat{\mathsf{f}},\mathsf{A}} \hat{\gamma}' + \hat{\gamma}\underline{\mathfrak{L}}'\_{\hat{\mathsf{f}},\mathsf{A}}}{\hbar} + \mathfrak{L}\_{\hat{\mathsf{f}}} \right), \tag{26}$$

where **ˆ** *<sup>J</sup>β***.***<sup>π</sup>* is the estimated counterpart of *<sup>J</sup>β***.***π*. Starting from (25) and (26) one can obtain *<sup>θ</sup>***<sup>ˆ</sup>** and **Σˆ** *<sup>θ</sup>***<sup>ˆ</sup>** using (15) and (16). In particular, replacing **<sup>Σ</sup><sup>ˆ</sup>** *<sup>β</sup>***<sup>ˆ</sup>** = **<sup>ˆ</sup>** *Jβ***.***<sup>π</sup>* **Σˆ** *<sup>π</sup>***<sup>ˆ</sup> ˆ** *J <sup>β</sup>***.***<sup>π</sup>* in (16), one obtains the following:

$$
\hat{\Sigma}\_{\emptyset} = \hat{f}\_{\theta,\emptyset} \hat{\Sigma}\_{\emptyset} \hat{f}\_{\theta,\emptyset}{}^{\prime} = \hat{f}\_{\theta,\emptyset} \hat{f}\_{\emptyset,\pi} \hat{\Sigma}\_{\hat{\pi}} \hat{f}\_{\emptyset,\pi}{}^{\prime} \hat{f}\_{\theta,\emptyset}{}^{\prime}.
$$

Additionally, in this case, when one considers *n* "seemingly unrelated" equations such as (24), i.e.,

$$\mathbf{c}\_{i,t} = \mathbf{a}\_{i}\mathbf{c}\_{i,t-1} + \gamma\_{0,i} + \gamma\_{1,i}\mathbf{C}\_{i,t-1} + \gamma\_{2,i}\mathbf{C}\_{i,t-1}^{2} + \boldsymbol{\mu}\_{i,t}, \qquad \mathbf{i} = 1, \ldots, n,\tag{27}$$

and the variance–covariance matrix of *u<sup>t</sup>* = (*u*1,*t*, ..., *un*,*t*) , for example **Ω***u*, is not diagonal, then equation by equation OLS is no longer equivalent to ML. In this case, the likelihood can be maximized by iterated SURE, obtaining *π***ˆ** *<sup>i</sup>* = (*αi*, *γ*ˆ0,*i*, *γ*ˆ1,*i*, *γ*ˆ2,*i*) , *i* = 1, ..., *n*, **Ωˆ** *<sup>u</sup>*, and the variance-covariance matrix of *π***ˆ** = *π***ˆ** 1, ..., *π***ˆ** *n* , i.e.,

$$
\mathfrak{L}\_{\mathfrak{A}} = \left[ \begin{array}{cccc}
\mathfrak{L}\_{\mathfrak{A}\_1} & & \dots & \mathfrak{L}\_{\mathfrak{A}\_n, \mathfrak{A}\_1}^{\prime} \\
\vdots & \ddots & & \vdots \\
\mathfrak{L}\_{\mathfrak{A}\_n, \mathfrak{A}\_1} & \dots & \mathfrak{L}\_{\mathfrak{A}\_n}^{\prime}
\end{array} \right].
$$

Then, starting from each pair *<sup>π</sup>***<sup>ˆ</sup>** *<sup>i</sup>*, **<sup>Σ</sup><sup>ˆ</sup>** *<sup>π</sup>***<sup>ˆ</sup>** *<sup>i</sup>* one can obtain the ML estimates of the structural parameters *θ<sup>i</sup>* = (*mi*, *pi*, *qi*) and the associated variance–covariance matrices as illustrated above.

As for the asymptotic properties of ML estimates of the structural parameters, Boswijk and Franses (2005) prove that *m*ˆ is consistent in *T* (as the time span increases *CT* ideally coincides with *m*), whereas *p*ˆ and *q*ˆ are not; moreover, they show that the asymptotic distribution cannot be proved to be normal. However, they demonstrate with an extensive simulation that when the frequency is allowed to go to infinity along with the time span, then *m*ˆ , *p*ˆ and *q*ˆ are essentially unbiased and asymptotically normal; they also show that this is approximately valid, even with a fixed time span, at least if it includes the inflection point ln *<sup>q</sup>*−ln *<sup>p</sup> <sup>p</sup>*+*<sup>q</sup>* . In other words, if the observed time span includes the inflection point and the sampling frequency is reasonably high, their results suggest that using the standard normal and the *χ*<sup>2</sup> for making inference on the parameters is a reasonable approximation.

#### *3.3. Boswijk et al. Multivariate Model*

Boswijk et al. (2009), henceforth BFF, propose a multivariate generalization of (19). The BFF model is made up of *n* equations, and can be written as follows:

$$\Delta c\_{i,t} = \sum\_{j=1}^{n} a\_{ij} \left( c\_{j,t-1} - (p\_j + q\_j) \mathbf{C}\_{j,t-1} + \frac{q\_j}{m\_j} \mathbf{C}\_{j,t-1}^2 - p\_j m\_j \right) + u\_{i,t}, \qquad i = 1, \ldots, n,\tag{28}$$

where, in a simplified homoskedastic version of the model, we might assume that *ut* = [*u*1,*t*, ..., *un*,*t*] ∼ *iidNn*(**0**, **Ω**). <sup>13</sup> Along the lines of the BF model, (28) may be reparametrized as follows:

$$\Delta c\_{i,t} = \sum\_{j=1}^{n} a\_{i\bar{j}} \left( \mathfrak{F}\_{\bar{j}}^{\prime} X\_{j,t-1} - \beta\_{0\bar{j}} \right) + u\_{it\prime} \qquad \qquad i = 1, \ldots, n,\tag{29}$$

with

$$\mathbf{X}'\_{jt} = \left[ \mathbf{c}\_{j,t\prime} \mathbf{C}\_{j,t\prime} \mathbf{C}\_{j,t}^2 \right], \quad \mathbf{f}'\_{j} = \left[ 1, -\beta\_{1j\prime} - \beta\_{2j} \right].$$

or, more compactly,

$$\mathbf{Y}\_t = \mathfrak{a}\mathfrak{F}\mathbf{X}\_{t-1} + \mathfrak{u}\_{t\prime} \tag{30}$$

where

$$\begin{array}{rclclcl}\mathbf{Y}\_{\mathbf{f}} &=& \begin{bmatrix} \Delta c\_{1,t} \\ \vdots \\ \Delta c\_{n,t} \end{bmatrix}, & \mathbf{X}\_{\mathbf{f}} &=& \begin{bmatrix} X\_{1,t} \\ \vdots \\ X\_{n,t} \end{bmatrix}, & \boldsymbol{u}\_{\mathbf{f}} &=& \begin{bmatrix} u\_{1,t} \\ \vdots \\ u\_{n,t} \end{bmatrix}, \\\\ \mathbf{a}\_{n\times n} &=& \begin{bmatrix} \boldsymbol{\alpha}\_{11} & \cdots & \boldsymbol{\alpha}\_{1n} \\ \vdots & \ddots & \vdots \\ \boldsymbol{\alpha}\_{n1} & \cdots & \boldsymbol{\alpha}\_{nn} \end{bmatrix}, & \boldsymbol{\beta}\_{\mathbf{f}} &=& \begin{bmatrix} \boldsymbol{\beta}\_{1} & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & \boldsymbol{\beta}\_{n} \\ \boldsymbol{\beta}\_{01} & \cdots & \boldsymbol{\beta}\_{0n} \end{bmatrix}. \end{array}$$

Since this paper's main goal is to celebrate Søren Johansen and Katarina Juselius, it is nice to remark that, apart from the exclusion restrictions in *β*, and the fact that the rank of *αβ* is actually full, (30) has the mathematical form of the "reduced rank regression" popularized by Søren and Katarina; therefore, in estimating and interpreting the model, we can benefit directly from the results inspired by their work, in particular Hansen (2003). Notice that the (exclusion) restrictions on the matrix *β* can be written as the following:

$$\text{vec}(\mathcal{J}) = H\_{\mathcal{J}} \mathfrak{p}\_{\mathcal{J}} + \mathfrak{h}\_{\mathcal{J}} \tag{31}$$

for suitable restriction matrices *H<sup>β</sup>* and *hβ*. <sup>14</sup> It might be also interesting to consider restrictions on *α* of the following type:

$$\text{vec}(\mathfrak{a}) = H\_{\mathfrak{a}} \mathfrak{p}\_{\mathfrak{a}'} \tag{32}$$

for example to test the hypothesis that the matrix *α* is diagonal, under which (28) would collapse into *n* "seemingly unrelated" BF equations such as (19).15 Of course, when *α* is unrestricted, we have that *<sup>H</sup><sup>α</sup>* <sup>=</sup> *<sup>I</sup>n*<sup>2</sup> and *<sup>ϕ</sup><sup>α</sup>* <sup>=</sup> vec(*α*).

Assuming that *<sup>u</sup><sup>t</sup>* <sup>∼</sup> *iid*<sup>N</sup> (**0**, **<sup>Ω</sup>**), the log-likelihood function is given by -(*ϕ*, **Ω**) = −*T* 2 *<sup>n</sup>* ln(2*π*) <sup>+</sup> ln|**Ω**<sup>|</sup> <sup>+</sup> tr(**Ω**−1*Muu*) , where *Muu* = *T*−<sup>1</sup> ∑*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> *<sup>u</sup>tu <sup>t</sup>*. Since the log-likelihood score is bi-linear in the parameters *α* and *β*, one can employ the generalized reduced rank regression algorithm proposed by Hansen (2003) for likelihood maximization of I(1) VAR models under linear restrictions. This provides maximum likelihood estimates of the parameters *ϕ* = *ϕ α*,*ϕ β* and **Ω**, for example, *ϕ***ˆ** and **Ωˆ** . 16

To work out the variance–covariance matrix associated to *ϕ***ˆ**, notice that the model (30) under the restriction (31) and (32) is a sub-model of the following regression model:

$$\mathcal{Y}\_t = \Pi \mathbf{X}\_{t-1} + \mathbf{u}\_{t\prime}$$

where **Π** = *αβ* = **Π**(*ϕ*) is a smooth function of the vector of the parameters in *ϕ*. The second derivatives of the log-likelihood with respect to vec(**Π**) are given by <sup>−</sup>*<sup>T</sup> <sup>M</sup>XX* <sup>⊗</sup> **<sup>Ω</sup>**<sup>−</sup>1, see e.g., Johansen (2006, Equation (13)), where *MXX* = *T*−<sup>1</sup> ∑*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> *<sup>X</sup>t*−1*X <sup>t</sup>*−1. Because the

parameters in *ϕ* and in **Ω** are asymptotically independent, one finds that the Hessian with respect to *ϕ* equals the following:

$$\mathcal{H}\_{\mathfrak{q}} = \frac{\partial^2 \ell(\mathfrak{q}, \Omega)}{\partial \mathfrak{q} \partial \mathfrak{q}'} = -\, ^\prime J\_{\Pi \mathfrak{q}}' \left( \mathcal{M}\_{XX} \otimes \Omega^{-1} \right) \mathcal{J}\_{\Pi \mathfrak{q}'} \tag{33}$$

where

$$J\_{\Pi\varphi} = \frac{\partial \text{ vec } \Pi(\varphi)}{\partial \varphi'}.$$

In order to describe *<sup>J</sup>***Π.***<sup>ϕ</sup>* in more detail observe that, in the present case, one has vec **<sup>Π</sup>**(*ϕ*) = vec *αβ* <sup>=</sup> (*<sup>β</sup>* <sup>⊗</sup> *<sup>I</sup>n*)vec *<sup>α</sup>* <sup>=</sup> *<sup>I</sup>*(3*n*+1) <sup>⊗</sup> *<sup>α</sup>* vec *β* . Therefore, using (31) and (32), one finds the following:

$$\begin{aligned} \partial \text{vec} \left( \mathfrak{a} \mathfrak{z}^{\prime} \right) / \partial \mathfrak{q}\_{\mathfrak{a}}^{\prime} &= \left. \left( \mathfrak{z} \otimes I\_{\mathfrak{n}} \right) H\_{\mathfrak{n},\mathfrak{z}} \right. \\ \left. \partial \text{vec} \left( \mathfrak{a} \mathfrak{z}^{\prime} \right) / \partial \mathfrak{q}\_{\mathfrak{f}}^{\prime} &= \left. \left( I\_{(3n+1)} \otimes \mathfrak{a} \right) \mathfrak{K}\_{(3n+1),\mathfrak{n}} H\_{\mathfrak{f}} \mathfrak{f} \right. \end{aligned}$$

where *<sup>K</sup>mn* is a commutation matrix, which satisfies *<sup>K</sup>mn* vec(*M*) <sup>=</sup> vec(*M* ) when *M* is *m* × *n*. Therefore

$$J\_{\Pi \mathfrak{a} \mathfrak{p}} = \text{blkdiag} \left( (\mathfrak{F} \otimes I\_n) H\_{\mathfrak{a} \prime} \left( I\_{(\mathfrak{A}n+1)} \circledcirc \mathfrak{a} \right) \mathcal{K}\_{(\mathfrak{A}n+1), \mathfrak{a}} H\_{\mathfrak{f}} \right).$$

The variance–covariance matrix of *ϕ***ˆ** can be then estimated by the following:

$$
\mathfrak{T}\_{\Phi} = -\mathfrak{A}\_{\Phi}{}^{-1}
$$

where *<sup>H</sup>***<sup>ˆ</sup>** *<sup>ϕ</sup>* is obtained by plugging the ML estimates *<sup>ϕ</sup>***<sup>ˆ</sup>** and **<sup>Ω</sup><sup>ˆ</sup>** instead of *<sup>ϕ</sup>* and in **<sup>Ω</sup>** in (33).

#### **4. Results**

Our statistical analysis is based on two equations, headed to *c*1,*<sup>t</sup>* and *c*2,*t*, respectively (see Section 2 for a definition of the indices). In this section, we will first discuss the estimates of the reduced form models and then the corresponding estimates of the structural form models.

#### *4.1. Analysis of the Reduced Form—Comparing Bass, BF, BFF*

As illustrated in the previous section, the two univariate Bass Equation (18) may be seen as a restricted version of the bivariate BFF model (29), with four restrictions: *α*<sup>11</sup> = *α*<sup>22</sup> = −1 and *α*<sup>12</sup> = *α*<sup>21</sup> = 0. Similarly, the univariate dynamic BF model (27) may also be seen as a restricted version of the bivariate BFF model (29), with only two restrictions: *α*<sup>12</sup> = *α*<sup>21</sup> = 0. In all cases, assuming that the errors in the two equations are simultaneously correlated, i.e.,

$$
\Omega\_{\mathfrak{u}} = \begin{bmatrix}
\sigma\_1^2 & \rho \sigma\_1 \sigma\_2 \\
\rho \sigma\_1 \sigma\_2 & \sigma\_2^2
\end{bmatrix},
$$

efficient estimates of all models may be obtained by maximum likelihood, using the Hansen (2003) algorithm as illustrated in Section 3.3. <sup>17</sup> The results are shown in Table 4.


**Table 4.** ML estimates of the reduced form parameters.

Table 5 reports some misspecification tests based on the residuals illustrated in Figure 3.

The two rows, headed AC, in the table report the results of tests for auto correlation of the residuals of the applied and methodological equation, respectively. Specifically, we tested for serial correlation up to *k* = 20 lags using the Ljung-Box Q-statistic, whose null hypothesis is that the errors are uncorrelated.<sup>18</sup> The p-value is zero for the standard Bass model (18): therefore, residuals serial correlation is a major problem for that model. Conversely, in models (27) and (29), the white noise assumption is not rejected for the methodological equation, while for the applied equation, there is a clear improvement over model (18), but some autocorrelation seems to remain for both models, which suggests to invest more on the dynamic specification, which is left for further research.

The four rows headed HSK in the table report the results of two different types of Breusch– Pagan tests for heteroskedasticity for the applied and methodological equation, respectively. In all cases, the null hypothesis is that the errors are homoskedastic, but we introduced two different alternatives. In fact, as seen in Equation (22), Boswijk and Franses (2005) suggest that the standard deviation should be proportional to *c φ <sup>i</sup>*,*t*−<sup>1</sup> (with *<sup>φ</sup>* <sup>=</sup> 1/2 or *<sup>φ</sup>* <sup>=</sup> 1); therefore, we introduced the constant and *c* 2*φ <sup>i</sup>*,*t*−<sup>1</sup> in the auxiliary regression, with two alternative values for *φ*. For model (18), the null is rejected in most cases.19 Conversely, in spite of the very convincing argument supporting heteroskedasticity made by the cited authors, we did not find statistically significant evidence in this sense for this data set in (27) and (29); therefore, for the analysis in this paper, we did not consider the heteroskedastic versions of BF and BFF models.

The log-likelihood increases by 33.75 from model (18) to model (27): the LR test is therefore *χ*<sup>2</sup> <sup>2</sup> = 67.50, and the p-value is essentially zero. According to this result, the standard Bass model seems unable to capture the persistent swings clearly visible in Figure 2 and in the first plot of Figure 3: notice in fact that both parameters *α*<sup>11</sup> and *α*<sup>22</sup> estimated in (27) are approximately −0.5 and statistically different from −1, which implies that only half of the distance from the ideal Bass path is corrected within one quarter, giving rise to persistent disequilibria. However, even model (27) is not satisfactory: in fact, the log-likelihood of model (29) is significantly higher (the LR test is *χ*<sup>2</sup> <sup>2</sup> = 14.6, *p*-value 0.00111). This result is interesting since it suggests the existence of Granger causality running from the methodological research to applied research and/or vice-versa.

**Table 5.** Residual based tests for autocorrelation and heteroskedasticity. AC: Ljung-Box Q-statistics up to *k* = 20 lags (see footnote 25 for *p*). HSK: Breusch–Pagan test, including the constant and *c* 2*φ* 1,*t*−1 (or *c* 2*φ* 2,*t*−1) in the regression where the dependent variable is *<sup>u</sup>*ˆ<sup>2</sup> 1,*t σ*ˆ 2 1 (or *<sup>u</sup>*ˆ<sup>2</sup> 2,*t σ*ˆ 2 2 ).


**Figure 3.** Residuals of different models: Bass = model (18), BF = model (27), BFF = model (29). Thick red line = "Applied" (left scale). Thin blue line = "Methodological" (right scale).

To shed some light on this, we observe that the estimates of *α*<sup>12</sup> and *α*<sup>21</sup> in (29) are both positive and statistically significant, suggesting that an increase in the methodological research leads to expect more applications in the future, and that an increase in applications stimulates further methodological research, with a continuous dialogue between the economic problems and econometric methods, which is exactly in the spirit of KJ and SJ's main message to the profession.

To provide a visual illustration of the relevance of the dynamic interaction between methodological and applied research in this field, we carry on a simulation exercise, similar in spirit to impulse response analysis. Impulse response functions, being the reactions of the variables to shocks entering the system, are useful for studying the interactions between variables in a vector autoregressive model (Lütkepohl 2016). In a more general non-linear setting, Potter (2000) and Koop et al. (1996) remark that nonlinear models produce impulse responses that are history- and shock-dependent; to overcome this problem, they introduce the notion of "generalized impulse response functions", based on a stochastic simulation, which can be applied in both the linear and non-linear case. We considered this tool, but since the non-linearity is relatively mild in our case, we opted for a tailored solution that is closer to the traditional deterministic impulse response analysis.

We initialize *Ci*,0 = *ci*,0 = 0,<sup>20</sup> and then we compute two alternative trajectories for *ci*,*<sup>t</sup>* based on the estimated counterpart of (29). In the first dynamic simulation, *ui*,*<sup>t</sup>* is set to zero for all *i* and *t*: this leads to the "unshocked" paths *c<sup>U</sup> i*,*t* , corresponding to the deterministic trajectory that would take place in the absence on any innovation, starting from the assumed initial conditions. In the second dynamic simulation, we set *uj*,1 = *σ*ˆ*<sup>j</sup>* in the *j*-th equation, whereas all other innovations (different equations and/or different times) are set to zero so that the impulse corresponds to one standard deviation in just one of the equations;<sup>21</sup> this exercise leads to the shocked paths *c<sup>S</sup> j*,*i*,*t* , where the first subscript, *j*, indicates which equation has been shocked. The standardized response of the *i*-th equation to an impulse on the *j*-th equation are then given by the difference of the two trajectories, standardized by the standard deviation of the output variable as follows:

$$IR\_{j,i,t} = \frac{\left(c\_{j,i,t}^S - c\_{i,t}^{II}\right)}{\hat{\sigma}\_i} \qquad \quad j = 1, \ldots, n; \quad i = 1, \ldots, n; \quad t = 1, \ldots$$

The IRs therefore isolate that part of the trajectory *c<sup>S</sup> j*,*i*,*t* , which can be attributed to the shock. The first 20 IRs are illustrated in Figure 4. Notice that, by construction, *IRj*,*i*,1 is equal to 1 for *j* = *i*, 0 otherwise. According to Figure 4, in the short run, the (standardized) response of the methodological literature to a (standardized) impulse in the applied literature appears qualitatively very similar to the (standardized) response of the applied literature to a (standardized) impulse in the methodological literature.

**Figure 4.** Standardized impulse responses based on model (29). Initialization: *Ci*,0 = *ci*,0 = 0, *i* = 1, 2.

Some more insight on the relationship between methodological and applied research can be obtained by analyzing the cumulative IRs. It is important to remark that, given the mathematical nature of the model, the shocks do not have permanent effects. In fact, as *t* goes to infinity, the cumulative citations *Ci*,*<sup>t</sup>* will eventually reach the saturation point *mi* irrespective of the initial conditions and/or the shocks they undergo: this implies that ∑<sup>∞</sup> *<sup>t</sup>*=<sup>1</sup> *c<sup>U</sup> <sup>i</sup>*,*<sup>t</sup>* <sup>=</sup> <sup>∑</sup><sup>∞</sup> *<sup>t</sup>*=<sup>1</sup> *c<sup>S</sup> <sup>j</sup>*,*i*,*<sup>t</sup>* <sup>=</sup> *mi* <sup>−</sup> *Ci*,0 for any *<sup>i</sup>*, *<sup>j</sup>* and *Ci*,0, and therefore <sup>∑</sup><sup>∞</sup> *<sup>t</sup>*=<sup>1</sup> *IRj*,*i*,*<sup>t</sup>* = 0. As a consequence, although the first IRs illustrated in Figure 4 are positive, at some point they turn negative (although with a very small magnitude) so that, in the limit, the cumulative sum is zero. This behavior is better illustrated through the cumulative IRs, illustrated in Figure 5, for a much longer period (500 quarters).

**Figure 5.** Cumulative standardized impulse responses based on model (29). Initialization: *Ci*,0 = *ci*,0 = 0, *i* = 1, 2.

Figure 5 shows that an impulse equal to *σ*ˆ1 (i.e., 8.15 papers) in the applied literature is strongly "self exciting", giving rise to a very long sequence of positive IRs in the applied literature itself, adding up to 10.5*σ*ˆ1 (about 85 papers) in the subsequent 150 quarters (almost 40 years), before it starts fading away. Conversely, the cumulative impact on the methodological literature of the same impulse is shorter living, and way less relevant (1.2*σ*ˆ2, i.e., about 3 papers). On the other hand, an impulse equal to *σ*ˆ2 (i.e., 2.72 papers) in the methodological literature is not so "self-exciting" (the peak of the cumulative IRs is only 3.6*σ*ˆ2—10 papers—about 50 quarters after the impulse), whereas the cumulative impact on the applied literature seems very important (the peak is equal to 4.4*σ*ˆ1—35 papers—about 150 quarters after the impulse). This evidence seems to suggest that, although in the short run, the cross fertilization is rather balanced, in the long run, the methodological literature triggers the applications more than the other way around.<sup>22</sup>

Actually, the extremely long sequence of positive IR's, well beyond the observed period of 115 quarters, casts some doubt on the validity of the implicit assumption that the impulses do not have a permanent effect. We think that a hint for future research arising from the current study is to develop an alternative model where the saturation point *mi* is not already set at the beginning of the process, but it is to some extent "path dependent". In fact, if an idea appears more successful than what was initially assumed (i.e., we observe some unexpected citations), we should reconsider the expected total number of citations in the long run, leading to an upward revision. Conversely, when an idea is suddenly abandoned, possibly in favor of an alternative paradigm (i.e., we observe an unexpected reduction in the number of citations), we should reasonably revise downwards the expected total number of citations in the long run.

#### *4.2. Analysis of the Structural Form*

Table 6 reports the "structural" parameters *m*, *p* and *q* in the models (12), (19) and (28), which are based on the ML estimates of (18), (27) and (29), respectively. The associated standard errors are computed using the delta method, as illustrated in Sections 3.1–3.3. It is important to remark that the standard errors reported for models (12) are not reliable: they appear to be much lower than in the other two models, but the assumptions for applying ML—in particular, the absence of serial correlation—are clearly invalid for that model as illustrated in Table 5.


**Table 6.** Estimates of the structural parameters of the Bass, BF and BFF models. The standard errors of the ˆ*tP*'s measured in quarters.

The timing of the citations peaks ˆ*t<sup>P</sup>* <sup>1</sup> and <sup>ˆ</sup>*t<sup>P</sup>* <sup>2</sup> , the associated peaks *c*¯ *P* <sup>1</sup> and *c*¯ *P* <sup>2</sup> , and the corresponding cumulative number of citations at the peak *C* -¯ *P* <sup>1</sup> and *C* -¯ *P* <sup>2</sup> , are obtained by plugging the estimated structural parameters in (9)–(11), and the associated standard errors are computed using the delta method.23

According to the evidence provided in Table 6, the estimates of the structural parameters are rather robust to the model used. Our comments are focused on the results based on model (28), which is statistically preferable.

It is interesting, and not surprising, that the "innovation parameter" *p* is much higher for the methodological literature, whereas the "imitation parameter" is quite similar in the two strands of the literature: this makes imitation relatively more important than innovation in the applied literature. As for the citation peaks, it seems that the peak in the methodological literature (11 papers per quarter) was reached in 2001, whereas the peak in the applied literature (88 papers per quarter) is expected in 2020, 12 quarters after the end of the estimation sample (although the associated standard error is extremely large—32 quarters). Based on the discussion of the properties of the estimates provided in Boswijk and Franses (2005), the estimates of the methodological equation should, therefore, be regarded as more reliable since the inflection point of the diffusion curve appears to be within the sample; this is less so for the estimates of the applied equation, where the estimated inflection point is outside the sample (of course we do not know the "true" inflection point). Not surprisingly, the standard error of *m*ˆ <sup>1</sup> is quite large (the coefficient of variation *σ*ˆ*m*<sup>ˆ</sup> <sup>1</sup>/*m*ˆ <sup>1</sup> is about 42%), while the standard error of *m*ˆ <sup>2</sup> is much smaller (the coefficient of variation *σ*ˆ*m*<sup>ˆ</sup> <sup>2</sup>/*m*ˆ <sup>2</sup> is less than 10%).

Figure 6 illustrates the observed time series along with the estimated unconditional expectation obtained by plugging the estimated structural parameters in Equation (8). Strictly speaking, since at the end of the sample (2017:Q3) we observe *C*1,*<sup>T</sup>* = 5445.2 and *C*2,*<sup>T</sup>* = 1011.8, our point estimates would ideally imply that we should expect *m*ˆ <sup>1</sup> − *C*1,*<sup>T</sup>* = 9905 applied WoS papers and *m*ˆ <sup>2</sup> − *C*2,*<sup>T</sup>* = 216 methodological WoS papers citing KJ and SJ in the future. We think that this interpretation is hazardous, to say the least. It is worth observing that the estimates of the structural parameters, especially *m*<sup>1</sup> and *m*2, are very unstable as observed among others in Chandrasekaran and Tellis (2018), and they mainly seem to represent the history of the process in a descriptive sense rather than being a reliable forecasting tool in an inferential sense. For example, if we re-estimate the parameters based on the sub sample 1989:Q1-2005:Q4, so that the end of the sample occurs right before the "second wind" clearly visible in the plot, we would obtain *m*ˆ <sup>1</sup> = 2710.6

(*σ*ˆ*m*<sup>ˆ</sup> <sup>1</sup> = 142.5), *m*ˆ <sup>2</sup> = 684.3 (*σ*ˆ*m*<sup>ˆ</sup> <sup>2</sup> = 34.7), ˆ*t<sup>P</sup>* <sup>1</sup> =1999:Q1 and <sup>ˆ</sup>*t<sup>P</sup>* <sup>2</sup> =1997:Q4.<sup>24</sup> Therefore, estimating the same model 15 years ago, one would be convinced of the following: (i) that the citations peak was already reached several years before (and then the estimates would be regarded as reliable); (ii) that the potential for this literature was about one fifth of what it appears now; and (iii) that by 2020, the interest in KJ and SJ work will have disappeared (see Figure 7). However, as this Special Issue confirms, no prediction could have proved more wrong!

**Figure 6.** Observed time series along with the estimated unconditional expectation based on Equation (8). **Left**—applied index; **right**—methodological index.

**Figure 7.** Forecasting fallacy: observed time series along with the estimated unconditional expectation based on (8), parameters re-estimated based on the trimmed sample 1898:Q1-2005:Q4. **Left**—applied index; **right**—methodological index.

#### **5. Conclusions and Suggestions for Further Research**

Our main purpose in writing this paper was to contribute to the Festschrift in honor of Katarina Juselius and Søren Johansen as a sign of gratitude for their being for us a constant source of inspiration. We tried to find a way to show how profoundly they contributed to the development of economic ideas, emphasizing one key aspect of their approach, namely, the dialogue between empirical economics and econometric methodology. To this aim, we have proposed an operational way to disentangle, as much as possible, their contribution to applied and methodological econometric research, through the development of two indices based on the Web of Science database. We hope that this can also be a contribution

to bibliometric studies since a similar approach to assess in a quantitative way the impact of new ideas on methodological and applied research, and on the interaction between them, can be used for other areas. Ideally, similar analyses might be employed to investigate even more general epistemological issues, such as the relationship between theoretical and empirical research.

We think that the data we describe in Section 2 are very interesting per se. They show that KJ and SJ's influence on the literature is extremely important: their top 10 papers sum up to about 10,500 WoS citations (more than 50,000 in GS) from about 6500 citing papers, an average of more than 200 papers per year. Based on our indicators, 85% of the citing papers are essentially applied, whereas 15% are methodological: we do not have a benchmark for comparison, but we have the impression that the share of methodology is somewhat larger than in the econometric literature in general. As of 2017, the number of applied citing papers per quarter had not yet reached the peak (although a "false peak" seems to have occurred around 2000); conversely, the peak in the methodological literature seems to have been reached around 2001, although the shape of the trajectory is very flat after the peak, similar to what Bjork et al. (2014) has identified in a minority of Nobel prize winners and defined as "staying power".

To model the data, we resorted to an innovative dynamic multivariate version proposed in Boswijk et al. (2009)—of the well-known Bass (1969) model. It was a pleasure for us to observe and emphasize that this model resembles so closely the Vector ECM model popularized by KJ and SJ; in particular, the bilinear nature of the model allows to use the Hansen (2003) algorithm to maximize the likelihood, which generalizes Johansen's ML algorithm, adapting it to a rather general class of restrictions, which includes our case.

The estimated model conveys very interesting information. As seen in Formula (9), the location of the citations peaks depends on the relative importance of the "innovation parameter" *p* and the "imitation parameter" *q*. Our estimates suggest that the different location of the peaks might be explained by the higher value of the parameter *q*<sup>2</sup> with respect to *q*1, whereas *p*<sup>1</sup> and *p*<sup>2</sup> are quite similar: using the standard terminology in the Bass model literature, the difference in the parameters suggests that the methodological literature is mainly driven by "innovators", whereas "imitators" are relatively more important in the applied literature.

Another interesting finding is that, in the literature referring to KJ and SJ, the "crossfertilization" between methodological and applied research is statistically significant and bi-directional (although possibly more effective from methodology to applications than the other way round). According to our impulse response analysis, rounding our figures, 8 unexpected applied papers in one quarter lead to predict that 3 methodological papers will follow, whereas 3 unexpected methodological papers lead to predict that 40 applied papers will follow (this is not so unbalanced as it seems at first sight since the scale of the two strands of literature is different). These results testify that one of the most important messages that Katarina Juselius and Søren Johansen have emphasized in their writings i.e., that the applications should pose challenging problems to the methodology and that the methodology should sharpen the ability of applied researchers to ask meaningful questions to the data—has become a common heritage in this literature.

As for the estimated dimension of KJ and SJ influence, as measured by the parameters *m*<sup>1</sup> and *m*<sup>2</sup> (often called "saturation point" or "ceiling"), a word of caution is in order. Our estimates, *m*ˆ <sup>1</sup> = 15,351 and *m*ˆ <sup>2</sup> = 1228, imply that we should expect about 10,000 applied WoS papers and 200 methodological WoS papers citing KJ and SJ in the future. We do not consider these figures very reliable. Indeed, early in the literature, it was pointed out by Heeler and Hustad (1980) and others (e.g., Hyman 1988) that the predicting ability of the Bass model depends on the generation of accurate estimates of *m*. Srinivasan and Mason (1986) report problems with convergence when the data set does not contain the peak time period (i.e., the inflection point of the curve). The parameter *m* is, again, under attack in Van den Bulte and Lilien (1997): there is evidence of downward bias in the estimation of the saturation point. Finally, in their review article Chandrasekaran

and Tellis (2018), point out the overall poor forecasting ability, the unstable parameter estimates and the difficulty to define a clear stopping rule for the time window regarding data collection for the Bass model (since the data should, in theory, end when the entire market has adopted). We add one more critique to the list: the parameter *m*, in the logic of the Bass model, appears to be in the DNA of the process since the onset and to be immutable over time. In all versions of the model that we have considered, the "shocks" (i.e., the unexpected citing papers) have no permanent effect in the sense that they determine, at most, a persistent (but transitory) departure from a path, which eventually leads to *m*. In the spirit of the unit roots literature, so much inspired by the contribution of Katarina Juselius and Søren Johansen, our suggestion for further methodological research is to try and conceive a new model, where the shocks are allowed to have a permanent effect on the "ceiling". We think that this is absolutely needed in the applications, such as the bibliometric ones, where the notion of "population at risk" or "potential" is not obvious. However, also in marketing, or epidemiology, or in the analysis of technological innovation, the final diffusion is likely to be influenced in a crucial way by events that are largely unpredictable; therefore, pretending that the same differential equation—where *m* is fixed since *t* = 0—drives the dynamics of the process along its entire history might not be a realistic representation of the observed phenomena.

**Author Contributions:** The authors contributed equally to the paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** The data are available from the authors upon request.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Bass vs. Autoregressive Models**

It is interesting to observe that (12) can be regarded as a univariate AR(1) model for *Ct* with state dependent parameters. In fact, the model can be rewritten as the following:

$$
\Delta \mathbf{C}\_t = \mu + \pi\_t \mathbf{C}\_{t-1} + \mathfrak{u}\_t \tag{A1}
$$

with

$$\begin{aligned} \mu\_- &= -pm\\ \pi\_t &= -p + q\left(1 - \frac{\mathcal{C}\_{t-1}}{m}\right) \end{aligned}$$

*m* can be seen as a steady state for *Ct*: in fact, if *Ct*−<sup>1</sup> = *m*, then *π<sup>t</sup>* = −*p*, so that in the absence of shocks (i.e., *ut* = 0), we have that Δ*Ct* = *pm* − *pm* = 0. The state dependent parameter *π<sup>t</sup>* controls the strength of the adjustment to the steady state.


the system locally behaves like a random walk with drift. When *Ct*−<sup>1</sup> > *m* <sup>1</sup> <sup>−</sup> *<sup>p</sup> q* , then *π<sup>t</sup>* < 0 so that the system starts adjusting. An illustrative example, based on the estimated parameters for the methodological index, is given in Figure A1.

**Figure A1.** Illustration of *<sup>π</sup><sup>t</sup>* as a function of *Ct*−1, with *<sup>p</sup>* = 0.00549, *<sup>q</sup>* = 0.0242, *<sup>m</sup>* = 1227.7 as in the estimated equation for *CM*,*t*.

Similarly, the model (19) can be seen as a univariate AR(2) model for *Ct* with state dependent parameters:

$$
\Delta \mathbf{C}\_t = \mu + \pi\_t \mathbf{C}\_{t-1} + \gamma \Delta \mathbf{C}\_{t-1} + \mu\_t \tag{A2}
$$

with:

$$\begin{aligned} \mu\_{\epsilon} &= -amp\_{\prime} \\ \pi\_{t} &= -a \left( p - q \left( 1 - \frac{\mathbb{C}\_{t-1}}{m} \right) \right), \\ \gamma\_{\epsilon} &= 1 + a\_{\prime} \end{aligned}$$

so that when *α* = −1 (A2) collapses into (A1). We remark that as far as *α* is negative the sign of *π<sup>t</sup>* is the same in both models, and depends only on the sign of *Ct*−<sup>1</sup> − *m* <sup>1</sup> <sup>−</sup> *<sup>p</sup> q* . The magnitude of *π<sup>t</sup>* instead is affected by *α*: everything else being fixed, when −1 < *α* < 0, the process is less explosive at the beginning, and the strength of adjustment is weaker in the end, as compared to the case *α* = −1.

Finally, the model (28) can be seen as a VAR(2) model for *C<sup>t</sup>* = [*C*1,*t*, ..., *Cn*,*t*] with state-dependent parameters:

$$
\Delta \mathbf{C}\_{t} = \mu + \mathbf{a} \,\mathbf{\mathcal{B}}\_{t}^{\prime} \mathbf{C}\_{t-1} + \Gamma \Delta \mathbf{C}\_{t-1} + \mathbf{u}\_{t} \tag{A3}
$$

where

$$\begin{array}{rcl} \mathbf{C}\_{t} &=& \begin{bmatrix} \mathbf{C}\_{1,t} \\ \vdots \\ \mathbf{C}\_{n,t} \end{bmatrix}, \quad u\_{t} = \begin{bmatrix} u\_{1,t} \\ \vdots \\ u\_{n,t} \end{bmatrix}, \quad \mu = -\begin{bmatrix} \sum\_{j=1}^{n} a\_{1j} p\_{j} m\_{j} \\ \vdots \\ u\_{n,t} \end{bmatrix} \\\\ \mathbf{a}\_{n\times n} &=& \begin{bmatrix} a\_{11} & \cdots & a\_{1n} \\ \vdots & \ddots & \vdots \\ a\_{n1} & \cdots & a\_{nn} \end{bmatrix}, \quad \mathbf{f}\_{t} = \operatorname\*{diag}\left\{ p\_{i} - q\_{i} \left( 1 - \frac{\mathbb{C}\_{i,t-1}}{m\_{i}} \right) \right\}, \quad \Gamma = I\_{n} + \mathfrak{a}. \end{array}$$

It is easily seen that, when *α* is diagonal, (A3) collapses into *n* seemingly unrelated equations such as (A2), while when *<sup>α</sup>* <sup>=</sup> <sup>−</sup>*In*, (A3) collapses into *<sup>n</sup>* seemingly unrelated equations such as (A1).

#### **Appendix B. Sensitivity to** *ω*

As discussed in Section 2, the parameter *ω* in (1) and (2) controls for the weight of the "Mainly Applied" (MA) and "Mainly Methodological" (MM) papers on the aggregate indices *c*1,*<sup>t</sup>* (applied) and *c*2,*<sup>t</sup>* (methodological). Meaningful values of *ω* are in the range (0.5; 1): with *ω* = 0.5 the papers classified as MA and MM are essentially pooled together, and allowed to contribute evenly to both indices. In the opposite polar case, *ω* = 1, MA (or MM) is considered equivalent to PA (or PM). We observe that, in principle, instead of a single weight *ω*, it would be possible to consider two different weights for MA and MM papers, for example, *ω<sup>A</sup>* and *ωM*, defining the following:

$$\begin{aligned} \mathcal{L}\_{1,t} &= \mathcal{L}\_{PA,t} + \omega\_A \mathcal{c}\_{MA,t} + (1 - \omega\_M) \mathcal{c}\_{MM,t} \\ \mathcal{L}\_{2,t} &= -\mathcal{L}\_{PM,t} + \omega\_M \mathcal{c}\_{MM,t} + (1 - \omega\_A) \mathcal{c}\_{MA,t} \end{aligned}$$

We remark, however, that in our dataset, any value 0.5 < *ω<sup>M</sup>* < 1 would leave the two indexes essentially unchanged since there are only 92 papers classified as MM in front of 716 classified as PM and 4198 classified as PA; therefore, our choice to set *ω<sup>M</sup>* = *ω<sup>A</sup>* is a minor problem. Conversely, in our dataset, the critical issue is *ωA*, mainly because of its impact on *c*2,*t*: in fact, there are 1451 papers classified as MA and 716 classified as PM, so that setting *ω<sup>A</sup>* = 0.5, the MA papers would be as influential as the PM papers in the index *c*2,*t*. This argument induced us to set *ω<sup>A</sup>* = *ω<sup>M</sup>* = 0.85. With this choice, (1 − *ωA*) is relatively close to 0 so that *c*2,*<sup>t</sup>* reflects mainly the 716 PM papers and, therefore, is a more reliable measure of the methodological research. Notice that this choice has a minor impact on the reliability of the applied index *c*1,*<sup>t</sup>* for two reasons: (i) the 4198 papers classified as PA outnumber the 1451 MA papers, and (ii) the correlation between *cPA*,*<sup>t</sup>* and *cMA*,*<sup>t</sup>* it quite high, 69.5% (see Figure 1) (conversely the correlation between *cPM*,*<sup>t</sup>* and *cMA*,*<sup>t</sup>* is only 1.7%). Finally, notice that setting *ω<sup>A</sup>* = 0.85 would make it approximately equal to *nPA*/(*nPA* + *nPM*) = 0.8545: in practice, this corresponds to the assumption that the share of "applied research" of an MA paper is similar, on average, to the share of applied research in the econometric literature referring to KJ and SJ papers in general. Let us now discuss how a different choice of *ω* would affect our results.

As illustrated in Table A1, changing *ω* affects quite relevantly the magnitude of the indices (especially *c*2,*t*), as well as the correlation among them. To explain the impact on the magnitude, remember that, when *ω* = 1, the 1451 MA papers are treated *de facto* as the "Purely Applied"(PA) ones, whereas when *ω* = 0.5, only half of them (725.5) is treated as applied, while the other half is treated as methodological, and therefore, contribute also to the methodological index *c*2,*t*. <sup>25</sup> The impact on the correlation is instead explained by the fact that the correlation between *cPA*,*<sup>t</sup>* and *cMA*,*<sup>t</sup>* it quite high (69.5%), whereas the correlation between *cPM*,*<sup>t</sup>* and *cMA*,*<sup>t</sup>* is negligible (1.7%); see Figure 1.

**Table A1.** Sensitivity to *ω*: impact of *ω* on some characteristics of the composite citation indices.


Given this impact of *ω* on the composite indices, it is interesting to analyze to which extent the results of the econometric model depend on it. The analysis is limited to the general model (29) since our analysis shows that it is preferable with respect to the restricted counterparts (27) and (18). In this appendix, we show that our results are essentially robust to changes in *ω*.

Table A2 shows how the estimates of the reduced for changes when *ω* is changed.


**Table A2.** Sensitivity to *ω*: ML estimates of the reduced form parameters based on model (29).

The sign and significance of the parameters are essentially the same irrespective of *ω*. Interestingly, the difference in the correlation between the two indices induced by *ω*, illustrated in Table A1, are reflected in different estimates of *ρ*, whereas the estimates of *α*1,2 and *α*2,1 (i.e., the parameters controlling for the dynamic interaction among the two processes) remain quite stable. Due to this, the (unreported) pattern of the standardized IRs and cumulative IRs computed with *ω* = 0.5 and *ω* = 1 are very similar to those illustrated in Figures 4 and 5 for *ω* = 0.85. Unreported results show that also the misspecification tests are qualitatively unchanged for all *ω*s with respect to those reported in Table 5 for model (29): homoskedasticity and uncorrelatedness appear acceptable for any value of *ω*.

Table A3 illustrate how the structural parameters change when *ω* is changed. The influence on the *m* has an obvious interpretation: as *ω* increases, a larger share of the MA papers is removed from the methodological index (so that *m*<sup>2</sup> declines) and added to the applied index (so that *m*<sup>1</sup> increases). As for the *p*s and the *q*s, we observe that as *ω* increases, *p*<sup>1</sup> and *q*<sup>1</sup> decrease, whereas *p*<sup>2</sup> and *q*<sup>2</sup> increase. As a consequence of these changes, the timing of the peaks, obtained by formula (9), change: specifically, as *ω* increases, the peak in the applied literature moves to the right, whereas the peak in the methodological peak moves to the left. The distance between the peaks is 13 years with *ω* = 0.5, and about 24 years when *ω* = 1. This is not surprising: as illustrated in Figure 1, the dynamic behavior of the MA paper resembles closely the PA papers, and therefore, when 50% of them are considered methodological, *c*1,*<sup>t</sup>* and *c*2,*<sup>t</sup>* become more similar, and the two peaks become closer (although they still remain quite far away from each other).

An interesting consequence of the fact that, increasing *ω*, the peak of the applied literature moves ahead is that the quality of the structural parameters for the applied curve (already quite poor with *ω* = 0.5) decreases considerably: when *ω* = 1, the standard error associated to *m*ˆ <sup>1</sup> is as large as 9516, and the standard error associated to the estimated timing of the peak ˆ*t<sup>P</sup>* <sup>1</sup> turns out to be 42 quarters, more than 10 years. It is a well-known fact in the literature that the estimates of the Bass model are quite poor if the sample period does not include the inflection point, which is quite likely the case for the applied literature if we trust the point estimates, and even more so when *ω* = 1.



#### **Notes**

<sup>1</sup> Many thanks for the provision of the initial Web of Science data to Evi Sachini, Antonis Kardasis and Penny Nikolaidou of the National Documentation Centre/N.H.R.F. based in Athens, Greece.


<sup>10</sup> Maintaining the assumption that *ut* is i.i.d. normal, an alternative estimation strategy could be based on Non Linear Least Squares (NLLS). Estimates of *m*, *p* and *q* would be based on the following:

$$\min\_{m,p,q} \sum\_{t=2}^{T} \left( c\_t - mp - (q-p)\mathbb{C}\_{t-1} + \frac{q}{m}\mathbb{C}\_{t-1}^2 \right)^2.$$

The advantage of NLLS is that it provides directly the estimates of the parameters of interest (*m*, *p* and *q*) and the corresponding standard error, without having to resort to the delta method. The disadvantage is that convergence of the numerical optimization routines is sometimes not easy: this is partly due to the strong collinearity, and partly to the fact that the optimization problem has two solutions. In the following, we opt for OLS and the delta method.


<sup>13</sup> Actually, Boswijk et al. (2009) propose an heteroskedastic version of the model, where *ut* <sup>=</sup> diag *c φ i*,*t*−1 *εt*, with *εt* = [*ε*1,*t*, ...,*εn*,*t*] ∼ *iidNn*(**0**, **Ω***ε*) and *φ* fixed to either 1/2 or 1. In this paper we only briefly discuss the heteroskedastic BFF model, since in our application suitable heteroskedasticity tests seem to accept the hypothesis of homoskedasticity.

<sup>14</sup> Precisely,

$$\begin{array}{rcl} H\_{\mathfrak{f}} & = \operatorname{diag} \{ H\_l \} & , & h\_{\mathfrak{f}} & = \operatorname{diag} \{ h\_l \} \mathbf{1}\_{n \times 1} \\ (3n^2 + n) & & & (3n^2 + n) \times 1 \end{array}$$

with

$$\begin{array}{rcl} \mathbf{H}\_{l} &=& \begin{bmatrix} \mathfrak{u}\_{\mathfrak{u},l} \otimes [\mathfrak{u}\_{3,2}, \mathfrak{u}\_{3,3}] & \mathbf{0}\_{3\mathfrak{u},1} \\ \mathbf{0}\_{1,2} & 1 \end{bmatrix} & \prime & \mathbf{h}\_{l} \\ \end{array} = \begin{bmatrix} \mathfrak{u}\_{\mathfrak{u},l} \otimes \mathfrak{u}\_{3,1} \\ 0 \end{bmatrix} \quad i = 1, \ldots, n.$$


$$\mathbf{W}\_t^{-1}\mathbf{Y}\_t = \left[ \left( \mathbf{X}\_{t-1}^t \otimes \mathbf{W}\_t^{-1} \right) (\mathcal{B} \otimes I\_n) \right] \text{vec}(\mathfrak{a}) + \mathfrak{u}\_{\mathcal{U}}$$

or the following:

$$\mathbf{W}\_{l}^{-1}\mathbf{Y}\_{l} = \left[ \left( \mathbf{X}\_{l-1}^{\prime} \otimes \mathbf{W}\_{l}^{-1} \right) \left( I\_{3n+1} \otimes \mathfrak{a} \right) \right] \text{vec} \left( \mathfrak{F}\_{l}^{\prime} \right) + \mathfrak{u}\_{l}$$

The first equation allows to estimate *α* by GLS when *β* and **Ω** are known, while the second allows to estimate *β* by GLS when *α* and **Ω** are known. A "switching" iterative algorithm similar to Hansen (2003) is therefore possible also in this case. Of course, linear restrictions on vec(*α*) or vec(*β*) are easily dealt with also in this case.


intuitive way: the peak of the cumulative IRs occurs earlier, and the intensity becomes weaker. This can be explained in the light of the discussion presented in Appendix A: in the initial stages of the process, when both *C*1,*<sup>t</sup>* and *C*2,*<sup>t</sup>* are close to zero and much lower than *m*<sup>1</sup> and *m*2, respectively, the processes behave as explosive AR(2), and therefore, the shocks are initially amplified; however, as *C*1,*<sup>t</sup>* and *C*2,*<sup>t</sup>* grow, the processes become less and less explosive, until eventually they start adjusting and the cumulative impact of the shock is driven down to zero. However, some characteristics of the cumulative IRs do not change, even when the initial conditions are modified: the cumulative cross impact seems to be relatively stronger from the methodological to the applied literature than vice versa.

<sup>23</sup> Defining *ψ* = [*tP*, *c*¯ *<sup>P</sup>*, *C*¯*P*] , starting from (9)–(11), we have the following:

$$J\_{\Phi,\mathfrak{A}} = \frac{\partial \Psi}{\partial \theta'} = \begin{bmatrix} (p+q)^2 & 0 & 0 \\ 0 & 4q & 0 \\ 0 & 0 & 2q \end{bmatrix}^{-1} \begin{bmatrix} 0 & \ln p - \ln q - 1 - \frac{q}{p} & \ln p - \ln q + 1 + \frac{p}{q} \\ (p+q)^2 & 2m(p+q) & \frac{m}{q}(p+q)(q-p) \\ q-p & -m & m\frac{p}{q} \end{bmatrix}^{-1}$$

The variance-covariance matrix for *ψ***ˆ** is then obtained as the following:

$$
\mathfrak{L}\_{\Phi} = f\_{\mathfrak{G}^{\square}\theta} \mathfrak{L}\_{\vec{\Theta}} f'\_{\mathfrak{G}^{\square}\theta}.
$$


#### **References**

Bass, Frank M. 1969. A new product growth for model consumer durables. *Management Science* 15: 215–27. [CrossRef]

