2.1. Data and Descriptive Statistics
This study’s data are drawn from the fourth wave of the Nigerian General Household Survey Panel, a recent nationally representative dataset. The data were collected in 2018/2019 by the Nigeria National Bureau of Statistics in collaboration with the World Bank Living Standard Measurement Study—Integrated Surveys on Agriculture (LSMS-ISA) program. The sample size was approximately 5000 households across 36 states and the Federal Capital Territory. Studies by Dedehouanou and McPeak [
23] and Dilion et al. [
24] also describe this dataset.
The survey collected agricultural data at the disaggregated levels (crop, plot, and household). Detailed information on inputs, harvested outputs, and soil characteristics were collected at the plot level. In contrast to the three previous surveys, this recent LSMS-ISA wave included whether farmers used improved seeds on a specific plot. This essential feature was helpful for the identification strategy. Hence, the farm plots were disaggregated into improved and traditional crop plots. Other plot characteristics, such as soil quality, organic fertilizers, plot slope, and whether machines were used in the plots, were also included in the questionnaire.
At the crop level, data are available on the number of seeds planted, fertilizer and chemical use, total labor used, and the approximate percentage of the plot planted with the crops, which helps account for the crop plot size when two or more crops are interplanted. The key socio-economic data collected at the household level include characteristics of the household head (such as sex, gender, and age), characteristics of household assets (such as household size, wealth index, and farm size), and other household characteristics, such as participation in non-farm enterprises, access to extension agents, and credit access. The data and documentation are freely available online (For information on the LSMS-ISA project and links to the data, see
https://microdata.worldbank.org/index.php/catalog/lsms/ (accessed on 20 January 2022)).
The descriptive statistics of the variables are summarized in
Table 1. This table shows that 11.8% of the sample farm plots had improved maize varieties (
Table 1, row 1). Several studies have also reported low adoption of improved maize in Nigeria [
25,
26,
27]. In contrast, the Consultative Group for International Agricultural Research (CGIAR) reported that 95% of Nigerian maize land was under improved maize varieties in 2012 [
28]. There may be two reasons for the significant difference in the adoption rate between this study and the CGIAR estimates. First, CGIAR estimates are based on studies in a specific region and expert opinions [
28,
29], which may be misleading. Second, it may be possible for farmers who have adopted improved varieties to again choose traditional varieties in 2019 [
30,
31].
Before estimating the regression discussed in the next section, it would be instructive to examine the differences between the improved and traditional varieties in the data. We present the mean differences in the characteristics between adopters and non-adopters of improved maize varieties. The first three columns of
Table 2 show that adopters of improved varieties are systemically different from traditional variety growers. For example, adopters tend to be wealthier, younger, and have access to extension services compared to traditional variety growers.
In addition,
Table 2 shows that improved variety growers’ average yield (production) was 40% higher than the traditional variety. The yield difference in this study is very similar to those reported by Wossen et al. [
14] and Abdoulaye et al. [
3]. On the other hand,
Table 2 also shows that adopters use more resources (such as farm size, seed, fertilizer, and chemical use). Hence, the descriptive analysis does not answer whether the improved varieties perform better than the traditional varieties. The next section discusses the econometric models to isolate the improved varieties’ effects on crop yields.
2.2. Empirical Model
This study uses a multistage approach to account for both observable and unobservable sources of heterogeneity between improved and traditional variety growers. First, PSM accounted for the selection bias resulting from the observable characteristics of an improved farm plot with comparable time-invariant features. Second, after obtaining the matched sample, a linear endogenous treatment–effect method was used to control the unobservable characteristics. A stochastic production frontier (SPF) was then estimated to test whether the correction for sample selection is necessary [
32]. Finally, the stochastic metafrontier proposed by Huang et al. [
33] was employed to separate the impact on technical efficiency from technological progress. Details are as follows.
2.2.1. Propensity Score Matching
Although PSM does not account for estimation bias due to unobserved characteristics, compared to other studies (for example, Refs. [
3,
15,
34]), we first employed PSM to create an appropriate counterfactual dataset to mitigate biases arising from observable factors. Empirically, the adoption model can be expressed as follows:
In Equation (1), D is a binary variable for adopting improved varieties, and 0 otherwise. α is a vector of parameters to be estimated; i is the ith plot, and ε is a random error term. Finally, X is a vector that represents household and plot-level characteristics. Specifically, household characteristics included the household head’s age, sex, and education. The plot-level characteristics included farm size, plot slope, soil quality, and machine use.
After identifying suitably matched samples, adopting improved varieties was measured as the average treatment effect on treated (ATT) [
35], which denotes the average impact of planting improved varieties on farm plots. Following Villano et al. [
34], the empirical model is expressed as follows:
where
and
are the average output values in kilograms per hectare for the improved and traditional varieties, respectively.
2.2.2. Linear Endogenous Treatment–Effect Model
This study employs a linear endogenous treatment–effect model using an instrumental variable (IV) to consider the impact of unobserved characteristics. In addition, since our IV estimation is based on matched data (PSM), the estimation bias due to observed and unobserved characteristics is avoided. However, when the endogenous regressor is binary, the linear model in the first stage of the IV method may not be appropriate [
35]. Empirically, our IV model can be specified as follows:
In Equation (3),
Z, meets the following criteria: (1) Correlated with
, and (2) uncorrelated with
[
35]. Following previous literature on agricultural technology adoption [
3,
36,
37,
38], maize varietal information sources were used, which included information from extension agents and farmer networks as an instrument for adoption. According to the technology adoption theory, farmers will only adopt an improved variety if they access information about a particular variety. Therefore, information about improved varieties is expected to spread by extension services or neighboring farmers. A district-based instrumental variable is the share of improved variety growers in the enumeration area. This variable represents a proxy for the local adoption norms. This variable may influence farmers’ adoption of improved varieties but does not directly influence maize yield (
Y) in Equation (4).
2.2.3. Stochastic Metafrontier Approach
Finally, this study employs stochastic metafrontier analysis, which has been widely used in many studies to measure the technological gaps between the observed and potential outputs that each farm could produce at the most productive frontier [
34,
39,
40,
41,
42]. The main feature of this method is that firms are split into different groups based on a priori sample separation information, such as variety type, ownership, and location. Once such a classification has been made, separate analyses for each specific group are performed. Inefficiency is then estimated relative to the group-specific frontier, and the difference in technology frontiers across groups is viewed as the technology gap [
42,
43]. This study employs the two-step stochastic metafrontier approach proposed by Huang et al. [
33], which allows the statistical properties necessary to draw a statistical inference in the second step.
We assume that production functions have a translog functional form owing to their well-known flexibility in technology presentation [
43]. The first step of the stochastic metafrontier model can be written as follows:
where
represents the natural logarithm of the total maize production in kilograms of the
ith farm;
and
are the two components of the composed error term,
. Subscript j denotes the
jth group. This study had two groups (i.e.,
j = 1 and 2): improved and traditional varieties.
Following Huang et al. [
33], the stochastic metafrontier model estimated in the second step is expressed as:
The dependent variable, is the ML estimate from Equation (5) for each group. The superscript M denotes the metafrontier.
However, the estimation result of Equation (6) might be biased because of the endogeneity of improved varieties caused by unobserved characteristics. As discussed above, the observed characteristics can be controlled using a matched sample selected by PSM. Unobserved characteristics should also be controlled in Equation (5) to obtain an unbiased estimation result. Specifically, unobserved characteristics (such as farming ability) correlate with noise in the selection equation (i.e., improved variety adoption) and noise in the stochastic frontier model. To deal with biases from unobserved variables within an SPF formulation, we used the model recently introduced by Greene [
32]. This method has been widely used in previous studies, such as Bravo-Ureta et al. [
44] and Villano et al. [
34].
After estimating the metafrontier, we calculated the technical efficiency and technology gap ratio (TGR) for the improved and traditional varieties. Specifically, the farm-level technical efficiency estimates can be obtained as follows:
Technical efficiency
denotes each farm’s distance from the frontier of its individual variety. Aside from
the farm distance to the metafrontier is expressed as:
The TGR was estimated for each variety (
j = 1 for traditional varieties;
j = 2 for improved varieties). As in O’Donnell et al. [
42] and Huang et al. [
33], a high TGR score indicates close proximity of the group frontier to the metafrontier, and this group has a yield advantage over others with lower TGR values.
Finally, the MTE calculates the product of the two previous metrics to determine the overall distance of each farm from the metafrontier as follows:
Note that technical efficiency estimates () cannot determine the variety with the most productive technology. compares which variety frontier operates closest to the metafrontier, and the compare the variety type to the metafrontier.