**5. An Empirical Application: Obtaining Disaggregated Information on Wages**

In order to illustrate the performance of the proposed estimator, it will be applied to solve an empirical problem of disaggregating data of average wages for Spain. The most detailed information about non-agricultural wages in Spain is published in the Wage Structure Survey (*Encuesta de Estructura Salarial*). The complete version of this survey is conducted by the Spanish Statistical Office (*INE*) every four years, being the corresponding to 2010 one of the most recent ones. In intermediate years, however, only partial data are collected and the microdata are not released. If, for example, we want to explore the differences across industries on average wages by gender and type of working day in a year where the complete statistical operation is not conducted, the only information we have are at aggregate level. This situation happens, for example, in 2011, where the only available data on are the aggregates reported in Table 2, which do not allow disaggregated differences between male and female workers to be analyzed depending on the industry they belong to:


**Table 2.** Available information on annual wages by industry, type of working day and gender. Wage Structure Survey, 2011.


**Table 2.** *Cont*.

In such a context, if the researcher wants to study wage gender gaps across industries it would be necessary to apply an estimation procedure that produces disaggregated values for this specific year, since the official aggregated data do not allow for this type of analysis. The values in Table 2 provide the aggregates required for applying our DWP estimator. Vector *y*, with dimension (18 × 1) and elements *yi*·, contains the mean wage for each industry and our estimation target will be the unknown *yij* elements, where sub-index *j* refers to the type of worker (classified into four categories: full-time males, full-time female, part-time male and part-time females). The information in Table 2 is also useful for setting a regressor (*xij*) for our analysis. In particular, the aggregate mean wages for each type of worker (*xi*·, in the four bottom rows of Table 2) will be used for this purpose, assuming that *xi*· = *xij*, *j* = 1, ... , 4. The additional information required to define the weights (θ*ij*) has been taken from the Spanish Labor Force Survey (EPA) corresponding to that year, where we can find information about the number of workers classified by industry, type of working day and gender. With all this information, the DWP estimator has been applied, specifying identical support vectors as those described in the previous section with the numerical simulation, and the estimates obtained are shown in Table 3:

**Table 3.** DWP estimates on disaggregated mean annual wages (EUR) by industry, type of working day and gender, 2011.


The aggregate information classified by industry in Table 2 displayed a high variability, ranging from slightly more than EUR 14,000 for the average worker in the Accommodation industry to almost three times higher in Financial and Insurance services. Additionally, the aggregates also showed that the male workers earned more on average than the female workers. Specifically, full-time male workers earned on average around 16% than their female counterparts, whereas this gap was around 11% in the case of part-time workers. This information, however, does not allow for checking if this gender differences on wage keep stable independently on the industry. The estimates obtained by the DWP estimator and reported on Table 3 help to shed some light on this matter.

According to the outcomes of the estimation, the gender gap for full-time workers is much larger in the case of economic branches related to mining, manufacturing or construction than in service activities. Furthermore, for the specific case of Education and Health and social services activities, we estimate significant positive difference for full-time female workers. Something similar, but to a lesser extent, happens with the case of part-time workers: the mean gender gap in favor of male workers, according to the estimates, is mainly produced by the higher wages received in mining, manufacturing and construction, but in general the activities related to services tend to alleviate this gap. Detecting these differential patterns across industries is possible due to the disaggregated information contained in the estimates, which was partially hidden in the aggregated averages. Additionally, we have explored how robust are the estimates and the patterns found by modifying the supporting vectors, which in turn impact on the priors, as depicted in equation (15). The estimates reported in Table 3 correspond to a case where the support vectors have been defined as *b*- = [−100,0,100] with *M* = 3 and common for parameters α*<sup>i</sup>* and β*ij*. Appendix B reports the same estimates as in Table 3, where the support vectors are defined as *b*- = [−10,0,10] (Table A1) and *b*- = [−1,000,0,1,000] (Table A2) in order to check if having wider or narrower vectors impacts on the results. Despite some of the minor differences produced by the numerical simulation, the general patterns seem to be robust to this specification.

#### **6. Conclusions**

In this paper, we have tackled the problem of providing reliable estimates of a target variable in a set of small geographical areas, by showing that under certain conditions the generalized cross-entropy (GCE) solution for a matrix adjustment problem and the GME estimator of a DWR equation differ only in terms of the a priori information considered. Then, a composite estimator that combines the priors considered in both approaches is proposed and the performance among the three approaches is evaluated throughout Montecarlo experiments.

The proposed method may represent a new basis to recover estimate at a disaggregate level in presence of: (i) sampling and response errors; (ii) small samples. Within this framework, minimal distributional assumptions are necessary, and a dual loss function is used to take into account both the estimation precision and the prediction objectives. The choice of the prior is data based and endogenously determined and the method provides a simple way of introducing and evaluating prior information in the estimation process. The DWP estimation procedure seem to be a promising alternative model-based estimation technique because the implementation of the method involves minimum outlay of computing, it does not depend on any hypotheses regarding the form of the error distribution in the model, and it produces good results for small-sized samples, especially in the presence of spatial heterogeneity. Finally, theoretical and other non-sample information may be directly imposed on the DWP estimates much more easily than the classic Maximum likelihood and Bayesian estimation techniques.

The results indicate that for low values of the parameter δ (that measures the weight given to the uniform prior for each parameter), it seems preferable considering the GCE approach that does not introduce any area-specific effect and considers the indicator observed at area level as the best prediction in absence of observable information. The longer the value of this scalar, the better the relative performance of the GME-DWR approach (based on a priori uniform distributions).

The working of the proposed estimation procedure has been also illustrated by applying the procedure on the estimation of average wages for the Spanish industries in 2011, classified by gender and type of working day. Our results have shown that the DWP estimation has the potential to obtain disaggregated estimates based on minimal assumptions about the data-generating process.

**Author Contributions:** R.B.P. and E.F.V. designed the methodology, run the numerical simulation and the empirical application and wrote the document. E.F.V. got the data for the empirical application. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been completed within the BLU-ETS project "Blue-Enterprises and Trade Statistics", a small or medium-scale focused research project funded by the Seventh Framework Programme of the European Commission, FP7-COOPERATION-SSH (*Cooperation* Work Programme: Socio-Economic Sciences and the Humanities). Additionally, this project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 726950 corresponding to the IMAJINE project.

**Conflicts of Interest:** The authors declare no conflicts of interest.
