Next Article in Journal
Random Networks with Quantum Boolean Functions
Next Article in Special Issue
Confidence Intervals and Sample Size to Compare the Predictive Values of Two Diagnostic Tests
Previous Article in Journal
Hierarchical Fractional Advection-Dispersion Equation (FADE) to Quantify Anomalous Transport in River Corridor over a Broad Spectrum of Scales: Theory and Applications
Previous Article in Special Issue
Regression Models in Complex Survey Sampling for Sensitive Quantitative Variables
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Self-Perceived Health, Life Satisfaction and Related Factors among Healthcare Professionals and the General Population: Analysis of an Online Survey, with Propensity Score Adjustment

by
Ramón Ferri-García
1,
María del Mar Rueda
1 and
Andrés Cabrera-León
2,3,*
1
Department of Statistics and Operations Research, University of Granada, 18071 Granada, Spain
2
Andalusian School of Public Health, 18080 Granada, Spain
3
Network Biomedical Research Center of Epidemiology and Public Health (CIBERESP), 28029 Madrid, Spain
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(7), 791; https://doi.org/10.3390/math9070791
Submission received: 24 February 2021 / Revised: 27 March 2021 / Accepted: 4 April 2021 / Published: 6 April 2021

Abstract

:
Healthcare professionals (HCPs) often suffer high levels of depression, stress, anxiety and burnout. Our main study aimswereto estimate the prevalences of poor self-perceived health, life dissatisfaction, chronic disease and unhealthy habits among HCPs and to explore the use of machine learning classification algorithms to remove selection bias. A sample of Spanish HCPs was asked to complete a web survey. Risk factors were identified by multivariate ordinal regression models. To counteract the absence of probabilistic sampling and representation, the sample was weighted by propensity score adjustment algorithms. The logistic regression algorithm was considered the most appropriate for dealing with misestimations. Male HCPs had significantly worse lifestyle habits than their female counterparts, together with a higher prevalence of chronic disease and of health problems. Members of the general population reported significantly poorer health and less satisfaction with life than the HCPs. Among HCPs, the prior existence of health problems was most strongly associated with worsening self-perceived health and decreased life satisfaction, while obesity had an important negative impact on female practitioners’ self-perception of health. Finally, the HCPs who worked as nurses had poorer self-perceptions of health than other HCPs, and the men who worked in primary care had less satisfaction with their lives than those who worked in other levels of healthcare.

1. Introduction

One of the elements of the physician’s pledge in the 2017 revision of the Declaration of Geneva, adopted by the World Medical Association (WMA), states: ‘I will attend to my own health, well-being, and abilities in order to provide care of the highest standard [1]’. This addition to the previous Declaration of Geneva acknowledges that patients suffer when the well-being of healthcare professionals (HCPs) is compromised [2] and was adopted in response to the growing awareness that physicians and nurses present high levels of depression, stress, anxiety and burnout [3]. In fact, suicide is the only cause of death that has a higher prevalence among physicians than in the general population [4], and the situation among nurses is likely to be similar [5]. Moreover, the prevalence of substance abuse and/or addiction among physicians is likely to be similar to that found among the general public, or even higher [6].
The WMA recommends that more research be conducted into physicians’ health and well-being and into the impact of these parameters on the patient care provided [7]. In view of these considerations, the main objectives of this research were to estimate the prevalence among HCPs of ill health, dissatisfaction, chronic disease and unhealthy lifestyle habits and to identify and analyse factors associated with life satisfaction and perceived health status.
We addressed these study goals by means of an online survey, an approach that offers substantial advantages over traditional survey techniques in terms of financial and time savings.Health surveys have traditionally used probability sampling of addresses and data collection facilitated by an interviewer who visits each address, but this traditional approach has some limitations, such as the great economic and temporal cost and the susceptibility to nonresponse bias. The main motivation for using nonprobability samples (as volunteer web surveys) is their low cost, lowrespondent burden and quick turnaround since they allow for producing estimates shortly after the information needs have been identified.
Although the validity of internet research for subjective surveys of personal well-being is well established [8] and online questionnaires are recognised as an important tool for epidemiological research [9], many surveys of this type are subject to self-selection [10,11]. Ref. [12] found in a health study that the bias in web surveys is too important, even when additional quotas are set. Statistical adjustments are the key to obtaining reliable estimates from online survey data. Among the various techniques to remove bias in web surveys, we could underline propensity score adjustment (PSA). This method, originally developed for reducing selection bias in non-randomised clinical trials [13], was adapted to nonprobability surveys in the work of [14,15]. PSA aims to estimate the propensity of each individual’s participation in a survey by using logistic regression. [16] assessed the ability of PSA to remove bias in the context of sensitive sexual health research and the potential of web panel surveys to replace or supplement probability surveys.
Another goal of this research was to explore the use of machine learning (ML) classification algorithms to remove selection bias by reweighting the study variables via PSA. ML techniques are commonly employed in epidemiology [17,18,19], and statistical algorithms have been used to weight variables in recent health surveys [20,21,22].These techniques have also shown good properties in simulated data in terms of bias reduction [23,24] but at the cost of increasing the variance of the estimates. However, the mean square error (MSE), which combines bias and variance, is reduced with PSA in some situations, meaning that its application can be recommended in nonprobability sampling contexts. The objective of this study was to compare the performance and applicability of ML algorithms for PSA using several transformations to convert the probabilities provided by PSA into weights in a real-world context. This work pioneers the use of ML techniques to adjust the voluntary response bias in a real health survey and shows the capabilities of the different methods compared with the usual non-adjustment methodology.

2. Materials and Methods

2.1. Target Population

In 2014, according to census data, the Public Health System of Andalusia (SAS) employed 137,882 HCPs. However, for the purposes of this study, only those with a university degree were considered for inclusion, and so the target population was composed of the 73,465 HCPs who had this academic qualification.

2.2. Sample

In 2014, the participants in an online course on holistic care for patients with chronic diseases were asked to complete a web survey. These participants (n = 1797) were all university graduates working in the SAS as HCPs.

2.3. Variables

The following variables were present in both datasets (web survey and census): sex, age, degree and type of medical care provided (Table 1).
In addition to the variables presented in the table, the following variables were also addressed in the web survey:
  • Self-perceived health status (scored on a 5-point Likert scale, ranging from 1 = very bad to 5 = very good)
  • Satisfaction with life (scored on a 10-point Likert scale, ranging from 1 = completely unsatisfied to 10 = completely satisfied)
  • Alcohol intake (once a day/once a week/once a month/less than once a month/never)
  • Tobacco use (never/ex-smoker/occasional smoker/regular smoker)
  • Physical activity (none/occasional/regular/intensive)
  • Body mass index (BMI), obtained from dividing the weight (in kilograms) by the square of the height (in centimetres) and categorised as low or normal weight (<25 kg/m2), overweight (25–29 kg/m2) and obesity (≥30 kg/m2) [25]
  • Hours of sleep per night (numeric)
  • Physical, mental or sensorial disability (presence/absence)
  • Chronic disease (presence/absence)
  • Health problems (none/one/two or more)
In order to make the prevalences of the healthcare professional survey comparable with those of the general population, the same categorisation and cut-off points of the Andalusian Health Survey [26] were applied for those study variables considered in both surveys, as follows: poor health ≤3 (i.e., fair, bad or very bad); dissatisfaction with life ≤6; ≥1 alcoholic drink per month; and insufficient sleep <7 h of sleep per night.

2.4. Sampling Weights

As shown in Table 1, HCPs aged 36–55 years were over-represented in the web survey sample with respect to the target population as well as to primary care HCPs. On the other hand, there was an under-representation of HCPs with a degree in nursing.
Given a volunteer survey sv, the usual estimator of the population proportion is the Horvitz–Thompson estimator given by
p h t = 1 N i ϵ s v A i   w i
where Ai = 1 if the unit i in the sample s has the desired characteristics and 0 else, and w i is the weight (the inverse of the sampling rate).
To adjust for the lack of probability sampling and the resulting non-representativeness, the sample was weighted, using the standard procedure of propensity score adjustment (PSA) for web surveys [14,15].
This approach aims to estimate the propensity of an individual to be included in the nonprobability sample by combining the data from the sample sv with a reference probability sample s r   and training a predictive model on the variable δ, with δi = 1 if i ϵ s v   and δi = 0 if i ϵ s r . PSA assumes that the selection mechanism of sv is ignorable and follows a parametric model:
P ( δ i = 1   | x i ) = π ( x i , γ )
for some function π of the observed covariates x i and a parameter γ. The usual procedure is to estimate the parameter γ by using logistic regression and to transform the estimated propensities to weights by inverting them:
p P S A 1 = 1 i ϵ s v 1 / π ( x i ) ^ i ϵ s v A i   * 1 / π ( x i ) ^
where   π ( x i ) ^   denotes the estimated propensity for the individual i ϵ s v . This transformation is equivalent to the Hajek estimator of the population proportion. An alternative that takes into account the fact that individuals of sv must be excluded from the target population of sr is the formula presented in [27]:
p P S A 2 = 1 i ϵ s v ( 1 π ( x i ) ^ ) / π ( x i ) ^ i ϵ s v A i   * ( 1 π ( x i ) ^ ) / π ( x i ) ^
We considered the following algorithms for estimating the aforementioned propensities:
  • Logistic regression
  • Decision trees (C5.0 algorithm [28])
  • The k-nearest neighbours algorithm, with k = 5 (5-NN)
  • Naïve Bayes with no Laplace smoothing
  • Random forest with 500 trees
  • Gradient boosting machine (GBM) with 100 trees, interaction depth of 1 and learning rate of 0.1
  • Feed-forward neural networks with one hidden layer, initialising weights to 0 and considering three cases with 1, 3 and 5 units in the hidden layer
In all cases, the probabilities calculated in PSA were transformed into weights for Hajek estimators, following the formula for pPSA2 stated in [27]. Weights for Horvitz–Thompson estimators were also calculated, in accordance with [15]. PSA was performed in R 3.1.5 [29] using the packages sampling [30], survey [31], C50 [32], randomForest [33], gbm [34], e1071 [35], caret [36] and nnet [37].
The weights for the Horvitz–Thompson estimators were discarded, as they were unstable and produced unacceptably high variances. In general, the Horvitz–Thompson weights, although they correlated with the Hajek weights obtained by the same methods, presented higher levels of skewness, probably caused by the grouping features of the weighting method (see Appendix A). Moreover, the weights obtained by PSA using decision trees and neural networks with five units were also discarded, as they were found to be equal to the design weights and so provided the same outputs as in the unadjusted case.

2.5. Statistical Analysis

Several weights were applied in estimating the prevalence of each of the variables considered. To reflect potential differences between male and female HCPs in these prevalence values, sex was taken as a stratification variable. The variances of the proportion estimators were calculated using the leave-one-out jackknifealgorithm [38], implemented in the bootstrap package in R [39]. Prevalence values for the study population were compared with those for the general population [26] in the same age range (22–67 years).
Multivariate ordinal logistic regression models were run to characterise the ordinal variables of life satisfaction and self-perceived health status. Sampling weights were applied in the models, which were constructed independently for male and female HCPs. In the statistical analysis, the scales for life satisfaction and self-perceived health status were inverted; thus, odds ratios (OR) >1 mean that the explanatory variable increases the probability of dissatisfaction with life or of poor self-perceived health. In addition, those reference categories of the explanatory variables which obtained a better interpretation of odds ratios (i.e., OR > 1) were chosen. The following explanatory variables were included in the models:
  • Health problems (none/one/two or more)
  • Tobacco use (never/ex-smoker/occasional smoker/regular smoker)
  • Hours of sleep per night (<7 h/≥7 h)
  • Physical activity (none/occasional/regular/intensive)
  • Body mass index (BMI), categorised as low or normal weight (<25 kg/m2), overweight (25–29 kg/m2) and obesity (≥30 kg/m2) [25]
  • Level of healthcare (Primary/ Other)
  • Age in years (numeric)
  • Degree (Medicine/Nursing/Other)
Multicollinearity of the independent variables was assessed using the variance inflation factor (VIF) [40], which indicates collinearity if the factor takes large values. The factor was discarded for VIF >3 [41]. Therefore,‘chronic diseases’ and ‘physical, mental or sensorial disability’ were not included in the final model. Alcohol consumption was also excluded because of its low association with the dependent variables of the models, which was assessed with a preliminary regression analysis where the alcohol variable was not significant and had a beta coefficient around zero. The rest of the coefficients and test statistics remained almost unchanged with respect to the case without the alcohol consumption variable.To observe the range of values in which the coefficients would be applicable to the entire population, 95% confidence intervals were calculated. Hypothesis testing of the beta coefficients was performed with the Wald test. Statistical and graphical analyses were performed in R 3.5.1 using the packages poliscidata [42] and ggplot2 [43], respectively, in addition to those mentioned above.

3. Results

3.1. Prevalence Estimations

According to results provided by PSA with logistic regression,10.3% of male HCPs (Table 2) and 12.6% of female HCPs (Table 3) were dissatisfied with their life and 8.4% of male and 7.8% of female professionals perceived their own health as poor. Regarding lifestyle habits, 62.3% of the men and 42.8% of the women drank alcohol at least once a week, while 31.1% of the men and 26.7% of the women slept for less than seven hours a day. Finally, 31.8% of the men and 22.3% of the women reported havingat least one chronic disease. Moreover, 26.3% of the men and 20.6% of the women had one health problem, 10.4% and 6%, respectively, had two or more health problems, and 7% of men and 6% of women had a disability (Table 2 and Table 3).
Figure A8 and Figure A9 of Appendix B show the 95% confidence intervals for the prevalence of each of the variables considered. All of the estimations were very similar, whichever method was applied, although some point estimates varied slightly due to the influence of certain algorithms on the propensity estimation step. In consequence, there were no statistical differences between the prevalences estimated among any of the weighting methods applied. The logistic regression algorithm obtained the best results in terms of both prevalence and variance deviations compared with no weighting adjustment (see Table A3 and Table A4 of Appendix B). As stated before, PSA contributed to increasing the variance of the estimators but reduced their bias, meaning that the estimates based in PSA might be more valuable as they mitigated the effect of non-sampling errors in the final estimates. Given that the estimates provided by PSA with different algorithms were very similar (and therefore might reduce the bias in the same amount), the choice that reduced MSE to the minimum extent might be the estimate with the lowest variance.
Table 4 shows the prevalences of the study variables for the general population [26] and the HCPs. The latter group self-reported significantly better health and greater satisfaction with life than the general population. In addition, while women in the general population reporteda significantly worse perception of their health than men (17.5% and 12.1%, respectively, reported poor health), female HCPs had a better, although non-significant, perception in this respect, compared with their male counterparts (7.8% and 8.5%, respectively). On the contrary, women reported significantly less satisfaction with their life than men, both those in the general population (19.2% vs. 16.3%, respectively) and among the HCPs (12.6% vs. 10.3%, respectively).
With respect to alcohol consumption (at least once in a month), the men in the general population and among HCPs reportedsignificantly higher prevalencesthan women. In addition, alcohol consumption was significantly more prevalent among male and female HCPs than among men and women in the general population (79.8% and 60%, 62.5% and 37.1%, respectively). Regarding hours of sleep per day, significantly more HCPs than persons in the general population slept for less than 7 h. This difference was especially marked among men (31.2% vs. 17.7%, respectively). In addition, significantly more male than female HCPs slept for less than 7 h per day (31.2% vs. 26.7%, respectively), which is contrary to the pattern observed in the general population.
The presence of chronic disease was much more prevalent among women in the general population than among female HCPs (45.3% vs. 22.3%, respectively), but no such difference was observed between the two groups of men (35.9% vs. 31.8%, respectively). The prevalence of disability was almost twice as high among HCPs as in the general population (6% vs. 3.5%, respectively). In this respect, there were no differences between men and women.

3.2. Regression Modelling

As described above, the regression modelling was performed using three types of weighting: no adjustment, PSA using logistic regression for prevalence estimation and PSA using a neural net with one unit for prevalence estimation. These weighting methods were selected taking into account the low degree of variability among them, which means that one or more could be discarded if necessary to avoid redundancy (see Appendix A for further information on the similarity among weights).
In almost every case, the strength of evidence against the explanatory variable having a null effect weakened with reweighting, not only because the variance increased (for example, with larger confidence intervals) but also when the beta coefficient shifted towards zero (or towards one; see Table 5, Table 6, Table 7 and Table 8). In other words, when reweighting was performed, it merely addressed misestimation of the association between explanatory variables, caused by the nonprobabilistic sampling method applied in the survey.
Table 5 and Table 6 depict the results for the models assessing self-perceived health, and Table 7 and Table 8 depict those concerning satisfaction with life. Figure 1 and Figure 2 illustrate the OR for self-perceived health and satisfaction with life, respectively, for male and female participants.
The strongest OR for poor self-perceived health was obtained when the respondent had one or more pre-existing health problems. Thus, the prior existence of one health problem increased the likelihood of poor health by 3 and 2 times, respectively, for men and women. In the case of two or more health problems, this probability rose to 8 and 10 times, respectively, see Table 5 and Table 6. In addition, there was evidence that the presence of obesity, according to the BMI index, was significantly associated with a lower probability of good health among women (OR = 2.1).
Regarding the type of university degree held, nursing qualifications were significantly associated with poorer self-perceived health, compared with respondents with a degree in medicine, regardless of sex (OR = 1.8), or even among women those whose degree subject was reported as neither medicine nor nursing (OR = 2). However, no significant differences in OR were observed between those who worked in primary care or other level of healthcare.
In relation to lifestyle habits, smoking every day was associated with a greater likelihood of poorer self-perceived health in women; no physical activity or only occasional activity was also associated with poorer self-perception of health, especially in men, as was sleeping less than seven hours per night.
The results obtained from the analysis of self-perceived life satisfaction are detailed in Table 7 and Table 8 and illustrated in Figure 2. As in the case of self-perceived health, the strongest negative association with life satisfaction was measured for prior health problems, and this relationship became significantly stronger for both male and female respondents as the number of pre-existing health problems increased. For men, furthermore, working in primary rather than other levels of healthcare was also associated with less life satisfaction. Another important factor was that of physical inactivity, which was also associated with lower levels of life satisfaction, especially among men, although the differences with women in this respect were not statistically significant. Thus, male and female HCPs who performed no physical activity at all were 5 and 2.5 times, respectively, more likely to have less satisfaction with life than their more physically active counterparts. With respect to tobacco consumption, women who smoked (whether every day or less frequently) were more likely to report lower levels of life satisfaction than those who had never smoked. Finally, HCPs who slept less than seven hours per night were around 1.5 and 1.8 times (for men and women, respectively) more likely to report low levels of life satisfaction than those who slept for longer, assuming all other variables remained constant.

4. Discussion

The stress of addressing the COVID-19 pandemic is having significant ill effects on HCPs’mental and physical health [44]. In consequence, the analysis of relevant data compiled before the present crisis is of crucial assistance to efforts to maintain and/or improve HCPs’well-being and to facilitate the application of more effective supportive interventions targeting policies, institutions and individuals [45]. In this regard, attention to personal welfare and service quality is of the utmost importance [46].
Regarding the methodological aspects of this study, in the analysis of nonprobability samples, any inference drawn must take into account the selection bias inherent in the sampling procedure, which in most internet surveys is equivalent to self-selection bias. Propensity score adjustment can be a useful means of overcoming the effects of this kind of bias, although additional calibration may be needed to remove the bias completely [47,48]. In our study, PSA alone produced no substantial changes in the estimates except for the effect of certain variables on the indicators of health and life satisfaction. From this, we conclude that either the original sample was sufficiently representative of the target population or the variables in question did not properly model the self-selection mechanism.
The outcomes from algorithms used to estimate prevalences, as an alternative to logistic regression, did not differ from those obtained by assigning weights to decision trees and 5-unit neural networks. In the first case, this was because the algorithm was unable to grow any branch for the tree, as it did not detect any variable enabling it to classify an individual, either in the self-selected or in the reference sample. In the second case, the feed-forward technique achieved convergence in the first iteration, and therefore no adjustment was needed (see Appendix A for further information). Either or both of these cases might reflect a lack of predictability in the covariates available for both samples. On the other hand, the Horvitz–Thompson weights, which were also obtained for each PSA performed, had to be discarded as they resulted in a higher variance of the estimators and produced unstable and misleading point estimates.
The study has several limitations that have to be pointed out. First of all, there were no available measures to assess whether the bias removal had been successful or not. It is reasonable to assume that adjustments to mitigate selection bias may have a significant effect; however, model misspecification in PSA can increase the bias of the estimates, although the logistic regression model that was used as the reference result showed a relative robustness to changes in the covariates or sample size [23]. Further studies could consider the use of estimators that ensure robustness against model misspecifications, such as the doubly robust estimator proposed in [49].
Moreover, the available covariates did not show a very different behaviour in the online sample in comparison with the full population. This can indicate that the online sample was fairly representative of the population but can also indicate that the available covariates failed to capture the differences between the sampled and the non-sampled population, which could reduce the potential of PSA to mitigate the selection bias.
It was also observed that PSA increased the variance of the estimators in comparison with the unadjusted case. As stated in Section 1, it is known that PSA can reduce the selection bias at the cost of increasing the variance because of the complexity added by the predictive models. However, the bias–variance trade-off is often positive, as the mean square error gets reduced after the application of PSA in certain situations, according to literature [11,14,15,23,24].
Our analysis shows that, although there were no significant differences between male and female HCPs regarding self-rated health and dissatisfaction with life, male personnel had significantly poorer lifestyle habits than their female counterparts, together with a higher prevalence of chronic disease, of disability and of health problems. A different tendency was observed in sleep, chronic disease and health problems when comparedwith the general population. Further research is needed in this area in order to justifyinterventionswhich encourage male HCPs to modify their lifestyle habits in order to prevent problems from spiralling through the burnout cascade stages of reduced activity, distress and despair [50].
In our survey, members of the general population reported significantly poorer health and less satisfaction with life than the HCPs consulted. Although female HCPs consumed alcohol at least once in a month in a significantly higher frequency than those in the general population, they were only half as likely to suffer chronic disease. A limitation of that result is that the quantity of consumed alcohol was not reported in the survey. Other studies have also found a lower prevalence of chronic diseases among physicians than in the general population, with similar percentages to ours, ranging from 13–44% [51,52]. Nevertheless, further detailed, up-to-date research is needed in this area.
Among HCPs, the prior existence of health problems was the factor most strongly associated with worsening self-perceived health and decreased life satisfaction, while obesity had an important negative impact on female practitioners’ self-perceived health. Our study did not include work environment, workplace characteristics and other factors such as quality of management, professional development and colleague support/team spirit. Allof those factors have a stronger positive association with HCPs’ satisfaction compared with personal and intrinsic factors [53].

5. Conclusions

For almost all of the explanatory variables, any misestimations caused by the nonprobabilistic nature of the sampling process for the online survey were corrected by reweighting. There were some differences across the estimations provided by different adjustments and estimators, although several groups of algorithms for PSA with similar behaviours could be spotted according to the weights that they provided. Horvitz–Thompson estimates had larger estimated variances, and tree-based bagging algorithms provided more skewed weights, which contributed to an increase in the variance of the estimates. The point estimates finally considered were similar, meaning that they probably removed bias to the same extent, but some adjustments presented lower variances, which made them more desirable in terms of reducing estimation error.According to our analysis, male HCPs reported poorer lifestyle habits and health conditions than their female counterparts, although men and women had similar perceptions of health and life satisfaction. All HCPs self-reported much better health conditions and life satisfaction than the general population. The prevalence of chronic disease among female HCPs was half that of the prevalence measured among the general population but that of disability among all HCPs was almost twice that of the general population. Prior health problems, sleeping for less than seven hours per night, physical inactivity and smoking (by women) were all associated with the perception of poorer health, while obesity (among women), working as a nurse or in primary healthcare (among male HCPs) were associated with less satisfaction with life. Accurate knowledge of HCPs’ self-perceived health, life satisfaction and associated factors is essential to enabling policy makers and healthcare managers to design and implement effective programmes to improve the attention paid to human resources. The study results we report can be used as a baseline for monitoring the health effects produced in HCPs by the COVID-19 pandemic and for assessing interventions to benefit the welfare of these professionals, whose current role makes them priority beneficiaries of such attention.

Author Contributions

Conceptualisation, A.C.-L.; methodology, R.F.-G., M.d.M.R. and A.C.-L.; software, R.F.-G.; validation, R.F.-G., M.d.M.R. and A.C.-L.; formal analysis, R.F.-G.; investigation, A.C.-L.; resources, R.F.-G., M.d.M.R. and A.C.-L.; data curation, A.C.-L.; writing—original draft preparation, R.F.-G., M.d.M.R. and A.C.-L.; writing—review and editing, R.F.-G., M.d.M.R. and A.C.-L.; visualisation, R.F.-G.; supervision, M.d.M.R. and A.C.-L.; project administration, M.d.M.R. and A.C.-L.; funding acquisition, M.d.M.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministerio de Ciencia e Innovación, Spain, under project PID2019-106861RB-I00/AEI/10.13039/501100011033 and, in terms of the first author, a FPU grant from the Spanish Ministry of Science, Innovation and Universities, grant number FPU17/02177.

Institutional Review Board Statement

Ethical review and approval were waived for this study, as it had an observational design with no personal data involved.

Informed Consent Statement

Study participants voluntarily enrolled in the online course approved by the Andalusian Health Quality Agency of the Junta de Andalucía (March 2014). They cannot be identified, their responses were anonymous.

Data Availability Statement

Data for the HCP sample and the Hajek and Horvitz–Thompson weights are available from the OSF home database in the link: https://osf.io/tj6a7/?view_only=7d820d6b242147cd87836100b0b4fa1a (accessed on 5 April 2021).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Descriptive statistics of weights obtained through PSA with Horvitz–Thompson weighting applying each predictive algorithm can be observed in Table A1.
Table A1. Descriptive statistics for Horvitz–Thompson weights.
Table A1. Descriptive statistics for Horvitz–Thompson weights.
Logistic RegressionC5.05-NNNaïve BayesRandom ForestGBMNeural Net (1 Unit)Neural Net (3 Units)Neural Net (5 Units)
Mean40.677.3840.6740.8815.6740.8840.6740.677.38
Std. Dev.26.220109.3933.728.0740.6533.4452.350
CV0.6402.690.821.790.990.821.290
Minimum20.197.3813.0217.527.9617.9317.87147.38
Q120.197.3813.0217.527.9617.9317.87147.38
Median31.627.3813.0238.397.9633.2430.24147.38
Q350.857.3836.854.97.964956.2564.617.38
Maximum115.897.381373.45166.92117.85231.42139.74323.557.38
MAD16.940027.75022.718.3300
IQR30.66023.7837.38031.0738.3850.620
Skewness1.63NaN11.12.583.363.671.754.12NaN
Kurtosis2.15NaN131.957.249.3214.522.2219.39NaN
It can be noticed that weights obtained using C5.0 and neural networks with 5 units in the hidden layer for propensity estimation provide constant weights as a result, equivalent to not doing any adjustment at all and using design weights. The rest of the weights move around the same values given the similarity of means (except for weights using random forest in PSA), but the variability is not the same for all of them. More precisely, variability of weights after using logistic regression is relatively smaller, as well as after the use of naïve Bayes, neural networks with 1 unit in the hidden layer or gradient boosting machines. Variability begins to be relatively high when 3 units are placed in the hidden layer in neural networks and very high when using random forest and 5-NN. In these last two cases, very significant outliers are present. All of the weightings present a high skewness, along with a high kurtosis in a majority of the cases.
Histograms and boxplots for each weighting can be observed in Figure A1 and Figure A2, where some of the patterns detected in the descriptive statistics are notorious. Positive skew is present in all weights, but although some of them are more uniform (such as weights using logistic regression in PSA), positive skew is more pronounced in others and even attributable exclusively to outliers. For example, when using GBM in PSA, most of the weights are below 80, except for only 65 of those weights (3.6% of the individuals) which take values over 220. However, the most notorious cases are those provided by random forest and 5-NN. In the case of random forest, all of the individuals have a weight of 7.96, except for 126 individuals (around 7% of the sample) that take a value of 117.85, much higher than the rest, leading to an increase of the skewness and the variability. On the other hand, weighting using 5-NN in PSA provides weights under 200 (with most of them being under 36.8, as described in Table A1), while a small subset of 11 individuals (0.6% of the sample) has a weight of almost 1400. This disposition largely increases variability, as well as skewness.
Figure A1. Histograms of Horvitz–Thompson weights.
Figure A1. Histograms of Horvitz–Thompson weights.
Mathematics 09 00791 g0a1
Figure A2. Histograms of Horvitz–Thompson weights.
Figure A2. Histograms of Horvitz–Thompson weights.
Mathematics 09 00791 g0a2
Descriptive statistics of weights obtained through PSA with Hajek weighting applied to each predictive algorithm can be observed in Table A2.
Table A2. Descriptive statistics for Hajek weights.
Table A2. Descriptive statistics for Hajek weights.
Logistic RegressionC5.05-NNNaïve BayesRandom ForestGBMNeural Net (1 Unit)Neural Net (3 Units)Neural Net (5 Units)
Mean0.000560.000560.000560.000560.000560.000560.000560.000560.00056
Std. Dev.0.0003000.000630.000400.000150.000310.000520.000630
CV0.5301.120.710.270.560.931.140
Minimum0.000220.000560.000010.000190.00000020.000110.000250.00000670.00056
Q10.000320.000560.000190.000270.00059850.000320.000270.0002170.00056
Median0.000490.000560.000330.000460.00059850.000500.000280.00030980.00056
Q30.000710.000560.000710.000640.00059850.000680.000690.00080190.00056
Maximum0.002020.000560.007100.003400.00059850.003650.003920.00482960.00056
MAD0.0002800.000270.0002800.000270.000040.0003140
IQR0.0003900.000520.0003700.000360.000420.00058490
Skewness1.42NaN3.282.28−3.262.573.514.28NaN
Kurtosis2.77NaN16.888.298.7014.0816.9125.10NaN
Weights obtained for Hajek estimators are more stable than those obtained for Horvitz–Thompson ones. In each weighting, values are around the same numbers (mean is identical in all cases), and the coefficient of variation is, in all cases, relatively low and below its counterpart for Horvitz–Thompson weights. Skewness coefficients again show that weights tend to be right-skewed, except for weighting with PSA using random forest, which provides very left-skewed values. Kurtosis coefficients are high as well, showing leptokurtic distributions.
Figure A3 and Figure A4 show histograms and boxplots for Hajek weights obtained with each algorithm in PSA. In this case, skewness appears in a smoother manner as propensities were not grouped in strata as was done with Horvitz–Thompson weights. This allows weights to be closer to the arithmetic mean, which results in the decrease in variability previously mentioned. The use of 5-NN or random forest provides the most unstable situations because of the presence of outliers.
Figure A3. Histograms of Hajek weights.
Figure A3. Histograms of Hajek weights.
Mathematics 09 00791 g0a3
Figure A4. Histograms of Hajek weights.
Figure A4. Histograms of Hajek weights.
Mathematics 09 00791 g0a4
Following one-dimensional analysis, Pearson bivariate correlations between weights were analysed. Results of correlations can be observed in Figure A5 and Figure A6.
Figure A5. Representation of Pearson correlations between weights. The darker and larger the circle, the closer the correlation is to 1 (in caseswith a blue circle) or −1 (in caseswith a red circle).
Figure A5. Representation of Pearson correlations between weights. The darker and larger the circle, the closer the correlation is to 1 (in caseswith a blue circle) or −1 (in caseswith a red circle).
Mathematics 09 00791 g0a5
Figure A6. Pearson’s bivariate correlations between weights.
Figure A6. Pearson’s bivariate correlations between weights.
Mathematics 09 00791 g0a6
It is noticeable how correlations are generally positive and relatively high except for two cases: Horvitz–Thompson weighting using 5-NN in PSA and using random forest. In the former case, correlations with the rest of weights are positive but weaker than the rest of the cases (it only shows a slightly stronger relationship when the same algorithm is used but weights are developed for Hajek estimator instead). The random forest case is more remarkable: correlations with any other set of weights are very low, except with Hajek weights using the same algorithm where the correlation is highly negative. It is likely that this lack of correspondence is caused by the propensities estimated by the random forest algorithm, which assigns probabilities very close to the limits 0 and 1, and therefore correlation depends almost exclusively on the few individuals that have been assigned probabilities far from those limits.
In order to better visualise the existent relationships between weights, the correlation matrix was used as an input for multidimensional scaling (MDS) in two dimensions, which explains 89.65% of the total variance. Results of the analysis can be observed in Figure A7.
Figure A7. Multidimensional scaling for two dimensions of the correlations between weights.
Figure A7. Multidimensional scaling for two dimensions of the correlations between weights.
Mathematics 09 00791 g0a7
Thanks to the scaling, the existence of two differentiated groups can be noted: the group composed of weights obtained using PSA with logistic regression, GBM and naïve Bayes and another group composed of those obtained with neural networks and 5-NN (for Hajek estimators). For 5-NN, if Horvitz–Thompson weighting is used, weights separate from the groups previously mentioned but are closer to the second group than to the first one. Weights obtained with PSA using random forest are very separated from the rest of the weights, no matter which estimator weights were developed for.

Appendix B

Table A3. Point estimate, variance and difference from the non-adjusted case of estimators of prevalence in male HCPs for each propensity score adjustment (PSA) (algorithms are sorted from the least to the most complex).
Table A3. Point estimate, variance and difference from the non-adjusted case of estimators of prevalence in male HCPs for each propensity score adjustment (PSA) (algorithms are sorted from the least to the most complex).
Algorithm Used in PSAPoor Self-Perceived HealthDissatisfied with Life (Score of 6 or Less)
EstimateVarianceDiff. From No Adj. (%)EstimateVarianceDiff. From No Adj. (%)
No adjustment0.0880.00014EstimateVariance0.10020.00016EstimateVariance
Logistic regression0.0840.00016−4.34%17%0.10310.000232.93%45%
5-NN0.0860.00029−2.29%103%0.10190.000411.68%159%
Naïve Bayes0.0810.00017−8.24%21%0.10490.000314.67%98%
Random Forest0.0870.00015−1.12%8%0.10260.000182.38%11%
GBM0.0820.00016−6.12%11%0.09650.00020−3.68%28%
Neural net (1 unit)0.0870.00023−0.58%62%0.10900.000438.84%174%
Neural net (3 units)0.0860.00025−1.77%76%0.11900.0006118.75%285%
Algorithm used in PSAAlcohol once a week<7 h of sleep
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.62320.00041EstimateVariance0.30930.00038EstimateVariance
Logistic regression0.62340.000530.02%29%0.31180.000490.82%30%
5-NN0.59400.00095−4.69%129%0.32520.000835.12%121%
Naïve Bayes0.62400.000660.12%59%0.30550.00059−1.25%56%
Random Forest0.61450.00046−1.40%10%0.31360.000411.40%10%
GBM0.61070.00058−2.02%40%0.30340.00048−1.91%27%
Neural net (1 unit)0.60040.00085−3.66%106%0.33950.000859.76%126%
Neural net (3 units)0.59420.00109−4.67%163%0.36090.0011416.69%204%
Algorithm used in PSADisability (physical. mental or sensorial)Chronic disease
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.06450.00011EstimateVariance0.33690.00040EstimateVariance
Logistic regression0.06950.000167.74%46%0.31790.00048−5.63%19%
5-NN0.05870.00020−8.96%81%0.32800.00082−2.66%104%
Naïve Bayes0.06880.000176.64%54%0.30650.00055−9.03%37%
Random Forest0.05740.00011−10.98%−3%0.34120.000441.27%10%
GBM0.07070.000169.51%45%0.32110.00050−4.70%26%
Neural net (1 unit)0.05840.00015−9.42%43%0.30650.00065−9.03%63%
Neural net (3 units)0.05060.00013−21.56%16%0.29740.00077−11.73%91%
Algorithm used in PSAOne health problemTwo or more health problems
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.27420.00036EstimateVariance0.10720.00017EstimateVariance
Logistic regression0.26300.00044−4.09%22%0.10370.00019−3.23%13%
5-NN0.24870.00067−9.30%89%0.11580.000388.06%128%
Naïve Bayes0.25270.00048−7.83%35%0.10030.00020−6.43%21%
Random Forest0.26840.00038−2.12%8%0.10840.000191.12%10%
GBM0.26340.00045−3.95%26%0.10590.00020−1.23%20%
Neural net (1 unit)0.23610.00054−13.90%51%0.10480.00024−2.22%41%
Neural net (3 units)0.22350.00062−18.49%74%0.10440.00025−2.58%51%
Table A4. Point estimate, variance and difference from the non-adjusted case of estimators of prevalence in female HCPs for each propensity score adjustment (PSA) (algorithms are sorted from the least to the most complex).
Table A4. Point estimate, variance and difference from the non-adjusted case of estimators of prevalence in female HCPs for each propensity score adjustment (PSA) (algorithms are sorted from the least to the most complex).
Algorithm Used in PSAPoor Self-Perceived HealthDissatisfied with Life (Score of 6 or Less)
EstimateVarianceDiff. from No Adj. (%)EstimateVarianceDiff. from No Adj. (%)
No adjustment0.08390.00006EstimateVariance0.12050.00009EstimateVariance
Logistic regression0.07840.00007−6.49%15%0.12610.000124.61%39%
5-NN0.05970.00006−28.78%−3%0.12340.000222.41%158%
Naïve Bayes0.07740.00009−7.71%41%0.12700.000155.34%68%
Random Forest0.08330.00007−0.74%7%0.11830.00009−1.82%6%
GBM0.07530.00006−10.22%3%0.12610.000134.62%45%
Neural net (1 unit)0.07200.00008−14.21%29%0.12700.000195.36%114%
Neural net (3 units)0.06380.00007−23.90%6%0.12920.000257.16%187%
Algorithm used in PSAAlcohol once a week<7 h of sleep
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.42230.00020EstimateVariance0.26710.00016EstimateVariance
Logistic regression0.42750.000261.23%30%0.26700.00021−0.03%29%
5-NN0.44510.000485.42%139%0.25740.00038−3.65%138%
Naïve Bayes0.42770.000311.28%56%0.26070.00023−2.40%43%
Random Forest0.42390.000210.38%8%0.26710.000170.02%8%
GBM0.42510.000260.67%33%0.25990.00020−2.70%23%
Neural net (1 unit)0.42810.000391.39%95%0.25470.00028−4.64%79%
Neural net (3 units)0.42270.000490.12%144%0.25030.00034−6.27%113%
Algorithm used in PSADisability (physical. mental or sensorial)Chronic disease
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.06280.00005EstimateVariance0.22300.00014EstimateVariance
Logistic regression0.06020.00006−4.14%23%0.22280.00019−0.05%29%
5-NN0.05830.00010−7.10%114%0.23530.000365.55%151%
Naïve Bayes0.06050.00008−3.67%64%0.22240.00022−0.27%54%
Random Forest0.06120.00005−2.44%5%0.22190.00015−0.48%7%
GBM0.06180.00008−1.51%56%0.22410.000200.52%37%
Neural net (1 unit)0.05810.00008−7.46%74%0.22530.000291.05%99%
Neural net (3 units)0.06270.00014−0.06%183%0.22730.000381.94%162%
Algorithm used in PSAOne health problemTwo or more health problems
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.21220.00014EstimateVariance0.05620.00004EstimateVariance
Logistic regression0.20560.00017−3.13%22%0.06010.000066.95%50%
5-NN0.21640.000321.95%133%0.05660.000110.65%150%
Naïve Bayes0.20130.00019−5.16%39%0.06160.000089.63%96%
Random Forest0.21360.000150.66%8%0.05420.00004−3.58%3%
GBM0.20470.00017−3.56%21%0.06070.000088.08%81%
Neural net (1 unit)0.20950.00025−1.26%83%0.05680.000101.17%123%
Neural net (3 units)0.21520.000341.41%150%0.05750.000132.26%205%
Figure A8. The 95% confidence intervals for the prevalence of variables related to self-perceived health and lifestyle satisfaction among male HCPs, according to the algorithms used in the propensity score adjustment (facets are sorted by confidence interval values in order to obtain common y axis limits in each row).
Figure A8. The 95% confidence intervals for the prevalence of variables related to self-perceived health and lifestyle satisfaction among male HCPs, according to the algorithms used in the propensity score adjustment (facets are sorted by confidence interval values in order to obtain common y axis limits in each row).
Mathematics 09 00791 g0a8
Figure A9. The 95% confidence intervals for the prevalence of variables related to self-perceived health and lifestyle satisfaction among female HCPs, according to the algorithms used in the propensity score adjustment (facets are sorted by confidence interval values in order to obtain common y axis limits in each row).
Figure A9. The 95% confidence intervals for the prevalence of variables related to self-perceived health and lifestyle satisfaction among female HCPs, according to the algorithms used in the propensity score adjustment (facets are sorted by confidence interval values in order to obtain common y axis limits in each row).
Mathematics 09 00791 g0a9

References

  1. Parsa-Parsi, R.W. The revised Declaration of Geneva: A modern day physician’s pledge. J. Am. Med. Assoc. 2017, 318, 1971–1972. [Google Scholar] [CrossRef] [PubMed]
  2. Hall, L.H.; Johnson, J.; Watt, I.; Tsipa, A.; O’Connor, D.B. Healthcare staff wellbeing, burnout, and patient safety: A systematic review. PLoS ONE 2016, 11, 1–12. [Google Scholar] [CrossRef] [PubMed]
  3. Jadad, A.R.; Jadad Garcia, T.M. From a digital bottle: A message to ourselves in 2039 2. J. Med. Internet Res. 2019, 21, e16274. [Google Scholar] [CrossRef]
  4. Albuquerque, J.; Tulk, S. Physician suicide. Can. Med. Assoc. J. 2019, 191, E505. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Mumba, M.; Kraemer, K. Substance use disorders among nurses in medical-surgical, long-term care, and outpatient services. Medsurg. Nurs. 2019, 28, 118. [Google Scholar]
  6. Angie, C.C.; Leung, T. Substance use disorders. In The Art and Science of Physician Wellbeing: A Handbook for Physicians and Trainees; Weiss Roberts, L., Trockel, M., Eds.; Springer International Publishing: Berlin, Germany, 2019. [Google Scholar]
  7. World Medical Association. WMA Statement on Physicians Well-Being. Adopted by the 66th WMA General Assembly, Moscow, 2015. Available online: https://www.wma.net/policies-post/wma-statement-on-physicians-well-being/ (accessed on 27 November 2019).
  8. Howell, R.T.; Rodzon, K.S.; Kurai, M.; Sánchez, A.M. A validation of well-being and happiness surveys for administration via the Internet. Behav. Res. Methods 2010, 42, 775. [Google Scholar] [CrossRef] [PubMed]
  9. Ekman, A.; Klint, A.; Dickman, P.W.; Adami, H.; Litton, J. Optimizing the design of web-based questionnaires—Experience from a population-based study among 50,000 women. Eur. J. Epidemiol. 2007, 22, 293. [Google Scholar] [CrossRef] [PubMed]
  10. Beaumont, J.F.; Rao, J.N.K. Pitfalls of making inferences from non-probability samples: Can data integration through probability samples provide remedies? Surv. Stat. 2021, 83, 11–22. [Google Scholar]
  11. Castro-Martín, L.; Rueda, M.; Ferri-García, R. Combining Statistical Matching and Propensity Score Adjustment for inference from non-probability surveys. J. Comput. Appl. Math. 2021, 113414. [Google Scholar] [CrossRef]
  12. Erens, B.; Burkill, S.; Couper, M.P.; Conrad, F.; Clifton, S.; Tanton, C.; Phelps, A.; Datta, J.; Mercer, C.H.; Sonnenberg, P.; et al. Nonprobability Web surveys to measure sexual behaviors and attitudes in the general population: A comparison with a probability sample interview survey. J. Med. Internet Res. 2014, 16, e276, PMCID:PMC4275497. [Google Scholar] [CrossRef] [PubMed]
  13. Rosenbaum, P.R.; Rubin, D.B. The central role of the propensity score in observational studies for causal effects. Biometrika 1983, 70, 41–55. [Google Scholar] [CrossRef]
  14. Lee, S. Propensity score adjustment as a weighting scheme for volunteer panel web surveys. J. Off. Stat. 2006, 22, 329–349. [Google Scholar]
  15. Lee, S.; Valliant, R. Estimation for volunteer panel web surveys using propensity score adjustment and calibration adjustment. Sociol. Method Res. 2009, 37, 319–343. [Google Scholar] [CrossRef]
  16. Copas, A.; Burkill, S.; Conrad, F.; Couper, M.P.; Erens, B. An evaluation of whether propensity score adjustment can remove the self-selection bias inherent to web panel surveys addressing sensitive health behaviours. BMC Med. Res. Methodol. 2020, 20, 1–10. [Google Scholar] [CrossRef] [PubMed]
  17. Flouris, A.D.; Duffy, J. Applications of artificial intelligence systems in the analysis of epidemiological data. Eur. J. Epidemiol. 2006, 21, 167–170. [Google Scholar] [CrossRef] [PubMed]
  18. Keil, A.P.; Edwards, J.K. You are smarter than you think: (super) machine learning in context. Eur. J. Epidemiol. 2018, 33, 437–440. [Google Scholar] [CrossRef] [PubMed]
  19. Naimi, A.I.; Balzer, L.B. Stacked generalization: An introduction to super learning. Eur. J. Epidemiol. 2018, 33, 459–464. [Google Scholar] [CrossRef] [PubMed]
  20. Bentley, R.; Baker, E.; Simons, K.; Simpson, J.A.; Blakely, T. The impact of social housing on mental health: Longitudinal analyses using marginal structural models and machine learning-generated weights. Int. J. Epidemiol. 2018, 1414–1422. [Google Scholar] [CrossRef]
  21. Mayr, A.; Weinhold, L.; Hofner, B.; Titze, S.; Gefeller, O.; Schmid, M. The betaboost package—A software tool for modelling bounded outcome variables in potentially high-dimensional epidemiological data. Int. J. Epidemiol. 2018, 1383–1388. [Google Scholar] [CrossRef] [Green Version]
  22. Torres, J.M.; Rudolph, K.E.; Sofrygin, O.; Glymour, M.M.; Wong, R. Longitudinal associations between having an adult child migrant and depressive symptoms among older adults in the Mexican Health and Aging Study. Int. J. Epidemiol. 2018, 1432–1442. [Google Scholar] [CrossRef] [PubMed]
  23. Ferri-García, R.; Rueda, M.D.M. Propensity score adjustment using machine learning classification algorithms to control selection bias in online surveys. PLoS ONE 2020, 15, e0231500. [Google Scholar] [CrossRef] [PubMed]
  24. Castro-Martín, L.; Rueda, M.D.M.; Ferri-García, R. Inference from non-probability surveys with statistical matching and propensity score adjustment using modern prediction techniques. Mathematics 2020, 8, 879. [Google Scholar] [CrossRef]
  25. WHO. Body Mass Classification; World Health Organization: Geneva, Switzerland, 2015; Available online: https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight (accessed on 1 April 2021).
  26. Cabrera-León, A.; Cantero-Braojos, M.; Garcia-Fernandez, L.; de Hoyos Guerra, J.A. Living with disabling chronic pain: Results from a face-to-face cross-sectional population-based study. BMJ Open 2018, 8, e020913. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Schonlau, M.; Couper, M. Options for conducting web surveys. Stat. Sci. 2017, 32, 279–292. [Google Scholar] [CrossRef]
  28. Quinlan, R. C4.5: Programs for Machine Learning; Morgan Kaufmann: San Francisco, CA, USA, 1993. [Google Scholar]
  29. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018; Available online: https://www.R-project.org/ (accessed on 1 April 2021).
  30. Tillé, Y.; Matei, A. Sampling: Survey Sampling; R Package Version 2.7; R Foundation for Statistical Computing: Vienna, Austria, 2015; Available online: http://CRAN.R-project.org/package=sampling (accessed on 1 April 2021).
  31. Lumley, T. Survey: Analysis of Complex Survey Samples, R package version 3.30; R Foundation for Statistical Computing: Vienna, Austria, 2014. [Google Scholar]
  32. Kuhn, M.; Quinlan, R. C50: C5.0 Decision Trees and Rule-Based Models. R Foundation for Statistical Computing: Vienna, Austria, 2018. Available online: https://CRAN.R-project.org/package=C50 (accessed on 1 April 2021).
  33. Liaw, A.; Wiener, M. Classification and regression by random forest. R News 2002, 2, 18–22. [Google Scholar]
  34. Greenwell, B.; Boehmke, B.; Cunningham, J.; Developers, G. Package ‘gbm’. R Foundation for Statistical Computing: Vienna, Austria, 2018. Available online: https://cran.r-project.org/web/packages/gbm/gbm.pdf (accessed on 1 April 2021).
  35. Meyer, D.; Dimitriadou, E.; Hornik, K.; Weingessel, A.; Leisch, F. e1071: Misc Functions of the Department of Statistics (e1071); R Foundation for Statistical Computing: Vienna, Austria, 2018; Available online: https://cran.rproject.org/web/packages/e1071 (accessed on 1 April 2021).
  36. Kuhn, M. Caret: Classification and Regression Training. Available online: https://cran.rproject.org/web/packages/caret/index.html (accessed on 1 April 2021).
  37. Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S, 4th ed.; Springer: New York, NY, USA, 2002; ISBN 0-387-95457-0. [Google Scholar]
  38. Quenouille, M.H. Notes on bias in estimation. Biometrika 1956, 43, 353–360. [Google Scholar] [CrossRef]
  39. Tibshirani, R.; Leisch, F. Bootstrap: Functions for the Book “An Introduction to the Bootstrap”. 2017. Available online: https://cran.r-project.org/web/packages/bootstrap/index.html (accessed on 1 April 2021).
  40. Stine, R.A. Graphical interpretation of variance inflation factors. Am. Stat. 1995, 49, 53–56. [Google Scholar]
  41. Hair, J.F.; Black, W.C.; Babin, B.; Anderson, R.E. Multivariate Data Analysis, 7th ed.; Prentice Hall: Hoboken, NJ, USA, 2010. [Google Scholar]
  42. Edwards, B.; Pollock, P. Poliscidata: Datasets and Functions Featured in Pollock and Edwards, An R Companion to Essentials of Political Analysis. 2018. Available online: https://CRAN.R-project.org/package=poliscidata (accessed on 1 April 2021).
  43. Wickham, H. ggplot2—Elegant Graphics for Data Analysis, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2016; Available online: https://github.com/hadley/ggplot2-book (accessed on 1 April 2021).
  44. Lim, R.; Aarsen, K.; Van Aarsen, K.; Gray, S.; Rang, L.; Fitzpatrick, J.; Fischer, L. Emergency medicine physician burnout and wellness in Canada before COVID19: A national survey. Can. J. Emerg. Med. 2020, 22, 603–607. [Google Scholar] [CrossRef] [PubMed]
  45. Khan, A.; Vinson, A.E. Physician well-being in practice. Anesth. Analg. 2020, 131, 1359–1369. [Google Scholar] [CrossRef] [PubMed]
  46. López-Cabarcos, M.Á.; López-Carballeira, A.; Ferro-Soto, C. New ways of working and public healthcare professionals’ well-being: The response to face the covid-19 pandemic. Sustainability 2020, 12, 8087. [Google Scholar] [CrossRef]
  47. Valliant, R.; Dever, J.A. Estimating propensity adjustments for volunteer web surveys. Sociol. Method Res. 2011, 40, 105–137. [Google Scholar] [CrossRef]
  48. Ferri-García, R.; Rueda, M. Efficiency of propensity score adjustment and calibration on the estimation from non-probabilistic online surveys. SORT Stat. Oper. Res.Trans. 2018, 42, 159–182. [Google Scholar]
  49. Chen, Y.; Li, P.; Wu, C. Doubly robust inference with nonprobability survey samples. J. Am. Stat. Assoc. 2020, 115, 2011–2021. [Google Scholar] [CrossRef] [Green Version]
  50. Weber, A.; Jaekel-Reinhard, A. Burnout syndrome: A disease of modern societies? Occup. Med. 2000, 50, 512–517. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  51. Campbell, S.; Delva, D. Physician do not heal thyself. Survey of personal health practices among medical residents. Can. Fam. Phys. 2003, 49, 1121–1127. [Google Scholar]
  52. Robert Koch-Institut. Gesundheitstrends Bei Erwachseneninin Deutschland Zwischen 2003 und 2012. In DatenundFakten: Ergebnisse der Studie ‘Gesundheit in Deutschland Aktuell 2012’; Beiträge zur Gesundheitsberichterstattung des Bundes; Robert Koch-Institut, Ed.; RKI: Berlin, Germany, 2014; pp. 13–33. [Google Scholar]
  53. Domagała, A.; Bała, M.M.; Storman, D.; Peña-Sánchez, J.N.; Świerz, M.J.; Kaczmarczyk, M.; Storman, M. Factors associated with satisfaction of hospital physicians: A systematic review on european data. Int. J. Environ. Res. Public Health 2018, 15, 2546. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Confidence intervals at 95% for the odds ratio for each explanatory variable on self-perception of health, using logistic regression for the propensity score adjustment. Reference classes for categorical variables: no health problems, never smoked, ≥7 h of sleep, physical exercise several days a week, normal weight or underweight, other level of healthcare and degree in medicine. The x axis scale is logarithmic to facilitate interpretation of the data.
Figure 1. Confidence intervals at 95% for the odds ratio for each explanatory variable on self-perception of health, using logistic regression for the propensity score adjustment. Reference classes for categorical variables: no health problems, never smoked, ≥7 h of sleep, physical exercise several days a week, normal weight or underweight, other level of healthcare and degree in medicine. The x axis scale is logarithmic to facilitate interpretation of the data.
Mathematics 09 00791 g001
Figure 2. Confidence intervals at 95% for the odds ratio for each explanatory variable on self-perceived life satisfaction after applying logistic regression to the propensity score adjustment. The following reference classes are assumed for the qualitative variables: no health problems, never smoked, seven or more hours of sleep per night, physical exercise several days a week, normal weight or underweight, working in other level of healthcare and holding a degree in medicine. The x axis scale is logarithmic to facilitate interpretation of the data.
Figure 2. Confidence intervals at 95% for the odds ratio for each explanatory variable on self-perceived life satisfaction after applying logistic regression to the propensity score adjustment. The following reference classes are assumed for the qualitative variables: no health problems, never smoked, seven or more hours of sleep per night, physical exercise several days a week, normal weight or underweight, working in other level of healthcare and holding a degree in medicine. The x axis scale is logarithmic to facilitate interpretation of the data.
Mathematics 09 00791 g002
Table 1. Variables present in both datasets.
Table 1. Variables present in both datasets.
VariableWeb Survey (%)Census (%)
Sex
Male31.6633.12
Female68.3466.88
Age1
≤25 1.112.16
26–35 21.0926.96
36–4534.7825.74
46–55 32.7824.50
>55 10.2420.65
Healthcare area
Specialised50.9766.74
Primary49.0333.26
Degree subject area
Medicine43.8040.44
Nursing44.4652.86
Other11.746.69
Valid samplen = 1797n = 73,465
1 Age data were not available for 383 individuals (0.52%) in the census data.
Table 2. Point estimate, variance and difference from the non-adjusted case of estimators of prevalence in male healthcare professionals (HCPs)for each propensity score adjustment (PSA) (algorithms are sorted from the least to the most complex).
Table 2. Point estimate, variance and difference from the non-adjusted case of estimators of prevalence in male healthcare professionals (HCPs)for each propensity score adjustment (PSA) (algorithms are sorted from the least to the most complex).
Algorithm Used in PSAPoor Self-Perceived HealthDissatisfied with Life (Score of 6 or Less)
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.0880.00014EstimateVariance0.10020.00016EstimateVariance
Logistic regression0.0840.00016−4.34%17%0.10310.000232.93%45%
Neural net (1 unit)0.0870.00023−0.58%62%0.10900.000438.84%174%
Algorithm used in PSAAlcohol once a week<7 h of sleep
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.62320.00041EstimateVariance0.30930.00038EstimateVariance
Logistic regression0.62340.000530.02%29%0.31180.000490.82%30%
Neural net (1 unit)0.60040.00085−3.66%106%0.33950.000859.76%126%
Algorithm used in PSADisability (physical. mental or sensorial)Chronic disease
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.06450.00011EstimateVariance0.33690.00040EstimateVariance
Logistic regression0.06950.000167.74%46%0.31790.00048−5.63%19%
Neural net (1 unit)0.05840.00015−9.42%43%0.30650.00065−9.03%63%
Algorithm used in PSAOne health problemTwo or more health problems
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.27420.00036EstimateVariance0.10720.00017EstimateVariance
Logistic regression0.26300.00044−4.09%22%0.10370.00019−3.23%13%
Neural net (1 unit)0.23610.00054−13.90%51%0.10480.00024−2.22%41%
Table 3. Point estimate, variance and difference from the non-adjusted case of estimators of prevalence in female HCPs for each propensity score adjustment (PSA) (algorithms are sorted from the least to the most complex).
Table 3. Point estimate, variance and difference from the non-adjusted case of estimators of prevalence in female HCPs for each propensity score adjustment (PSA) (algorithms are sorted from the least to the most complex).
Algorithm Used in PSAPoor Self-Perceived HealthDissatisfied with Life (Score of 6 or Less)
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.08390.00006EstimateVariance0.12050.00009EstimateVariance
Logistic regression0.07840.00007−6.49%15%0.12610.000124.61%39%
Neural net (1 unit)0.07200.00008−14.21%29%0.12700.000195.36%114%
Algorithm used in PSAAlcohol once a week<7 h of sleep
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.42230.00020EstimateVariance0.26710.00016EstimateVariance
Logistic regression0.42750.000261.23%30%0.26700.00021−0.03%29%
Neural net (1 unit)0.42810.000391.39%95%0.25470.00028−4.64%79%
Algorithm used in PSADisability (physical. mental or sensorial)Chronic disease
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.06280.00005EstimateVariance0.22300.00014EstimateVariance
Logistic regression0.06020.00006−4.14%23%0.22280.00019−0.05%29%
Neural net (1 unit)0.05810.00008−7.46%74%0.22530.000291.05%99%
Algorithm used in PSAOne health problemTwo or more health problems
EstimateVarianceDiff. from no adj. (%)EstimateVarianceDiff. from no adj. (%)
No adjustment0.21220.00014EstimateVariance0.05620.00004EstimateVariance
Logistic regression0.20560.00017−3.13%22%0.06010.000066.95%50%
Neural net (1 unit)0.20950.00025−1.26%83%0.05680.000101.17%123%
Table 4. Prevalence of the study variables in the general population [26] and in healthcare professionals according to survey data (Andalusia).
Table 4. Prevalence of the study variables in the general population [26] and in healthcare professionals according to survey data (Andalusia).
Study VariablesGeneral PopulationHealthcare Professionals (Weighted with Propensity Score Adjustment Using Logistic Regression)
%95% CI%95% CI
Poor self-perceived health (fair/bad/very bad) in the last 12 monthsTotal14.8(13.5; 16)8.1(6.7; 9.5)
Men12.1(10.6; 14)8.4(5.9; 1.,9)
Women17.5(15.6; 19)7.8(6.2; 9.5)
Dissatisfied with life (6 or less on a scale from 1 to 10)Total17.8(16.2; 20)10.7(9.2; 12.3)
Men16.3(14.6; 18)10.3(7.3; 13.3)
Women19.2(17.1; 21)12.6(10.5; 14.8)
Alcohol consumption (at least once in a month)Total49.5(47; 52)66.4(63.9; 68.8)
Men62.5(59.9; 65)79.8(76.1; 83.5)
Women37.1(33.7; 41)60.0(56.9; 63.1)
Less than 7 h of sleepTotal20(17.8; 22)27.9(25.6; 30.3)
Men17.7(15.3; 20)31.2(26.8; 35.5)
Women22.1(19.7; 25)26.7(23.9; 29.5)
Presence of a chronic diseaseTotal40.7(38.6; 43)26.6(24.2; 28.9)
Men35.9(33.6; 38)31.8(27.5; 36.1)
Women45.3(42.7; 48)22.3(19.6; 25)
Physical, mental or sensorial disabilityTotal3.54(2.94; 4)6.0(4.8; 7.2)
Men3.95(3.16; 5)7.0(4.5; 9.4)
Women3.16(2.45; 4)6.0(4.5; 7.5)
Table 5. Regression models for poorer self-perceived health among men according to each weighting adjustment method. Reference classes for categorical variables: no health problems, never smoked, ≥7 h of sleep, physical exercise several days a week, normal weight or underweight, working in a specialised field of healthcare and degree in medicine (n = 558 observations, Nagelkerke R2 = 0.281).
Table 5. Regression models for poorer self-perceived health among men according to each weighting adjustment method. Reference classes for categorical variables: no health problems, never smoked, ≥7 h of sleep, physical exercise several days a week, normal weight or underweight, working in a specialised field of healthcare and degree in medicine (n = 558 observations, Nagelkerke R2 = 0.281).
No PSA AdjustmentPSA with Logistic RegressionPSA with Neural Net
(1 Unit)
PredictorsOdds ratio95% CIOdds ratio95% CIOdds ratio95% CI
1|2 intercept7.442.71–20.49.433.26–27.310.152.90–35.5
2|3 intercept279.11249–313331.25289–380314.56264–375
3|4 intercept2411.01733–33542979.32078–42733151.02100–4728
4|5 intercept5792.52086–16,0857636.52635–22,1326489.21890–22,276
One health problem3.232.14–4.862.821.78–4.452.591.53–4.38
Two or more health problems8.314.11–16.87.242.99–17.67.181.79–28.9
Daily smoker1.300.66–2.581.460.72–2.961.430.67–3.05
Non-daily smoker0.450.18–1.120.390.16–1.000.190.07–0.50
Ex-smoker0.880.59–1.310.850.54–1.330.600.34–1.05
<7 h of sleep1.781.23–2.591.831.19–2.811.951.17–3.26
No physical activity at all2.941.36–6.352.761.02–7.431.980.32–12.3
Occasional physical activity1.601.03–2.461.651.01–2.691.781–3.17
Regular physical activity1.360.84–2.191.360.81–2.281.450.78–2.71
Obesity1.390.78–2.491.400.71–2.771.780.81–3.95
Overweight1.501.01–2.221.500.96–2.361.600.93–2.75
Age (5 years)1.141.02–1.271.161.02–1.311.171.00–1.36
Primary care1.160.77–1.751.240.78–1.981.370.77–2.44
Nursing degree1.911.33–2.741.851.25–2.761.861.20–2.89
Other degree0.920.46–1.871.070.51–2.271.170.51–2.67
Table 6. Regression models for poorer self-perceived health among women according to each weighting adjustment method. Reference classes for categorical variables: no health problems, never smoked, ≥7 h of sleep, physical exercise several days a week, normal weight or underweight, working in other level of healthcare and degree in medicine (n = 1211 observations, Nagelkerke R2 = 0.23).
Table 6. Regression models for poorer self-perceived health among women according to each weighting adjustment method. Reference classes for categorical variables: no health problems, never smoked, ≥7 h of sleep, physical exercise several days a week, normal weight or underweight, working in other level of healthcare and degree in medicine (n = 1211 observations, Nagelkerke R2 = 0.23).
No PSA AdjustmentPSA with Logistic RegressionPSA with Neural Net
(1 Unit)
PredictorsOdds ratio95% CIOdds ratio95% CIOdds ratio95% CI
1|2 intercept6.963.65–13.26.73.24–13.87.22.97–17.5
2|3 intercept252.80234–273242.22222–264253.71230–280
3|4 intercept2705.62141–34192481.21886–32642252.81643–3088
4|5 intercept6655.23093–14,3195758.52337–14,1914897.81816–13,210
One health problem2.271.64–3.141.901.33–2.721.761.15–2.70
Two or more health problems10.816.22–18.810.255.32–19.810.154.91–21.0
Daily smoker1.541.07–2.231.641.10–2.451.601.02–2.51
Non-daily smoker1.560.98–2.511.590.96–2.641.460.83–2.59
Ex-smoker0.930.71–1.220.960.71–1.290.990.70–1.41
<7 h of sleep1.270.97–1.651.461.09–1.971.531.10–2.13
No physical activity at all1.941.25–3.001.540.93–2.551.480.82–2.65
Occasional physical activity1.501.13–2.001.431.04–1.971.471.02–2.11
Regular physical activity1.170.842–1.641.130.78–1.651.100.72–1.69
Obesity2.141.23–3.722.101.10–4.021.840.81–4.20
Overweight1.391.04–1.851.270.91–1.771.160.81–1.67
Age (5 years)1.191.11–1.281.181.09–1.281.191.07–1.32
Primary care1.240.95–1.601.210.92–1.591.290.98–1.70
Nursing degree1.671.29–2.161.781.33–2.381.871.36–2.56
Other degree1.931.30–2.881.991.31–3.032.201.45–3.33
Table 7. Regression models for poorer self-perceived life satisfaction among men according to each weighting adjustment method. Reference classes for categorical variables: no health problems, never smoked, ≥7 h of sleep, physical exercise several days a week, normal weight or underweight, other level of healthcare and degree in medicine (n = 558 observations, Nagelkerke R2 =0.266).
Table 7. Regression models for poorer self-perceived life satisfaction among men according to each weighting adjustment method. Reference classes for categorical variables: no health problems, never smoked, ≥7 h of sleep, physical exercise several days a week, normal weight or underweight, other level of healthcare and degree in medicine (n = 558 observations, Nagelkerke R2 =0.266).
No PSA AdjustmentPSA with Logistic RegressionPSA with Neural Net
(1 Unit)
PredictorsOddsratio95% CIOdds ratio95% CIOdds ratio95% CI
1|2 intercept0.350.14–0.870.210.08–0.570.240.07–0.80
2|3 intercept3.052.60–3.582.432.05–2.893.252.69–3.93
3|4 intercept18.5416.4–20.915.0113.1–17.120.2417.4–23.6
4|5 intercept91.9577.6–10975.9463.1–91.4100.3180.4–125
5|6 intercept227.57162–320180.36121–269255.27156–418
6|7 intercept509.65296–877442.63245–799577.23300–1111
7|8 intercept1597.5776–32901938.1975–38522483.31187–5196
8|9 intercept1597.6612–41752281.3838–62122919.11047–8136
9|10 intercept3223.0790–13,1654045.2853–19,1814846.0993–23,640
One health problem2.601.79–3.772.581.70–3.912.651.67–4.20
Two or more health problems3.982.13–7.444.442.18–9.043.651.38–9.70
Daily smoker1.530.84–2.761.430.74–2.751.410.69–2.89
Non-daily smoker0.910.41–2.030.940.43–2.070.740.29–1.89
Ex-smoker0.810.57–1.160.820.55–1.210.650.40–1.07
<7 h of sleep1.511.07–2.141.691.13–2.531.871.15–3.05
No physical activity at all5.102.73–9.514.392.09–9.263.691.26–10.8
Occasional physical activity1.951.29–2.962.031.27–3.232.091.21–3.61
Regular physical activity1.941.30–2.901.841.18–2.891.941.12–3.35
Obesity0.990.58–1.700.920.50–1.701.030.52–2.02
Overweight1.200.83–1.731.020.69–1.521.070.68–1.68
Age (5 years)1.080.98–1.191.060.96–1.181.110.98–1.27
Primary care1.431.00–2.051.541.03–2.301.420.89–2.25
Nursing degree1.000.71–1.410.840.57–1.250.750.49–1.15
Other degree0.900.51–1.600.770.42–1.430.820.42–1.58
Table 8. Regression models for poorer self-perceived life satisfaction among womenaccording to each weighting adjustment method. Reference classes for categorical variables: no health problems, never smoked, ≥7 h of sleep, physical exercise several days a week, normal weight or underweight, other level of healthcare and degree in medicine (n = 1211 observations, Nagelkerke R2 = 0.159).
Table 8. Regression models for poorer self-perceived life satisfaction among womenaccording to each weighting adjustment method. Reference classes for categorical variables: no health problems, never smoked, ≥7 h of sleep, physical exercise several days a week, normal weight or underweight, other level of healthcare and degree in medicine (n = 1211 observations, Nagelkerke R2 = 0.159).
No PSA AdjustmentPSA with Logistic RegressionPSA with Neural Net
(1 Unit)
PredictorsOdds ratio95% CIOdds ratio95% CIOdds ratio95% CI
1|2 intercept0.230.12–0.420.240.12–0.490.260.11–0.59
2|3 intercept1.481.32–1.661.521.33–1.721.581.37–1.83
3|4 intercept7.206.62–7.847.556.86–8.317.937.08–8.88
4|5 intercept32.1028.8–35.835.7931.7–40.540.4735.0–46.8
5|6 intercept79.9164.3–99.381.3663.5–10488.6666.7–118
6|7 intercept177.46127–248185.18127–269194.14127–298
7|8 intercept437.33273–700441.97259–755411.03223–758
8|9 intercept820.65365–1847938.97396–2224949.42385–2344
9|10 intercept2901.21147–73382710.6892–82332447.5690–8684
One health problem1.571.19–2.071.501.12–2.011.551.13–2.13
Two or more health problems3.712.33–5.923.342.03–5.513.742.20–6.33
Daily smoker1.831.25–2.661.781.18–2.701.570.99–2.49
Non-daily smoker1.921.13–3.251.901.01–3.541.400.71–2.77
Ex-smoker1.210.94–1.541.110.84–1.461.070.79–1.47
<7 h of sleep1.801.42–2.301.721.31–2.251.861.34–2.58
No physical activity at all2.471.65–3.702.561.62–4.032.171.33–3.56
Occasional physical activity1.571.21–2.041.491.12–1.991.280.93–1.77
Regular physical activity1.100.82–1.501.080.77–1.501.050.71–1.55
Obesity1.540.94–2.511.480.85–2.601.430.84–2.43
Overweight1.110.87–1.431.070.81–1.421.130.82–1.57
Age (5 years)1.060.99–1.131.081.00–1.161.101.00–1.21
Primary care1.100.87–1.381.110.87–1.421.040.81–1.34
Nursing degree0.880.70–1.110.880.68–1.150.870.65–1.16
Other degree1.140.78–1.661.080.73–1.601.080.71–1.63
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ferri-García, R.; Rueda, M.d.M.; Cabrera-León, A. Self-Perceived Health, Life Satisfaction and Related Factors among Healthcare Professionals and the General Population: Analysis of an Online Survey, with Propensity Score Adjustment. Mathematics 2021, 9, 791. https://doi.org/10.3390/math9070791

AMA Style

Ferri-García R, Rueda MdM, Cabrera-León A. Self-Perceived Health, Life Satisfaction and Related Factors among Healthcare Professionals and the General Population: Analysis of an Online Survey, with Propensity Score Adjustment. Mathematics. 2021; 9(7):791. https://doi.org/10.3390/math9070791

Chicago/Turabian Style

Ferri-García, Ramón, María del Mar Rueda, and Andrés Cabrera-León. 2021. "Self-Perceived Health, Life Satisfaction and Related Factors among Healthcare Professionals and the General Population: Analysis of an Online Survey, with Propensity Score Adjustment" Mathematics 9, no. 7: 791. https://doi.org/10.3390/math9070791

APA Style

Ferri-García, R., Rueda, M. d. M., & Cabrera-León, A. (2021). Self-Perceived Health, Life Satisfaction and Related Factors among Healthcare Professionals and the General Population: Analysis of an Online Survey, with Propensity Score Adjustment. Mathematics, 9(7), 791. https://doi.org/10.3390/math9070791

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop