1. Introduction
One in four Australian adults was obese in 2009 with another one-third being overweight [
1]. Over the last two decades, there has been a steady shift in the Australian population towards the higher end of the body mass index (BMI), driven mainly by weight gain rather than by changes in height. The BMI, a simple index of weight for height, is commonly used to classify people as overweight and obese. It is defined as the weight in kilograms divided by the square of the height in metres (kg/m
2) [
2]. An Australian study suggests that excessive body weight is likely to be costly, with an estimated economic cost including direct health costs, productivity losses, and carer costs of Australian $60 billion dollars per year [
3]. The increasing prevalence of obesity is linked to the onset of chronic diseases including type 2 diabetes, hypertension, coronary heart disease, elevated cholesterol levels, depression, and musculoskeletal disorders [
4,
5,
6,
7]. Other studies have demonstrated that obesity is associated strongly with a deterioration in health-related quality of life, including both the physical and mental health domains [
8]. It has also been demonstrated that obesity negatively affects workforce participation and gives an increased risk of occupational injury [
9,
10,
11]. This has resulted in a growing demand for research to better understand the factors that determine obesity [
12] and the socio-economic impact of being overweight.
This paper explores those factors that influence the incidence of obesity among Australians by way of a random effects generalized ordered probit model. The paper utilizes data from the Household Income and Labour Dynamics in Australia (HILDA) Survey, a household-based annual panel survey. The HILDA is a survey of Australian representative households with an aim to provide longitudinal data on households and their members. The same households and their members are interviewed every year. It began in 2001 with a national probability sample of 7682 households, comprising 13,969 persons interviewed (aged 15 and over) and 4784 children under age 15. It has sample retention of approximately 95% from year to year. It also has new households formed from household members that split-off, such as children leaving home or couples separating [
13].
A component of the survey is a self-completion questionnaire (SCQ) that is provided to all individuals in the households aged 15 and over. Since 2006, the SCQ included additional questions on the height and weight of the individuals, with the individuals self-reporting the information [
14]. This enables the calculation of a BMI score for each person, from which can be derived a categorical variable based on World Health Organization (WHO) guidelines. It is recognized that BMI is an imperfect measure of obesity and does not take into account sex, age, fat distribution, or muscle mass [
15]. However, for HILDA respondents, it is the available variable, and, while it is an indirect measure of weight, it has been determined to correlate well with direct measures such as dual-energy X-ray absorption [
15].
This paper comprises an examination of the influence of economic and social factors on the probability of an individual being in the WHO
overweight and
obese categories. The conditioning variables are demographic, economic, social, and lifestyle related, and many have been included in studies elsewhere [
16,
17,
18]. A potential issue is that such studies may ignore a predisposition towards overweightness/obesity, which might confound the potential relationships with the candidate covariates. One way to capture this predisposition is through the use of a latent class model [
16,
19]. Alternatively, it could be argued that any predisposition to obesity may be captured by variables identifying personality traits. Sullivan et al. argue that personality traits can influence diet and therefore may be important in determining the propensity to obesity [
20]. This study examines the influence of personality traits on the incidence of obesity. The HILDA collected data in waves five (2005) and nine (2009), from which factor scores for the five factor model (FFM) personality traits were computed. Personality traits have been identified to have a multifaceted impact on body weight [
21]. The five traits are emotional stability (known for its antithesis, neuroticism), extroversion, openness to experience, agreeableness, and conscientiousness [
22]. Low scores on emotional stability, thereby high scores for neuroticism, by individuals mean that they often experience feelings of anxiety, hostility, worry, and depression [
21,
23,
24]. These individuals have then been found to weigh more and are at greater risk of obesity [
24,
25], with an odds ratio of 1.02. Openness is intellectual curiosity, the need for variety, and willingness to explore new things [
21,
26]. In Jokela et al., a meta-analysis determined that higher openness to experience was associated with slightly lower odds of obesity but this disappeared when adjusted for education (odds ratio 0.95) [
27]. Extraversion is the sensitivity to positive emotions and social assertiveness [
28]. Higher extraversion was not identified with obesity in women but was in men (odds ratio 1.09) in European samples, although not in America [
27]. The agreeableness trait describes individuals who demonstrate trust, altruism, and generosity [
28]. In Jokela et al., agreeableness was not identified with obesity (odds ratio 1.02) [
27]. Individuals with conscientiousness prefer planned rather than spontaneous behaviour [
28]. They were determined to maintain healthy weight by seeking healthy eating habits [
18,
21,
27]. Jokela et al. determined high conscientiousness was associated with lower obesity risk, with an odds ratio of 0.84. In addition, Jokela et al. demonstrated a likelihood of reversion to being non-obese among initially obese individuals after 5.4 years (odds ratio 1.09) [
27]. These five personality scores are included in this paper as conditioning variables in the probability model.
The next section deals with the empirical model and how it is used to test for those factors that influence obesity. A third section gives a description of the data used in the paper, along with summary statistics. This is followed by an analysis of the estimates and tests, and the paper ends with some concluding remarks.
2. The Econometric Model
Obesity is usually described in terms of an ordered response model, in which the underlying latent variable is the BMI score [
16,
19,
29].
For the ordered responses, the outcome for a categorical response variable is defined as:
where the
J outcomes have a natural integer ordering. Further, a latent variable (in this case BMI score), which underlies the response variable, is defined as [
30] (p. 655):
where the variables in vector
x are seen to govern the ordered responses of individuals and, for identification, do not contain a constant. The observed responses can be associated with the underlying latent variable (in this case BMI):
where the
γj are thresholds or cut points to be estimated.
The response probability is given by [
31] (p. 520):
with the restriction that
and
. The function
F is an appropriate cumulative distribution function for
.
Greene et al. cogently argue that, when modelling BMI category outcomes where those categories are rigidly bounded by WHO guidelines, it might be more appropriate to model ordered responses with flexible boundaries, allowing for sources of individual heterogeneity in terms of the relationship between well-being and BMI category [
16,
19].
In the generalized ordered response model, the thresholds are not fixed (parallel), but are allowed to vary across individuals. Individual heterogeneity is captured by allowing thresholds to vary with those variables that condition category probability [
32]. That is:
Substitution of Equation (5) into the cumulative distribution of Equation (4) gives [
33]:
where
βj =
β −
δj, leading to a separate set of coefficients for each category. The generalized model of Equation (6) is estimated as a series of
J − 1 binary response models [
34], proceeding sequentially on the series from the first model, which analyses category 1 versus 2, ...,
J, to the last model, which analyses category 1, ...,
J − 1 to
J.
In panel random effects, individual heterogeneity is also introduced by augmenting Equation (6) with the mean zero and constant variance
σ2α variable
αi. That is, the latent variable is specified as [
30] (p. 662):
leading to the cumulative distribution function [
35]:
where individual heterogeneity is captured by the non-parallel cut offs and the panel random effects component.
Conditional on
, in Equation (7), we estimate a random effects generalized ordered probit for a three category variables based on individual BMI scores for the last five years of the HILDA survey. The three ordered categories, based on the WHO guidelines, are
normal,
overweight and
obese. The generalized model nests alternative models based on restricting the parameters to be identical between categories. Clearly the most specific model is the standard ordered probit, in which all parameters are identical between categories. We adopt a sequential procedure advocated by Pfarr et al., following Williams, in testing down from the generalized model [
34,
35]. In the first round, a Wald test is performed on the restriction that all parameters are the same across categories. The model is then re-estimated with the restriction that the least significant parameter in the first round is identical across all categories. The Wald test is then applied again. This process of estimation, testing restrictions, and then applying a restriction to a new estimate proceeds until only parameters that are significantly different over categories remain. The model was estimated using REGOPROBIT2 (Statistical Software Components, Chestnut Hill, MA USA) [
33].
3. Data and Descriptive Statistics
All data come from the HILDA panel. It needs to be noted that some of the data were imputed following non-response by a panel member or the failure of a household to provide some information. Imputation for different data items, such as income, is undertaken by making use of responses from similar individuals or households [
13]. For this paper, the data were further reduced by the researchers, with missing responses for key variables being dropped to ensure balance.
Table 1 presents the list of variables included in the model. According to WHO international classifications, the BMI cut-off points for adults are less than 18.5 for
underweight, range between 18.5 to 25.0 for
normal weights, between 25 to 30 for
overweight and more than 30 for
obese. The category
underweight is not considered in the dependent variable
ordobese, since our analysis focuses on the
overweight and
obese categories relative to the normal category of BMI. This paper also uses BMI categories and not the BMI numerical values. The use of BMI categories has been criticized due to a reliance, when calculating BMI, on self-reported height and weight and the possibility of misstatement [
17,
18,
36]. In addition, the use of categories of BMI and not BMI numerical values results in a loss of information. These criticisms are duly noted with recognition, as per Greene et al., that the BMI category is likely to be correct [
16,
18]. The correlation coefficient was determined to be very high for self-reported weight and height and measured weight and height (greater than 0.9) [
37]. In addition, policy-makers are interested in categories and individual movements in the categories rather than the marginal changes in them [
16].
Age is expected to have a quadratic association with BMI [
38]. The general increasing trend of BMI with age may be attributed to age-related losses in lean body mass, resulting in lowered energy expenditure. However, later in life, BMI is expected to decrease with age due to biological mechanisms. To account for this pattern in the relationship between age and BMI, age and age squared terms are included in the model presented in this study.
A significant relationship between education and obesity has been shown in many studies [
39,
40]; those with higher levels of education have a significantly lower risk of obesity. The variable
educ, the self-reported highest level of education attained by participants, was included in this model to capture this relationship and was collapsed into four categories, as outlined in
Table 1.
The respondent’s employment status, empstatus, was re-coded as a binary variable, scoring 1 for employed and 0 for unemployed or not being in the work force. Income is captured by the variable lndinc/p, which is the logarithm of the ratio of household annual disposable income to the number of persons in the household who were included in the survey at the time the data was collected. The covariate losat, satisfaction with life, was collapsed from ten to three categories, especially to re-categorize those who rated themselves as dissatisfied with their life.
The covariates area, remoteness area, and advantage, the SEIFA 2001 decile of index of relative socio-economic advantage/disadvantage, are included in this model to capture a likely association between living in remote areas and being in relatively low socio-economic status with a higher overweight risk. Consumption spending on alcohol (alcohol) and on foods prepared outside the home (meals) is generally associated with increased obesity. Potential differences in terms of varying household types are captured by the inclusion of the variables marstatus and hhtype, where the latter is designed to control for single parents.
The five personality traits associated with the FFM are included as measures of the health status and personalities of respondents. The panel on obesity runs from 2006 to 2010 inclusive. Personality data was collected for the years 2005 and 2009. Personality scores are relatively stable [
41], and scores for the year 2005 were applied to the years 2006 and 2007, while scores for 2009 were applied to the years 2008 and 2010 to complete the panel. This technique is in keeping with Cobb-Clark and Schurer in their study of the FFM from HILDA and their demonstration that personality traits are stable for working-age adults [
41]. The FFM is well established in psychology literature [
42] but is used less frequently in econometric work [
43]. HILDA respondents are administered a version of the Big Five personality inventory, based on Saucier (1994), using the trait descriptive approach [
44]. Respondents were asked how well 36 different adjectives describe them, with 28 used to derive scales of five specific personality traits. Scores for each of the traits are constructed by assigning a value from 1 to 7 to each item, with the higher score indicating that the trait describes the individual better, summing them, and obtaining an average [
22,
41,
42]. The five personality traits are
opene, openness to experience;
consc, conscientiousness;
extrv, extraversion;
agree, agreeableness; and
emote, emotional stability. The internal reliability coefficients (Cronbach alpha) for these traits were shown by Wooden to be satisfactorily high in HILDA (greater than 0.7) and identical between wave five and wave nine of the survey [
22]. Testing was conducted by Wooden on the extent to which these personality traits changed by age between the two survey years (2005 and 2009) to conclude that for those aged 25 to 64, the personality scores for most individuals do not change much over time [
22]. Some work has been done linking personality traits and obesity [
20,
45,
46]. These studies use different personality variables to the FFM; the former uses the Karolinska Scales of Personality (KSP), and Sullivan et al. use the Temperament and Character Inventory (TCI). Fortunately, the TCI can be linked to the FFM [
47].
The mean and standard deviation scores for the continuous variables in
Table 1,
age,
lndinc/p,
alcohol, and
meals, have their usual meaning. The personality scores are ordinal but take on 36 different ranks between the values 1 and 7 inclusive; as such, the reported means and standard deviations have the usual interpretation. The mean for the binary variables,
gender,
empstatus,
hhtype, and
marstatus, give the proportion of the estimation sample scoring 1. The means and standard deviations for the remaining variables, which are all ordered categorically, should be interpreted with caution. The relative frequency distributions for these categorical variables are given in
Table 2.
In
Table 2,
advantage is the SEIFA index, which is simply the decile of socio-economic advantage from the lowest to the highest ten percent. The fact that the relative frequencies all approximate to the value of ten gives an indication of the representativeness of the HILDA sample. The final column gives the distribution of scores over the five panel years for the BMI categories.
Table 3 complements this column by giving the transition probabilities between categories between the first and last years of the sample.
Reading down the columns in
Table 3 gives the BMI category in 2006, and reading across the rows gives the category in 2010. The elements on the principal diagonal give the probability of remaining in the same category: these are relatively large, indicative of stability over time. The off-diagonal elements give the probability of transition between categories. The probability of moving from
normal to
overweight is 0.152 and the probability of moving from
obese to
normal is 0.016. These are unconditional transitional probabilities. The next section examines the conditional probabilities of being overweight or obese, identified by the random effects generalized ordered probit model.
4. Results
The random effects specification was applied to Equation (6), and this model was estimated without restriction. Recalling
Section 2, a series of sequential Wald tests were applied to this unrestricted model, where each test is on the basis of the parallel lines restriction to variables. The variable with the highest probability value was then restricted and the model re-estimated with the restriction, with the subsequent imposition of the parallel lines restriction onto the remaining variables. This process of test and then restriction proceeded until only those variables with a probability score of less than 0.05 in the Wald test remained. See
Table A1 of the
Appendix A for the probability scores of the sequential Wald tests for all variables. Six variables were identified where the parallel lines restriction applied. That is, the estimated coefficients for these variables were deemed to be the same for both equations in the model.
Table 4 gives a Wald test on jointly restricting these variables to have the same coefficient values over the two equations and clearly indicates that these restrictions cannot be dismissed.
The results for the generalized ordered probit of Equation (6) from
Section 2, but estimated with the parallel lines assumption of
Table 4, are given in
Table 5. The results for the generalized ordered probit without parallel lines restrictions are reported without comment in
Table A2 of the
Appendix A. The first two columns of
Table 5 give the coefficients and standard errors for eq1, with the category
normal in the variable
ordobese, against the two categories
overweight and
obese. The following two columns of coefficients and standard errors are for eq2, with the categories
normal and
overweight against the category
obese. The underlined coefficients are for those variables where the parallel lines restriction is not relaxed according to the test outcomes of
Table 4. It should be noted that the coefficients for three of these variables,
hhtype,
alcohol, and
meals, test as not significantly different from zero in both eq1 and eq2. Further, the estimates of the unrestricted generalized ordered probit, as shown in
Table A2 of the
Appendix A, show that the remaining three variables with parallel line restrictions,
area,
agree, and
emote, have estimated coefficients which are all significant at the 1% level in both equations and have similar magnitudes in both equations for the unrestricted estimates.
Before moving on to a detailed description of the estimated coefficients and their implications for the BMI categories, it would be useful to deal with two statistics reported in the header and footer of
Table 5. The Wald test at the header of the table is the usual model test with the slope parameters jointly restricted to zero. Note here that there are 30 slope parameters, as the model is estimated conditional on six parameters being common to both eq1 and eq2. In the footer of the table,
ρ is the ratio
where
and
are the variance of the unobserved individual effect and the idiosyncratic error, respectively. The statistic
ρ is restricted to the unit interval and it gives the proportion of the total variance given by the unobserved individual effect. It is the correlation of
overweight and
obese over time for individuals and is indicative of the level of persistence in the
overweight/
obese category against
normal weight and the
obese category against the
normal/
overweight category for individuals [
48,
49]. Here the score, at 0.852, is close to 1.0 and is indicative of high persistence for individuals over time.
Ten variables are significant at the 1% level in both equations. Of these, three are restricted to having fixed coefficients for both equations, area, agree, and emote. Remoteness area, area, is ordered categorically, and the ordering is over increasing remoteness. Thus probability in both equations is increasing with remoteness. Recall that the personality scores are coded from 1 to 7, with low scores reflecting negative aspects of the trait and high scores reflecting positive aspects of the trait. Increasing emotional stability, emote, is associated with lower probability, but the reverse is true for agreeableness, agree, with increasing agreeableness indicating a higher probability.
The seven variables for which the parallel lines restriction does not apply and which are significant in both equations are
age,
age2,
educ,
advantage,
marstatus,
consc, and
opene. The outcomes for
age and
age2 indicate that probability is increasing, but this is non-linear in both equations. The education variable,
educ, is an ordinal scale in qualification achieved, ranked from highest to lowest. The positive sign is expected and probability is decreasing with educational achievement [
39,
40]. The ordinal scale for
advantage, the SEIFA deciles of relative socio-economic advantage, is increasing and the negative sign is expected. Being in a married or de facto relationship,
marstatus, shifts the probability up in both equations. A rationale for this may be in terms of marriage markets; once married, competition for a partner ceases, and individuals become less concerned with appearance. That is, marriage causes overweightness. However, this also opens up the possibility of marriage selection and that lean people are more likely to be selected in marriage [
50], although Lin et al. found some evidence to suggest that when single females are faced with adverse marriage market conditions, low male to female ratios, then females have less incentive to remain fit and healthy [
51]. The discussion of the two personality traits,
consc and
opene, will be deferred to a joint examination of the implications of the results for all five psychological variables.
Two variables are significant in eq2 but not in eq1. There is a significant downward shift in the probability of being obese relative to falling into the normal/overweight category that is associated with being employed, empstatus. The categorical variable losat is positively ordered in life satisfaction; that is, higher scores indicate increased life satisfaction. The outcome of a negative relationship with probability in eq2 is expected. Three variables are significant in eq1, but not in eq2; gender, lndinc/p, and extrv. All three coefficients are positive, so that probability increases in eq1 but not in eq2. The results indicate that being male increases the likelihood of being overweight. Further, the likelihood of being overweight increases with household disposable income per person, lndinc/p.
All of the FFM personality traits are significant with the exception of extraversion in eq2. Three have negative coefficients;
consc,
emote, and
opene. That is, the probability of being overweight/obese is decreasing with increasing conscientiousness, emotional stability, and openness to ideas. Given the nature of these traits, the negative sign would be expected. Similar findings on the association between BMI and body weight and conscientiousness were revealed by Kim (2016). That study used participants from the National Longitudinal Study of Adolescent to Adult (Add Health) and concluded that a one standard deviation increase in conscientiousness was associated with a decrease in BMI by 0.89 and a 12% reduction in the probability of being obese [
21]. There is nothing in the nature of the remaining traits, agreeableness and extraversion, which would indicate any conditioning of probability. However, the results here replicate the results for the community in Sullivan et al. [
20]. They found that being obese was positively associated with novelty seeking and reduced reward dependence, which is parallel with the positive sign for extraversion here. Further, the negative signs for conscientiousness and emotional stability are replicated in Sullivan et al., with lower persistence and self-directedness being associated with obesity [
20].
This study had a number of limitations. It relied on, and analysis was based on, HILDA panel data without any augmentation to link to other datasets for one or more periods, such as health campaigns at national or state level. In addition, within the HILDA dataset, households may have been comprised by one or many more than one person, but no attempt was made in the analysis to group respondents by household. A third limitation was the exclusion of “underweight” due to the small proportion of respondents in this category, and the grouping of persons with BMI > 30 into the obese category with no further attempt to include another category of morbidly obese.