**Promoting Healthy Eating among Young People—A Review of the Evidence of the Impact of School-Based Interventions**

#### **Abina Chaudhary <sup>1</sup> , František Sudzina 2,3,\* and Bent Egberg Mikkelsen <sup>4</sup>**


Received: 1 July 2020; Accepted: 3 September 2020; Published: 22 September 2020

**Abstract:** Intro: Globally, the prevalence of overweight and obesity is increasing among children and younger adults and is associated with unhealthy dietary habits and lack of physical activity. School food is increasingly brought forward as a policy to address the unhealthy eating patterns among young people. Aim: This study investigated the evidence for the effectiveness of school-based food and nutrition interventions on health outcomes by reviewing scientific evidence-based intervention studies amongst children at the international level. Methods: This study was based on a systematic review using the PRISMA guidelines. Three electronic databases were systematically searched, reference lists were screened for studies evaluating school-based food and nutrition interventions that promoted children's dietary behaviour and health aiming changes in the body composition among children. Articles dating from 2014 to 2019 were selected and reported effects on anthropometry, dietary behaviour, nutritional knowledge, and attitude. Results: The review showed that school-based interventions in general were able to affect attitudes, knowledge, behaviour and anthropometry, but that the design of the intervention affects the size of the effect. In general, food focused interventions taking an environmental approach seemed to be most effective. Conclusions: School-based interventions (including multicomponent interventions) can be an effective and promising means for promoting healthy eating, improving dietary behaviour, attitude and anthropometry among young children. Thus, schools as a system have the potential to make lasting improvements, ensuring healthy school environment around the globe for the betterment of children's short- and long-term health.

**Keywords:** school children; food and nutrition; intervention; healthy eating

#### **1. Introduction**

Childhood is one of the critical periods for good health and development in human life [1,2]. During this age, the physiological need for nutrients increases and the consumption of a diet high in nutritional quality is particularly important. Evidence suggests that lifestyle, behaviour patterns and eating habits adopted during this age persist throughout adulthood and can have a significant influence on health and wellbeing in later life [3,4]. Furthermore, the transition from childhood into adolescence is often associated with unhealthy dietary changes. Thus, it is important to establish healthful eating behaviours early in life and specially focus on the childhood transition period. A healthy diet during the primary age of children reduces the risk of immediate nutrition-related health problems of primary concern to school children, namely, obesity, dental caries and lack of physical activity [5–7]. Furthermore, young people adopting these healthy habits during childhood are more likely to maintain their health and thus be at reduced risk of chronic ailments in later life [7–9]. Thus, healthy behaviours learnt at a young age might be instrumental in reaching the goals of good health and wellbeing of the 2030 Sustainability Agenda which has implications at the global level.

Globally, the prevalence of overweight and obesity rose by 47.1% for children and 27.5% for adults between 1980 and 2013 [10]. A recent WHO (World Health Organization) Commission report [10] stated that if these same trends were to continue, then by 2025, 70 million children are predicted to be affected [11]. Hence, the increased prevalence might negatively affect child and adult morbidity and mortality around the world [12,13]. Worldwide the dietary recommendations for healthy diets recommend the consumption of at least five portions of fruits and vegetables a day, reduced intake of saturated fat and salt and increased consumption of complex carbohydrates and fibres [14]. However, studies show that most children and adolescent do not meet these guidelines [15,16] and, thus, as a result, childhood and adolescent obesity are alarming nearly everywhere [17]. Recent figures show that the prevalence has tripled in many countries, making it the major public health issue in the 21st century [18–21]. According to WHO [4], 1 in 3 children aged 6–9 were overweight and obese in 2010, up from 1 in 4 children of the same age in 2008.

The increased prevalence of overweight and obesity has fuelled efforts to counteract the development, as seen for instance in the action plan on childhood obesity [17]. Increasingly policy makers have been turning their interest to the school setting as a well-suited arena for the promotion of healthier environments [18]. As a result, schools have been the target of increased attention from the research community to develop interventions and to examine the school environment to promote healthful behaviours including healthy eating habits.

Globally, interventions in the school environment to promote healthier nutrition among young people have received considerable attention from researchers over the past years. But there is far from a consensus on what are the most effective ways to make the most out of schools' potential to contribute to better health through food-based actions. Is it the environment that makes a difference? Is it the education or is it the overall attention given to food and eating that plays the biggest role? School food and nutrition intervention strategies have witnessed a gradual change from knowledge orientation to behavioural orientation [22] and from a focus on the individual to the food environment. Research evidence has shown that adequate nutrition knowledge and positive attitudes towards nutrition do not necessarily translate to good dietary practices. Similarly, research has shown that the food environment plays a far bigger role in behaviour than originally believed [23,24].

School-based interventions can a priori be considered as an effective method for promoting better eating at the population level. Schools reach a large number of participants across diverse ethnic groups. It not only reaches children, but school staffs, family members as well as community members [8,25]. Schools can be considered a protected place where certain rules apply and where policies of public priority can be deployed relatively easily. In addition, schools are professional spaces in which learning and formation is at the heart of activities and guided by a skilled and professional staff. Schools, as such, represent a powerful social environment that hold the potential to promote and provide healthy nutrition and education. Besides the potential to create health and healthy behaviours, good nutrition at school has, according to more studies, the potential to add to educational outcomes and academic performance [26–28].

However, taking the growth in research studies and papers in the field into account, it is difficult for both the research community and for policy makers to stay up to date on how successful school-based interventions have been in improving dietary behaviours, nutritional knowledge and anthropometry among children. Also, the knowledge and insights into how it is possible to intervene in the different corners of the school food environment has developed which obviously has influenced over recent decades how programs and interventions can be designed. It has also become clear that food at school is more than just the food taken but includes curricular and school policy components. The findings from school-based studies on the relationship between school, family as well as community-based

interventions and health impact suggest that health impacts are dependent on the context in which they have been carried out as well as the methodology. Thus, an updated overview as well as a more detailed analysis of initiatives is needed in order to develop our understanding of the nature of the mechanisms through which the school can contribute to the shaping of healthier dietary behaviour among children and adolescents before more precise policy instruments can be developed. Our study attempted to fill the need for better insight into which of the many intervention components works best. It attempted to look at school food and nutrition interventions reported in the literature that have been looking at healthy eating programmes, projects, interventions or initiatives.

School-based interventions in the Western world are traditionally targeted at addressing obesity and over-nutrition, but school food interventions are also addressing under nutrition and, as such, their role in a double burden of disease perspective should not be underestimated. Many studies have reported on micronutrient malnutrition among school-aged children in developing countries (for instance [29–31]) but it has also been reported in the context of developed countries [32]. Against this backdrop, the aim of this study was to provide an analysis of the evidence of the effectiveness of school-based food interventions by reviewing recent scientific, evidence-based intervention studies on healthy eating promotion at school. The specific objectives of the study were to identify which interventions had an effect on primary outcomes, such as BMI, or on secondary outcomes such as dietary behaviour, nutritional knowledge and attitude.

#### **2. Materials and Methods**

The functional unit of the review were healthy eating programmes, projects or initiatives that have been performed using the school as a setting. We included only programmes, projects or initiatives that were studied in a research context, in the sense that they were planned by researchers, carried out under controlled settings using a research protocol, and reported in the literature. School-based programmes, projects, interventions or initiatives are, per definition, cluster samples where a number of schools first were chosen for intervention followed by performing an outcome measurement before and after the intervention and, in most cases, also in one or more control schools. The outcome measurement in the studies reviewed was performed on a sample of students that was drawn from each school (cluster).For this, the systematic review and meta-analysis (PRISMA) guidelines and the standardised quality assessment tool "effective public health practice project (EPHPP) quality assessment tool for quantitative studies" were used for analysing the quality assessment of the included studies [33]. This EPHPP instrument can be used to assess the quality of quantitative studies with a variety of study designs.

#### *2.1. Literature Search*

The literature review involved searches in PubMed, Web of Science and Cochrane Library database. The search strategy was designed to be inclusive and focused on three key elements: population (e.g., children); intervention (e.g., school-based); outcome (e.g., diet and nutrition, knowledge, attitude and anthropometrics). The search terms used in PubMed database were: "effectiveness of school food AND nutrition AND primary school children", "effectiveness of school food AND nutrition AND interventions OR programs AND among primary school children AND increase healthy consumption", "primary school children and education and food interventions", "Effectiveness of school-based food interventions among primary school", "effectiveness of school-based nutrition and food interventions", "primary school interventions and its effectiveness", and "obesity prevention intervention among Primary schools". Search terms such as: "effectiveness of school-based food interventions among primary school", "effectiveness of school based food and nutrition interventions", "primary school interventions and its effectiveness" and "obesity prevention interventions", were used in the Web of Science database. Lastly, search terms such as: "nutrition interventions in primary schools" and "Nutrition education interventions in school" were used in the Cochrane Library database to find the articles. In addition, reference lists of all retrieved articles and review articles [34] were screened for

potentially eligible articles. The search strategy was initially developed in PubMed and adapted for use in other databases. In addition, snowballing of the reference list of the selected articles was conducted.

#### *2.2. Inclusion Criteria*

Studies selected for the inclusion were studies which investigated the effectiveness of a school-based interventions targeting food and nutrition behaviour, healthy eating and nutrition education as a primary focus during the intervention. Also, to be included in this review, only articles from 2014 to 2019 were selected and of those inclusion criteria included articles targeting primary school children aged between 5 and 14 years. Participants included both boys and girls without considering their socio-economic background. Study design included randomized controlled trial "RCT", cluster randomized controlled trial "RCCT", controlled trial "CT", pre-test/post-test with and without control "PP", experimental design "Quasi". Studies which did not meet the intervention components/exposures, such as information and teaching (mostly for the target group and parents were additional), family focus on social support and food focus (which mainly focuses on the availability of free foods including food availability from school gardening), were excluded. Systematic review papers and studies written in different language except for English were excluded as well. Studies which met the intervention criteria but had after school programs were excluded.

#### *2.3. Age Range*

Since the review covers a broad range of different countries and since school systems are quite different, the sampling principle had to include some simplification and standardisation. The goal of the review was to cover elementary (primary) and secondary education and, as a result, the age range of 5–14 was chosen to be the best fit, although it should be noted that secondary education in some countries also covers those 15–18 years of age. In most countries, elementary education/primary education is the first—and normally obligatory—phase of formal education. It begins at approximately age 5 to 7 and ends at about age 11 to 13 and in some countries 14. In the United Kingdom and some other countries, the term primary is used instead of elementary. In the United States the term primary refers to only the first three years of elementary education, i.e., grades 1 to 3. Elementary education is, in most countries, preceded by some kind of kindergarten/preschool for children aged 3 to 5 or 6 and normally followed by secondary education.

#### *2.4. Assessment of Study Eligibility*

For the selection of the relevant studies, all the titles and abstracts generated from the searches were examined. The articles were rejected on initial screening if the title and abstract did not meet the inclusion criteria or met the exclusion criteria. If abstracts did not provide enough exclusion information or were not available, then the full text was obtained for evaluation. The evaluation of full text was done to refine the results using the aforementioned inclusion and exclusion criteria. Thus, those studies that met predefined inclusion criteria were selected for this study.

#### *2.5. Analytical Approach*

The first step of data collection was aimed at organizing all studies with their key information. In the second step, we created coded columns. A coded column served as a basis for being able to do further statistical analysis. In other words, in a coded column we added a new construct not originally found in the papers as a kind of dummy variable that standardized otherwise non-standardized information, allowing us to treat otherwise un-calculable data statistically. For the impact columns, we used the following approach to construct codes where impacts where put on a 1–4-point Likert scale with 1 being "ineffective", 2 "partially effective", 3 "effective" and 4 "very effective".

For the design column, the following approach was adopted as illustrated in the Table 1. Quasi experimental/pre–post studies were labelled QED and were considered to always include a baseline and follow-up outcome measurement. As the simplest design with no comparison but just a pre/post study of the same group, we constructed a power column and assigned 1 to this for a QED design. For the controlled trial (CT), we assigned the power 2. A controlled trial is the same as QED but with a comparison/control in which no interventions are made and with no randomization. We considered a study to be of that kind if some kind of controls were made which could be, for instance, matching. All CTs in our study included 2 types of comparisons: pre and post (baseline and follow-up) as well as a comparison between intervention/no intervention. For the RCT/RCCT—a trial that is controlled through the randomization—we assigned the power 3. This "top of hierarchy" design includes the case (intervention) and a control (no intervention) and normally two types of comparisons (pre and post) as well as an intervention/no intervention. For the context of this study, we did not differentiate between RCTs and RCCTs. The latter is sometimes used to stress the fact that the school (or the class) is the sampling unit from which the subjects are recruited. But since in the context of schools RCCT is simply a variation of RCT, we coded them in the same class of power. We simply assumed that when authors spoke about an RCT, they in fact meant an RCCT since they could not have been sampling subjects without using the school as the unit.

**Table 1.** Coding table for study designs. The table shows the types of studies examined in the review and the power assigned to them.


Codes and categorization were used to standardize the information found in the papers for our statistical analysis. Categorisation of the age/class level, such as EA—Early age, EML—Early middle late, EL—Early late, was used.

For the intervention components ("what was done") we translated all studies into three columns: information and teaching, family and social support and environmental components, food provision and availability. The latter was further expanded into three columns labelled as: focus on and provisioning of F & V; free food availability through school gardening and availability of food and healthier food environment. Our inclusion criteria were that studies should contain at least one of these components. For the environmental component—food provision and availability intervention components—we identified 2 distinct types: either a broad healthier eating focus or a narrow and more targeted fruit and vegetable focus. After the coding, we started to ask questions about the data. Most importantly, we were interested in knowing whether there existed a relationship between "what was done" and "what was the impact". In other words, we were interested in knowing more whether there was a pattern in the way the studies intervened and the outcomes.

#### *2.6. Queries Made*

We performed queries for each intervention component (the independent variable in columns K, L and M) for each single outcome measure.

Is there a relationship between age and outcome? We used the coded column (EA, EML, etc.) to study that relationship.

In addition, we made queries regarding the relationship among study designs. For instance, would the duration of studies influence whether an effect could be found or not? Would more powerful designs result in more impact?

Furthermore, we made queries on the relationship between one intervention and a multi-interventional component and their effect on the outcome measure. Also, the queries on target groups were made. Codes such as S and NS (refer Table 4) in the column were used to study the relationship. In our analysis a distinction was made between "standard" and "extreme" (special cases). From the reviewed papers, it was clear that some studies put little emphasis on the school selected. We classified those as standard (S). However, a few papers used a stratification approach and case/cluster selection that can be classified as an "extreme" or non-standard case. We coded these as non-standard (NS). For instance, studies could be targeted to include only refugees or subjects of low socio-economic status. It can be speculated that being a "special case" or extreme case could have an influence. As a result, we reserved a code for these cases, although it became clear that they represented only a minority.

In our study, availability plays a central role, since it is used in many food-at-school intervention studies. Availability signals that food is "pushed" as opposed to being used in the "pull" mode, where individuals are expected to request food in the sense that is the behaviour of the individual that becomes the driving force rather than the "out thereness". Availability is in most studies used in combination with the idea of a food environment. The literature shows that availability can be of two types. One is when food is made available for the individual to take where visibility, salience, product placement, etc., are used as factors. The other type of availability is when it is made free and the individual as a result does not have to pay. Free availability has been studied extensively in intervention studies but for obvious reason it is difficult to implement "post-study" since there needs to be a permanent financing present. The only exceptions to this are the collective meal models found in countries such as Sweden, Finland, Estonia and Brazil as well as in the EU scheme where the EU subsidizes the fruit.

Study design and other characteristics are provided in Table 2, and their findings are provided in Table 3.

**Table 2.** The review sample: study design/characteristics. The table shows the 43 studies of the review Illustrating study design and study characteristics of the included studies.


**Author Year Title**/**Reference Main Aim (from Abstract) Main Aim in Brief Program Name Location & Country Study Design Study Design Coded Power Intervention Components Acronym Column I RCT, PP, CT, RCCT, Quasi Information and Teaching Food Focus Family**/**Social Support Environmental**/**Food Focus on Healthy Meal Availability Environmental**/**Food Focus through School Gardening** Piana N., et al. [39] 2017 An innovative school-based intervention to promote healthy lifestyles To describe an innovative school-based intervention to promote healthy lifestyles. To evaluate its effects on children's food habits and to highlight the key components which contribute most to the beneficial effects obtained from children's, teachers' and parents' perspectives. HE & FV, Nutritional knowledge, Physical activity Kidmed test Spoleto, Umbria Pre-test post-test PP 1 x x Battjes-Fries M.C.E., et al. [40] 2017 Effectiveness of Taste Lessons with and without additional experiential learning activities on children's willingness to taste vegetables The aim of this study was to assess the effect of Taste Lessons with and without extra experiential learning activities on children's willingness to taste unfamiliar vegetables, food neophobia, and vegetable consumption. HE & FV, attitude TLVM Dutch province of Gelderland Quasi experimental design QED 1 x Bogart L.M., et al. [41] 2014 A Randomized Controlled Trial of Students for Nutrition and eXercise (SNaX): A Community-Based Participatory Research Study To conduct a randomized controlled trial of Students for Nutrition and eXercise (SNaX), a 5-week middle-school-based obesity-prevention intervention combining school-wide environmental changes, multimedia, encouragemen<sup>t</sup> to eat healthy school cafeteria foods, and peer-led education. HE & FV, Nutritional knowledge SNaX Los Angeles Unified School District Randomized Controlled Trial RCT 3 x Shriqui V.K., et al. [42] 2016 Effects of a School-Based Intervention on Nutritional Knowledge and Habits of Low-Socioeconomic School Children in Israel: A Cluster Randomized Controlled Trial Examining the effect of a school-based comprehensive intervention on nutrition knowledge, eating habits, and behaviours among low socioeconomic status (LSES) school-aged children was performedAnthropometry, HE & FV, Nutritional knowledge NRI & PA Beer Sheva, a big metropolis in southern Israel Randomized Controlled Cluster Trial RCCT 3 x x

**Table 2.** *Cont.*

**Author Year Title**/**Reference Main Aim (from Abstract) Main Aim in Brief Program Name Location & Country Study Design Study Design Coded Power Intervention Components Acronym Column I RCT, PP, CT, RCCT, Quasi Information and Teaching Food Focus Family**/**Social Support Environmental**/**Food Focus on Healthy Meal Availability Environmental**/**Food Focus through School Gardening** Sharma S.V. et al. [43] 2016 Evaluating a school-based fruit and vegetable co-op in low-income children: A quasi-experimental study The purpose of this study was to evaluate the effectiveness of a new school-based food co-op program, Brighter Bites (BB), to increase fruit and vegetable intake, and home nutrition environment among low-income 1st graders and their parents. HE & FV, Nutritional knowledge BB Houston, Texas Quasi-experimental non-randomized controlled study QED 1 x x x Lawlor A.D. et al. [44] 2016 The Active for Life Year 5 (AFLY5) school-based cluster randomised controlled trial: effect on potential mediators To determine the effect of the intervention on potential mediators Anthropometry, HE & FV AFLY5 South East of England Cluster RCT RCCT 3 x x Steyn P.N. et al. [45] 2016 Did Health kick, a randomised controlled trial primary school nutrition intervention improve dietary quality of children in low-income settings in South Africa? To promote healthy eating habits and regular physical activity in learners, parents and educators by means of an action planning process HE & FV, PA HK Western Cape (WC) Province Cluster RCT RCCT 3 x Jones M. et al. [46] 2017 Association between Food for Life, a Whole Setting Healthy and Sustinable Food Programme, and Primary School Children's Consumption of Fruit and Vegetables: A cross Sectional Study in England The aim of the study was to examine the association between primary school engagemen<sup>t</sup> in the Food for Life programme and the consumption of fruit and vegetables by children aged 8–10 years. HE & FV, Nutritional knowledge FLP England Cross sectional school matched comparison approach Cross-sectional study design 1 x x Larsen L.A. et al. [47] 2015 RE-AIM analysis of a randomized school-based nutrition intervention among fourth-grade classrooms in CaliforniaTo promote healthy eating behaviours and attitudes in children HE & FV, Nutritional knowledge, Attitude NPP California RCT with pre-, post-, and follow-up assessments RCT 3 x x

**Table 2.** *Cont.*


**Table 2.** *Cont.*

**Table 2.** *Cont.*


**Author Year Title**/**Reference Main Aim (from Abstract) Main Aim in Brief Program Name Location & Country Study Design Study Design Coded Power Intervention Components Acronym Column I RCT, PP, CT, RCCT, Quasi Information and Teaching Food Focus Family**/**Social Support Environmental**/**Food Focus on Healthy Meal Availability Environmental**/**Food Focus through School Gardening** Hutchinson J. et al. [57] 2015 Evaluation of the impact of school gardening interventions on children's knowledge of and attitudes towards fruit and vegetables. A cluster randomised controlled trial To evaluate whether ongoing gardening advice and gardening involvement from the Royal Horticultural Society (RHS) gardening specialists was associated with better fruit and vegetable outcomes in children than those at teacherled schools that obtained standard advice from the RHS Campaign for School Gardening Nutritional knowledge, Attitude CFSG London boroughs, Wandsworth, Tower Hamlets, Greenwich and Sutton Randomised Controlled Cluster Trial RCCT 3 x x Viggiano A et al. [58] 2018 Healthy lifestyle promotion in primary schools through the board game Kaledo: a pilot cluster randomized trial The board game Kaledo seems to improve knowledge in nutrition and helps to promote a healthy lifestyle in children attending middle and high schools. So, this study was conducted to investigate whether similar effects of Kaledo could be found in younger children in primary school. Anthropometry, HE & FV, Nutritional knowledge Kaledo Campania, Italy Pilot cluster randomized trial RCCT 3 x Waters E. et al. [59] 2017 Cluster randomised trial of a school-community child health promotion and obesity prevention intervention: findings from the evaluation of fun 'n healthy in Moreland! Fun 'n healthy in Moreland! aimed to improve child adiposity, school policies and environments, parent engagement, health behaviours and child wellbeing Anthropometry, HE & FV FHM Victoria, Australia Randomised Controlled Cluster Trial RCCT 3 x Xu F et al. [60] 2015 Effectiveness of a Randomized Controlled Lifestyle Intervention to Prevent Obesity among Chinese Primary School Students: CLICK-Obesity Study To evaluate whether the lifestyle intervention was able to reduce obesity risk and increase healthy behaviors and knowledge Anthropometry, Nutritional knowledge CLICK-ObesityMainland China Randomised Controlled Cluster Trial RCCT 3 x x Jung et al. [61] 2018 Influence of school-based nutrition education program on healthy eating literacy and healthy food choice among primary school children To examine the effectiveness of a school-based healthy eating intervention program, the Healthy Highway Program, for improving healthy eating knowledge and healthy food choice behavior among elementary school studentsNutritional knowledge, HE & FV Healthy highway program Oswego County, New York State Pre-/post-test QED 1 x

**Table 2.** *Cont.*

**Author Year Title**/**Reference Main Aim (from Abstract) Main Aim in Brief Program Name Location & Country Study Design Study Design Coded Power Intervention Components Acronym Column I RCT, PP, CT, RCCT, Quasi Information and Teaching Food Focus Family**/**Social Support Environmental**/**Food Focus on Healthy Meal Availability Environmental**/**Food Focus through School Gardening** Jhou W et al. [62] 2014 Effectiveness of a school-based nutrition and food safety education program among primary and junior high school students in Chongqing, China To examine the effectiveness of a school-based nutrition and food safety education program among primary and junior high school students in China Nutritional knowledge, attitude school-based nutrition and food safety education Chongqing, China Pre-/post-test QED 1 x Anderson EL, et al. [63] 2016 Long-term effects of the Active for Life Year 5 (AFLY5) school-based cluster-randomised controlled trial To investigate the long-term effectiveness of a school-based intervention to improve physical activity and diet in children. HE & FV, PA AFLY5 Southwest of England Randomised Controlled Cluster Trial RCCT 3 x Griffin T.L. et al. [64] 2015 A Brief Educational Intervention Increases Knowledge of the Sugar Content of Foods and Drinks but Does Not Decrease Intakes in Scottish Children Aged 10–12 Years To assess the effectiveness of an educational intervention to improve children's knowledge of the sugar content of food and beverages Nutritional knowledge, attitude NEMS Aberdeen, Scotland Randomised Controlled Cluster Trial RCCT 3 x Kipping R.R. et al. [65] 2014 Effect of intervention aimed at increasing physical activity, reducing sedentary behaviour, and increasing fruit and vegetable consumption in children: Active for Life Year 5 (AFLY5) school-based cluster randomised controlled trial To investigate the effectiveness of a school-based intervention to increase physical activity, reduce sedentary behaviour, and increase fruit and vegetable consumption in children HE & FV, PA AFLY5 South west of England Randomised Controlled Cluster Trial RCCT 3 x Gaar V.M. et al. [66] 2014 Effects of an intervention aimed at reducing the intake of sugar-sweetened beverages in primary school children: a controlled trial Aimed at reducing children's SSB consumption by promoting the intake of water Nutritional knowledge, attitude Water campaign Rotterdam, Netherland Controlled trial CT 2 x Moore GF et al. [67] 2014 Impacts of the Primary School Free Breakfast Initiative on socio-economic inequalities in breakfast consumption among 9–11-year-old schoolchildren in WalesTo examine the impacts of the Primary School Free Breakfast Initiative in Wales on inequalities in children's dietary behaviours and cognitive functioning HE & FV FSM Wales, UK Randomised Controlled Cluster Trial RCCT 3 x

**Table 2.** *Cont.*


**Table 2.** *Cont.*


**Table 2.** *Cont.*

**Table 3.**The review sample-findings. The table shows the findings from the 43 studies of the review.



#### **Table 3.** *Cont.*


**Table 3.** *Cont.*

The information from abstracts were organized in a table with the following information:

Column A: Authors. The column lists the researchers/authors conducting the study.

Column B: Year. The column shows the year of the publication of the article.

Column C: Title/Reference. The column lists the title of the article.

Column D: Main aim. The column lists the main aim presented by authors in the abstract of each article.

Column E: Main aim in brief. This column is a constructed variable that refers to the main aim of each study. The idea was to give in brief the study idea and which outcome measures was focused on in the study.

Column F: Program name. The column gives the name of the project, program or intervention reported in in the article.

Column G: Location and Country. The column lists the specific place or location where the study was performed.

Column H: Study design. The column shows research design of the study according to authors.

Column I: Study design coded. This column is a constructed variable to capture the research design of the study and used to make an analysis of power possible, see Column J.

Column J: Power. The column was constructed to express the strength of the design. It is a dummy variable that was assigned a numerical value that allowed for a quantitative analytical approach.

Column K, L and M: Intervention components. The column shows which intervention components that was used in the study. We used a model that categorizes components into three different mechanisms of influence: cognitive (K), environmental (L, M, N) and social (O).

The environmental component includes actions where availability of meals—or fruit and vegetable (F & V)—were increased. Either through passive provision (F & V and meals) or through active participation such as gardening. The social category included actions where families and/or peers were actively influencing the participants. The cognitive category included teaching and learning.

Column L: Environmental/food focus on F & V. In this column, interventions which were targeted towards fruits and vegetables were flagged. This includes interventions whose focus was providing cooking lessons and maintaining healthy cafeterias during the intervention periods. Also, maintaining healthy cafeteria here refers to school canteens providing healthy options to its menu where children's while buying food have healthier options to choose.

Column M: Environmental/food focus on increasing availability through school gardening. In this column, interventions which provided free foods among participants through gardening within the school were listed.

Column N: Environmental/food interventions focused on healthy meal availability. Interventions which provided healthy meals, breakfast, snacks during the school hours and distributed fresh fruits among the participants were listed in this column.

Column O: Family/social support. In this column interventions that included social components were flagged. These interventions included peer and family influence mechanisms.

Column P: Age. The column lists the age of the targeted groups of the intervention expressed in years according to the primary article data provided by authors.

Column Q: Age construct EA. This column shows a constructed variable for the age categorization based on the primary data given by authors. The constructed code was made to make statistical analyses possible. The construct Early Age (EA) was assigned if intervention were carried out in early school.

Column R: Age construct EML. This column shows a constructed variable for the age categorization based on the primary data given by authors. The code Early Middle Late (EML) was assigned if intervention was targeted all age groups.

Column S: Age construct EL. This column shows a constructed variable for the age categorization based on the primary data given by authors. The code EL refers to Early late and was assigned if the intervention was targeted early and early and late school.

Column T: Sample size. The number of young people enrolled in the intervention was listed in this column.

Column U: Time duration. This column shows the length of the intervention expressed in months. It is a constructed variable based on the primary data given by authors and was made to standardize duration and make it ready for cross study analysis.

Columns V, W, X, Y: Outcome measures. In Columns T, U, V, W, the outcome measures named as Anthropometry, HE/FV (healthy eating fruits and vegetables), Nutritional knowledge, and Attitude, respectively, were listed according to our outcome model shown in Figure 1. Only a few include all outcome measures, but all studies included at least one of them.

**Figure 1.** Outcome measures model. The figure illustrates the four types of outcome measures found in the interventions.

Columns X, AA, AB, AC: Effectiveness. The effectiveness as measured by the outcomes measured are listed in this column. Each outcome measure was rated using a Likert scale from 0–4. The effectiveness of outcome measures among participants as measured by the measures in our model (Figure 1): attitude, anthropometry, HE/FV, nutritional knowledge and attitude were listed in the Columns X, Y, Z, AA, respectively.

Column AD: Target group. This column provides information on the target group of interventions such as information on grades of subjects and municipalities.

Columns AE, AF: Target group. This column is a constructed variable created to capture if the intervention had a special ethnic or socio-economic focus. Columns AC and AD consisted of coded target group named as Standard (S) and Non-Standard (NS). The "NS" here represents the target group either from refugees or immigrants or lower socio-economic classes.

Column AG: Keywords. This column lists the keywords found in the interventions.

Ordinary least squares regression was applied in this study; specifically, we used the linear regression function in IBM SPSS 22. We opted for a multi-variate approach; i.e., multiple linear regression was used. Anthropometry, behaviour (healthy eating and food focus), attitude and nutritional knowledge were used as dependent variables. In order to better account for control variables, such as sample size and study length, a dummy variable was introduced for study length of one year and more; and a logarithm of the sample size was used instead of the actual sample size to eliminate scaling effects. We grouped countries by continents (while splitting Europe into North and South as there were enough studies and no countries in between) and introduced related dummy variables. The remaining variables were used as independent variables without any additional manipulations.

Since the aim was to create models consisting only of independent variables that significantly influence the dependent variables, we used the backwards function. Because there were too many independent variables for the backwards function for the attitude model (with only eight observations), the stepwise function was used instead.

Information and teaching was present in all but one study. Free food was found only in two studies and focus on fruit and vegetables in three studies. Therefore, it is not surprising that neither of the three variables were found to be significant in any of the models.

#### *2.7. Study Sample*

The search strategy resulted in 1826 titles which were screened for duplicates and potential relevance. After this initial screening, 345 titles and abstracts were assessed against the inclusion and exclusion criteria. Articles that studied school interventions after school hours were excluded. In addition, articles which studied interventions among children in out of school context such as at community level were excluded. The justification is that both "after school" and "out of school" since can be regarded as non-typical school environments. We aimed to study the "school" as an artefact that can be considered as a "standard" across countries despite some national differences. For both "after school" and "out of school", we argue that there are considerable differences among countries and that an inclusion of such studies would negatively influence our analytical approach. In total, 42 articles were identified as relevant and full papers were obtained as the final sample. Figure 2 below illustrates the search terms and selection process of articles.

**Figure 2.** Review flow chart. The figure shows the progress of the literature review process following the PRISMA 2009 approach.

#### *2.8. Intervention Study Characteristics*

For all 43 items in our sample, Table 2 provides the information about the study, intervention methodologies, characteristics strategies, etc. In our extract of studies, the sample size ranged from 65-2997 subjects/participants, and the intervention duration ranged from 1 and half month to 36 months. The systematic review locations identified by the author were: 26 from Europe [21,36,38–40,44,46,49,52,54,57,58,63–75], six from Asia [35,42,48,59,60,62], 10 from America [37,41,43,45,47,50,51,53,55,61] and one from Africa [56]. We categorized all interventions according to their intervention components. To this end, we had constructed three classes: Information and Teaching, Food Focus and Family/Social support as illustrated. The interventions characteristics of each included study are shown in Table 2.

Of the total study sample, the majority of studies (*n* = 41) involved "Information and Teaching" components consisting mainly of classroom-based activities (e.g., an adapted curriculum and distribution of educational materials, health and nutrition education program). Another 12 studies along with "Information and Teaching" involved a food focus and availability component. These food and availability components which consisted mainly of supervised school gardening, environmental modifications to stimulate a more healthful diet, such as increased availability and accessibility of healthy foods, distributions free food programmes, school provided free breakfast, school lunch modifications and incentives. Only two studies combined all the three intervention components of this study. Family/social support intervention was clearly focused on in nine study. In other studies, even though their interventions were not primarily or secondarily focused on family/social support component, they indirectly acknowledged the importance of parents and included them in their studies.

All of the reviewed studies included intervention components that were delivered in school settings and within school hours. Our sample showed that consumption of fruit and vegetables was the most used intervention component and was include in more than half of the interventions. Most studies were designed and carried in a way where a research assistant was trained by senior researchers/co-authors to ensure that each members of the research team followed same procedures for data collection. Since all studies were "in situ" studies included a close researcher/school staff cooperation component. In most of the listed studies, teachers being the responsible person to implement the interventions were trained beforehand.

#### *2.9. Types of Interventions*

Table 2 shows an overview of the programmes and their intervention components. From the table, it can be seen that studies differed according to how broadly they intervened. Some studies have included a narrow intervention (i.e., only one intervention components which targeted behavioural components), whereas others included multicomponent approaches where all three intervention components were used in the study.

#### **3. Results**

Finding the right approach to intervening for healthier eating at school is a major challenge. In other words, which interventions create which impacts and how should the public best invest in new policies, strategies, and practices at school if long term health is the intended end point?

The purpose of this review was to compile the evidence regarding the effectiveness of successful school-based interventions in improving dietary behaviours, nutritional knowledge, attitudes and anthropometry among children. The analysis of the data showed a number of relationships between outcome effect and a number of other characteristics of the intervention (i.e., age, location/region, intervention type, duration). Descriptive statistics are provided in Table 4.


**Table 4.** Descriptive statistics.

The linear regression models carried out for each intervention component is added in the text and the tables have been referred to each associated result. Out of 42 studies, 36 studies reported the outcome on HE/FV behaviour scale while anthropometry and attitude impacts were observed in 18 and six studies, respectively. The item one of the results in this article presents the most general finding from the literature review, item two describes the variable found significant in two cases, while the remaining variables were significant in once case each. Additionally, item four, five and six are related "design" phenomena effects in the sense that they are not related to intervention components but to the study was designed your study. The rest is related to (intervention components rather than designs. In Table 5, the outcome measures for which an effect could be seen has been listed. The linear regression model describing what influences the attitude is provided in Table 6.


**Table 5.** Linear regression model for attitude.


**Table 6.** Linear regression model for anthropometry.

With regards to the explanatory power of themodel,*R* <sup>2</sup> =0.789,*R* <sup>2</sup> adj. =0.719, andsignificance = 0.009. The linear regression model describing what influences the anthropometry is provided in Table 6. With regards to the explanatory power of the model, *R* <sup>2</sup> = 0.683, *R* <sup>2</sup> adj. = 0.586, and significance = 0.003. The linear regression model describing what influences the behaviour is provided in Table 7.


**Table 7.** Linear regression model for behaviour.

With regards to the explanatory power of the model, *R* <sup>2</sup> = 0.121, *R* <sup>2</sup> adj. = 0.096, and significance = 0.037.

An alternative linear regression model describing what influences the behaviour is provided in Table 8.

**Table 8.** Alternative linear regression model for behaviour.


With regards to the explanatory power of the model, *R* <sup>2</sup> = 0.449, *R* <sup>2</sup> adj. = 0.432, and significance < 0.001.

#### *3.1. School-Based Interventions in General Create Impact*

Looking across the whole study sample, it can be seen that in general the interventions created an impact in one or more ways either on knowledge, intentions, eating habits and/or anthropometry. In other words, it was hard to find studies that created no impact. This finding adds to the body of evidence that suggests that food-based interventions are a well-suited and effective policy tool when it comes to promoting healthier eating among young people.

#### *3.2. Family Support A*ff*ects Healthier Eating Behaviour and Attitude*

Out of all the included studies, nine studies focused on family support as an intervention component. But out of those, our analysis showed that the family involvement was impactful among participants when it comes to promoting healthier food choices. Parents being influencers and role models in the family in these studies seemed to help to influence children's dietary habits. Studies which involved participants' parents in the intervention and provided them with nutritional knowledge and healthy cooking skills (i.e., knowledge about the importance of healthy food and nutrition during the early age of their children), seemed to be able to help young people prepare more healthy and nutritious food at home. As studies showed, this seemed to increase children's intentions towards eating more fruits and vegetables and eventually resulted in consumption of more healthy foods. However, this did not seem to be the case for all ages. Intention to eat more fruits and vegetables was seen among early age participants (EA) either alone or with family support. It should be noted that the regression models did not include interactions, since the number of analysed studies was only ~40. It was not possible to include age as a continuous variable in the models because (as it can be seen in Table 5) age was a range, and sometimes even a wide range, e.g., 8–11 or 4–11. Family support increases the outcome measure by approximately 1 in both cases. Please refer to Tables 5 and 7 for detailed linear regression model used for attitude and behaviour.

#### *3.3. Interventions Done in Northern Europe (7 Studies) Had a Smaller Impact on Behaviour than the Studies Conducted in the Rest of the World (22 Studies)*

The results from the models which was created to measure the efficiency of HE/FV highlighted the fact that HE/FV scale depends only on region where the intervention was done. The behaviour outcome for Northern Europe was on average 1.5 while the average for the rest was 3.2 (please refer to Table 8).

### *3.4. E*ff*ect of Anthropometry Measures Increases with Study Power*

The results suggested that the design of the study plays a role when it comes to be able to show impact of interventions. From the findings, it was clear that the anthropometry measured among the participants were increasing with the power of the study. That is, the stronger the design the greater the likelihood of being able to measure impact on anthropometric outcomes—a unit increase in the design power is associated with an outcome increase of approximately 1.5 (please refer to Table 6). To examine the influence of study design we used the score that was constructed for the purpose (please refer to Table 1). This score assigns a higher power to randomized designs than non-randomized ones.

#### *3.5. Study Duration Impacts Anthropometric Outcomes*

It was also clear that the intervention duration does have impact on the outcome, i.e., the longer the duration better the anthropometric results among the children. Interventions that lasted a year or more, had the outcome measure on average almost one unit higher than shorter studies (please refer to Table 6).

#### *3.6. Larger Samples Impacts Anthropometry Measures*

Results showed that anthropometric outcome decreased within the sample size. Increasing the sample size by a factor of 10, from approximately 100 to 1000, decreased the outcome measure by almost 2.5 (please refer to Table 6). Thus, bigger the sample size a reverse effect on outcome was obtained. The studies whose intervention was done for long period of time (i.e., couple of months or year and among small participants) were found to be effective in the outcome. It might be the case that it was hard to administer the same thing to large sample size post intervention and thus could have decreased the anthropometry outcome among the participants.

#### *3.7. Food Availability Interventions Influence Anthropometric Outcomes*

Our analyses showed that a food focus, specifically healthy meal availability had an impact on the children's anthropometric outcomes—increasing it by almost 3.5 on average (please refer to Table 6).

#### *3.8. Interventions among Younger Students Influence Attitude Among Participants*

Results showed that the younger the study subjects were, the more influence interventions had on attitudes (the outcome was on average 0.75 higher than for other age groups). Thus, the result suggests that the participants' attitude increases when they are in their early age (EA) i.e., 4–7 years old. Furthermore, results suggest that increased family support associated with participants' attitude towards healthy eating helps in changing the behaviour among them. Early age (EA) and family support seemed to impact positively both alone and together. Meaning that the intervention had positive impacts on participants (i.e., EA participants) attitudes towards healthy eating either with the involvement of their family support or without the involvement of family support. Please refer to Table 5 for detail linear regression model for attitude.

#### *3.9. No E*ff*ect of School Based Interventions on Nutritional Knowledge*

Findings showed that nutritional knowledge among participants (i.e., of all age group) does not depend on school-based interventions. Thus, none of the collected variables have influences on nutritional knowledge.

#### **4. Discussion**

#### *4.1. Discussion of Results of This Review in Relation to Others*

In the discussion we aim to relate our findings with what has been found in previous studies, discuss our methodological approach and reflect on what are the policy implications. Since the discussion on how to counteract the unhealthy eating pattern and the worrying increase in nutrition related disorders among young people is attracting much attention and since the discussion on how the school could contribute we aim to give policy makers and practitioners an up to date insight into the potentials of the school to act as a hub for promotion of healthier eating and provide inspiration for the development of new types of school-based interventions and strategies.

The huge interest in using the infrastructure of the school to initiate and promote healthier eating among young people has resulted in a large number of interventions studies over the past decades. This research interest per definition as the same time creates a need for syntheses of the findings in order to make them feed into the public health and school policy cycle and to "send the results to work". Taken the huge investment that better food at school strategies at school will cost for states it is worth appreciating that the Evidence-Informs-Policy pathway seems to be working. At the same time the conceptual approaches and the understanding of what intervention components might work better than others, which age groups might benefit the most etc. as developed considerably which again adds to the rationale for synthesis of intervention study findings. Most recent reviews by Julie et al. [76], Noguera el al. [77], Evans et al. [78], Cauwenberghe et al. [34] and Brown et al. [79] has created a time gap of almost five years. Covering the last five years of research our review makes a needed contribution and in addition we argue it makes a needed contribution to a standardization and conceptualization of both sampling and intervention design methodologies.

Overall, the findings from this review suggest that school-based interventions that include intervention components such as information and teaching, food focus and family support are effective in improving the HE/FV, anthropometric measurements and attitude towards healthy dietary behaviour among the participants. On the other hand, nutritional knowledge among participants did not seem to be influenced much by any of the intervention components used.

Impacts on HE/FV behaviours were observed, but mostly among early age children revealing a distinct age pattern in the findings. Thus, age was seen as a significant factor in determining effectiveness in several study [35,37,39,42]. Impact was greater on young children in the 4–7 year old age range, suggesting that dietary influences may vary with age.

Multicomponent approaches that includes good quality instruction and programs, a supportive social environment both at school and home, family support has been effective in addressing childhood related diseases through focusing on diet and physical activity. Most of the studies in this review implemented with combination of school staff and intervention specialists provide evidence for the effectiveness of the program. Thus, evidence supports that family involvement and nutrition education curriculum delivered by the teacher under supervision of intervention specialists can alter the intake of fruit and vegetables while impacting positively on anthropometric measurements. Teacher led interventions have been effective and can be the most sustainable approach for long term impact of the program. The same conclusion was found in a review done in investigating the effectiveness of school-based interventions in Europe which provided the effectiveness of multicomponent intervention promoting a healthy diet in school aged children in Europe [34].Studies with a food focus in their intervention approaches showed significant improvements in BMI [35,54,58]. Significant improvements in BMI here refers to the studies whose probability value was less or equal to 0.05. This means that the interventions in that case showed reduction in body mass of participants. We looked at studies whose aim was to focus on interventions of obesity prevention or reduction among primary school children's. Thus, search term such as: "obesity prevention intervention among primary schools", was used as explained in the methods section. When performing the search for school-based interventions we did not encounter any studies that were focusing on underweight. Making the options for healthy choices

of food in the school cafeterias and having the option of free food from the school gardens decreases the sugar sweetened beverages and junk options among the children's and thus resulting in improvements in BMI. This review evidence further highlights that duration of the intervention, i.e., a year or more has an impact on anthropometric measurements. This is in contrast to reviews of Julie et al. [76] and Cauwenberghe et al. [34] review that found that making the better options of food choices and duration of the studies were effective in reducing the sedentary behaviour and noting improvements in BMI. This study also found that larger sample sizes reverse the outcome of anthropometric measurements (i.e., sample size negatively influences the outcome). This might be the case because it might be harder to administer the same thing to more individual. Thus, more studies are needed to examine the effects of bigger sample sizes.

Our study is far from being the first to create overview of the large number of studies that are studying interventions that can promote healthier eating habits and that can counteract the worrying increase in obesity and overweight among young people the general. The huge interest is reflected in the number of studies trying to assess the impact and effectiveness of school-based interventions as well as in the number of reviews aiming to synthesize the findings from the growing body of evidence of the effect of school-based food interventions into actionable school food policies. Our study adds to this body of knowledge and fills a gap since our study looks at the most recent studies.

Comparing our review with others we find that the majority of the studies on school food-based interventions have been conducted in high income countries. This is also the case in our study and this fact is important to keep in mind since it introduces a bias in the insight created from school food effectiveness reviews. It is also important to keep in mind that studies—and as a result also reviews-covers different types of school food cultures. These cultures can roughly be divided in collective, semi collective and non-collective types. In the collective type found in countries such as Sweden, Finland, Estonia and Brazil school food provision is an integrated—and mainly free—part of the school day. In semi-collective approaches food is in most cases traditionally a part of what is offered at school, but due to payment. In the non-collective approach found in countries such as Denmark, Norway and the Netherlands there is little infrastructure and tradition for school organized foodservice. In this approach parents organized lunch boxes as well as competitive foods traditionally play a bigger role.

A further important note to make is the distinction between narrow F & V approaches and broader healthier eating intervention approaches. This classification can also be seen in previous studies and in more recent reviews. The first type of interventions that follow the six-a-day tradition that to some extent has been fuelled by the European School Fruit program introduced by the EU in 2009 was reviewed by Noguera et al. [77] and by Evans et al. [78]. In a study by Noguera el al. [77] a meta-analysis on F&V interventions was done but limited to educational interventions in the sense that it only looked at computer-based interventions and covering mostly European research. The study showed that this targeted but narrowed approach was effective in increasing FV consumption but that broader multicomponent types of interventions including free/subsidized FV interventions were not effective. In the review paper from 2012 by Evans et al. [78] examined studies done in United Kingdom, United States, Canada, Denmark, New Zealand, Norway and the Netherlands. Evans and co-workers [78] found that school-based interventions were able to moderately improve fruit intake but that they had only minimal impact on vegetable intake. These reviews and previous ones generally conclude that F&V targeted interventions are able to improve young people's eating patterns towards higher intake of fruit.

In the category of reviews taking a broader approach to healthier lifestyle promotion we find studies and reviews that looks at promotion of healthier eating in general—and that in some cases include physical activity. A review by Julie et al. [76] covered studies from United States, United Kingdom, Australia, Spain and the Netherlands. This review also included physical activity as part of broader school-based obesity prevention interventions. In particular, interventions should focus on extending physical education classes, incorporating activity breaks, and reducing sedentary behaviours to improve anthropometric measures. Julie et al. concluded that interventions taking a broader approach should include employing a combination of school staff and intervention specialists to implement programs; that they should include psychosocial/psychoeducational components; involve peer leaders; use incentives to increase fruit and vegetable consumption and should involve family. In a study by Cauwenberghe et al. [34] intervention studies done in a European union studies were reviewed. This review—as our study do—made an age distinction in the sense that a categorization was done between children and adolescents. Among children the authors found a strong evidence of effect for multicomponent interventions on fruit and vegetable intake. For educational type of interventions Cauwenberghe et al. [34] found limited evidence of effect as found when looking at behaviour and fruit and vegetable intakes. The study found limited evidence on effectiveness of interventions that specifically targeted children from lower socio-economic status groups. For adolescents Cauwenberghe et al. [34] found moderate evidence of effect was found for educational interventions on behaviour and limited evidence of effect for multicomponent programmes on behaviour. In the same way as our review authors distinguished between behaviour and anthropometrics and found that effects on anthropometrics were often not measured in their sample. Therefore, evidence was lacking and resulted in inconclusive evidence. Cauwenberghe et al. [34] concluded that there was evidence was found for the effectiveness of especially multicomponent interventions promoting a healthy diet but that evidence for effectiveness on anthropometrical obesity-related measures was lacking. In a review by Brown et al. [79] studies mostly from Europe but also covering United States, New Zealand, Canada and Chile it was found that intervention components most likely to influence BMI positively included increased physical activity, decreased sugar sweetened beverages intake, and increased fruit intake.

Our review adds to the increasing support for the idea that school should play a role in promoting healthier eating habits among young people. As such the school can be seen as an important actor when it comes to the promotion of human rights. In particular; the right to adequate food, the right to the highest attainable standard of health and right to the education, school plays an integral part which has also been highlighted in the "United Nations System Standing Committee on Nutrition" new statement for school-based and nutrition interventions [25]. Furthermore, Mikkelsen and colleagues [80] in their study have also suggested the fact that the international framework of human rights should invoke its strategies, policies, and regulations in the context of school and that national, regional, and local level actors has important roles to play. Additionally, they have highlighted that ensuring healthy eating in school environment can be a good investment in children short- and long-term health and education achievements. Thus, schools, as a system have the potential to make lasting improvements in students nutrition both in terms of quality and quantity and simultaneously contribute to realization of human rights around the globe [25].

#### *4.2. Discussion of Methods*

#### Strengths and Limitations

All attempts to reduce complexity of research studies in a research field suffers from in built weaknesses. Standardising the work of others in attempts to make generalizations is always difficult. As per definition a review includes attempts to standardize its study material in order to create an overview of "what works" and what "this that works" depends on. For obvious reasons research protocols depends very much on the context of the study: What is doable in one study setting on one country might not work on other settings. Additionally, reporting procedures vary among authors. The aim of a review is to standardize this heterogeneity to something that is homogenous and computable. So, in our case our constructs represent an attempt to make different studies with similar but slightly different approaches and methodologies comparable by making them computable. This has obviously some disadvantages.

Another limitation is that our review restricted itself to cover only published English language articles. Therefore, publication bias cannot be excluded, as it is possible that the inclusion of unpublished articles written in other languages than English will have affected the results of this review. Second, most of the studies included in the present were carried out in countries from Southern and Northern parts of Europe. This raises questions about the generalisability of these results to other countries in Europe, especially because contextual variables were often lacking in the included studies. And the same questions about the generalisability could be raise in other parts of the world i.e., in Latin America, North America, Asia and Africa, as very few studies were reported from this part of the world.

On the other hand, large dropouts were reported in many listed studies and the study follow up were reported in few studies and was for short time period. Among these studies which did follow up, was right after the end of the intervention period and thus this could have affected the effectiveness among this study outcomes. Long-term follows-up post-interventions would help to study the retention of behaviour change and effect on the body composition among the participants. Thus, long terms studies post interventions are needed to draw the conclusion about the sustainability of an intervention. Additionally, in future studies to improve the quality of the evidence of effectiveness in this kind of interventions, studies with high quality, rigorous design, appropriate sample size, post interventions long term follow up, assessment of implementation issues and cost effectiveness of the intervention should be executed.

On the strength side the standardisation approach helps to find patterns and to create overview of a large material within a given field of research. The strength of this study is that it provides a broad up to date overview of what is known about the relationship between school-based intervention and policies and healthy eating outcomes among children and that it contributes to the deeper understanding of the fact that current research findings are quite limited. This is among the very few recent reviews which evaluated the effect of school-based food at nutrition interventions among children only. A systematic review approach of this study attempted efficiently to integrate existing information and provide data for researchers' rationale in the decision making of future research. Furthermore, the applied explicit methods used in this limited bias and, contributed to improved reliability and accuracy of drawn conclusions. Other advantages are that this study looks specifically at the evidence available in Northern and Southern Europe. Statistical analyses of pooled data have facilitated a more through synthesis of the result is one of the biggest strengths of this study.

#### *4.3. Policy Implications*

The evidence of the impact of school intervention derived from our review suggests several topics to be dealt with in future research not only in Europe but also the other part of the world. First, this review highlights the need for researchers to recognize the importance of further investigations on the measures of anthropometrics, nutritional knowledge, and attitude. Among these 42 studies carried out in different regions very few looked upon the effects on participants' attitudes and anthropometrics measures. And of those showed positive impact if family support was provided, if started at early age and lastly if food focus was part of the intervention. Additionally, most of the included studies were not aiming to contribute to obesity prevention. Thus, it is highly recommendable that there is urgent need for more studies to be done that includes more measures of efficiency of participants' attitude towards the healthy behaviour and healthy lifestyle and measures for anthropometrics. Second, to increase the comparability between studies and to facilitate the assessment of effectiveness, more agreement is needed for best measures of the diet and questionnaires. Third, more research is needed to be done among specific groups like low socio-economic group, immigrants or minorities. As mention earlier, only few listed studies included this specific group in their studies. Furthermore, evidence suggest that health inequalities such as prevalence of overweight are as a result of dietary habits and ethnicity and socio-economic status are identified as determinants of health eating. Thus, future research should not exclude these specific groups as European countries have become ethnically diverse.

To improve or decrease childhood diseases such as overweight and obesity and other aspects of health, many policy documents have been calling for the development of the effective strategies among children's and adolescents. Even though the limited to moderate impact and evidence was found among these school-based interventions, it should be noted that interventions were not primarily targeting obesity prevention but, in many cases, had a broader scope. Thus, in order to deliver these evidence-based recommendations to policy makers factors such as sustainability of intervention, context and cost effectiveness should be considered. Additionally, the policy makers should ensure school policies and the environment that encourage physical activity and a healthy diet.

#### **5. Conclusions**

Findings from this systematised review suggest that applying multicomponent interventions (environmental, educational, and physical strategies) along with parental involvement and of long-term initiatives may be promising for improving dietary habits and other childhood related diseases among primary school children. Despite being challenging to find experimental studies done in related fields, those studies found showed positive trend. Thus, to conclude, evidence of the effect was found among school-based food and nutrition initiatives among primary school children. However, to strengthen the perspectives of this study, further systematic review targeting the more long-term studies assessing the long-term sustainability of the interventions should be considered. Also, studies with goal to increase efficiency of anthropometric measurements in their future school-based interventions could include increasing PA, increasing fruit and vegetable intake and decreasing sedentary behaviour. This study has provided fundamentals background on which further research could be done in this area of school-based food and nutrition interventions. Thus, the findings from this systematic review can be used as guidelines for future interventions in school settings related to food and nutrition. Also, the categorization of intervention components we see as useful for the planning of future interventions.

**Author Contributions:** Conceptualization, B.E.M. and A.C.; methodology, B.E.M., A.C. and F.S.; validation B.E.M.; formal analysis, F.S.; investigation, A.C.; resources B.E.M. and A.C.; data curation, A.C. and F.S.; writing—original draft preparation, B.E.M., A.C. and F.S.; writing—review and editing, B.E.M., A.C. and F.S.; project administration, B.E.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **E**ff**ects of the Preschool-Based Family-Involving DAGIS Intervention Program on Children's Energy Balance-Related Behaviors and Self-Regulation Skills: A Clustered Randomized Controlled Trial**

**Carola Ray 1,2,\* , Rejane Figuereido 1,3, Henna Vepsäläinen <sup>2</sup> , Reetta Lehto 1,2 , Riikka Pajulahti 1,2, Essi Ska**ff**ari 1,2 , Taina Sainio 1,4, Pauliina Hiltunen 1,2, Elviira Lehto 1,4 , Liisa Korkalo <sup>2</sup> , Katri Sääksjärvi <sup>4</sup> , Nina Sajaniemi 4,5, Maijaliisa Erkkola <sup>2</sup> and Eva Roos 1,2,6**


Received: 25 June 2020; Accepted: 24 August 2020; Published: 26 August 2020

**Abstract:** The study examines the effects of a preschool-based family-involving multicomponent intervention on children's energy balance-related behaviors (EBRBs) such as food consumption, screen time and physical activity (PA), and self-regulation (SR) skills, and whether the intervention effects differed among children with low or high parental educational level (PEL) backgrounds. The Increased Health and Wellbeing in Preschools (DAGIS) intervention was conducted as a clustered randomized controlled trial, clustered at preschool level, over five months in 2017–2018. Altogether, 802 children aged 3–6 years in age participated. Parents reported children's consumption of sugary everyday foods and beverages, sugary treats, fruits, and vegetables by a food frequency questionnaire, and screen time by a 7-day diary. Physical activity was assessed by a hip-worn accelerometer. Cognitive and emotional SR was reported in a questionnaire by parents. General linear mixed models with and without repeated measures were used as statistical methods. At follow-up, no differences were detected in EBRBs or SR skills between the intervention and control group, nor did differences emerge in children's EBRBs between the intervention and the control groups when stratified by PEL. The improvement in cognitive SR skills among low PEL intervention children differed from low PEL control children, the significance being borderline. The DAGIS multicomponent intervention did not significantly affect children's EBRBs or SR. Further sub-analyses and a comprehensive process evaluation may shed light on the non-significant findings.

**Keywords:** energy balance-related behaviors; self-regulation skills; preschoolers; children; randomized controlled trial; intervention effects; parental educational level; intervention mapping; multicomponent intervention

#### **1. Introduction**

Young children's food intake, screen time, and physical activity (PA), commonly referred to as energy balance-related behaviors (EBRBs) [1], are of importance since they can predict the future weight status and health of children [2–4]. A socio-economic status (SES) gradient exists already in preschoolers' EBRBs; those with low SES family backgrounds tend to have less healthy EBRBs such as higher intake of sugary foods or beverages and excessive screen time [5–7].

Home and an early childhood education and care center, hereafter preschool, are the settings where three to six-year-olds spend most of their time, and it is therefore important that these environments promote healthy EBRBs including sufficient PA and fruit and vegetable (FV) consumption [8–10]. Reviews have concluded that EBRB interventions should be conducted at preschools and homes simultaneously in order to be successful [11,12]. Preschool-based family-involving interventions have been reported to be promising [12–15], although some studies show no effects on EBRBs [12,14,16]. This has raised discussion on intervention design and implementation in families [12]. When designing interventions for the general population, they should reach and show higher effects on those needing it most, namely those with low SES backgrounds [5,17]. To date, knowledge of the equity effectiveness of EBRB interventions among children is sparse [18,19]. Promoting several EBRBs simultaneously is challenging, as the aim can be to both promote healthy behaviors and discourage unhealthy behaviors. Strategies can differ, a review concluding that promoting PA among young children is successful when focusing on the preferred behavior, rather than focusing on decreasing sedentary time such as lying or sitting down [20].

Strengthening children's self-regulation (SR) skills in parallel to promoting children's healthy EBRBs could be an effective strategy in interventions [21,22]. Self-regulation is a multidimensional concept, briefly described as the capacity of a goal-directed behavior to regulate actions, emotions, and cognitions [23]. Cognitive SR skills refer to executive functioning such as self-monitoring to plan and proceed toward long-term goals [24–26], whereas emotional SR skills refers to capacities such as being able to recognize one's own feelings and staying calm in stressful situations [24,25]. Associations between children's SR skills and less favorable EBRBs and weight status have been found [21,22,24,25]. The Head Start study tested the strategy of strengthening young children's SR skills alongside promoting their healthy EBRBs [27]. The intervention included four arms: intervening on EBRBs and SR skills; intervening on EBRBs; intervening on SRs skills; and no intervention. Effects were seen in lower sugar-sweetened beverage consumption in the study arm promoting EBRBs and SR skills compared with the other arms [27].

The Increased Health and Wellbeing in Preschools (DAGIS) intervention aimed to promote preschoolers' (aged 3–6 years) healthy EBRBs and SR skills. The assumption was that there would be greater effects on children from families with low parental educational levels (PEL), also assuming a reduction in any health gaps between children with low and high PEL backgrounds [28]. The intervention development process was guided by the Intervention Mapping (IM) framework [29] and the process is described elsewhere [28]. A cross-sectional study served as the needs assessment [7,28], and based on these findings, there were three main aims: to reduce children's screen time; to reduce the consumption of sugary everyday foods and beverages; and to increase vegetable consumption. In these three behaviors, the needs assessment showed less favorable behaviors among children with low PEL background [28]. To promote alternatives to the reductions, additional aims were to increase fruit and berry consumption and total PA (light, moderate, and vigorous intensity) [28]. In addition, the intervention aimed to strengthen children's SR skills. Activities were planned to suit families with low PEL backgrounds.

In Finland, 78–86% of three to six year-olds attend municipality-driven preschools [30]. Therefore, preschools offer a good setting for interventions. As screen time and sugary food and beverage consumption occurs mostly at home [31], homes were considered as an equally important intervention setting. The developed program lasted 23 weeks, and was divided into five themes: SR skills; PA; fruit

and vegetables; screen time; and sugary foods and beverages. Each theme was in focus for four to five weeks.

In this study, we aimed: (1) to evaluate the effects of a preschool-based family intervention on children's EBRBs and SR skills, and (2) to evaluate whether effects were stronger among children with low PEL background than among those with high PEL background.

#### **2. Materials and Methods**

The DAGIS intervention study is a preschool-level clustered randomized controlled trial (RCT) aimed to promote preschoolers' healthy EBRBs and SR skills so that those from low SES background would benefit most from the program. The study was conducted between September 2017 and May 2018 including baseline and follow-up measurements [28]. Early educators delivered the program and all included activities to all preschoolers independently of their participation in the study. Prospective trial registration number: ISRCTN57165350 (the 8th of January 2015).

#### *2.1. Recruitment*

We aimed to invite municipalities that had a high number of preschools and had a large variety in educational and income levels among inhabitants as well as being located within a convenient distance from the Helsinki region. Municipalities invited were selected by comparing municipality statistics from southern and western Finland [32], and excluded municipalities that were already part of the previous comprehensive DAGIS survey in 2015–2016 [7]. Power calculations prior to the recruitment for the intervention were based on the DAGIS survey results; specifically, we used the average (about 1.7 times/week for all and about 2 times/week for low PEL group) and standard deviations of children's sugary food and beverage consumption frequency [7]. Based on those values, we decided to aim at a decrease of 0.74 times/day in sugary foods and beverages consumption frequency. To detect a change of 0.74 times/day less sugary foods and beverages, the required sample size was calculated to be 432 children, considering an attrition rate of 70% (Fpower macro, SAS version 9.4.). The significance level was set at 5% and the power at 80%.

Altogether, seven municipalities were invited to participate in the study, and an oral presentation on the study was offered. Five municipalities had an oral presentation; two of these municipalities chose to participate. One municipality decided that all of its preschools (*n* = 29, preschool managers *n* = 19) would participate, whereas the other municipality allowed its preschool managers to make the decision individually, as such, the managers of three preschools chose to participate. We decided that these 32 preschools and 1702 eligible preschoolers were sufficient for our study (Figure 1).

Researchers visited each preschool to inform early educator professionals about the project and their role in the project. The recruitment phase lasted 1–2 weeks, and families returned informed consents (or refusals to participate) to preschools in sealed envelopes. Thereafter, the researchers returned to preschools to distribute the baseline research material for early educators, parents, and children.

**Figure 1.** Flow chart in the Increased Health and Wellbeing in Preschools (DAGIS) intervention study, in accordance with the Consolidated Standards of Reporting Trials (CONSORT) 2010 statement [33].

#### *2.2. Ethical Issues*

The DAGIS intervention study received ethics approval from the Helsinki Ethics Review Board in humanities and social and behavioral sciences (22/2017; 16 May 2017). Early education professionals were informed about the study through site visits. The early educators' questionnaire stated that participation was voluntary and that the early educators had the option to withdraw at any stage of the study. Early educators gave their consent by filling in the questionnaire. Families returned written informed consent, and thereafter, the questionnaires were delivered.

#### *2.3. Data Collection and Measurements*

The baseline data collection occurred in four waves over five weeks and the follow-up data collection in three waves over five weeks. Data collection in waves was necessary due to the limited number of accelerometers available for measuring children's PA. Research staff visited each preschool to instruct early educators and left printed screen time diaries for families, study questionnaires for families who had requested paper copies, and accelerometers for children. These materials were picked up from preschools one week later. However, most parents requested that their questionnaires be sent electronically by sending the parent's main questionnaire as a personal link and the food frequency questionnaire link by email.

#### 2.3.1. Measurements

Screen time was assessed by a printed screen time diary. In the diary, parents recorded their child's use of screens outside preschool time whenever the child used a screen for more than 10 min in a row. Screen use was recorded separately for different screens: TV, DVD, computer, tablet, or cell phone. The screen time diary was a slightly modified version from a previous validated diary [34], as the original did not include portable screens and questions about screen contexts. The screen time diary has shown good reproducibility [35]. Screen time was calculated for children who presented data for at least three weekdays, and one weekend day. Total screen time (min/day) was calculated as a weighted mean: (5 × weekday mean + 2 × weekend mean)/7.

Children's PA was assessed by a hip-worn accelerometer, the ActiGraph wGT3X-BT (ActiGraph, LLC, Pensacola, FL, USA), 24 h/day over seven consecutive days, and parents kept a screen time diary over the same days. A 15-s epoch length was used for data derived from accelerometers, and more than ten minutes of consecutive zeroes was set as non-wearing time [36]. In the analyses, the cut-off points of Evenson et al. [37] for children aged 5–15 years were used, which means that total PA including light, moderate, and vigorous intensity PA is defined as more than 100 counts/min. Inclusion criteria for the child's PA data to be in the analyses were that there were data for at least four days, of which one was a weekend day. In addition, each day needed to have 600 min or more of awake wearing time. The mean total PA (min/day) was used in the analyses.

The original 47-item food frequency questionnaire (FFQ) was designed for the DAGIS survey to particularly measure the consumption frequencies of vegetables and fruits as well as sugary foods and beverages [38]. It has shown acceptable validity for ranking food group consumption compared with 3-day food records [38], and testing the reproducibility of the items has yielded acceptable results [35]. In the DAGIS intervention, the FFQ was expanded into a 51-item FFQ that included six food groups (vegetables, fruit, and berries; dairy products; fish meat and eggs; cereal products; beverages; and other foods such as sweets and snacks). A link to the electronic 51-item FFQ was sent to all parents and hard copies were sent to those who did not fill in the electronic version. Parents reported how many times during the past week the child had consumed foods outside preschool hours. The FFQ included three answer options: not at all, times per week, and times per day. The instruction was to either tick the 'not at all' box or to write a number in one of the other columns. The FFQ was intentionally restricted to not cover municipality-provided foods and beverages consumed during preschool hours because parents would not have been able to reliably report these foods.

The three food consumption frequency variables ('sugary everyday foods and beverages', 'sugary treats', and 'fruit and vegetables (FV)') were formed by summing up the consumption frequencies (times/week). The sugary everyday foods and beverages variable included flavored yogurt and quark; puddings; sugar-sweetened cereals and muesli; berry, fruit, and chocolate porridge with added sugar; berry and fruit soups with added sugar; soft drinks; flavored and sweetened milk- and plant-based beverages; and sugar-sweetened juices. The sugary treats variable included ice cream, chocolate, sweets, cakes, cupcakes, sweet rolls, Danish pastries, pies and other sweet pastries, and sweet biscuits and cereal bars. The FV variable included fresh vegetables, cooked and canned vegetables, fresh fruit, and fresh and frozen berries.

Children's SR skills were assessed with 10 items derived from the Child Social Behavior Questionnaire, previously used in the Millennium Cohort Study on 3-year-olds [26]. Five items assessed cognitive skills and five items emotional SR skills. Each statement had three response options: disagree; agree to some extent; and fully agree. The mean points for each sub-dimension were calculated and used in the analyses. The internal consistency reliability as Cronbach's alphas was 0.68 for cognitive and 0.78 for emotional SR skills.

#### 2.3.2. Parental Educational Level

The parent filling in the guardian's questionnaire reported his/her own highest educational achievement and the education of a partner living in the same household. The six answer options were categorized as follows: low educational level (comprising comprehensive school, vocational school, or high school); middle educational level (bachelor's degree or college); and high educational level (master's degree or licentiate/doctor). The highest educational level among parents was used as the parental educational level (PEL) variable in the analyses. In four cases, the highest education was not the education level of the mother or the father of the child, but that of a spouse living in the same household.

#### 2.3.3. Confounding Factors

The parent reported the date of birth and gender of the participating child. In the statistical analysis, adjustments were made for the child's gender and age at baseline (continuous) for the categorical variable PEL and for the municipality.

#### *2.4. Randomization, the Intervention, and the Program Content*

Randomization was made at the preschool manager-level, separately for the two municipalities by an online randomization program (https://www.randomlists.com/team-generator). Preschools were divided into small and large preschools before randomization. After the baseline measurements, preschools were informed whether they had been randomized into the intervention (*n* = 13) or control (*n* = 19) group (Figure 1).

In intervention preschools, all early educators received program training. The training was split into a longer training session after the baseline measurements and a shorter training session around the middle of the 23-week program, in all, approximately 8 h [28]. Throughout the intervention, two researchers engaged with early educators conducting the program by email. Basically, the program at preschools was based on the international MindUp™ program [39]. Healthy EBRBs promoting strategies and methods were added to the existing ones in the program, and a program for families was developed [28]. The program was run in both preschools and homes and divided into five themes, all of which lasted 4–5 weeks: SR skills; physical activity; fruit and vegetables; screen time; and sugary foods and beverages. SR skills along with each EBRB were emphasized throughout the program in the preschool activities. SR skills were promoted by brain breaks, which were a few minutes' calming down and breathing sessions three times per day, led by early educators. In addition, early educators were trained to teach children to recognize and reflect on different feelings. In the family activities, focus was set on the children's EBRBs, and on how parents, by acting as role models and changing the availability and accessibility of the home environment, could influence their children's EBRBs. The methods used for families were, among others, information letters, emails containing videos or articles, bingos related to EBRBs, and two fairy tales written for the project. For each of the five themes, preschools arranged one activity afternoon. Early educators received the instructions and needed materials for the activities at the program training sessions. The activity afternoons were conducted as a workshop for children and parents to which all families were invited. An activity afternoon could consist of a working sheet about vegetable eating habits and favorite vegetables, or a vegetable tasting session that children and parents conducted together. Materials that were produced during the afternoons were expected to be displayed at the preschool, so that families could see each other's works. The early educators in the control preschools received training for the program after the intervention was finished.

#### *2.5. Statistical Analyses*

Differences between the participants' characteristics and the two groups (intervention/control) at baseline were analyzed by the Chi-square test (categorized variables) and *t*-test (continuous variables). Our main outcomes were total screen time (min/day), total PA (min/day), two variables related to sugar consumption (sugary everyday foods and beverages, and sugary treats, as times/week), total FV consumption frequency (times/week), and SR skills (cognitive and emotional dimensions, as scores). As a first step, a simple model was used to show the comparison between the intervention and control groups. To evaluate this, we used the general linear mixed models adjusted for baseline value of the outcome. This first model was used as a simple description of the results at follow-up. As a second step, a more complete and appropriate model was used with the major interest to evaluate the results between follow-up and baseline for the control and intervention groups. For this aim we used the

linear mixed models with repeated measures for all outcomes, taking into account the interaction between the two groups and two time-points of baseline and follow-up. In the mixed models, normal distribution was visually checked. The preschool unit was used as a random effect in order to adjust for variability between the preschools. All aforementioned analyses were adjusted for child's gender, age at baseline, municipality, and PEL. Furthermore, accelerometer wearing time was included as an adjustment variable in the analyses where PA was the outcome. We also evaluated linear mixed models with three-level interactions: groups (intervention and control), time-points (baseline and post-intervention), and PEL. For these models, the results for the comparison between the two groups and time-points were presented as stratified by PEL group. In all analyses, multiple imputation was applied for independent variables with missing values. The number of children included in the analysis of each dependent variable and the missing values are presented in Supplementary Table S1 and the complete results for the linear mixed models with repeated measures and the respective effect size for interaction is presented in Supplementary Table S3.

All analyses were based on the intention-to-treat principle so that all randomized participants were included in the analysis in their randomized intervention group. General statistical analysis was performed and tables created using SPSS version 25. Mixed models, effect size for models' interaction, and multiple imputation analysis were conducted in R version 3.4.3 using the lme4, MuMIn, and MICE packages, respectively. For all analyses, a 5% statistical significance level was adopted.

#### **3. Results**

The average age of children in the study was 5.24 (±1.06) and 5.14 (±1.04) years for the control and intervention groups, respectively. Even though most characteristics were similar in the groups, a higher percentage of children with high educational level parents were found in the control group (26%) than in the intervention group (18%) (Table 1).


**Table 1.** Children's characteristics by the control and intervention group at baseline (*n* = 802).

\* SD, standard deviation; <sup>a</sup> comparison using *t*-test; <sup>b</sup> comparison using Chi-square test; <sup>c</sup> one missing value for age; d low educational level (comprehensive school, vocational school, or high school), middle (bachelor's degree or college), high (master's degree or licentiate/doctor).

Table 2 shows the descriptive results for children's EBRBs and SR skills according to the intervention and control group, at baseline and at follow-up, whereas the corresponding results according to PEL are presented in Supplementary Table S2. Children had about the same daily screen time in the intervention and control groups at baseline (Table 2), but low PEL children had higher screen time than the other groups (Supplementary Table S2). The FV consumption at baseline was higher in the high PEL groups than in the other groups (Supplementary Table S2).

Table 3 shows the comparison between the intervention and control groups at follow-up adjusted for respective baseline outcome values. Figures 2 and 3 present the mean of the main outcomes (descriptive values from Table 2) at the baseline and follow-up for the intervention and control groups, and for the PEL subgroups of the intervention group.



\* EBRBs, energy balance-related behaviors; SR, self-regulation. \*\* SD, standard deviation.



\* (*n* = 645–737, estimates, and their 95% confidence intervals (C.I.); a models adjusted for gender, age, municipality, and parental educational level; b models adjusted for gender, age, municipality, parental educational level, and accelerometer wear time; c models adjusted for gender, age, municipality, parental educational level, (accelerometer wear time in PA as behavior), and baseline value of the outcome.

*Nutrients* **2020**, *12*, 2599

*Nutrients* **2020**, *12*, 2599

**Figure 3.** Children's EBRBs (headings (**A**–**E**)) and SR skills (headings (**F**,**G**)) within the intervention group separated by highest parental educational level (PEL) (means). For exact mean values, please see Supplementary Table S2 (\* *p*-value < 0.05 for difference between follow-up and baseline within the group).

There were no significant differences detected in follow-up between the intervention and control groups for children's total screen time, total PA, consumption frequencies of sugary everyday foods and beverages, sugary treats, and FV, and cognitive and emotional SR skills (Table 3).

The results between the baseline and follow-up within the control and intervention groups differed for some EBRBs and SR skills (Table 3, see means in Figure 2). In the intervention group, the change between baseline and follow-up in total screen time was not significant, whereas there was a significant increase, approximately 4.5 min/day, in screen time in the control group (*p* = 0.028, Table 3, Figure 2A). The control group significantly increased in total PA on average by 24 min/day (*p* < 0.001), and the intervention group had a significant increase of 27 min/day (*p* < 0.001, Table 3 and Figure 2B). There was an increase in sugary treat consumption frequency in both groups (*p* < 0.001 in both groups, Table 3). In the intervention group, there was a trend, albeit not significant (*p* = 0.088), where FV consumption frequency increased (Table 3, Figure 2E). A positive significant change in points in cognitive SR skills was observed in the intervention group (*p* = 0.011, Table 3, Figure 2F).

Similar comparisons of children's EBRBs and SRs skills at follow-up stratified by PEL and the comparison between baseline and follow-up for intervention and control groups stratified by PEL are presented in Table 4. To illustrate the results within the separate PEL intervention groups, figures are presented with the mean of main outcomes at baseline and follow-up (Figure 3).

No significant differences were found when examining EBRBs and SR skills stratified by PEL (Table 4). In follow-up, there was a borderline significant result in cognitive SR skills when comparing low PEL intervention and control groups (*p* = 0.051).

Within the groups, the low PEL control group decreased their cognitive SR skills (borderline significance, *p* = 0.052). The total PA increased significantly within all intervention and control groups when stratified by PEL (*p* < 0.001 for all subgroups, Table 4, Figure 3B). The sugary treat consumption frequency increased within low PEL control and intervention groups (*p* < 0.001 in both groups), and in the middle PEL control group (*p* = 0.027, Table 4, Figure 3D). Cognitive SR skills strengthened in the middle PEL intervention group (*p* = 0.038, Table 4, Figure 3F).


**Table 4.**Comparison between the intervention and control group by parental educational level and changes within groups \*.

\* Estimates and their 95% confidence intervals (C.I.); a models adjusted for gender, age in years, municipality, and parental educational level; b models adjusted for gender, age in years, municipality, parental educational level, and accelerometer wear time; c models gender, age in years, municipality, parental educational level, (accelerometer wear time in PA as behavior), and for baseline value of outcome.

#### **4. Discussion**

We detected no differences in EBRBs or SR skills between the intervention and the control group in our preschool-based family-involving RCT. Furthermore, changes in children's EBRBs according to PEL did not differ between the intervention and control groups at follow-up, although a borderline significant result emerged in low PEL children in the intervention group, improving their cognitive SR skills compared with the corresponding control group (*p* = 0.051).

A possible reason for not detecting significant intervention effects might be that the goals set were unrealistic (0.74 times/day decrease in sugary foods and beverages), or it would have required a higher number of children. Our study was a complex multicomponent intervention of relatively short duration. Each of the five program themes were focused on for 4–5 weeks, which could have been too short a duration for changes to occur. Therefore, further evaluation of the effects is needed. Furthermore, the analysis did not show stronger intervention effects in low PEL children. Still, cognitive SR skills strengthened in the low PEL intervention group compared with the low PEL control group, and the results bordered on statistical significance. Within the low PEL control group, cognitive SR skills decreased; also here the results did border to reach statistical significance. However, a significant improvement in cognitive SR skills occurred among middle PEL intervention children. Since the above-mentioned increases in cognitive SR points when comparing control and intervention group were small, these results might lack practical implication. The Head Start intervention showed improvements in SR skills and a decrease in sugar-sweetened drink consumption in the group that received the intervention promoting both EBRBs and SR skills, compared with the other three groups [27]. Although the aims of that study and ours were similar, the results are not totally comparable. The age group in Head Start was slightly older (4–9 years), and SR skills were measured by another instrument. In both studies, activities to strengthen SR skills were mainly conducted in preschools, whereas parents were the main target when promoting healthy EBRBs. It was discussed that parents might not have been sufficiently engaged, which may have led to null results regarding the children's EBRBs, which may also be the case in the DAGIS.

Within the intervention and control group, several significant changes occurred in the EBRBs. The control group increased their screen time by approximately 4.5 min/day, whereas no changes were detected within the intervention group. For the control group, it had about a 30 min/week higher screen time, which might eventually harm energy balance, weight status, and development of SR skills. The results of the control children followed the trend that screen time increases with age among young children [40]. The ToyBox study also did not reveal an overall positive effect on screen time [16], nevertheless when including a process evaluation, a reduction in computer/video games time was shown [14]. Subgroup analyses in ToyBox showed less TV time during weekends in the intervention girls [16], and subgroup analyses should also be considered in the DAGIS study.

The total PA increased in the control and intervention group. A recently published European study reported that moderate-to-vigorous PA increased from the age group of 2–3 years to 4–5 years, and further to 6–7 years [41]. The trend might explain the results in the DAGIS. Moreover, the follow-up occurred in spring, when there are more daylight hours than at the baseline in autumn. Studies have revealed that the higher the temperature and the more daylight present, the higher the level of PA among children [42,43]. The municipality, in which all preschools participated, simultaneously runs a training program for all early educators aimed at increasing preschool PA, which has increased all children's preschool PA independently of intervention status. Previous interventions have reported no effects on children's PA [44–46], and discussion has ensued on whether short durations such as six weeks of promoting PA are sufficient to detect an increase in children's PA [16,47].

The follow-up results for sugary everyday food and beverage consumption outside preschool hours did not differ between the intervention and control groups. The reduction was mainly supposed to happen at home, as these foods are seldom served at Finnish preschools [31]. The program implementation in families might have been weak, leading to no changes. This needs to be further studied by analyzing the processes in the intervention. We found an increase in sugary treat consumption in both the control and intervention low PEL groups (Supplementary Table S2), but no changes in the middle or high intervention groups. It seems that as children grow older, the consumption increases, especially in low PEL groups, which might lead to a greater gap between the PEL groups. The change in FV consumption did not differ between the intervention and control groups. However, while the control group had a stable consumption of FV at both time-points, the consumption frequency in the intervention group increased by 1.3 times/week. Similarly, some intervention studies have shown improvements in FV consumption [48], although a systematic review concluded that multicomponent FV interventions have provided low evidence of increasing FV consumption [49].

When developing the DAGIS intervention, the focus was set on understanding the low educational level context and how to, by means of a universal intervention, reach those with low PEL backgrounds [28]. One strategy was to produce easy-to-read materials as the ToyBox intervention study discussed that the lack of significant results for children's food consumption might have been due to the intervention materials being insufficiently tailored to those with low education levels [13]. The DAGIS logic model of change included primary outcomes, which were seen as the most important determinants for explaining socio-economic differences in children's EBRBs. The main primary outcomes (i.e., adults role modeling and changes in the environment in availability and accessibility of, for example, foods and screens), should be examined next. It is more likely to see changes in these due to the relatively short duration of the intervention. Generally, it has been concluded that availability and accessibility (foods, screens) in the home environment would be of great importance for children's health behaviors in low PEL families [13].

As this study includes the intention-to-treat effect analysis, it was assumed that all intervention preschools and families conducted the program in the same manner and at the same intensity. Further analysis including fidelity and implementation degree of the program will yield a deeper understanding of the effects. The importance of the implementation degree has been discussed in conjunction with null results in multicomponent interventions [50].

The DAGIS intervention study had limitations that should be acknowledged. The short intervention time, in all, five months, was a limitation, but the project as a whole needed to be conducted during a preschool year. Previous discussion has questioned whether a short time period is adequate for children to change their EBRBs [13,44]. In addition, children's baseline consumption of FV, mean three times/day outside preschool time, was fairly high, which sets challenges for achieving an increase. Furthermore, reliably measuring food consumption is challenging. However, reproducibility and validity of our parental FFQ have been tested [36,38]. Still, the FFQ reflects the foods eaten during the last week outside preschool time and does not allow for analysis of whether food consumption changed at preschool. The 10-item questionnaire assessing two dimensions of children's SR skills had three answer categories, which might not have been sensitive enough to capture changes. Many instruments are available to assess children's SR skills, but no consensus exists on their validity in evaluating this multidimensional concept [51]. Finally, the sample size might not have been sufficiently large to detect significant results. The power calculations were conducted based on means and standard deviations from the DAGIS cross-sectional survey [7]. Some dissimilarities exist between these two studies such as the number of preschools and municipalities and the proportion of low PEL families participating, which might have led to an underpowered study.

A strength of the study is that the study development was guided by the IM framework [28], which enabled systematic planning. The logic model of change was formed on the best existing knowledge, and on a comprehensive evaluation of the Finnish preschool-family context [10,28]. This enables further systematic evaluations of the processes. The fairly high response rate of families, 47%, and having all preschools from one municipality participating including diverse preschools as well as diverse families can be seen as a strength. The high response rate indicates a lower selection bias among the participants. In addition, slightly more than 30% of the participating families had low education levels. It is often seen as a challenge that the less educated tend not to participate in intervention studies [52]. The study also included a combination of instruments such as the accelerometer for assessing PA, a validated screen time diary, and a validated FFQ for robust assessment [35,38].

The fairly new approach of simultaneously strengthening children's SR skills and promoting their EBRBs can be seen as a strength and also as a risk. To the best of our knowledge, this approach has been evaluated in one other study [27], where it was discussed that the next step should be integrating SR skill promotion into the EBRB context. In the DAGIS study, this can be seen as a strength as the program enhanced SR skills, while simultaneously promoting EBRBs by adding more materials to the existing program. The materials and methods for the program also underwent pretesting [28].

#### **5. Conclusions**

The DAGIS intervention study aimed to promote preschoolers' EBRBs and SR skills through a preschool-based family-involving intervention conducted as a clustered RCT. We detected no significant differences in the preschoolers' EBRBs between the intervention and control groups at follow up. No differences at follow-up between the PEL groups were found, except for the cognitive SR skills, where a borderline significant result emerged between low PEL control and intervention group. Within the middle PEL intervention group, there was an increase in cognitive SR skills. Even though the intervention did not achieve its goal and the aims were not attained, further analyses should examine whether changes can be seen in the determinants of children's EBRBs, especially those of importance for children with low PEL. In addition, a thorough process evaluation may provide insight into the non-significant findings.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2072-6643/12/9/2599/s1, Table S1: Number of children and missing values in each outcome, Table S2: Descriptors for study outcomes by the control and intervention groups and by parental educational level (PEL), Table S3: Adjusted differences and their 95% confidence interval (C.I.) between intervention and control group separated for baseline and follow-up; and adjusted differences between follow-up and baseline for each study group.

**Author Contributions:** Conceptualization, C.R., R.F., and E.R; Investigation, C.R., H.V., R.L., R.P., E.S., T.S., P.H., E.L., and L.K.; Formal analysis, R.F.; Data Curation, R.L.; Writing—Original Draft Preparation, C.R. and R.F.; Writing—Review & Editing C.R., R.F., H.V., R.L., R.P., E.S., T.S., P.H., E.L., L.K., K.S., N.S., M.E., and E.R.; Visualization, R.F.; Project Administration, C.R., Funding Acquisition, N.S., M.E., and E.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was financially supported by the Ministry of Education and Culture in Finland, The Ministry of Social Affairs and Health, The Academy of Finland (Grants: 285439, 287288, 288038, 315816), the Päivikki and Sakari Sohlberg Foundation, Signe and Ane Gyllenberg Foundation, and the Medicinska Föreningen Liv och Hälsa. Folkhälsan Research Center and University of Helsinki provided the infrastructure and the funding for PIs (N.S., M.E., E.R.) and key personnel (C.R., R.L.). Open access funding was provided by University of Helsinki. The funding bodies were not involved and did not interfere with the study at any stage.

**Acknowledgments:** The authors thank the preschools, the preschool personnel, and the families for their participation in the DAGIS study, and the research staff for the data collection. The authors thank the collaborating partners of the DAGIS study for providing assistance in designing the DAGIS study.

**Conflicts of Interest:** L.K. is a board member of the company TwoDads. The other authors declare that they have no competing interests.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Review* **Machine Learning Models to Predict Childhood and Adolescent Obesity: A Review**

#### **Gonzalo Colmenarejo**

Biostatistics and Bioinformatics Unit, IMDEA Food, CEI UAM+CSIC, E28049 Madrid, Spain; gonzalo.colmenarejo@imdea.org

Received: 30 June 2020; Accepted: 13 August 2020; Published: 16 August 2020

**Abstract:** The prevalence of childhood and adolescence overweight an obesity is raising at an alarming rate in many countries. This poses a serious threat to the current and near-future health systems, given the association of these conditions with different comorbidities (cardiovascular diseases, type II diabetes, and metabolic syndrome) and even death. In order to design appropriate strategies for its prevention, as well as understand its origins, the development of predictive models for childhood/adolescent overweight/obesity and related outcomes is of extreme value. Obesity has a complex etiology, and in the case of childhood and adolescence obesity, this etiology includes also specific factors like (pre)-gestational ones; weaning; and the huge anthropometric, metabolic, and hormonal changes that during this period the body suffers. In this way, Machine Learning models are becoming extremely useful tools in this area, given their excellent predictive power; ability to model complex, nonlinear relationships between variables; and capacity to deal with high-dimensional data typical in this area. This is especially important given the recent appearance of large repositories of Electronic Health Records (EHR) that allow the development of models using datasets with many instances and predictor variables, from which Deep Learning variants can generate extremely accurate predictions. In the current work, the area of Machine Learning models to predict childhood and adolescent obesity and related outcomes is comprehensively and critically reviewed, including the latest ones using Deep Learning with EHR. These models are compared with the traditional statistical ones that used mainly logistic regression. The main features and applications appearing from these models are described, and the future opportunities are discussed.

**Keywords:** childhood obesity; obesity; overweight; machine learning; deep learning; statistical models; data science; BMI

#### **1. Introduction**

Obesity and overweight prevalence among children and adolescents has increased to a large extent during the last four decades [1,2]. For instance, the prevalence of overweight and obese children and adolescents between 5 and 19 years has soared from about 4% in 1975 to 18% in 2016 [3]. This increase is especially dramatic in developing countries [4], while in developed countries it seems to be slowing down and affects mainly the low-income sub-populations [5]. In absolute numbers, it is currently estimated that about 38 million children under the age of 5 are overweight or obese, while about 340 million children and adolescents aged 5–19 years are overweight or obese [3].

This large prevalence poses a threat to the current and future health systems. Childhood and adolescent obesity is related to different comorbidities during this age [6–10], as well as to a lower quality of life [11], but, in addition, it is also associated to *adult* comorbidities, like metabolic syndrome and diabetes [12], cardiovascular risk [13,14], and death [15,16]. This is probably due to the difficulty in its eradication once it is established, justifying the adoption of childhood preventive measures, rather than therapeutic ones [9].

Obesity, that is, excess adipose tissue in the body [17], has a complex, multifactorial etiology. Among the factors involved in its development, the most important ones are genetics, physical activity, sedentary lifestyle, diet, etc. [18] In addition, obesity has additional complications for its analysis during childhood and adolescence. This is largely due to the huge changes in height and weight during this period. If we measure the Body Mass Index (BMI) through it, we see a pattern of an initial increase until reaching a first peak at about 1 year, followed by a decrease up to the age of about 6 years, where it starts to rise again (the so-called *adipose rebound*) [18]. So big are these changes that there is no universal consensus in the definitions of "overweight" and "obese" based on BMI at these ages [17], and in most cases, they are defined using sex-, age- and population-specific percentiles, normally ≥ 85th percentile for overweight, and ≥ 95th percentile for obese, as will be discussed in Section 4 in detail. (It must be noted that in this Review we use the concept "obesity" in two ways: one is as excess adipose tissue in the body in general, and the other is a BMI-based category to classify individuals, normally for adults BMI ≥ 30 kg/m<sup>2</sup> and for children with multiple definitions as described in the text.)

Therefore, during this period, there happen large metabolic and hormonal changes that largely influence the adiposity at different ages. On top of that, there is still a large influence of specific pre-gestational and gestational factors, especially during early childhood, that have a large impact at these ages. The additional risk factors for obesity in childhood-adolescence have been reviewed recently [18,19]. Some of the most outstanding ones are parent's BMI, gestational weight gain of the mother, gestational diabetes, maternal smoking, birth weight, rapid infant growth, and high protein and/or free sugars consumption. There are also psychological factors, especially during the adolescence period.

In order to prevent childhood and adolescent obesity, the development of predictive models to identify potential individuals of high risk is of great utility. This allows the focusing of preventive measures towards the high-risk subpopulation, allowing a more cost-effective and personalized approach to weight reduction interventions. In addition, the use of predictive models allows, by their analysis, to rank the different risk factors in order of importance, so that we can identify those that would be more effective in order to design these interventions. Moreover, the models can be used as simulation tools where "what-if" analyses can be conducted, by varying one or more predictor variables and seeing what would be the effect in obesity for particular sub-populations (defined by, e.g., sex, age, diet, etc.).

Given the large complexity of obesity, especially during the childhood and adolescence period, with a large number of multidomain influencing factors interacting in convoluted ways, traditional statistical methods like (generalized) linear models show limitations and have focused mainly in analyses with a reduced number of predictor variables and with limited predictive power. As we will see in Section 3, these models in most cases use more or less the same set of predictor variables transformed in one way or another and aggregated a linear functional form. Another limitation of these methods is their inability to deal with high-dimensional data, where the number of predictor variables (columns) is close or even much higher than that the number of dataset instances (rows), as they typically require many more instances than predictor variables in order to provide reliable inferences and avoid overfitting. Such situation makes them to need huge samples for they to be used with large sets of predictor variables, resulting in difficult practical implementations.

In this way, Machine Learning (ML) techniques are especially gifted modelling tools for these datasets, typically of high-dimensional nature and with complex relationships between many multidomain variables. This is due to their capacity to deal with high-dimensional data so that they can be applied to model relatively small datasets having large numbers of predictor variables and with reduced overfit. In addition, ML methods are able to find complex, nonlinear relationships between the predictor variables and these and the response variable or variables in an automated way, not requiring to manually predefine and test a large set of potential relationships between these variables. Therefore, the predictive capacity, ease of application, and robustness of these models for complex data far outclasses those of the traditional statistical models. This is even more in the case

of the recent Deep Learning (DL) branch of ML, which can tap from huge datasets both in instances and predictor variables to obtain models with extremely good predictive capacities. DL methods, in addition, are able to directly use complex data like images, text, social media, time series, etc., avoiding the need of lengthy *feature engineering* processes, as we will see in Section 2. This is increasing dramatically the scope of data sources that can be used in this field, allowing to identify novel risk factors.

Given the above described advantages of ML over statistical methods for this problem, it is no surprise that ML have started to be used in the area. Thus, this paper attempts to conduct a critical and comprehensive review of the work done in ML models applied so far to the area of childhood and adolescent obesity. This will include a brief unbiased summary of each of the works available in the area to predict childhood or adolescent BMI and/or obesity/overweight with ML, followed by a thorough discussion of the collective patterns found, results obtained and novel risks factors identified, advantages and limitations of the approach, and future perspectives. The discussion will include also a comparison with the statistical models of the same outcomes, which will have been briefly reviewed previously. In addition, models to predict related outcomes (e.g., success of weight decreasing therapies, social obesogenic environments, pediatric attention to obesity, etc.) will also be reviewed, as they are of increasing interest especially in the area of preventive interventions. We will see that this is a new field that has experienced a recent explosion, especially during the last five years, mainly through the use of massive databases of Electronic Health Records (EHR) and the application for the first time of DL techniques, which is starting to allow a more systematic analysis of large cohorts with many multidomain predictor variables and the introduction of complex data sources as predictors. As the reader will see, this is also a very heterogeneous field, both in terms of type of model (cross-sectional, longitudinal), label predicted by the model (obesity, overweight, success of obesity therapies, pediatric attention to obesity, etc.), aim of the predictions (explanatory, predictive, and simulation), and application of the model (prediction of risk subpopulation, optimization of obesity therapy, suggestion of novel therapeutic approaches, etc.), further extending those typical of statistical models. It is expected to provide an updated view of the field to researchers within multiple disciplines and interests: statisticians, engineers, data scientists, epidemiologists, pediatricians, nurses, and nutritionists.

The article will be organized as follows: after this Introduction, first, a summary of the ML field will be conducted in order to provide some basic knowledge for readers not experts in the field, trying to make the work as much self-contained as possible; second, the procedure to search and select the reviewed works will be described; third, the statistical models in the childhood/adolescence obesity area will be reviewed, in order to set a comparison point with the ML models; fourth, ML models targeted to the prediction of BMI or categorized versions of BMI will be reviewed; fifth, ML models targeted to the prediction of related outcomes will be reviewed; sixth, a final wrap-up discussion of the main patterns in the models summarized will close the paper.

#### **2. Basic Concepts in Machine Learning**

*Machine Learning* (ML) exploded in the 90s of last century as a new field of data analysis at the interface between Statistics and Artificial Intelligence. Although the initial concepts like Rosenblatt's perceptron [20] (a basic, 1-layer artificial neural network to perform binary classification), Naïve Bayes [21], Decision Trees [21], and k-Nearest Neighbors date back to the 50s–60s of the 20th century, it was during the last decade of it when the field started to enter into full maturity and be massively applied. This happened with the appearance of multi-layer neural networks, thanks to the invention of the *backpropagation* training algorithm [22], as well as other ML paradigms like Support Vector Machines [23] and, in the first decade of the 21st century, Random Forests [24] and Gradient Boosting Machines [25]. This emergence has been fostered by the confluence of CPU miniaturization and cheapening, massive accessibility of computational capacity, and the development of completely new ideas for statistical modeling.

This explosion has been followed, in the second decade of the 21st century, by the one of *Deep Learning* (DL). DL is an outgrowth from ML that comprises mainly artificial neural networks of very large numbers of layers (the term "deep" comes from here), together with specialized layers, like *convolutional* and *recurrent* ones, and additional adaptations to allow the training of these huge neural networks: non-saturable activation functions; new weight initialization schemes; faster optimizers; and the training of the network in small, random batches of the data (the so-called *mini-batch* training). The DL models contain typically millions of training parameters. The specialized layers find directly from complex data like images, sounds, texts, music, etc., patterns ("feature maps") that are fed into multi-layer fully-connected perceptrons, allowing the direct modeling of this complex data, without the need of manually generating compressed representations of these data, the so-called "feature engineering".

Again, DL has benefited from an additional increase of computational power easily accessible, mainly though both the use of GPUs instead of CPUs, and of cloud computing, as well as the availability of huge public datasets (e.g., YouTube, San Bruno, CA, USA; Wikipedia, Facebook, Menlo Park, CA, USA; etc.) and open competitions (Kaggle, San Francisco, CA, USA, etc.).

Generally speaking, ML has put more emphasis in *prediction* rather than *testing of a predefined hypothesis* like traditional statistical models, where the emphasis is more in inference. In the same way, the focus is more in a practical, engineering-oriented approach rather than on a rigorous theoretical background. ML can be defined as a set of algorithms that *automatically learn simplified representations of the data*. For example, we can present the ML algorithm with a set of data instances, like pictures of animals, together with a label for the species present in each picture. The algorithm would then be *trained* by automatically learning some abstract internal rules to associate each image to each label, by minimizing some kind of measure of the prediction error or *loss*. When presented with new pictures, the algorithm would then be able to assign a label (species name) to each of them.

ML models are able to cope with very complex datasets, even those with many more predictor variables than instances (*high-dimensional datasets*). For this reason, they tend to be more difficult to interpret ("black-box" type of models), although as we will see later, new techniques have been developed to facilitate understanding the inner working of the model.

From our purposes in this Review, we can talk about two main groups of ML models: *supervised* and *unsupervised*. Supervised models are those that use datasets comprising both a set of *predictor variables* and one or more *target variables* or *labels*. The model would then be trained to be able to predict the label(s) from new instances of the predictor variables: for instance, to predict if a child will be obese or not from his age, sex, parent's BMI, and food consumption. The other type of ML models, *unsupervised ones*, attempt to find, without the use of labels, transformations of the input data with easier visualization, less noise, etc., or try to identify groups in the data. These techniques include *Dimensionality Reduction* and *Clustering* techniques.

Within the area of supervised models, which are the ones we will see in the Review, there are two main groups: *classification* models, those where the predicted label is a categorical one (e.g., obese child yes or not), and *regression* models, those where the label is a numeric one (e.g., BMI).

The most important type of classification models is *binary classification*, where the label has only two categories, for instance "+" and "−". In this case, the model frequently outputs a probability *p* of one of the two classes (e.g., "+"; the probability of the alternate class "−" would be 1 − *p*). Once we define a threshold *t* for this probability, if *p* ≥ *t* for a new instance, we would assign the category "+" to that instance; if, on the contrary, *p* < *t*, we would assign the category "−". At this point, several concepts are used to characterize the performance of the model (Figure 1), depending on whether the real category is "+" or "−", and whether the predicted category is "+" or "−".

− **Figure 1.** Measures of the performance of a binary classifier. Class labels are "+" and "−". Predicted category by the model is represented vs the real category, for all the possible situations.

*Sensitivity* (or *recall*) is the proportion of real positives that are predicted as positives. *Specificity* is the proportion of real negatives that are predicted as negatives. *Positive Predictive Value* (PPV), or *precision*, is the proportion of predicted positives that are real positives, and *Negative Predictive Value* (NPV) is the proportion of predicted negatives that are real negatives. *Accuracy* is the total proportion of correct predictions of all the predicted data.

A perfect model would have all these measures equal to 1. Obviously, this is almost never the case, and we have to cope with some proportion of errors. We can choose the threshold *t* so that it optimizes the purpose of our model. For example, if we are mainly interested in identifying as many real positives (e.g., future obese children) as possible, in order to apply to them some preventive weight-loss treatment, we would select a lower *t* and thus increase the sensitivity, even at the cost of increasing the false positives and, therefore, decreasing the specificity and the PPV. This approach would reach a point where we would identify so many false positives that would result in a prohibitive cost for treating many unnecessary cases or, if applying the treatment to a future normal-weight child has a negative effect, an unnecessary harm to too many members of our population. Alternatively, if we are more interested in finding a sample of children most of whom will be obese in the future, even if it is small (e.g., we can use it later for genotyping purposes), we would be more interested in optimizing the PPV; in this case, we would use a larger *t*, therefore increasing the false negatives. This would result in a decreased sensitivity and NPV. Again, we cannot increase *t* indefinitely, because there will be a point where the sample would be so small that would become useless. Therefore, there is always a balance between the cost and benefit, not just from the statistical point of view but also from the practical application of the model, which must be taken into consideration when optimizing the threshold of the model.

In order to characterize the discriminative capacity of the model, before selecting *t*, it is customary to use Receiver Operating Characteristic (ROC) curves. In this curve, the sensitivity is plotted against 1-specificity for all the values of the threshold (Figure 2).

**Figure 2.** ROC curve of a binary classifier.

For a random classifier, the curve will be a diagonal going from (0, 0) to (1, 1). For a perfect classifier, the curve would go from (0, 0) to (0, 1) and then to (1, 1). Intermediate classifiers would have a curve in between these two extremes. A frequent measure of the discriminatory power of the classifier is the area under the curve of the ROC curve (AUCROC). A random classifier has an AUCROC of 0.5, and a perfect classifier has an AUCROC of 1. Real-life classifiers would have values in between, the better the closer to 1. The AUCROC equals the so-called *concordance index* or *c-index*.

When the classifier predicts a multi-class label, that is, with more than two classes, a measure of the prediction performance is the accuracy, defined above as the percentage of instances for which the label is predicted correctly. Another measure is the *categorical cross-entropy*. For a prediction instance *i* and an M-category label, it is defined as the Equation (1)

$$-\sum\_{j=1}^{M} I\_{ij} \log P\_{ij} \tag{1}$$

ୀଵ where *Iij* is an indicator variable that is 0 if the predicted class *j* of instance *i* is not correct and 1 if it is. *Pij* is the predicted probability for class *j* on the new instance *i*. For *n* predicted instances, the categorical cross-entropy would be the sum of each of the instances categorical cross-entropies. Therefore, it basically measures the match between the predicted probabilities for the different classes with the observed frequencies. The better the agreement between predicted and actual labels, the smaller the categorical cross-entropy, thus being an *error* or *loss* function that is minimized as the model is trained with the training data (in the case of accuracy, it would be maximized). For binary classifiers, the cross-entropy formula simplifies to *M* = 2, and we have the *binary cross-entropy*.

When we deal with regression, common measures of the error or loss are the *Mean Squared Error* (MSE, the Equation (2)):

$$MSE = \frac{1}{n} \sum\_{i=1}^{n} (y\_i - \hat{y}\_i)^2 \tag{2}$$

where *n* is the number of predicted instances, *y<sup>i</sup>* is the actual continuous label for instance *i*, and *y*ˆ*<sup>i</sup>* is the predicted value of the label for that instance. Another is the *Mean Absolute Error* (MAE, the Equation (3)):

$$MAE = \frac{1}{n} \sum\_{i=1}^{n} |y\_i - \mathcal{Y}\_i| \tag{3}$$

where |.| means absolute value.

When training a model with training data, there is a risk that the model learns too many details of the latter, which makes it perform worse when presented with new data. In this case, we say that the model is *overfit*. Normally, all the models we fit will fit better the training data than new datasets. Therefore, in order to assess the *practical* prediction performance of a model, we need to *validate* it with new data. We will see that there are different approaches for validation of models, which can be divided in two groups: *internal validation* methods and *external validation* methods.

The main feature of internal validation methods is that we resample several times from the whole dataset, fit a new model with the resample, and evaluate the model with the instances left out from the resample. From this repeated resampling, model fit, and evaluation, we get an estimation of the predictive performance of the modeling *process* with new data, although we do not really test a final model with new data. Internal validations are used normally when the data is scarce, so it is very difficult to obtain a new data set to externally validate the model.

There are two main general approaches for internal validation: *cross-validation* and *bootstrap*. In the former case, in its *k-fold* version, what we do is divide the total sample into *k* random subsets ("folds"; as for *k*, normally 5 or 10 is used). Then what we do is, for each fold, validate with this fold a model fitted with the *k* − 1 remaining folds. The estimated validation measure of the performance of the model (accuracy, cross-entropy, MSE, etc.) will be the average of the performances of the *k* models fitted with each *k* subsamples, each having with *k*−1 folds and evaluated in the corresponding hold-out fold.

Cross-validation schemes can also be used to estimate *hyperparameters* of our model (e.g., number of nearest neighbors in the k-Nearest Neighbors method, see below). What we do then is perform the cross-validation with a double loop of folds; in one loop, we vary the hyperparameter among several options, and in the other, we estimate the validation performance within each hyperparameter selection. We will select the hyperparameter value that optimizes the cross-validation performance estimate, and at the same time that performance will serve as estimate of the external performance of the model (fitted with that optimal hyperparameter value).

It must be taken into account that the models fitted with cross-validation use a smaller dataset than the whole dataset, so this can be a source of error of estimation of the performance. The other approach for internal validation, bootstrap, avoids this issue by generating repeatedly samples of the same size of the original one by sampling with replacement (allowing randomly repeated instances). The model is refitted for each of these random samples and then evaluated in both that sample and in the original sample or the left-out instances. By averaging the difference between the training performance in each sample and the performance in the original sample, we get an estimation of the so-called *optimism* in the training performance. Then, we would derive the model with the whole dataset, evaluate its performance, and correct it by the estimated optimism.

We see that in both cross-validation and bootstrap, we do not make a real evaluation of the external performance of the model but rather make an estimation of it from data that is used at the end in the derivation of the final model. The alternative is to use an *external validation sample*. This is data that is not used in the derivation of the model and is only used for validating the model. A simple approach here is to randomly split the original sample into a training dataset (e.g., 60–80% of the data) to fit the model with it and then a validation/testing dataset (40–20% of the data) to evaluate external performance. This has two drawbacks when compared with internal validation methods: on one hand, we miss some of the data in the model derivation; on the other hand, we make the estimation of the external performance with a normally small dataset, which would result in an estimate with high variance (depending on how "lucky" we are in the random split, we can have very different estimates). This is not an issue if we have a very large dataset, and the validation set is quite large. However, in case we have a small dataset, it is preferable to use internal validation measures, despite being a bit more optimistic than external validations.

In addition, the random split approach has an additional problem in that both the training and the validation datasets come from the same sample, and thus, it is likely that they are very similar, a situation that quite possibly does not to occur when using the model in real life. There are ways to avoid this issue, like clustering the original sample and then generating training and validation datasets from different clusters. Another approach is to train the model with one dataset and then validate it with a different dataset, e.g., a posterior in time dataset, a dataset from another country, etc. This is a more demanding comparison but is probably the closest to the real performance of the model in production. Obviously, this approach is very expensive in terms of datasets, so it is only available in a reduced number of situations.

We will finish this section by briefly describing the ML models we will see in the Review.

#### *2.1. Naïve Bayes (NB)*

This method uses Bayes rule together with the approximation of conditional independence of predictor variables given the response class. Bayes rule establishes the posterior probability of the target variable *y* (label) taking the value *j*, conditioned to the predictor variables *x*1, . . . , *x<sup>n</sup>* (the Equation (4))*:*

$$P(y = j | \mathbf{x}\_1, \dots, \mathbf{x}\_n) = \frac{P(\mathbf{x}\_1, \dots, \mathbf{x}\_n | y = j)P(y = j)}{P(\mathbf{x}\_1, \dots, \mathbf{x}\_n)} \tag{4}$$

where *P*(*y* = *j*) is the prior probability of *y* taking the value *j***,** *P*(*x*1, . . . , *x<sup>n</sup> <sup>y</sup>* <sup>=</sup> *<sup>j</sup>*) is the posterior probability of the predictor variables conditioned to *y* taking the value *j*, and *P*(*x*1, . . . , *xn*) are the prior probabilities of the predictor variables. These prior and conditional probabilities can be estimated from the respective empirical frequencies when the predictor variables are categorical. When they are continuous, they can be approximated by different kernel functions. When the independence approximation is applied in NB, this simplifies largely (the Equation (5)):

$$P(y=j|\mathbf{x}\_1, \dots, \mathbf{x}\_n) = \frac{P(y=j) \prod\_{1}^{n} P(\mathbf{x}\_i|y=j)}{P(\mathbf{x}\_1, \dots, \mathbf{x}\_n)} \tag{5}$$

The predicted class for a set of *x*1, . . . , *x<sup>n</sup>* predictor values will be the one that maximizes the productory above, since the other factors are constant.

#### *2.2. k-Nearest Neighbors (kNN)*

The idea of this method is quite simple: For a new instance with predictor variables *x*1, . . . , *xn*, assign the label most frequent between the *k* instances in the training data with predictor variables less distant (more similar, the k-*nearest neighbors*) to the new instance predictor variables. This is called the *majority voting* class assignment. When the label is a continuous one (regression), the predicted value is the (weighted) average of the labels of the k-nearest neighbors. In order to measure the distance between sets of predictor variables, different metrics can be used. Probably the most frequent is the Euclidean one. The value of *k* can be quite variable and depends heavily on the dataset. It can be obtained through cross-validation techniques.

#### *2.3. Decision Trees (DT)*

This method can be used for both regression and classification. The idea here is to generate rectangular partitions of the space of predictor variables, by successive splitting the data by (usually binary) splits in one variable that optimize some loss function (e.g., minimization of MSE for regression). At the end, the label we assign to each partition is one function of the labels of the data instances belonging to each partition, e.g., its mean, or the majority voting class. Then, for new instances, we will find the partition it belongs to and assign the label that corresponds to that partition. A simple schema with only two predictor variables is depicted in Figure 3.

**Figure 3.** Depiction of Decision Tree for two variables, X<sup>1</sup> and X2. R1, R2, R3, and R4 are partitions generated by the splits s<sup>1</sup> , s<sup>2</sup> , and s<sup>3</sup> . The labels for the partitions would be a function of the labels of the instances in each partition in the training set.

Obviously, to grow a tree can become a very complicated task, given the combinatorial number of possible splits and variable sequences that can be created. Therefore, simplified algorithms for generating the tree have been devised. There are different ones, depending on the criteria for split, the selection of variables at each split, and the pruning of terminal nodes. These are CART [26], on one hand, and ID3 [27], which evolved to C4.5 (also called J48 in Weka's Java implementation) and later to C5.0. There is also the CHAID [28] algorithm, based on statistical tests and allowing non-binary splits.

The advantage of DT is the ease of interpretation, which can be aided by graphical displays; however, they are known for the high variance of their predictions, such that little variations of the dataset can result in very different trees and predictions.

#### *2.4. Support Vector Machines (SVM)*

This method was initially developed as a binary classifier. The approach is to build a hyperplane from the predictor variables with maximal margin, so that one half of the predictor space would result in a "+" label and the other in a "−" label. By maximal margin is meant a hyperplane that has the largest distance to the training instances of the infinite possible hyperplanes or, more correctly, the farthest minimum perpendicular distance to the training instances (since the "margin" is the minimum distance the training set points have to the hyperplane). Figure 4 displays a dataset of two predictor variables and the corresponding maximal margin hyperplane for the training instances.

− **Figure 4.** Maximum margin hyperplane for a predictor space of two variables. Two categories are perfectly classified by this hyperplane. The hashed lines indicate the maximum margin to the training set, obtained with this particular hyperplane. Training instances are presented as points in the plane, blue points corresponding to class "+" and red points to "−". The points located at a maximum margin to the hyperplane are the *support vectors*, since the plane only depends on these points of the training set.

For new testing instances, we just need to find which side of the hyperplane the new point lies in order to predict a label for it.

As a matter of fact, it is usually the case that the points are not perfectly separable. Therefore, instead of a maximum margin hyperplane, a "soft" margin one is obtained, by allowing some latitude for misclassified points with some specific criterion. This also makes the method more robust against small modifications of the dataset. In addition, in many situations, the boundary regions between the two classes are not linear. In this case, what we do is include as predictor variables additional specialized functions of these variables and instances, the so-called *kernels*, such that the dataset becomes linearly separable. There is a variety of kernels yielding different types of the SVM method: linear, polynomial, radial, etc. It turns out also that the computation of the hyperplane only requires the closest points to the boundary, which are called the *support vectors*, making the computation much faster. From this, the method takes its name.

Later developments of the method allowed it to deal with multiclass classification as well as regression.

#### *2.5. Random Forest (RF)*

RF are an example of ensemble methods, where a model of higher quality is built by aggregating multiple models of lower quality. The prediction for new instances will be obtained by averaging the prediction of all the simple models in the case of regression or, for classification problems, by the majority voting. In this way, we make predictions much more robust, with much less variance and with higher accuracy.

In the case of RF, we use an ensemble of hundreds or thousands of DTs. In addition, these DTs are built without pruning so that they will have little error, although large variance. However, since we are averaging many of them, the final variance will also be low. These DTs are built from bootstrap samples of the original training dataset (this is called *bagging* or bootstrap averaging). Moreover, to decorrelate the trees, at each split in the tree, only a random subset of predictor variables is used. In this way, the reduction of variance by averaging the trees is more efficient.

A very interesting property of the RFs is that they incorporate internally a direct estimation of the external validation error. Since the DT models are derived using bootstrap samples, for each instance in the training set, there will be a set of trees (approximately B/3, where B is the number of trees since they are fit using bootstrap samples) that will have been derived without that instance. By averaging the

difference in label prediction for that instance in these trees and its actual label, we would have what is called an out-of-bag (OOB) estimate for that instance. Averaging over all the instances, results in an estimate of the external performance of the RF without the need to use cross-validation or bootstrap.

RFs are a very powerful predictive method, both for regression and classification, and very robust irrespective of the type of datasets. The issue with them is the difficulty of interpretation (this is general for all the ensemble methods), since they contain many different and decorrelated DTs using different predictor variables. An approach used to analyze them is the so-called *variable importance techniques*. The idea here is to analyze the effect that each predictor has (on average over all the DTs) on the error of the RF. One approach is to calculate, for each predictor, what error reduction it has had each time it has been used in the trees. This is summed for all the trees, and the largest sum will correspond to the most important predictor as on average it has produced the largest reduction of errors in all the trees. Another approach uses permutation of the variables. For each tree, we have its OOB prediction accuracy after applying it to its OOB samples. After that, the *j*th variable is permuted and the OOB prediction is recalculated and subtracted from the previous one. This is averaged over all the trees. This is also repeated for all the predictor variables. We would then obtain a ranking of the variables, with those with the largest reduction of OOB performance being the top ranked.

#### *2.6. Gradient Boosting Machines (GBM)*

This is another ensemble method, but one that uses *boosting* instead of bagging. By *boosting*, it is meant the iterative improvement of a weak model by adding sequentially new models that improve the previous fit. In the case of gradient boosting machines, normally the models are DTs, and the improvement is done by fitting the new model to the residuals of the model so far or, more generally, to the gradient of the loss function we are using. Newer versions, like XGboost, use the second derivatives instead of the first ones, in order to improve speed and performance.

GBM, especially XGBoost, are currently the most used ML algorithms for models using numeric tabular data or (when modeling more complex data) feature pre-engineered data. For problems using complex data directly (e.g., computer vision, speech recognition, natural language processing, etc.) Deep Learning methods are used instead (see below).

As it happens with RF, the interpretation of these ensemble models is complicated. However, in the same way, techniques like variable importance can be used to facilitate interpretation.

#### *2.7. Regularized Linear Models (LASSO)*

When fitting linear models, the residuals of the least squares fit decrease as we add more predictor variables. However, if the number of instances *n* is not so much larger than the number *p* of predictor variables, the estimates of the least squares increase their variance as *p* becomes closer to *n* so that the model becomes overfit, and the external or test performance of the model decreases. In the case of *n* < *p*, the variance become infinite, no unique fit exists, and the method becomes useless. However, this situation of high dimensionality is very typical in ML datasets. One way to fix this problem is to shrink or *regularize* the estimates, so they remain small and with low variance, and in some cases, they even become zero. One approach to regularization is *ridge regression*, where all predictor variables are maintained, but their betas are kept small by restraining the sum of squared betas to be less or equal than a small value. Although this approach improves external performance of the model, it keeps an interpretation issue as no irrelevant variables are removed. An alternative approach is the LASSO, where the sum of the absolute value of the betas is restrained to being less or equal than a small value. This has the advantage of making some betas equal to zero, thus performing an effective selection of important variables.

#### *2.8. Bayesian Networks (BN)*

A Bayesian Network is a directed acyclic graph of nodes that correspond predictor variables, plus one or more nodes that represent the label(s). The directed edges between the nodes represent

causal relationships between the variables, through conditional dependence, and Bayes rule is used to determine the probability of the different possible values of the labels conditioned to particular values of the predictor variables. Nodes not connected would be conditionally independent. There is a large set of techniques to infer the structure and parameters of the network.

#### *2.9. Artificial Neural Networks (ANN)*

ANNs are ML methods that mimic the structure and mechanism of the nervous system. They are composed of layers of artificial neurons, with connections between neurons in consecutive layers. Each artificial neuron is an abstract unit that applies a weighted sum of its numeric inputs plus a bias parameter, and the resulting sum is passed to a so-called "activation function" to generate a numeric output. The first layer corresponds to the input variables; these variables are used as inputs of the next layer neurons, where each of its neurons generate an output, which is then used as input of the next layer neurons, and so on. The last layer contains typically one single neuron for one label or more for multilabel models. Figure 5 displays a typical fully connected, feedforward ANN (multilayer perceptron).

**Figure 5.** Typical structure of an artificial neuron and a fully connected feedforward neural network. The x<sup>i</sup> are the predictor variables, the w<sup>i</sup> are the weights and b is the biass.

The input layer contains the input variables (no transformation), while the last layer generates the output of the model and is called the output layer. In between, there are one or more layers, which are called "hidden" layers. Each neuron has a weight per input plus a bias parameter; all these weights and biases of all the neurons are the parameters of the network, which are optimized to minimize a loss function.

The first model or ANN was the perceptron by Rosenblat [20], which was designed as a one-neuron simple binary classifier after the mathematical neuron devised previously by McCulloh–Pitts [29]. The development of ANN to solve problems not linearly separable was allowed by the invention of the backpropagation algorithm [22], which allowed the training of multilayer perceptrons.

ANN became very popular in the 90s of last century, when they were amply used in many areas. At that time, they required feature engineering for many problems, and they were more or less abandoned in the first decade of the 21st century after the appearance of RF and GBM, since ANNs were slow to train, expensive computationally, and prone to overfitting. However, they have become very popular in the second decade of this century with the advent of the field of Deep Learning.

#### *2.10. Deep Learning (DL)*

The ML models we have seen so far have two main issues. On one hand, their performance in many cases shows *saturation*: This means that they reach a point when, irrespective of how big we grow the training set, the performance does not increase significantly. On the other hand, they work with numeric, tabular data, so they are unable to handle complex data like images, speech, text, etc. In order to model this type of data, it is required to convert it to numerical predictor variables in a very ad hoc and manual fashion. This is the so-called "feature engineering" problem. They are "shallow" methods, that is, unable to learn hierarchical representations of complex data.

These two problems are solved to a large extent with DL. DL consists mainly in ANNs with very large numbers of layers (that is the reason for the "Deep" in the name) and, therefore, huge numbers of training parameters. In this way, they are able to tap from huge datasets and increase steadily their performance without saturation.

On the other hand, some specialized layers have been developed that are able to automatically generate numerical representations (*feature maps*) of complex data. That is the case of *convolutional layers*, that are able to reformat tensor data of different dimensions. For example, in the case of 1D convolutional layers, they are able to find representations for serial data like text for language translation models; 2D convolutional layers are appropriate to model images like in computer vision models; while 3D layers can handle volumetric data like medical 3D images or video data.

Another specialized layers are the *recurrent layers*, where the output of the layer goes both to the next layer and to itself, allowing to find long-term and long-distance patterns by the use of ad hoc developed layers: Long Short-Term Memory (LSTM) [30] and Gated Recurrent Units (GRU) [31]. This type of layers is very appropriate also for serial data and is mostly used in natural language processing (NLP) applications.

Many of these specialized layers can be stacked sequentially and thus generate automatically hierarchies of representations with increasing levels of abstraction. This allows the model to learn very convoluted aspects of the data, which is not possible with the traditional ML methods. In addition, this hierarchical representation of the data can be applied to generate new specialized models with small datasets by reusing more general models fitted with much larger datasets. For example, we can develop one very efficient model to classify cats from a dataset of relatively few pictures of them by reusing some of the more abstract pre-fit layers of a more general model developed to classify animals from a huge dataset of pictures and adding to them some new layers that would be fit with the new small dataset of cat pictures. The previous layers would have learnt to identify the general shape of an animal, while the new layers would fit the specific features of cats. This is the process called *transfer learning*.

At the end of these layers, normally a multilayer perceptron is added to generate the output, whether numerical (regression) or categorical (classification).

DL is revolutionizing the ML area and is being applied in completely new fields, like drug discovery, music generation, self-driving cars, etc. They are also applied to biomedicine, [32,33] and as we will see, they have started to be used in the childhood obesity area.

After this summary of the main types of ML models, we proceed to describe the selection process of works reviewed in this paper.

#### **3. Bibliographic Search and Selection of Works for Reviewing**

An attempt was made for comprehensiveness in the bibliographic search, both in terms of time and publication media. Since the field of ML/DL applications is a very hot one, growing in an extremely fast way, it is not infrequent to find material published in congress proceedings, arXiv, etc. In addition, since this field shows a large interdisciplinarity, being at the interface between statistics, artificial intelligence, and biomedicine and including statisticians, engineers, pediatricians, nutritionists, and nurses in its research body, typical search engines used in biomedicine like Scopus, PubMed, etc., were not used in the search, as they missed many of the available references. Instead, Google Scholar was used for the bibliographic search. The search was performed by iteratively querying the engine with appropriate keywords in order to find papers that applied ML to predict childhood/adolescent obesity/overweight (e.g., childhood OR child OR adolescent AND machine learning OR data mining, etc.), extracting the matches and matching references in the corresponding bibliographies, and updating

the queries after the titles of the matching references if necessary. This procedure was repeated until no new matches were obtained. Concept papers not applied to a particular dataset were not included.

On next section, the most outstanding statistical models in the literature to predict childhood and adolescent obesity will be briefly reviewed. These will be used as comparison point to the ML models, that will be reviewed afterwards.

#### **4. Statistical Models to Predict Childhood**/**Adolescent Obesity**

There has been a lot of work performed to derive statistical models to predict childhood/adolescent obesity [34–47]. Although in principle it is advisable not to categorize variables when deriving models, whether predictor or target ones, given that the process results in a loss of information, most of the work in this area has focused on classification models for overweight, obesity, or combinations of them. The reason is obviously that most of the clinical interest is in detecting the conditions that can lead to pathological complications, and these are overweight and obesity, not BMI or similar endpoints in general. Another pathological nutritional status is undernutrition, but it is outside the scope of this Review. In the case of children and adolescents, given the large variability of both height and weight during this period of life, there is no general consensus in the definitions of overweight and obese [17], and the single-cutoff definitions used with adults, namely BMI ≥ 25 kg/m<sup>2</sup> for overweight and BMI ≥ 30 kg/m<sup>2</sup> for obesity, following the WHO definition, [3] are not valid. Instead, the common practice in the case of children/adolescents is to refer the BMI to an age- and sex-based (in some cases ethnicity too) distribution of BMI of the population at hand. The most common criterion is to define as overweight a child whose BMI is equal or above the 85th percentile for that sex and age and as obese a child whose BMI is equal or above the 95th percentile. As we will see, in most of the cases, these percentiles are obtained from the Centers of Disease Control (CDC) data if the sample is from the US [48] or alternatively from WHO growth charts [3], charts from the International Obesity Task Force (IOTF) [49], or growth charts from samples in other countries (e.g., UK90 for UK [50]).

Previous recent reviews in the area are those of Butler et al. [51], Ziauddeen et al. [52], and Butler et al. [53], all of them from 2018. Here, we will briefly review all the works found there and additional ones, in order to provide some sort of baseline predictive models to compare with the ML ones. A total of 14 papers have been found. Table 1 summarizes the main features of these models.

The most used tool to develop the statistical models is logistic regression, which is applied to predict binary outcomes. Here, a linear equation is used to predict the log-odds of a binary variable displaying one of its two alternative categories vs. the other, like being obese or being overweight vs. normal weight. This is the case of all the works but two exceptions. One is the work by Cortés-Martín et al. [47], where proportional-odds ordinal logistic regression is used, which is an statistical model appropriate to predict ordinal variables. In this case, the predicted outcome was the three ordered categories of BMI, namely normal weight vs. overweight vs. obese for children and adolescents (5–17 years). The other case is the work by Mayr et al. [38] where the authors use quantile regression with boosting to derive prediction intervals (which are at the end quantiles of the BMI for future observations) for BMI at different ages in childhood.

In addition, in the paper by Pei et al. [40], the standardized BMI at 5 years was also predicted by means of linear regression, together with obesity at 10 years with logistic regression. Moreover, in the paper by Druet et al. [35], a metanalysis is performed from the odds-ratios obtained from several logistic regressions for 10 different cohorts of variable nationality to estimate an odds-ratio for childhood obesity as a function of the 0–1 year weight gain standard deviation score (SDS).


**Table 1.** Summary of statistical models.

\* NW = normal weight; OW = overweight; OB = obese; \*\* L = longitudinal model; CS = cross-sectional model.

The rest of the papers aim at the prediction of overweight or obesity at one or several ages or a range of ages exclusively by means of logistic regression. Papers focused on the prediction of overweight are those of Steur et al. [34] (at 8 years), Weng et al. [41] (3 years), Graversen et al. [43] (at adolescence), and Redsell et al. [46] (5 years). Papers focused on the prediction of obesity are those of Druet et al. [35] (7–14 years), Levine et al. [36] (5 years, stratified by sex), Manios et al. [39,44] (9–13 years), Pei et al. [40] (10 years, as said before), Santorelli et al. [42] (2 years) and Robson et al. [45] (5 years). In the paper by Morandi et al. [37] both endpoints are predicted: overweight and obesity at both 7 and 16 years; in addition, predictions are made for *persistent* overweight and obesity, that is, overweight and obesity at *both* 7 and 16 years. By considering the definition of overweight and/or obesity in these works, some [34,35,37,39,41,46] used the IOTF criteria, others [40,44,47] used the WHO one, and other [45] used the CDC criteria.

When using logistic regression, in most of the cases, [34,35,37,41,42,45,46] a stepwise variable selection is performed from a pull of predictor variables to select the final ones to use in the definitive model or models. In one case [39,44], a score is derived "by hand" by combining odds-ratios obtained from simple logistic regressions of different variables and then used in a simple [39] or multiple [44] logistic regression to estimate its odds-ratio. In two other cases [36,43], the predictor variables are predefined, and in one case [40] several predefined predictor variables are used at the beginning, but then, the model is rederived with only the significant ones.

As regarding the predictor variables, the most popular ones, in decreasing order, are parental BMI (8 times), sex and birth weight (7 times), smoking mother during gestation (6 times), weight gain at some previous period (5 times), parental education (4 times), exclusive breastfeeding during some initial period (3 times), etc. Sometimes, versions of these variables are used, like categorized ones (e.g., obesity instead of BMI) or standardized ones. Some other times, mother's version (instead of parental ones) are used, e.g., mother's BMI, or mother's education. There are two cases where a set of genetic polymorphisms are used; in one case [37], incorporated as a score obtained as sum of risk alleles, they appeared to add no significant predictive capacity, but in the other, [47] in the form of components of a Multiple Component Analysis (MCA), they did.

The cohorts used in the derivation of the models are of variable origin: Netherlands [34], UK [36,41,42], Finland [37,43], Germany [38,40], Greece [39,44], USA (Latino community) [45], and Spain. [47] In the case of the metanalysis previously mentioned [35], the 10 different cohorts are also from multiple countries: UK, France, Finland, Sweden, USA, and Seychelles. We can see that most of the work has been performed in developed countries with mostly Caucasian samples, which limits their applicability. The sizes of the cohorts are also variable: They range from 166 [45] to around 13,000. [41] The metanalysis [35] includes more than 47,000 cases in the 10 cohorts.

In terms of model validation, some of the models [35,37,41,43,44,46] were externally validated (as a matter of fact, the works by Manios et al. [44] and Redsell et al. [46] are external validations of the previous models described in Manios et al. [39] and Weng et al. [41]), while other models were internally validated through bootstrap [34,45,47] or cross-validation [38,40]. In two cases, [42,43] both internal and external validation was used, while in one case [36] no validation was performed at all.

If we focus in the comparison of performances of the different logistic regression models, we can use the AUCROC (that equals the so-called c-index or concordance index) as a criterion for discrimination. Depending on where the linear predictor threshold of the model is set to assign one category or its alternative to the predictions, we can have very different sensitivities and specificities, as well as PPV (precision) and NPVs; to select the threshold we must take into account the purpose of the model, as well as the possible costs of false positives and/or false negatives. However, as a *global* measure of the discriminative capacity of the model, before its practical application by selecting a threshold, the AUCROC is a well-established criterion. Obviously, for two models with the same AUROC, one internally validated and the other externally validated, we will prefer the one externally validated, especially if it is with a large, unrelated cohort, because it will approximate more closely a real-life prediction than the internal validation that is based on data reutilization.

In this way, the models of the different works using external validation would be ranked in the following order of decreasing AUROC: Santorelli et al. [42] (0.89), Morandi et al. [37] (0.79), Druet et al. [35] (0.77), Weng et al. [41] (0.75), Redsell et al. [46] (0.67), and Manios et al. [44] (0.64). In the case of the paper by Graversen et al. [43], the AUROC is provided only for the internal validation. These values should be taken with caution, given that they do not compare the same "difficulty" in prediction, e.g., if the testing cohort is very similar to the training one, a very large AUROC could be obtained very easily; for example, an external validation with a different cohort to the training one is a more demanding task that an external validation with a random split of the same cohort, even if the latter is not used for training. Moreover, the difficulty depends on the relatedness between the predictor and target variables, e.g., the prediction of obesity at age 9 is more difficult if the predictor variable is weight gain between 0 and 1 years than if the predictor variable is weight gain between 7 and 8 years.

On the other hand, the ranking of models for internal validation by decreasing AUROC is Robson et al. [45] (0.78) and Steur et al. [34] (0.75). Mayr et al. [38] and Pei et al. [40] do not provide AUROC values. In principle, the evidence of predictive capacity of these models is weaker given that they have not been externally validated.

Finally, we should mention that, in terms of the type of prediction, all the models have a longitudinal setting, that is, they aim at predicting the endpoint *in the future* from *predictor variables* *taken in a previous point in time*, at least partially, e.g., predict overweight at 8 years using birth weight and mother smoking at gestation. These are designated in Table 1 as "L" type of prediction. The only exception is the work by Cortés-Marín et al., which has no predictive but explanatory purpose and therefore uses a cross-sectional setting, where the predictor variables are taken at the same time than the endpoint. This is designated in Table 1 as "CS" type of prediction. Here, the aim is to obtain the relative strengths of associations of variables of different domains with putative explanatory character (diet, age, sex, genetic polymorphisms, microbiota), although given the cross-sectional setting of the model, no demonstration of causality can be obtained from it but rather of putative variables to consider for a further test with a longitudinal setting.

#### **5. Machine Learning Models to Predict Childhood**/**Adolescent Obesity Based on BMI**

In this section, we will review the ML models derived to predict BMI (regression) and/or categorized versions of it (classification), e.g., normal-weight, overweight, obesity, etc.

To our knowledge, there are only two previous reviews of ML models to predict childhood/adolescent obesity. One early paper in 2010 by Adnan et al. [54] described the scarce work performed before it; another very recent paper [55] reviews the work up to 2020, together with the area of computerized decision support for the prevention and treatment of childhood obesity. However, the latter paper, being arranged as a systematic review, lacks many of the publications in the area of ML, and some of the ones described there could be more appropriately defined as statistical models (e.g., generalized linear mixed models and linear and logistic regression) or are targeted to the prediction of physical activity in children.

In what follows, we will use more or less a chronological order in the description of the works conducted in the area. As we will see, the field has experienced an explosion very recently, especially through the use of electronic health records (EHR) as sources of very large datasets. ML methods will be abbreviated as in Section 2. Table 2 summarizes the main features of the models that will be described.

The first attempts to use ML to predict childhood obesity are those of Novak and Bigec, back in 1995 [56] and 1996 [57]. In these papers, they describe the use of ANN to predict childhood obesity. However, the work is of preliminary nature and is more a description of the ANN theory and method, without providing a description of the results of a particular model derived from a particular sample.

This work was followed by that of Zhang et al. in 2009 [58]. Here, the aim is to compare the performance of ML models with the traditional logistic regression model. By using an UK cohort (the so-called Wirral database of >16,000 children), they developed several models to predict overweight at 3 years from previous data, using predictor variables available at 8 months or at 2 years. These variables were all child features like sex, BMI at 8 months, adjusted SDS of height at different visits, weight gain between pairs of visits, etc. Different ML methods were used: DT, Association Rules, ANN, Linear SVM, RFB (Radial Basis Function) SVM, BN, and NB. In the case of the prediction at 8 months, the ANN showed the largest accuracy, although the RBF (Radial Basis Function) SVM displayed the largest sensitivity (probably more useful for clinical purposes). For the prediction at 2 years, the largest accuracy was obtained with the Bayesian methods, although the largest sensitivity was observed in the case of RBF SVM again. Logistic regression had the largest specificity, but the sensitivity and accuracy were much worse than the ML models. They also derived models to predict obesity, but the quality of them was very low. No validation was performed in any of the models developed.


**Table 2.**Machine Learning (ML) models to predict BMI or its categories.

\* When several models are derived, the largest number of predictors is reported; \*\* NW = normal weight; OW = overweight; OB = obese; \*\*\* L = longitudinal model; CS = cross-sectional model; \*\*\*\* ND=Not described. ML Method abbreviation as in Section2.

A work in 2011 by Rehkopf et al. [59] used and American cohort (the NHLBI Growth and Health Study) of ca. 2000 white or black girls 8 or 9 years old that were followed for 10 years to predict the change from 9 to 19 years in the CDC BMI percentile and the transition from normal-weight to overweight or obese by means of RF models. They took 41 predictor variables from different domains: diet, physical activity, psychological, and social and parent health in order. They applied variable importance techniques by permutation to estimate the relative importance of these variables. For the first outcome, body dissatisfaction, drive for thinness, physical appearance (psychological), income and parental education (social), and other psychological variables were the most important variables. In the case of the transition to overweight or obesity, the most important predictor was income, followed by psychological variables. Again, no internal/external validation of the model was performed.

Following their review in 2010 [54], Adnan et al. published in 2012 three papers in this area [60–62] to predict the nutritional status (normal-weight, overweight and obese) by means of NB and a cohort of 140 Malaysian children 9–11 years old. They applied 19 predictor variables of different domains obtained from literature review: children features, lifestyle (including physical activity and diet), and family/environment. In the first work [60], they observed that the use of these variables improved the accuracy of obesity prediction by NB as compared to the work by Zhang et al. [58]. This approach was improved in the second paper [61] by using a genetic algorithm to select predictor variables in order to avoid the problem in NB with many variables where the predicted posterior probabilities turn to zero each time at least one of the predictor variables prior probability is zero. The third paper [62] adopted two additional methods for variable selection for NB models: variable importance with CART and Euclidean distances. The models were not validated in any of the papers.

Another work from 2012 is that of Lazarou et al., [63] where diet variables were used to predict overweight + obesity vs. normal-weight. A Cypriot cohort of ca. 600 children 10–12 years old was used with a cross-sectional setting. They used questionnaires of eating frequencies of food groups as predictor variables (fried food, fish and seafood, delicatessen meat, soft drinks, and sweets and junk food). By developing many DTs, for both boys and girls, they were able to derive rules of overweight + obesity risk as a function of diet patterns and sex. The approach was validated by bootstrap, but the results were not shown. Finally, they developed logistic regression models using as predictor variables PCA components of the diet variables; only one of the PC of the girls model was significant.

One paper in 2014 by Pochini et al. [64] predicted overweight and obesity in high-school students (14–18 years old) from 9 lifestyle predictor variables, using both logistic regression and DT, again, in a cross-sectional setting. The sample modeled was a cohort of ca. 15,000 high-school students in Columbia, USA (from the 2011 CDC Youth Behavior Risk Survey). For obesity, logistic regression significant factors were consumption of fruit/vegetables, smoking, being physically active, having regular breakfast, drinking fruit juice, and drinking soda; the remaining variables in the DT after pruning were physically active and tobacco. For the overweight prediction, the logistic significant variables were having regular breakfast and being physically active. For the DT, no variable remained after pruning; before pruning, the variables were breakfast, fruit juice, and sleep. In the case of the DT, the models were externally validated with a 30% of the original sample.

Dugan et al. [65], in 2015, used multiple ML methods to predict obesity at 2 years using predictor variables obtained before that age. The data came from a clinical decision support system, CHICA, that contained information from a multiethnic cohort in USA of >7000 children. Random Tree, RF, J48, ID3, NB, and BN were tried out. The best performing algorithm was ID3, with an accuracy of 85%, sensitivity of 89%, PPV of 84%, and NPV of 88%. Using some sort of variable importance by removing variable by variable, they found that the strongest predictors were overweight before 24 months, followed by being very tall before 6 months. All the models were internally validated through 10-fold cross-validation.

In 2016, a paper by Lingren et al. [66] was published aimed at the identification of putative cases of severe early childhood obesity from children 1–6 years old above the 99th BMI percentile, to separate them from those due to medications, pathologies, etc. The objective was to develop a cohort for further genotyping studies, in order to understand the genetic basis for severe early childhood obesity. Therefore, they attempted to optimize the PPV, in order to be most effective in the detection of these children. The dataset used corresponded to a cohort of >5000 of EHR from two children hospitals, one in Boston and another in Cincinnati. The predictor variables used were structured data (demographics, anthropometrics, ICD-9 diagnosis codes, and medications) as well as unstructured data (narrative) by NLP. They used both rule-based methods and ML methods (SVM and NB) that were tested in an external split of the original data. In general, the rule-based method worked better, but the ML one had more flexibility to leverage PPV and sensibility and to select variable sets.

Abdullah et al. published a paper [67] in 2017, where they used ML to predict obesity at 12 years from a Malaysian cohort of >4000 children 12 years old. The predictor variables were obtained from questionnaires and included three domains: socio-demographic, physical activity, and diet. Multiple methods for variable selection were tested, as well as multiple ML methods: BN, DT (J48), NB, ANN, and SVM. The best results were obtained with J48, together with consistency + linear forward variable selection. In this case, the models were not validated.

A later paper in the same year by Rios-Julián et al. [68] attempted to predict obesity + overweight (following the CDC criteria) vs. normal-weight by using BMI and other anthropometric variables in a community of Me'Phaa ethnicity in Mexico. They modeled a cohort of 221 children 6–13 years old by using different ML models: J48, logistic model trees, ANN, RF, and logistic regression. Three groups of variables were tried on: all; all but skinfold thickness; and sex, age, height, weight, BMI, and skinfold thickness. They obtained not very different results for the different variable groups and models, and in general, all the models yielded excellent predictions. All the models were internally validated by 10-fold cross-validation.

Moreover, in 2017 Wiechman et al. published a paper [69] that used DTs (C4.5 type) to gain insight on the factors influencing child obesity in Hispanic preschoolers in the USA. The sample analyzed was a cohort of children of 238 families, 2–5 years old, of Hispanic ethnicity. They develop shallow C4.5 decision trees to predict overweight by using variables from different domains: demographics, caregiver feeding style, feeding practices, home environment, dietary information, beverage consumption, social support, family life, integrated behavior model, and spousal support. They found some clues for obesity development: If the mother cares for the child or if she works but the father has high-level education, the child has less probability of being overweight. If the child is fed to avoid tantrums, the child tends to be more obese. The models were not validated.

The last paper in 2017 is that of Zheng and Ruggiero [70]. They used a dataset comprising a cohort of >5000 high-school (14–18 years) students in the USA. They predicted obesity between 14 and 18 years from 9 variables within three different domains: energy update, physical activity, and sedentary behavior. They used logistic regression, DT, kNN, and ANN. The best models were ANN and kNN, and all the ML models performed much better than logistic regression. All were internally validated by 10-fold cross-validation.

The year 2019 saw an explosion of ML and DL models to predict childhood obesity. We have identified up to 7 papers in this area, together with other aimed at related endpoints that will be described in the next section. Several of them make use of EHR as sources of data. We finish this section by describing these works.

An example of DL models is that of Gupta et al. [71] They used a cohort of EHR from ca. 68,000 children/adolescents with visits to medical centers for at least 5 years, in order to predict BMI and obesity from 3 to 20 years in groups of 3 consecutive years using data from the 3 previous years, resulting in multiple models. Recurrent NN of the LSTM type were used, with predictor variables from the EHR including medical conditions observed, drugs prescribed, procedures requested, and measurements taken, together with static demographic data. Data was split into three subsets: 60% for training, 20% for hyperparameter validation, and 20% for external validation. The whole training dataset was used to train a global model, and then, by transfer learning, specialized models for each sub-cohort were obtained by retraining the global model with the corresponding subset of data. In order to identify

important variables, they used embedding, while to identify important time intervals, they used attention techniques. The RNN was compared with RF and linear regression, which do not take into account the longitudinal information, and the RNN gave a much better performance. The performance of the models decays with the temporal distance between the acquisition of the predictor variables and the time of BMI prediction in the future, as expected.

Another work that used EHR data is that of Hammond et al. [72], who used a multiethnic cohort obtained from multiple providers in a safety net in New York city that included >3000 children. The authors predict obesity at year 5 by using logistic penalized regression, RF, and GB. In addition, obesity was predicted by deriving regression models for z-BMI using LASSO, RF, and GB and applying an obesity cutoff for the z-BMI predicted. They used feature engineering to generate predictor variables from the EHR: demographic information, home address, vital signs, and medications from the children when they were < 2 years old and from the mother vital signs, diagnosis codes; procedures; and laboratory results before, during, and post-pregnancy. They developed different models for boys and girls. The most important predictors were weight-for-length z-score, BMI between 19 and 24 months, and the last BMI measure before age 2. The best models have an AUCROC of 81.7% for girls and 76.1% for boys. Internal validation was conducted by bootstrap CV and external validation with a previously selected test split.

One case of work aiming at understanding risk factors for childhood obesity is that of Lee et al. [73]. They used a South Korean longitudinal cohort of ca. 1 million children and used DT models to predict obesity vs. normal-weight between 24 and 80 months (overweight children are removed). They used a set of 21 predictor variables of different domains: socioeconomic status (SES (modelled after attending medical aid or not), maternal factors (e.g., pregestational obesity, abdominal obesity, hypertension, smoking, etc.), paternal factors (obesity, abdominal obesity, and hypertension), and child factors (preterm, exclusive breastfeeding, high consumption of sugar sweetened beverages, etc.)) The model was externally validated with a 40% test split, resulting in an accuracy of 93%. By using a CHAID-type of variable selection, the most important predictor variable was mother obesity, followed by parental obesity and SES; other important factors were old pregnancy and gestational diabetes and hypertension. Child factors were exclusive breastfeeding, consumption of sugar-sweetened beverages, and irregular breakfasting. Interestingly, they observed that child's z-score for weight at birth and z-score for weight-for-height were not selected.

A South-Korean dataset was also used in order to understand factors affecting obesity is that of Kim et al. [76], although in this case it deals with *adolescent* obesity. They used a cohort of >11,000 students from South Korea and 19 predictor variables from questionnaires of different domains: sociological, anthropometric, smartphone use, obesity, other. They predict the three categories of BMI: underweight, normal, and overweight, by means of a General Bayesian Network (GBN), and compare it with many different ML methods resulting in GBN displaying the best fit: the best accuracy is 53.7%, and the AUCROC is 0.758. No validation is performed. The variable most related to BMI class is pocket money. More interestingly, they use the GBN to perform a "what-if" analysis by modifying the values of different variables or combination of variables in order to get an understanding of putative mechanisms for risk of obesity. For instance, the combination of high pocket money and low wealth increases a lot the probability of obesity, etc.

An adolescent cohort was also used by Singh et al. [75] but in this case from the UK. The Millenium cohort of UK of children born between 2000 and 2001, particularly the subsets MC2 to MC5, was modeled in order to predict the BMI at 14 years (MC6). The data was externally validated with a test split of 25%. Linear SVM, linear regression, and ANN were tried, and the best performance was obtained by the ANN, followed by the SVM.

A work that uses XGBoost is that of Pang et al. [77]. The authors predict obesity in the period 2–7 years from data in windows in the 0–2 years period with a cohort of ca. 27,000 children from Philadelphia in the Pediatric Big Data repository. Variables included vital signs, laboratory values, and provider information, resulting in a total of 102 predictors. Data was divided into train 1 (40%),

train 2 (40%), and hold-out (20%) to determine hyperparameters iteratively and train/test the model. Different ML models were tried, and the best was XGBoost, giving an AUCROC of 0.81, and for the threshold that gives a recall of 0.8, the precision, F1, accuracy, and specificity were 30.9%, 44.6%, 66.14%, and 63.27%, respectively. They analyzed the models with variable importance techniques, resulting in weight-for-height at month 24, weight at month 24, weight for height at month 18, and race being the most important ones. Different races, ethnicities, and caregivers had different importance distributions. Using sensitivity analysis, it was observed that the prediction of obesity at later times degrades, as expected.

An interesting alternative set of predictive variables to the ones described so far is the use of neuroimaging biomarkers. This is the case of the work by Park et al., [74] who used resting-state functional magnetic resonance imaging (rs-fMRI) to derive predictive models for BMI progression (and indirectly future BMI) of adolescents. A cohort of 76 individuals from the Enhanced Nathan Kline Institute Rockland Sample (NKI-RS) database of white and African American preadolescents (average age of 11.94 years) was used. Their BMI was measured in a first visit, followed by a second visit about 1.5 years later. From the fMRI of their brain in the first visit, both considering subcortical volume and cortical surface, 379 Degree-Centrality (DC) values of different parts of the brain were extracted. These were used with LASSO to predict the BMI progression (DeltaBMI/Deltat) and indirectly BMI in the second visit. Only six DC remained after the variable selection in the LASSO. These variables were entered in a linear regression model. The model was internally validated with leave-one CV, giving and Intra Class Correlation (ICC) for DeltaBMI of 0.70, and ICC for BMI of 0.98, and (when predicting the binary variable increase/decrease of BMI) an AUCROC of 0.82. Brain regions of the selected DCs were correlated with the eating disorder, anxiety, and depression. The approach was applied to a local South Korean dataset of 22 young adults (average age of 21.4), and the results were similar, suggesting robustness of the first model.

#### **6. Machine Learning Models to Predict Related Outcomes**

Some other works in the literature make use of ML to derive predictive models not of BMI or BMI categories, but of related endpoints. Table 3 summarizes these models.


**Table 3.** Summary of ML models to predict BMI-related outcomes.

ML Method abbreviation as in Section 2.

For instance, a work by Duran et al. [82] describes the use of ANN models to predict body fat percentage (BF%) and its excess (BF% above 85th percentile), which is an alternative measure of obesity to those based on BMI. A cohort ca. 2000 non-Hispanic white children less than 20 years old were used here. Different models were derived for boys and girls. The predictors used were age, height, weight, and waist circumference. The ANN were compared with the prediction using z-BMI and z-WC. In the case of boys, ANN has better accuracy, sensibility, and specificity than the simple models, especially

the z-WC one; in the case of girls, the ANN performs similarly to the z-BMI one and better than the z-WC one. The models were internally validated and externally validated with a test split.

On the other hand, there are models aimed at the prediction of the success of therapies or treatments to decrease childhood obesity. One case is described in a work by Hasan et al. [79], where they used RNN (both LSTM and GRU types) and probabilistic models to try to predict the positive or negative reception by obese adolescents of communication sequences by a counselor in interviews to promote weight reduction behavior. The authors used a dataset of 129 motivational interviews between a counselor and an adolescent (accompanied by a caregiver) for promoting weight reduction behavior. These interviews included 50,239 encoded sequences of utterances ending or not in a positive change talk or positive commitment language by the adolescent or caregiver. Given the high imbalance of the sequences of utterances (most of them are successful ones), they evaluated the models through either synthetic oversampling of the negative sequences or under-sampling of the positive ones. The models were trained with 80% of the data and externally evaluated with 20% of the data. In the case of under sampling, the LSTM models with target replication (LSTM-TR) resulted in the best models in terms of F1, precision, and recall. The probabilistic models were much worse. When using oversampling, the LSTM-TR was again the best model. These models can therefore be used to design communication strategies that achieve the best success.

Another example of prediction of therapy success is the work by Öksüz et al. [80] They used a cohort of 20 overweight or obese children 11–16 years old in Switzerland to predict the success of a weight-decrease 6-months therapy (defined as BMI after therapy < 0.4 BMI units than before). As predictors, they measured the heart rate at several intervals during a run test and a cooldown period, plus weight, age, BMI, and height. They tried different ML methods: several SVM, kNN, DT, and GB. Nested cross validation was used to train and internally validate the models given the small sample size. The best model used linear SVM, giving an accuracy of 85%. They used permutation tests to estimate the relative importance of the predictors, and several heart rate ones are the most important. These ML models performed better than the prediction of two domain experts.

A related task is the detection from EHR of attention by pediatricians to childhood obesity and associated medical risks. This is the case of the paper by Turer et al. [81]. They used a dataset of doctor visits of >7000 overweight/obese children 6–12 years old in several centers in Texas. They developed a rule-based classification algorithm to detect from EHR doctor's behaviors that indicate therapeutic "attention towards excess BMI", "attention towards excess BMI + comorbidities (medical risk)", and "no attention". They used different types of evidence, in addition to pathology codes, from EHR indices: diagnosis codes, orders for laboratories, medications, and referrals. The algorithm was externally validated by manual review of EHR data of 309 additional visits. Sensitivity to BMI alone was 96%, while to BMI/Medical risk was 96.1%.

We end this section with an interesting paper by Nau et al. [78] describing a predictive model for obesogenic vs. obesoprotective community environments. Here, the aim is predicting not the obesity for a particular child or adolescent, but rather if the features of a community are those that foster childhood obesity within it, or on the contrary, they protect against it. These authors analyzed 99 communities in Pennsylvania, 50 of them in the high quartile of child obesity prevalence and 49 others other in the lowest quartile. Therefore, it uses community-aggregated data to try to predict obesogenic vs. obesoprotective communities. They used 44 variables as potential predictors in different domains: food services, social, physical activity establishments, and land use. They used variable importance measures with RF to identify the most important variables. A total of 13 were deemed important above noise; unemployment was the most important, followed by population density, social disorganization, proportion of people with less than high school education, population change, no car ownership, etc. These are physical activity and social variables. The most important food services variables are counts of snacks stores and counts of fast food chains score. Models were also obtained without social variables that are considered causal of the others; the results gave similar ranking of the other variables. It seems that well-off communities are more protected against obesity. It was also observed

that classification accuracies were different for high and low obesity communities, indicating different structures/hierarchy of variables for these two groups. The models, however, were not internally or externally validated.

#### **7. Discussion**

In the present Review, we have seen a large amount of models to predict childhood/adolescent obesity. We have grouped them into two types: statistical ones and ML ones. The former models use traditional statistical techniques, mainly logistic regression, [34–37,39–46] although there are cases using linear regression [40], quantile regression [38], and ordinal logistic regression. [47] The ML models use a wide variety of ML methods: ANN [56–58,67,68,70,75], SVM [58,66,67], DT [58,64,65,67–70,73], NB [58,60–62,66,67], BN [58,65,67,76], LASSO [72,74], kNN [70], RF [59,65,68,72], GBM [72,77], and DL (RNN [71]).

In general, when in the same work logistic/linear regression is compared with ML models when fitting the same dataset [58,64,68,70,72,75], the latter give better results than the former in terms of prediction performance. This confirms that ML techniques are able to yield better predictions, not just by fitting better the training set but also through giving better results in internal and/or external validations.

On the other hand, if we analyze the models in terms of predictor variables, we see that the statistical models make use in most of the cases of a reduced set of well-established risk factors for childhood obesity: parental BMI, sex and birth weight, smoking mother during gestation, weight gain at some previous period, parental education, exclusive breastfeeding during some initial period, etc. Only the work by Cortés-Martín et al. [47] uses a wider set of predictor variables, including a Mediterranean diet score, multiple SNPs, and a marker of microbiota (urolithin metabotype), in addition to sex, age, and ethnicity. On the contrary, in multiple ML models, we observe other types of variables, alone or in combination with the "traditional" predictor variables. For example, the work of Rehkopf et al. [59] uses psychological predictor variables, that of Lazarou et al. [63] focuses mainly on diet, while that of Park et al. [74] utilizes rs-fMRI predictor variables. There are also several papers that use lifestyle variables (including both diet- and physical activity-related variables) [59–62,64,69,70]. Works that stand out for their use of specially wide sets of multidomain predictor variables are those of Rehkopf et al. [59] (diet; physical activity; and psychological, social, and parental health); Wiechman et al. [69] (demographics, caregiver feeding style, feeding practices, home environment, diet, social support, spousal support, family life, etc.); and Kim et al. [76] (wealth, smartphone use, pocket money, academic performance, sleeping quality, etc.) The latter work is interesting also because it makes a "what-if" analysis where some variables are modified, and their concerted effect on the predicted obesity is evaluated; this is an interesting approach to use ML models as simulation tools to suggest possible therapeutic or preventive interventions.

Therefore, we could say that the statistical models are probably more oriented towards earlier ages, where the number of factors affecting is less variable, or to predicting shorter times in the future. We would be mainly doing a short extrapolation of the BMI curve: Obese children would be those who were obese some short time before, and in the case of babies or early age children, gestational factors like smoking mother or gestational diabetes would also be of importance. These are simpler models with immediate implementation in the clinics, as they contain a small number of easily retrieved predictor variables. On the contrary, once the multidomain factors of obesity, like diet, physical activity, psychological variables, genetic, family environment, sociological, etc., enter the scene, which takes place in late childhood or adolescence, ML models are more appropriate. This is also for predictions spanning large periods of time, like the model by Gupta et al. that was developed to predict BMI and obesity from 3 to 20 years, or when we require higher accuracy in the prediction. In addition to prediction purposes, these ML models are useful in that they can be used to rank these wide sets of variables by importance, thus allowing to better identify the strongest risk factors and generate

new ideas for future preventive interventions [59,71,73,74,77,83]. In the case of longitudinal models, the strongest influence times can also be derived through attention techniques [71].

A specially interesting situation from the point of view of predictor variables are the ML models that use EHR [65,66,71,72,77], since they appear as very powerful approaches to predict childhood/adolescent obesity by tapping from the large databases of medical records with many patients and extended sets of predictor variables, including measurements, drug prescriptions, conditions observed, and procedures requested. These are especially amenable of DL techniques of the RNN type, which are specialized in dealing with time serial data like this. As described above, one case in the models presented here is that of Gupta et al. [71], where they were able to deliver excellent predictions of BMI and obesity along the whole childhood and adolescence growth curve. Another interesting use of RNN in this framework would be the extraction of information from narrative data in the medical records by means of NLP; an example of predictor variables extracted through NLP is the work by Lingren et al. [66].

These ML/DL models using EHR could be implemented in hospitals and primary health care centers to provide predictions and alerts through *dynamic*, *online* training. By this, we mean a model that is fed continuously with new data and is retrained periodically to enhance its predictions with the new data. This is opposed to *static*, *o*ffl*ine* training where the model is fit with a definite dataset and only once and forever. All the models we have reviewed are in the last category.

EHR offer also very interesting opportunities as Big Data sources for remining through ML/DL models. For example, we have seen the case of the work by Lingren et al. [66] where the EHR was exploited to identify a cohort of severe early childhood obesity for further genotyping efforts. Many other applications are possible, like analyzing and predicting comorbidities through statistical network analysis [84–86], phenotyping, diagnosing, pharmacoepidemiology and pharmacovigilance, etc. [87].

In addition, we should mention the fruitful application of ML/DL models to the field of childhood obesity prevention, not just through the identification of risk subpopulations but through the analysis of different aspects of the preventive intervention. We have seen that these models can be useful to optimize obesity prevention strategies [79], predict its success [80], and identify doctor's behaviors attentive or not of childhood obesity and related risks in the clinics [81] and, from a community point of view, social environments with obesogenic properties that should be targeted with preventive governmental policies [78].

To summarize, ML/DL approaches offer extraordinary advantages and new insights for childhood and adolescent obesity prediction and prevention over statistical methods. The following points summarize them:


To be fully fair, we should as well mention the *disadvantages* of ML methods over statistical ones. The first one is that making statistical inferences (parameter estimation and hypothesis test) in these models is more complicated than in statistical models. However, it is not impossible, and resampling and simulation techniques could be used if required. Another drawback of ML models is that they are more difficult to interpret, and they are typically called "black-box" type of models. This is an area of intense research, and we have seen above several examples of techniques to solve this problem, namely, the techniques of variable importance, embedding, and attention.

Seeing the advantages and disadvantages of both types of methods, we can ask: when is ML more appropriate, and when are statistical models? The following patterns have emerged:


Hopefully this Review will help the wide set of researchers in the field, including pediatricians, nurses, nutritionists, statisticians, data scientists, engineers, and epidemiologists, to get an updated view of these novel approaches and the opportunities they open, in order to approach in a more effective and creative way the prevention of childhood and adolescent obesity. We are at the beginning of a qualitatively new phase that can revolutionize this field in the near future.

**Author Contributions:** Conceptualization and writing, G.C. The author has read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The author acknowledges the Community of Madrid Government for providing the funds for Open Access publication of this article.

**Conflicts of Interest:** The author declares no conflict of interests.

#### **References**


© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

## **The E**ff**ect of Supportive Implementation of Healthier Canteen Guidelines on Changes in Dutch School Canteens and Student Purchase Behaviour**

**Irma J. Evenhuis 1,\* , Suzanne M. Jacobs <sup>2</sup> , Ellis L. Vyth <sup>1</sup> , Lydian Veldhuis <sup>2</sup> , Michiel R. de Boer <sup>1</sup> , Jacob C. Seidell <sup>1</sup> and Carry M. Renders <sup>1</sup>**


Received: 16 July 2020; Accepted: 10 August 2020; Published: 12 August 2020

**Abstract:** We developed an implementation plan including several components to support implementation of the "Guidelines for Healthier Canteens" in Dutch secondary schools. This study evaluated the effect of this plan on changes in the school canteen and on food and drink purchases of students. In a 6 month quasi-experimental study, ten intervention schools (IS) received support implementing the guidelines, and ten control schools (CS) received only the guidelines. Changes in the health level of the cafeteria and vending machines were assessed and described. Effects on self-reported purchase behaviour of students were analysed using mixed logistic regression analyses. IS scored higher on healthier availability in the cafeteria (77.2%) and accessibility (59.0%) compared to CS (60.1%, resp. 50.0%) after the intervention. IS also showed more changes in healthier offers in the cafeteria (range −3 to 57%, mean change 31.4%) and accessibility (range 0 to 50%, mean change 15%) compared to CS (range −9 to 46%, mean change 9.7%; range −30 to 20% mean change 7% resp.). Multi-level logistic regression analyses on the intervention/control and health level of the canteen in relation to purchase behaviour showed no relevant relations. In conclusion, the offered support resulted in healthier canteens. However, there was no direct effect on students' purchase behaviour during the intervention.

**Keywords:** schools; nutrition; canteen; adolescents; implementation; purchase behaviour

#### **1. Introduction**

To support adolescents to make healthier food choices, many national governments have formulated food policies to encourage a healthy offering of foods and drinks in schools and their canteens [1]. To create healthier canteens, nudging strategies are used, by which the healthier option is made easier without restricting the freedom of choice [2]. Such strategies focus on availability and accessibility by offering mainly healthier products, discouraging the consumption of unhealthy foods by making them less readily available, making the healthier option the default, and promoting healthier products [3–6]. Evaluations of such strategies show improvements in food and drinks offered in schools, which is likely to influence students' consumption of healthier foods and drinks [4–7]. However, these results are only seen when the policy is implemented adequately [8,9], which can be increased with supportive implementation tools [10–12]. The provision and type of such tools differ within and across countries, though training, modelling, continuous support such as helpdesks and incentives are commonly provided [12].

In the Netherlands, most schools have no tradition of offering school meals, but do offer complementary foods and drinks in a cafeteria and/or vending machines. Most students bring their lunch from home, and buy additional food and drinks at school, or at shops around the school [13]. The national Healthy School Canteen Programme of the Netherlands Nutrition Centre, financed by the Dutch Ministry of Health, Welfare and Sports, provides schools with free support to create healthier canteens (cafeteria and/or vending machine) [14–16]. This includes, for example, a visit and advice from school canteen advisors (i.e., nutritionists), regular newsletters, and a website with information about and examples of healthier school canteens. The programme has been shown to lead to greater attention to nutrition in schools and a small increase in the offering of healthier food and drinks in the cafeterias, but not in vending machines [15,17,18]. However, until then, the programme only included availability criteria.

Based on literature and in collaboration with future users and experts in the field of nutrition, the Netherlands Nutrition Centre developed the "Guidelines for Healthier Canteens" in 2014, and updated them in 2017 [19]. These guidelines include criteria on both the availability and accessibility of healthier foods and drinks (including tap water) and an anchoring policy. The guidelines distinguish three incremental health levels: bronze, silver and gold [19]. Only silver (≥60%) and gold (≥80%) are qualified for the label "healthier school canteen". These guidelines define healthier products as food and drinks recommended in the Dutch Wheel of Five Guidelines, and products that are not included but contain a limited amount of calories, saturated fat and sodium [20]. To increase dissemination of the guidelines, an implementation plan was developed, based on experience within the Healthy School Canteen Programme and in collaboration with involved stakeholders from policy, practice and science [21]. This study investigated the effect of this implementation plan to support implementation of the Guidelines for Healthier Canteens in schools on both changes in the health level of the canteen and in purchase behaviour of students. Moreover, the relation between the health level of the canteen and purchase behaviour is determined.

#### **2. Materials and Methods**

#### *2.1. Study Design*

The effect of the implementation plan was evaluated in a 6 month quasi-experimental controlled trial with 10 intervention and 10 control schools, between October 2015 and June 2016. The control schools were matched to intervention schools on the pre-defined characteristics: school size (fewer or more than 1000 students); level of secondary education (vocational or senior general/pre-university); and how the catering was provided (by a catering company or the school itself). Additionally, we aimed to match the control schools to intervention schools on contextual factors: the availability of shops near the school and the presence of school policy to oblige students to stay in the schoolyard during breaks. Intervention schools received support to implement the Guidelines for Healthier Canteens according to the plan (the intervention), while control schools received only general information about the guidelines, although they also received the support after the intervention period. Further details about the study design are provided in the study protocol [22]. This study was registered in the Dutch Trial Register (NTR5922) and approved by the Medical Ethical Committee of the VU University Amsterdam (Nr. 2015.331).

#### *2.2. Study Population*

The schools, in western and central Netherlands, were recruited via the Netherlands Nutrition Centre and caterers. Inclusion criteria were (a) presence of a cafeteria, (b) willingness to create a healthier school canteen, and (c) willingness to provide time, space and consent for the researchers to collect data from students, employees and canteen workers. The exclusion criteria were (a) the school had already started to implement the Guidelines for Healthier Canteens, and (b) the school had already received personalized support on implementing a healthier canteen from a school canteen advisor from the Netherlands Nutrition Centre in 2015. In all participating schools, we recruited students per class. In each school, we recruited 100 second or third-year Dutch-speaking students (aged 13–15 years), equally distributed over the school's offered education levels. Parents and students received information about the study and the option to decline participation. Figure 1 shows the flow diagram of the inclusion of the schools and students.

**Figure 1.** The CONSORT flow diagram of the present study [23].

#### *2.3. Intervention*

The intervention consisted of the implementation plan to support schools in creating a healthier school canteen, as defined by the Guidelines for Healthier Canteens. This plan was developed in a 3-step approach based on the "Grol and Wensing Implementation of Change model" [24] in collaboration with stakeholders, as described elsewhere [21], and delivered by school canteen advisors of the Netherlands Nutrition Centre, in collaboration with researchers of the Vrije Universiteit Amsterdam.

The intervention started with gaining insight into the context and current situation of the school and the canteen. For this purpose, involved stakeholders (e.g., teacher, school management, caterer, canteen employee) filled out a questionnaire on the schools' characteristics (educational level, number of students) and their individual (e.g., knowledge, motivation) and environmental (e.g., need for support, the innovation) determinants. School canteen advisors also measured the extent to which canteens met the Guidelines for Healthier Canteens, using the online tool "the Canteen Scan" [25]. Based on these findings, school canteen advisors provided tailored advice in an advisory meeting where all involved stakeholders discussed aims and actions to achieve a healthier canteen. Stakeholders also received communication materials about the Guidelines for Healthier Canteens, including a brochure with examples of, and advice on, how to promote healthier products. All stakeholders of all intervention schools were invited to a closed Facebook community to share experiences, ask questions and to support each other. In addition, to remind and motivate stakeholders, a newsletter with information and examples was sent by email once every 6 weeks. Finally, to gain insight into their students' opinion, students were asked to fill in a questionnaire (the same as used for the effect evaluation), and the results were fed back to schools in an attractive fact sheet.

#### *2.4. Measurements*

Measurements in the school canteens and among students were performed before and directly after the intervention period. The "health level" of the school canteen was measured in all participating schools using the online Canteen Scan [25], filled out by a school canteen advisor. The tool has been evaluated satisfactorily on inter-rater reliability and criterium validity if measured by a school canteen advisor, scoring > 0.60 on Weighted Cohen's Kappa [22]. Only intervention schools received the results of the Canteen Scan as part of the intervention.

Students reported their purchases via an online questionnaire filled out in a classroom under supervision of a teacher and/or researcher. Data on demographics and behavioural and environmental determinants were also collected [26]. The questions were derived from validated Dutch questionnaires [27–31], and the questionnaire was pretested for comprehensibility and length in a comparable population using the cognitive interview method think-aloud [32].

#### 2.4.1. Health Level of the School Canteen

The Canteen Scan assessed the extent to which a canteen complies with the four subtopics of the Guidelines for Healthier Canteens: (1) a set of four basic conditions for all canteens, (2) the percentage of healthier foods and drinks available in the cafeteria (at the counter, display, racks) and (3) in vending machines and (4) the percentage of accessibility for healthier food and drink products [19,25]. According to these guidelines, a canteen is healthy if all basic conditions are fulfilled, if the percentage of healthier foods and drinks available is at least 60% in the cafeteria and in vending machines, if fruit or vegetables are offered, and if the percentage of fulfilled accessibility criteria is also at least 60%. As the basic conditions overlap with the availability and accessibility scores, this subtopic was not used in the analyses. For the other three subtopics, the change between pre- and post-measurement was calculated for each school.

In the Canteen Scan, all visible foods and drinks available in the cafeteria (counter, display, racks) and in vending machines were entered. The scan automatically identifies whether, according to the Dutch Wheel of Five Guidelines [30], an entered product is healthier or less healthy, and calculates the percentage of healthier products. In addition, to assess the accessibility for healthier foods and drinks, nine criteria (8 multiple choice, 1 multiple answer options) were answered, creating a score ranging from 0 to 90%. These questions relate to the attractive placement of healthier products in the cafeteria and vending machines; the offer at the cash desk; the offer at the route through the cafeteria; fruit and vegetables presented attractively; promotions for healthier products only; mostly healthier items at the menu/pricelist; and advertisements/visual materials only for healthier products. Questions

include, for example, "Are only healthier foods and drinks offered at the cash desk?" and "Are fruit and vegetables presented in an attractive manner?"

#### 2.4.2. Self-Reported Purchase Behaviour of Students

Purchase behaviour was measured by assessing the frequency of purchases per food group (sugary drinks, sugar free drinks, fruit, sweet snacks, etc.) over the previous week, for the cafeteria and the vending machines separately. If students stated that they had bought less than once per week, they answered the frequency of purchases in the last month. Students who did not buy anything at both time points were excluded (*n* = 192), as they do not provide information about the relation between the intervention and their purchases. Groups of foods and drinks were considered as healthier or less healthy, as defined by the Dutch Wheel of Five Guidelines [20]. All reported healthier purchases in the cafeteria and vending machines, respectively, were summed, as were the less healthy purchases. As the data were not normally distributed, we dichotomised the variable. Frequencies of the preand post-intervention survey were subtracted and categorized into the dichotomous variable indicating a healthy or unhealthy change in purchase behaviour. A healthy score was defined as (1) a higher increase in healthier products compared with less healthy products; (2) a higher decrease in less healthy products compared with healthier products; or (3) purchases remained stable over time and consisted mainly of healthier products. An unhealthy score was defined as (1) a higher increase in less healthy products compared with healthier products; (2) a higher decrease in healthier products compared with less healthy products; (3) purchases remained stable over time and consisted mainly of less healthy products or an equal number of healthier and less healthy products.

#### 2.4.3. Other Student Variables

Demographic student variables included age (in years), gender and current school level (vocational (i.e., VMBO), senior general education (i.e., HAVO) or pre-university education (i.e., VWO)). Determinants of purchase behaviour included attitudes, subjective norms, perceived behavioural control and intention, all towards buying healthier products at school. For each variable, multiple questions (range 2–5) were asked on a 5-point Likert scale (answers ranging from, e.g., 1 = very unlikely to 5 = very likely) derived from existing validated Dutch questionnaires [27,28]. The mean score of each variable was calculated and the reliability of the measurements was assessed with Cronbach's alpha [33]. The measured environmental determinants were having breakfast (Yes, No); amount of money spent on food/drink purchases at school per week (<€1, €1–2, ≥€2); external food/drink purchase behaviour (<1 times p/w, 1–3 times p/w, ≥4 times p/w); and foods/drinks brought from home (<4 times p/w, ≥4 times p/w).

#### *2.5. Sample Size*

The sample size was calculated based on the outcome purchase behaviour, an expected 10% drop out, 80% power and 5% significance level [34]. The calculation showed that 20 schools and 100 students per school were necessary to be able to detect a 10% difference in purchase behaviour of students (continuous variable), with the expected multi-level structure (students within schools, intra-class correlation of 0.05).

#### *2.6. Statistical Analyses*

Student baseline characteristics and pre- and post-intervention canteen outcomes and student purchase behaviour were described by means and standard deviations. Canteen outcomes included three subtopics of the health level of the canteen: healthier food and drinks available in the cafeteria, in the vending machines and accessibility of healthier food and drinks. Mean (SD) pre- and post-intervention values and mean changes were described and changes in the subtopics per school were presented in a chart.

A mixed logistic regression analysis [35] was performed to investigate the effect of the intervention (independent variable) on purchase behaviour (dependent variable). Correlated errors of student scores (level 1) nested within schools (level 2) were taken into account by including a random intercept for schools in all analyses (model 1). The analyses were stratified by gender, as boys seems to react more to environmental changes than girls [36]. Models were first extended with demographic variables (model 2), secondly with students' behavioural determinants (model 3) and thirdly with students' environmental determinants (model 4).

The effect of a healthier canteen (independent variable) on student purchase behaviour (dependent variable) was also assessed using mixed logistic regression analyses with a random intercept for schools for boys and girls separately. We used the health level of the canteen at follow-up for each of the three subtopics of a healthier canteen. Due to non-linearity with student purchase behaviour, again a dichotomous variable was created, based on the guidelines, which state that 60% or higher is a healthier availability and accessibility, respectively. Again, the model was extended with demographic variables (model 2) and students' behavioural (model 3) and environmental determinants (model 4). Statistical analyses were performed using the IBM SPSS Statistics version 24.0 (IBM corporation (IBM Nederland), Amsterdam, The Netherlands. Odds ratios and 95% confidence intervals (CI's) are presented.

#### **3. Results**

#### *3.1. Baseline Characteristics*

We included data from 645 students of the intervention schools and 731 students of the control schools in the analyses (Table 1). Both groups consisted of more girls than boys (56% and 53%, respectively). The included schools offered education at the vocational (*n* = 6) level, the senior general/pre-university level (*n* = 5), or a combination of both levels (*n* = 9). The level of education was broadly similar for intervention and control schools. However, in intervention schools, slightly more girls followed the vocational education level (46.6%) compared to boys (41.4%), while the opposite was the case in control schools (girls, 39.5%; boys 46.2%). Most students indicated that they did bring food and drinks from home to school four or more times a week (for food, intervention schools (IS) 91.8 and control schools (CS) 89.2%; for drinks, IS 90.4% and CS 88.5%). The majority of students reported that they bought foods or drinks in the school cafeteria (IS 55.5%; CS 64.4%) or vending machine (IS 63.6%; CS 61.1%) less than once per week. During school time, 62.2% and 67.6% of the students in the IS reported buying food or drinks outside school less than once a week, compared to 65.6% and 73.6% in the CS.


**Table 1.** Baseline characteristics of students divided by intervention or control school and gender.


**Table 1.** *Cont.*

<sup>a</sup> Per variable, multiple questions (range 2–5) were asked on a 5-point Likert scale (answers ranging from 1 = very unlikely to 5 = very likely). <sup>b</sup> This variable was not used as confounder in the multi-level analyses due to the similarity with the outcome variable purchase behaviour per week. <sup>c</sup> On this variable, the control group has 40 students less (19 boys, 21 girls) as one school did not have a vending machine.

#### *3.2. Intervention E*ff*ect on Health Level of the Canteen*

Table 2 shows that intervention schools (IS) scored higher in terms of the healthier offering in the cafeteria (77.2%), compared to control schools (CS) (60.1%) after the intervention. Figure 2 confirms this and shows that nine of the ten IS increased the healthier offering (range of all IS: −3 to 57%, mean change 31.4%). In comparison, eight of the ten CS showed positive changes but the change (range of all CS: −9 to 46%, mean change 9.7%) was smaller compared to the IS. The healthier offering in vending machines increased in five of the ten IS (range of all IS: −15 to 33%, mean change 5.1%) and in three of the nine CS (range al all CS: −14 to 48%, mean change 5.3%) (Figure 3), although, on average, both groups made broadly similar changes in their offer (Table 2). With regard to the accessibility criteria, both groups showed overall increases, although two CS also showed decreases (Figure 4). The change in IS was higher compared to CS (range of all IS: 0 to 50%, mean change 15%; range of all CS −30 to 20%, mean change 7%), resulting in mean scores of 59% (IS) and 50% (CS) fulfilled accessibility criteria after the intervention.

−

−


**Table 2.** Subscores of a healthier canteen pre- and post-intervention, stratified by intervention and control schools.

<sup>a</sup> Mean score (SD). <sup>b</sup> Scores in percentage (0–100%). <sup>c</sup> One control school did not have a vending machine (*N* = 9, in control schools). <sup>d</sup> Nine criteria could be fulfilled, scoring 10% per criteria (0–90%).

(**a**) Control Schools (**b**) Intervention Schools

**Figure 2.** Histogram of the changes in healthier products available in the cafeteria.

### (**a**) Control Schools (**b**) Interventions Schools

**Figure 4.** Histogram of the changes in fulfilled accessibility criteria.

#### *3.3. Purchases in the Cafeteria*

Data on self-reported purchase behaviour at the cafeteria were included in the analysis from 1213 students (548 boys, 665 girls) (Table 3). Mean purchases of all foods and drinks per week varied between 0.46 and 1.72 per person. Both boys and girls bought more "less healthy" than healthier products. With regard to changes in weekly purchases in the cafeteria after 6 months, 50% of the boys of the IS maintained or changed to healthier purchase behaviour (Table 3). In boys of the CS, this percentage was 51.5%. Among girls, 53.6% maintained or changed to a healthier purchase behaviour in the IS, compared to 46.5% in the CS.


**Table 3.** Weekly food and drink purchases in the cafeteria.

<sup>a</sup> From each student, the difference between T0 and T1 has been calculated. Equal or bigger change in healthier products compared to less healthy products has been defined as a healthy score.

#### *3.4. Purchases at the Vending Machines*

Data on self-reported purchase behaviour at vending machines were available for 1217 students (542 boys, 675 girls) (Table 4). In the IS, the boys and girls, respectively, bought on average 0.79 and 1.48 healthier, and 0.88 and 1.40 less healthy products per week in vending machines after the intervention. Boys and girls in the CS bought on average 1.13 and 0.87 healthier, and 1.40 and 0.83 less healthy products per week in vending machines after the intervention, respectively. After 6 months, in both the IS and CS, half of the boys maintained or changed to a healthier purchase behaviour (both 49.3%). Among girls, approximately half of the girls in the IS (47.3%) and CS (52.0%) maintained or changed to a healthier purchase behaviour after 6 months.

**Table 4.** Weekly food and drink purchases at the vending machine.


<sup>a</sup> From each student, the difference between T0 and T1 has been calculated. Equal or bigger change in healthier products compared to less healthy products has been defined as a healthy score.

#### *3.5. Purchase Behaviour Analysed by Mixed Logistic Regression Analyses*

The results of the performed mixed logistic regression analyses showed that the odds for a healthier purchase behaviour compared to less healthy purchase behaviour is approximately equal for students in the intervention and control schools (Table 5). In boys, we found odds ratios of 0.92 (95%CI 0.62; 1.36) for cafeteria purchases and 1.02 (95%CI 0.62; 1.67) for vending machine purchases. Girls showed an odds ratio of 1.29 (95%CI 0.85; 1.96) for the cafeteria and 0.84 (95%CI 0.62; 1.14) in vending machines purchases. Adjustment for demographic (model 2), behavioural (model 3) and environmental variables (model 4) did not materially change the results.

**Table 5.** Mixed logistic regression analyses on the effect of the intervention (ref. group is control group) on changes in purchase behaviour.


<sup>a</sup> Dichotomous outcome: healthier vs. less healthy changes in purchases over time. <sup>b</sup> Model 1 = mixed logistic regression analysis, corrected for school. <sup>c</sup> Model 2 = Model 1, plus corrected for demographic variables (age, education). <sup>d</sup> Model 3 = Model 2, plus corrected for behavioural determinants (attitude, subjective norm, perceived behavioural control, intention); <sup>e</sup> Model 4 = Model 3, plus corrected for environmental determinants (amount of money spent in school p/w, breakfast, food purchases outside school, drink purchases outside school, food brought from home, drinks brought from home).

The analyses to the effect of a healthier canteen (healthier versus less healthy (ref. group) availability in the cafeteria, vending machine or accessibility) on purchase behaviour showed OR's ranging from 0.87 (95%CI 0.61–1.26) for combined purchases in girls, to 1.27 (95%CI 0.75–2.17) for purchases in vending machines in boys (Table 6). Adjustment for demographic (model 2), behavioural (model 3) and environmental variables (model 4) again did not materially change the results.


**Table 6.** Mixed logistic regression analyses on the effect of a healthier canteen (ref. group not healthy) on changes in purchase behaviour.

<sup>a</sup> Dichotomous outcome: healthier vs. less healthy changes in purchases over time. <sup>b</sup> Healthier canteen, measured with the subtopic healthier products available in cafeteria (≥60%, <60% (ref. group)). <sup>c</sup> Healthier canteen, measured with the subtopic healthier products available at vending machines (≥60%, <60% (ref. group)). <sup>d</sup> Healthier canteen, measured with the subtopic fulfilled healthier accessibility criteria (≥60%, <60% (ref. group)). <sup>e</sup> Model 1 = mixed logistic regression analysis, corrected for school. <sup>f</sup> Model 2 = Model 1, plus corrected for demographic variables (age, education). <sup>g</sup> Model 3 = Model 2, plus corrected for behavioural determinants (attitude, subjective norm, perceived behavioural control, intention); <sup>h</sup> Model 4 = Model 3, plus corrected for environmental determinants (amount of money spent in school p/w, breakfast, food purchases outside school, drink purchases outside school, food brought from home, drinks brought from home).

#### **4. Discussion**

We investigated the effect of support in implementing the "Guidelines for Healthier Canteens" on changes in the school canteen (cafeteria and vending machine) and on food and drink purchases of students. Our results show that the support has led to actual changes in the availability and accessibility of healthier products in the canteen. We did not observe changes in students' purchase behaviour. The large majority of the students (90%) reported that they usually bring food or drinks from home. Most (approximately 80%) students reported buying food or drinks in school only once a week or less.

Schools that received support showed a larger increase in the availability of healthier products in the cafeteria compared to control schools. The intervention schools also complied with more criteria for the accessibility of healthier products than the control schools. These results are in line with previous studies which also showed that implementation support is likely to increase the use of guidelines, especially if it consists of multiple components and is both practice and theory-based [24,37]. The support we offered was targeted at different stakeholder-identified impeding factors related to implementation of the guidelines, such as knowledge and motivation. The process evaluation already showed that our implementation plan favourably influenced these factors [38].

With regard to vending machines, changes were smaller and present in fewer schools compared to changes in the cafeteria. This result may be explained by the fact that schools do not always own nor regulate the content of the vending machines themselves, but outsource them to external parties such as caterers or vending machine companies. Some schools were therefore unable to change the offering and position of products in the machine within the study period. Previous research showed that vending machines were healthier if appointments about the healthy offer were included in agreements with caterers or vending machine companies [39]. Making agreements about the availability and accessibility of healthy products in the machines is therefore recommended.

In contrast to the changes in the canteen, we did not observe relevant differences in change of healthier purchases between students in intervention and control schools, nor between students from schools with a healthier canteen compared to students from schools with a less healthy canteen. An explanation for these results might be that the duration of the intervention was between four to six months, which proved to be short for the schools to make changes, as we noticed that in most canteens changes were made just before the post-measurements. As a result, students did not have enough time to get used to the new situation and to adapt their purchases. The effects of a healthier canteen on students' purchases remain therefore unknown. Our results are in contrast with many other studies that show that increasing the offering of healthier products and changes in placement and promotion in favour of healthier products are likely to lead to healthier food choices among customers [4,40–43]. However, reviews identified that investigations yielded contradictory results [44], and they emphasize the low quality of the studies [43], making more research needed.

Changing dietary behaviour is complex and affected by multiple individual, social and environmental factors [45–47]—for example, the palatability, price and convenience of foods offered in environments that youth visit regularly, including the school canteen and shops around schools [13,45,48]. During adolescence, many factors that influence youth's dietary choices are changing: they become more independent, parental influence decreases and influence of peers increases, living environments expand, and they have more money to spend [49,50]. These changes provide opportunities to develop healthy dietary habits which are likely to sustain over time [51]. Even though our study did not show a relation between a healthier canteen and healthier purchase behaviour, we would recommend that healthier food choices should be facilitated in school canteens, including vending machines, a place that students visit regularly and where students can autonomously choose what they buy. This might influence student purchase behaviour directly at the school canteen or in shops around schools, and foresees in educating adolescents on healthy norms [52]. This enables all youth to experience that healthy eating is important, tasty and very common, which they can use throughout their life.

A strength of our study is that the support consisted of multiple implementation tools which stakeholders could decide to use, as well as when and how. Moreover, our study included tailored advice. Previous research has shown that both a combination of components and tailored advice could increase the likelihood of an effective implementation plan [37,53]. Other strengths of our study are the measurement of outcomes both on the canteen and student level and the separate analyses for boys and girls. In general, boys are more likely to make impulsive, intuitive changes [41]. In contrast, girls are more likely to overthink their choices, limiting the effect of an attractive food offering. In our study, subtle differences across gender were observed, with boys indicating buying food and drinks outside the school more often. However, this finding should be further explored in future studies.

There are also some study limitations that should be mentioned. First, the use of self-reported questionnaires to investigate purchase behaviour. These measurements are potentially subject to reporting bias and socially desirable answers, likely leading to smaller number of reported purchases overall and larger number of reported healthier products. Possibilities to measure the dietary behaviour of student more objectively and regularly include, for example, the use of meal observations, sales data or Ecological Momentary Assessment (EMA) [54,55]. We could not use these options due to feasibility constraints, e.g., making use of sales data was not possible as due to different registration systems. Another limitation is the study duration, which was four to six months. A study duration of at least one school year will align to the schools' daily practice and will give schools the opportunity to create a team of involved people, to embed actions and to make changes.

The fact that the intervention was individualized to the contextual factors and needs of each school is both a strength and limitation. Alignment of the advices to a school's situation might lead to a more useful support but can also make it more difficult to compare results between different intervention schools. Therefore, it is important to (1) describe the core intervention functions of each tool of the implementation plan to be able to support schools with the same support and (2) to measure if the tools has been delivered and used as planned [12,56,57]. In our case, the core elements of the intervention have been described in the study design [34]. In addition to the effect evaluation, we also evaluated the quality of implementation to assess whether schools received each implementation tool [38].

A final limitation includes the fact that, due to the skewness of our purchase data and the non-linearity of some of the relations under study, we decided to dichotomize our data. This negatively influenced the power, and led to some loss of information.

Based on our results, we recommend that future studies investigate the sustainability of supportive implementation of food environment policy. In addition, we recommend longer-term studies that assess changes in students' purchases inside, and in shops around, school, that appear after an adaptation period.

Our results confirm that adolescents in the Netherlands bring most food and drinks from home and additionally buy their food inside as well as outside school. Attention to the home environment and the environment around school is therefore needed. The complexity of the food environment at schools within this broader food environment makes the use of whole system-based approaches important [13,46]. Different relevant stakeholders such as parents, shopkeepers, and local policy makers should be actively involved in this approach. Moreover, a healthy school environment not only consists of a healthy canteen, including vending machines, but also includes food education, integration with other health promotion school policies [58]. This is important, as schools contribute to the personal development of youth, wherein learning about making choices with regard to a healthy lifestyle in an obesogenic environment is an essential part.

#### **5. Conclusions**

This study investigated the changes in Dutch school canteens and self-reported student purchase behaviour after support to implement the Guidelines for Healthier Canteens compared to no support. We conclude that such support appears to contribute to healthier canteens. Our results did not show an effect of the implementation on healthier students' purchase behaviour, perhaps due to the short time between the changes made in the canteen and our follow-up measurements. Due to the fact that this study was performed in collaboration with the Netherlands Nutrition Centre and involved stakeholders, our research results are likely to lead to implementation in daily practice. More system-based approaches are warranted to be able to influence students' dietary behaviour. Additionally, long-term research to investigate the effects of healthier school canteens are needed.

**Author Contributions:** C.M.R., E.L.V. and J.C.S. designed the research. I.J.E. conducted the research, supported by S.M.J. and L.V. I.J.E. performed the data analysis, supported by M.R.d.B. I.J.E. drafted the manuscript, and all other authors helped refine the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Netherlands Organisation for Health Research and Development [ZonMw, Grant Number 50-53100-98-043].

**Acknowledgments:** We thank all schools, coordinators, school canteen advisors, students and other involved stakeholders who participated in this study. We also thank Renate van Zoonen and our Health Sciences students (Tamara Coppenhagen, Samantha Holt, Katelyn Sadee and Andrea Thoonsen) who supported us in the gathering of data.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

## **Do Parent–Child Dyads with Excessive Body Mass Di**ff**er from Dyads with Normal Body Mass in Perceptions of Obesogenic Environment?**

**Karolina Zarychta 1,\* , Anna Banik <sup>1</sup> , Ewa Kulis <sup>1</sup> , Monika Boberska <sup>1</sup> , Theda Radtke <sup>2</sup> , Carina K. Y. Chan <sup>3</sup> , Karolina Lobczowska <sup>1</sup> and Aleksandra Luszczynska 1,4,\***


Received: 15 June 2020; Accepted: 17 July 2020; Published: 19 July 2020

**Abstract:** Background: This study addressed differences between parent–child dyads with excessive body mass (overweight or obesity) and dyads with normal body mass in obesity determinants, derived from social-ecological models. It was hypothesized that parents and their 5–11 years-old children with excessive body mass would (1) report lower availability of healthy food at home, (2) perceive fewer school/local community healthy eating promotion programs, (3) report lower persuasive value of food advertising. Methods: Data were collected twice (T1, baseline; T2, 10-month follow-up), including *n* = 129 parent–child dyads with excessive body mass and *n* = 377 parent–child dyads with normal body mass. Self-reported data were collected from parents and children; with body weight and height assessed objectively. General linear models (including analysis of variance with repeated measures) were performed to test the hypotheses. Results: Compared to dyads with normal body mass, dyads of parents and children with excessive body mass perceived lower availability of healthy food at home and fewer healthy eating promotion programs at school/local community (T1 and T2). These effects remained significant after controlling for sociodemographic variables. No significant differences in persuasive value of food advertising were found. Conclusions: Perceptions of availability of healthy food at home and healthy nutrition promotion may be relatively low in parent–child dyads with excessive weight which, in turn, may constitute a risk factor for maintenance of obesity.

**Keywords:** childhood obesity; parent–child dyads; food availability; advertising; healthy diet; promotion programs

#### **1. Introduction**

The prevalence of overweight and obesity among children has doubled in recent decades, both in developed and developing countries [1,2]. Obesity is often considered as a result of an exposure of children to an unhealthy environment (also called obesogenic environment) and children's perceptions and responses to it [2,3]. The role of the obesogenic environment and the ways it is perceived are highlighted in several theoretical approaches explaining childhood obesity. For example, according to the ecological model of predictors of childhood obesity [4], characteristics of at-home-environment (e.g., types of food available at home), and out-of-home environment (e.g., community, demographic and societal characteristics, food policies at school or local community, policies regulating food advertising to children, etc.) represent the facets of a broader context, interacting with each other in the development and maintenance of childhood overweight/obesity.

Availability of various types of food at home is often considered a key determinant of children's nutrition behaviors [4,5]. In turn, unhealthy nutrition (diet low in fruit and vegetable intake, and high in energy-dense food intake) is significantly associated with excessive body mass [6]. Systematic reviews of environmental correlates of obesity-related behaviors in children showed that the availability of healthy food at home was associated with higher children's fruit and vegetable intake [7,8]. On the other hand, home availability of sugar-sweetened beverages was associated with a higher intake of these products by 8- to 13-year-olds [9], and intake of sweet and savory snacks among 12- to 13-year-old girls [10]. Lower perceived at-home availability of snacks and sweetened beverages was directly associated with lower intake of respective food among 10- to 11-year-olds [11]. Most of the studies, however, accounted only for children's perceptions of home food availability and did not consider parental perceptions. Parental perceptions may operate together with children's perceptions of availability, as parents are the key food gatekeepers at home. Furthermore, it is unclear whether children's and parental perceptions of availability of healthy food differ depending on body mass status of parent and child (normal body mass versus excessive body mass, i.e., overweight or obesity).

It is unclear if parents and their children with normal differ from those with excessive body mass in terms of their perceptions of availability of healthy food at home between. A cross-sectional study comparing 35 families with parents and children with excessive body mass with 47 families with normal body mass indicated that lower vegetable availability (rated by an independent observer) was associated with obesity issues [12]. This study, however, does not clarify how availability was perceived by parents and children. Determining the levels of parental and child perceptions of food availability at homes of families with overweight parents and children may be of practical relevance. Identifying if families differ in perceptions of at-home and out-of-home environment (depending on body mass status of family members) would allow designing more effective obesity prevention programs, targeting the general population, and family treatment programs for parents and children with excessive body mass [13].

Children's healthy nutrition and favorable changes in body mass are also shaped by perceptions of out-of-home environment, such as school and local community promotion of healthy eating which, in turn, may influence both parents' and children's behaviors and cognitions related to healthy food intake [4]. The World Health Organization [2] has recommended comprehensive programs promoting the intake of healthy food and a reduction of unhealthy food intake in schools as the key environmental strategies to address childhood obesity. An analysis of the effectiveness of 124 nutrition and physical activity programs indicated that the programs accounting for three settings (community, school, and home) were the most effective in terms of childhood obesity prevention [14]. The target population's awareness of out-of-home programs promoting healthy nutrition may be a condition for the successful implementation of such programs and their effectiveness [15]. Previous dyadic research has found out that parental perceptions of school and community-based physical-activity promotion programs are related to lower body mass in children [16]. It is unclear, however, whether perceptions of availability of nutrition programs may differ among parents and children with excessive body mass versus normal body mass.

Previous research investigating children's perception of healthy food environment indicated that those who are 5–9 years old perceive their parents and mass media as the primary source of nutrition information [17]. Thus, at-home availability of healthy food and perceptions of food advertising have been investigated in children as young as 5–9 years old [17,18]. Although teachers are reported by children as the source of information on healthy food, qualitative research did not elicit perceptions of programs at local community or at the school setting as relevant sources of information about health or healthy diet among young children [17]. Therefore, an adequate approach to investigate perceptions of

5–11 years old children may be to focus on at-home availability of food or perceptions of advertising, instead of testing young children's perceptions of a broader environment (e.g., local community).

In parallel to perceptions of food availability at home and availability of nutrition programs at the local community, perceptions of advertising have been shown to determine children's nutrition behaviors [19,20]. Food marketing practices are considered an environmental factor that can affect adults' and children's beliefs, attitudes, and knowledge about healthy eating, and their body mass [21]. Children's food decisions are made in an environment where food is extensively advertised to stimulate consumption at home, and where respective types of food are perceived as easily available [22,23]. Compared to children with normal body mass, 4–11-year-olds with excessive body mass had a higher recognition of energy-dense food advertisements [24] or food advertisements in general [25]. On the other hand, research suggested that children who are obese may know less about the persuasive value of food advertising [20]. Parents and their perceptions of advertising may play a role in modifying the impact of food advertising on children, e.g., through explaining the nature and selling intent of advertising [26,27]. To date, research has not clarified whether parental and child perceptions of advertising (e.g., its persuasiveness) of food may differ between families with parent and child with excessive body mass, compared to those with normal body mass.

This study investigated the differences between parent–child dyads with excessive body mass and parent–child dyads with normal body mass in terms of: perceptions of at-home environment (availability of healthy food at home) and out-of-home environment (perceptions of school and local community promotion of healthy eating, perceptions of advertising in terms of its persuasiveness). In particular, it was hypothesized that, compared to parents and children from dyads with normal body mass, parents and their 5–11-year-old children from dyads with excessive body mass would (1) report lower availability of healthy food at their homes, (2) perceive fewer healthy eating promotion programs at schools and local community, (3) report lower persuasive value of food advertising.

Moreover, we explored a 10-month stability of differences in perceptions of at-home and out-of-home environment, testing if any changes over time would occur in parent–child dyads with normal body mass and dyads with excessive body mass. During middle childhood (5–11 years old), children's perceptions of healthy food environment are influenced by their age and the developmental stage [28] and, in consequence, these perceptions may change over one year. Thus, it was investigated whether children's perceptions of at-home environment and perceptions of food advertising would change over a 10-month period. Finally, to account for the potential confounding effects of parental education, parental perceived economic status, and the location of the residence [29], the hypothesized effects were controlled for possible sociodemographic covariates.

#### **2. Materials and Methods**

#### *2.1. Participants*

Parents (98.6%) or legal guardians (1.4%; henceforth called "parents") that were the main caregivers in terms of preparing food and time spent with a child were included in the study as well as their 5–11-year-old children. The initially recruited sample included 924 dyads (1848 individuals) consisting of parents and their 5–11-year-old children participating in the measurement at Time 1 (T1, baseline), and 571 dyads (1142 individuals) at Time 2 (T2, 10-month follow-up). Data were collected as a part of a larger study testing parental and child psychosocial determinants of body mass [30,31].

At T1, the majority of parents (*n* = 547, 59.2%) from the initially recruited sample had normal body mass, *n* = 355 (38.4%) had excessive body mass, and *n* = 22 (2.4%) had underweight. Among children, *n* = 617 (66.8%) had normal body mass, *n* = 222 (24.0%) had excessive body mass, and *n* = 85 (9.2%) had underweight after adjusting for age and gender in relation to International Obesity Task Force cut-off points [32]. All participants were Caucasian (as 98% of Poland's population [33]).

Dyads in which either parent or child had underweight (*n* = 126 dyads) were excluded from further analyses, as the factors underlying underweight were not investigated in this study. The remaining

sample was divided into the subgroups of parents and children recruited form dyads with a specific body mass composition (e.g., both parent and child with excessive body mass). The dyads with the mixed body mass composition (e.g., consisting of a parent with obesity and a child with normal body mass) were included in additional analyses only (see Appendix A). The mixed body mass composition dyads included *n* = 193 dyads with parents with excessive body mass and children with normal body mass as well as *n* = 88 dyads with parents with normal body mass and children with excessive body mass.

The main analyzed sample consisted of *N* = 506 parent–child dyads (1012 individuals), including *n* = 129 dyads with parent and child who both had excessive body mass and *n* = 377 dyads with parent and child who both had normal body mass. In this study, we use the term 'dyads', to highlight the specificity of the subgroup (dyads were not treated as the unit of analysis).

Demographic characteristics of the main analyzed sample (*N* = 506 dyads) and both subsamples (dyads with normal body mass, dyads with excessive body mass), as well as the differences between the subsamples are presented in Table A1.

#### *2.2. Procedure*

The convenience sample was recruited in 26 locations in six administrative regions of Poland representing three levels of the mean household income (the average, below the average, above the average [33]). Data from parents and children were collected at schools, in general practitioners' offices, or at participants' homes. In cases where a school was the location of data collection, dyads with children attending classes in the respective school (but also dyads with children attending other schools but living in the local community) were invited and recruited. In cases of dyads recruited via general practitioners' offices, children attended various schools in the respective city/town.

Study personnel informed participants about the research aims and procedure. Parents provided informed consent (with respect to their own and their child's participation) and the child gave assent to participate in the study. Afterward, de-identified codes were assigned to participants to secure their anonymity across the measurement points. Younger children (aged 5–8) were interviewed using a structured interview while older children (aged 9–11) completed a questionnaire. Parents completed the questionnaires separately from children (e.g., in a different room). Participants' body mass and height were measured with certified scales and rods at both T1 and T2.

At both T1 and T2 (10 months later), parents provided their data referring to their perceptions of at-home environment (perceptions of availability of healthy food at home) and out-of-home environment (perceptions of school and local community promotion of healthy eating, perceptions of advertising in terms of its persuasiveness). Children provided their data with reference to perceptions of availability of healthy food at home, and perceptions of food advertising at both T1 and T2. During the follow-up, study personnel revisited the study sites after contacting parents by phone. The attrition occurred due to parental decisions to change the school/general practitioner or parental or children's decisions to discontinue their participation at T2.

The study was approved by the Internal Review Board at SWPS University of Social Sciences and Humanities, Wroclaw, Poland. All procedures were in accordance with the ethical standards of the institutional research ethics committee and in line with the 1964 Helsinki declaration and its later amendments.

#### *2.3. Materials*

Variables measured in both members of the dyad were assessed with the same measures [34]. The feasibility of item-wording for children was tested in a pilot study with *n* = 18 children (aged 5–11 years old) and found to be satisfactory.

#### 2.3.1. Parental and Child Perceptions of Availability of Healthy Food at Home (T1 and T2)

Parental and child perceptions of availability of healthy food at home were measured by four items, each based on Comprehensive Feeding Practices Questionnaire (CFPQ [35]), e.g., "Most of the food I keep in the house is healthy"). Participants were provided with a definition of healthy food, indicating that healthy meals include a lot of raw fruit and vegetable but limited amounts of products with added sugar or salt (e.g., limited amount of salty or sweet snacks) and a limited amount highly processed products (e.g., sausage, cheese). The responses ranged from 1 (*definitely not*) to 4 (*definitely yes*). Higher scores represent higher levels of parental or child perception of availability of healthy food at home. The mean item score for parents was *M* = 3.05, *SD* = 0.40, α = 0.54 at T1 and *M* = 3.07, *SD* = 0.32, α = 0.56 at T2; for children it was *M* = 2.84, *SD* = 0.44, α = 0.56 at T1 and *M* = 3.07, *SD* = 0.32, α = 0.58 at T2. Although the reliability coefficients are relatively low, they may be considered acceptable considering the scales had only 4 items [36].

#### 2.3.2. Parental Perceptions of School and Local Community Promotion of Healthy Eating (T1 and T2)

Parental perceptions of school and local community promotion of healthy eating was measured with two items based on Stok et al. [37]: "At school my child draws attention to the issues of healthy revival" and "A lot of things are being done to help me and my child to eat more healthily". The responses ranged from 1 (*definitely not*) to 4 (*definitely yes*). Higher scores represent a higher level of parental perceptions of school and local community promotion of healthy eating. The mean item score was *M* = 2.80, *SD* = 0.67, *r*<sup>s</sup> = 0.58 at T1 and *M* = 2.81, *SD* = 0.53, *r<sup>s</sup>* = 0.51 at T2.

#### 2.3.3. Parental and Child Perceptions of Food Advertising (T1 and T2)

Parental and child perceptions of food advertising (its persuasive value) were measured with one item each based on Food Advertising Questionnaire [38], e.g., "Advertising makes food products seem better than they really are". The responses ranged from 1 (*definitely not*) to 4 (*definitely yes*). The higher scores represent the higher levels of parental or child knowledge of persuasive value of food advertising. The item score for parents was *M* = 2.42, *SD* = 0.98 at T1 and *M* = 2.51, *SD* = 1.06 at T2; for children it was *M* = 2.57, *SD* = 0.80 at T1, and *M* = 2.51, *SD* = 0.80 at T2.

#### 2.3.4. Body Weight and Height (T1)

Child and parental body weight and height were assessed with standard medically approved telescopic height measuring rods and floor scales (scale type: BF-100 or BF-25; Beurer, Germany, measurement error <5%). For children, age and gender specific BMI z-score values were calculated with WHO AnthroPlus macro [39]. For parents, BMI was calculated using body weight and height: BMI = weight (kg)/height<sup>2</sup> (m<sup>2</sup> ).

#### 2.3.5. Sociodemographic Variables (T1)

Parental education was measured with a 5-point scale, ranging from 1 to 5 (primary, uncompleted secondary/vocational, secondary, ≤3 years of higher education, ≥4 years of higher education). Higher scores indicate higher education. Perceived economic status was assessed with one item, "Compared to the average economic situation of the family in the country, how would you rate the economic situation of your family", with responses ranging from 1 (*much below the average*) to 5 (*much above the average*). Higher scores indicate a higher economic status. The size of the place of residence was assessed with one question, "What is the number of inhabitants in the city/town/village where your family lives" with 4-item response scale (<10,000 inhabitants; between 10,000 and 100,000 inhabitants; between 100,000 and 500,000 inhabitants; >500,000 inhabitants). Higher scores indicate a larger population living in the place of residence.

#### *2.4. Data Analysis*

Assuming effect sizes of *f* = 0.15, power of 0.95, Type I error rate of 0.05, the sample size was estimated with G\*Power calculator [40]. The estimation indicated that at least 120 dyads per a subsample should be recruited, if the analyses would be conducted accounting for potential covariates. Results yielding a *p*-value of 0.05 were considered to be statistically significant. Missing data were accounted for by using the full information maximum likelihood procedure performed in IBM AMOS 25 [41]. All analyses were conducted with SPSS version 25. Analyses of variance were performed to test the differences in parental and/or child perceptions of at-home (perceptions of availability of healthy food at home) and out-of-home environment (perceptions of school and local community promotion of healthy eating, perceptions of food advertising) between parent–child dyads with excessive body mass and dyads with normal body mass. General linear models with repeated measures were performed to test: (1) the time effects on perceptions of at-home and out-of-home environment measured at T1 and T2 in parent–child dyads with excessive body mass vs. dyads with normal body mass, as well as (2) the interaction effects of time and the type of subsample (excessive body mass vs. normal body mass dyads). Sensitivity analyses were conducted to test the robustness of findings [42] and to identify if the patterns of effects are similar when accounting for the effects of control variables (the parental education level, the parental perceived economic status, and the size of the place of residence).

#### **3. Results**

#### *3.1. Preliminary Analysis*

The differences between parents who participated at both T1 and T2 measurements and those who dropped out were not statistically significant in terms of perceptions of availability of healthy food at home, perceptions of school and local community promotion of healthy eating, perceptions of food advertising, age, BMI, all *Fs* < 2.32, *ps* > 0.129, or gender, χ 2 (1) = 2.37, *p* = 0.306. The differences between children who participated at both T1 and T2 measurements and those who dropped out were not statistically significant in terms of perceptions of availability of healthy food at home, perceptions of food advertising, age, or BMI, all *Fs* < 2.36, *ps* > 0.137. However, dyads with boys tended to drop out more often than dyads with girls, χ 2 (1) = 3.26, *p* = 0.072.

Parents from dyads with excessive body mass differed from parents from dyads with normal body mass in terms of gender, χ 2 (1) = 12.83, *p* = 0.002, education level, and economic status, all *Fs* > 4.41, *ps* < 0.036. Parents in dyads with excessive body mass were more often men, reported a lower level of education, and a lower perceived economic status than parents in dyads with normal body mass. The differences between two types of dyads were not statistically significant in terms of parental and child age, children's gender, or the size of the residence place. For details see Table A1.

Bivariate correlations between the study variables obtained for the main analyzed sample of *N* = 506 dyads (*N* = 1,012 individuals) are presented in Table A2. At both T1 and T2, healthy food availability and advertisement perceptions reported by parents were positively associated with children's perceptions of healthy food availability and perceptions of persuasiveness of advertisement. A higher level of parental education was related to higher availability of healthy food reported by children (T1 and T2). A higher level of parental perceived economic status (T1) was positively associated with healthy food availability, reported by parents and children (T1 and T2), and negatively with parental and children's BMI (T1 and T2).

#### 3.1.1. Differences between Parent–Child Dyads with Excessive and Normal Body Mass: Perceptions of At-Home and Out-of-Home Environment

Compared to parents from dyads with normal body mass, parents from dyads with excessive body mass reported lower availability of healthy food at their homes (T1 and T2) and fewer school and local community promotion of healthy eating (T1 and T2). There were no statistically significant differences between parents from dyads with normal body mass and dyads with excessive body mass in terms of perceptions of persuasiveness of food advertisement (T1 and T2). The respective findings are reported in Table 1.

**Table 1.** Differences in at-home and out-of-home environment: Comparisons of dyads of parents and children with excessive body mass (*n* = 129) and dyads of parents and children with normal body mass (*n* = 377).


\*\*\* *p* < 0.001; \* *p* < 0.05; † *p* < 0.10; P = parent; Ch = child; T1 = Time 1 (baseline); T2 = Time 2 (10-month follow-up); for all analyses *df* = 1, 504; Advertisement perception = perceptions of persuasiveness of food advertising; Local promotion = perceptions of school and local community promotion of healthy nutrition; Healthy food availability = perceptions of availability of heathy food at home. Covariates included: the parental education level, parental perceived economic status, and size of the place of residence. Significant differences (with both significant *p*-levels and significant 95% CI for Cohen's *d*) are marked in bold.

Compared to children from dyads with normal body mass, children from dyads with overweight/obesity reported lower availability of healthy food at their homes at T1 and T2 (see Table 1). However, there was no statistically significant difference between children from the two types of dyads in terms of perceptions of out-of-home environment (perceptions of persuasiveness of food advertisement at T1 and T2).

The same pattern of differences was found in sensitivity analysis, testing differences in parental and/or child perceptions of at-home (perceptions of availability of healthy food at home) and out-of-home environment (school and local community promotion of healthy eating, perceptions of persuasiveness of food advertising). In sensitivity analyses, parent–child dyads with normal body mass and dyads with excessive body mass were compared when controlling for parental education level, parental perceived economic status, and the size of the place of residence (see Table 1).

The results of additional analyses, comparing the four types of dyads (i.e., the dyads with normal body mass, dyads with excessive body mass, and the two types dyads with mixed body mass composition) are presented in Appendix A. The additional analyses showed only two significant differences, both referring to parental perceptions. Parents from dyads consisting of a parent with excessive body mass and a child with normal body mass reported lower healthy food availability (T1, T2), compared to parents from dyads with normal body mass.

3.1.2. Changes over Time in Perceptions of At-Home and Out-of-Home Environment among Parent–Child Dyads with Normal and Excessive Body Mass

Regarding the changes over the 10-month period, parental and children's perceptions remained stable over time. Furthermore, all Time x Group interactions were not significant, neither when tested without nor with control variables such as parental education, parental perceived economic status, or the place of residence (see Table A3). These findings suggest that the gap in perceptions of healthy nutrition options in at-home and out-of-home environment did not decrease over time, with families with excessive body mass perceiving a relatively low availability of healthy food and fewer school and local community promotion of healthy eating, controlling for confounding effects of socio-economic variables.

#### **4. Discussion**

This study examined the differences between parent–child dyads with excessive body mass versus normal body mass in terms of their perceptions of healthy food-promoting environment. The findings support the assumption that perceptions of factors related to at-home environment and out-of-home environment differ, depending on the body mass status [4]. In particular, parents and children from dyads with excessive body mass perceived lower availability of healthy food at home than parents and children from dyads with normal body mass status. Additionally, parents with excessive body mass status reported lower levels of school and local community promotion of healthy eating, compared to parents from dyads with normal body mass. These differences remained significant after controlling for the level of parental education and economic status, and the size of the place of residence. There were no statistically significant differences between parents and children from the two types of dyads (with excessive body mass versus with normal body mass) in terms of perceptions of persuasiveness of food advertisement.

The findings showing differences in perceived home availability of healthy food products are partially in line with the existing evidence [12]. Previous studies, however, used the ratings of external observers to assess availability of fresh vegetable in households [12]. Our study adds to these findings [12], clarifying that healthy food availability at home is observed differently in families with parents and children with overweight/obesity, compared to families with children and parents with normal body mass. Thus, dyads with excessive body mass are at risk of further body mass increase, due to perceptions of low availability of healthy food. As well documented in previous research, low perceived availability of healthy food may be a trigger for unhealthy nutrition habits [8], that in turn determine a further increase of body mass [13].

The present study also showed that parents from dyads with overweight/obesity perceived lower availability of community and school-based healthy nutrition programs. Previous longitudinal research showed that if parents perceive limited promotion of physical activity in local community or schools, then their overweight children gain even more weight [16]. Therefore, families with children and parents with excessive body mass, in which parents report low levels of community and school healthy nutrition programs, may be at risk of a further increase of body mass in children.

The results did not confirm statistically significant differences between the two types of parent–child dyads (with excessive body mass versus with normal body mass) in terms of perceptions of food advertisement. Previous research suggested that children who are obese know less about persuasive value of food advertising [20], yet the number of studies addressing this issue is limited. A lack of statistically significant differences in the present study was observed even when controlling for age, which is among key determinants of child food advertising knowledge and literacy [43]. Previous studies, however, did not account for parental perceptions of persuasiveness of food advertising. In turn, our study showed that the difference between parents from dyads with excessive body mass and parents with normal body mass was not statistically significant in terms of perceptions of persuasiveness of food advertising. It is possible that parents from both types of dyads interacted similarly with their children, for example when explaining the persuasive value of advertising. Parents may use strategies such as mediation, including deliberate comments and judgments about TV commercials, or explaining the nature and purpose of advertising [26,44]. Similarities across dyads in terms of parental strategies may result in a lack of differences in children's perceptions of persuasiveness of advertising. Future research may also look more carefully into interactions between parental education [29] and parental practices [31] that may jointly predict children's perceptions of food advertising. Furthermore, perceptions or judgements other than persuasiveness of food advertising may better differentiate between dyads with excessive body mass and those with normal body mass. For example, recognition of logos (higher levels of fast food logos recognition compared to logos of other types of food among children with excessive body mass [24]), or the effect of exposure levels to food adverts on the energy intake in children with excessive body mass [45] were found to differentiate between the children with normal body mass and with overweight/obesity. Yet, the findings of the present study suggest that it may be relevant to account for the parental perceptions as well.

This study has several limitations. Only healthy food availability was assessed, whereas previous research suggested that assessing availability of both healthy and unhealthy food availability is relevant. Fruit and vegetable intake may be inversely associated with availability of unhealthy food; however, at the same time, higher low calorie and nutrient dense food availability was associated with higher child's intake of sweet and savory snack [18], which may suggest that certain products might be considered as less healthy than the others and that home environments might be healthy in some ways and at the same time unhealthy in another way (e.g., availability of healthy and unhealthy food products might be perceived as high). Future studies should account for perceptions of availability of healthy food and perceptions of availability of unhealthy food. Moreover, only self-reports of food availability were used. Perceived food availability is likely to be a different construct than the actual availability of food at home, and the two are only moderately related [46]. Therefore, the conclusions of the present study should not be generalized to the differences in actual availability of healthy food. A combination of subjective and objective indicators of at-home availability of food (e.g., photographs of food stored in the family's pantry or scanning food barcodes during grocery shopping) would be preferable [47]. Yet, the feasibility of using objective measures of food intake in large samples is limited. The study did not account for an actual school-based and local community promotion of healthy eating. Using such methods would allow for controlling whether parental and children's perceptions of at-home or out-of-home environment correspond with the actual presence of policies and programs at schools/communities. Moreover, the single-item measurement of perceptions of persuasiveness of food advertisement may have limited reliability. Future research could consider more complex measures of various aspects of perceptions of advertising to thoroughly examine if the differences between groups may depend on the content of investigated construct (e.g., perception of persuasiveness versus food advertising knowledge). The procedures for data collection did not allow for clustering children according to their schools; therefore, the analyses of the effects of the school-level variables could not be conducted. Next, this study accounted for excessive body mass status incorporating both overweight and obese individuals whereas previous research showed that the differentiation between overweight and obesity may be relevant. For example, studies showed that parents may misperceive their children's body mass, especially when it comes to differentiating between child being overweight or obese [46]. There is also evidence that obese children have more accurate perceptions of their body mass than overweight children [48,49]. Future studies should verify whether perceptions of availability of at-home and out-of-home environmental factors are different in parent–child dyads with overweight, compared to parent–child dyads with obesity. The sample was not representative for the general population of the country (e.g., in terms of parental education), which limits the generalizability of the findings. Any generalization to ethnically diverse populations should be made with caution as the analyzed sample was ethnically homogeneous (all participants were Caucasian).

To conclude, this is the first study to assess differences between parent–child dyads with normal body mass and dyads with excessive body mass in terms of perceptions of at-home (perceptions of the availability of healthy food at home) and out-of-home environment (perceptions of school and community promotion of healthy eating, perceptions of persuasiveness of food advertising). Future programs targeting obesity reduction may address specific perceptions of at-home and out-of-home environment, in particular when designing interventions targeting parents and children who already have excessive body weight. The perceptions of availability of healthy food at home, and perceptions of school and local community promotion of healthy eating may be relatively low in parent–child dyads with excessive weight, which in turn may constitute a risk factor for the maintenance of excessive body weight.

**Author Contributions:** Conceptualization, K.Z., A.B., E.K., M.B., T.R., C.K.Y.C., K.L. and A.L.; Data curation, K.Z. and A.L.; Investigation, K.Z., A.B., E.K., M.B., T.R., C.K.Y.C., K.L. and A.L.; Methodology, K.Z. and A.L.; Writing—original draft, K.Z., A.B., E.K., M.B., T.R., C.K.Y.C. and A.L.; Writing—review & editing, K.Z., A.B., E.K., M.B., T.R., C.K.Y.C., K.L. and A.L. All authors contributed to the manuscript revision, read and approved the submitted version. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by grant no. 2017/27/B/HS6/00092 from National Science Centre, Poland, awarded to A.L. The contribution by A.B. was supported by grant no. 2017/27/N/HS6/0208 from National Science Centre, Poland. The contribution of M.B. was supported by a doctoral scholarship no. 2018/28/T/HS6/00021 from National Science Centre, Poland. Open access of this article was financed by the Ministry of Science and Higher Education in Poland under the 2019-2022 program "Regional Initiative of Excellence", project number 012/RID/2018/19.

**Conflicts of Interest:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

#### **Appendix A**


**Table A1.** Demographic and clinical characteristics of *N* = 798 parent–child dyads and the main analyzed sample (*N* = 506 dyads) including, parent–child dyads with excessivebodymass(*n*=129)andparent–childdyadswithnormalbodymass(*n*=377).

**Table A1.** *Cont*.


\*\*\* *p* < 0.001; \*\* *p* < 0.01; \* *p* < 0.05; † *p* < 0.10; T1 = Time 1 (baseline); T2 = Time 2 (10-month follow-up); BMI = body mass index; Education = the parental education level; Economic status = the parental perceived economic status (reports on comparison to the economic situations of the average family in the country). Significant differences (with significant *p* = 0.05 levels and significant 95% CI for Cohen's*d*) are marked in bold.

**Table A2.** Correlations and descriptive statistics for the study variables: Characteristics of the main analyzed sample ( *N* = 506 parent–child dyads with normal body mass and parent–child dyads with excessive body mass) and for *N* = 798 (four types of dyads: parent with excessive body mass and child with normal body mass; parent with normal body mass and child with excessive body mass; parent–child dyads with excessive body mass; parent–child dyads with normal body mass).


P = parent; Ch = child; T1 = Time 1 (baseline); T2 = Time 2 (10-month follow-up). BMI = body mass index; Advertisement perception = perception of persuasiveness of food advertising; School and local promotion = perception of school and local community promotion of healthy eating; Healthy food availability = perceptions of availability of heathy food at home; Education = the parental education level (1—primary, 2—uncompleted secondary/vocational, 3—secondary, 4—≤3 years of higher education, 5—≥4 years of higher education); Economic status = the parental perceived economic status (reports on comparison to the economic situations of the average family in the country; 1—much below the average, 2—below average, 3—similar to average, 4—above the average, 5—much above the average); Place of residence (1—<10,000 inhabitants, 2—between 10,000 and 100,000 inhabitants, 3—between 100,000 and 500,000 inhabitants, 4—>500,000 inhabitants); Gender (1—male; 2—female). Person's *r* for continuous variables and Spearman's *rho* for categorical variables are provided. Significant (at*p*<0.05) coefficients are marked in bold.


**Table A3.** Di fferences in perceptions of at-home and out-of-home environment: Parent–child dyads with excessive body mass (*n* = 129) versus parent–child dyads with normal body mass (*n*=377).

All *F* values reported in this table are not significant, *p*s > 0.05; P = parent; Ch = child; T1 = time 1 (baseline); T2 = time 2 (10-month follow-up); Advertisement perception = perceptions of persuasiveness of food advertising; Local promotion = perceptions of school and local community promotion of healthy eating; Healthy food availability = perceptions of availability of heathy food at home. Covariates included parental education level, parental perceived economic status and place of residence.

**Table A4.** Differences in the study variables and demographic variables between excessive body mass parent-normal body mass child dyads (*n* = 193), normal body mass parent-excessive body mass child dyads (*n* = 88), parent–child dyads with excessive body mass (*n* = 129), and parent–child dyads with normal body mass (*n*=377).


 \*\*\* *p* < 0.001; \* *p* < 0.05; † *p* < 0.10; T1 = Time 1 (baseline); T2 = Time 2 (10-month follow-up); BMI = body mass index; Education = parental education level; Economic status = parental perceived economic status (reports on comparison to the economic situations of the average family in the country). Cohen's *d* is provided only for significant between groups differences. Significant differences (*p*<0.05 and significant 95% CI for Cohen's*d*) are marked in bold.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Importance of Self-E**ffi**cacy in Eating Behavior and Physical Activity Change of Overweight and Non-Overweight Adolescent Girls Participating in Healthy Me: A Lifestyle Intervention with Mobile Technology**

#### **Anna Dzielska 1,\* , Joanna Mazur 1,2, Hanna Nał ˛ecz <sup>1</sup> , Anna Oblaci ´nska <sup>1</sup> and Anna Fijałkowska <sup>3</sup>**


Received: 25 June 2020; Accepted: 15 July 2020; Published: 17 July 2020

**Abstract:** Very little is known about how multicomponent interventions directed to entire populations work in selected groups of adolescents. The aim was to evaluate the effectiveness of the Healthy Me one-year program on changes in healthy eating and physical activity among overweight and non-overweight female students. Randomization involved the allocation of full, partial or null intervention. The randomized field trial was implemented in 48 secondary schools (clusters) all over Poland among 1198 15-year-old girls. In this study, a sample of N = 1111 girls who participated in each evaluation study was analyzed. Using multimedia technologies, efforts were made to improve health behaviors and increase self-efficacy. The main outcome was a health behavior index (HBI), built on the basis of six nutritional indicators and one related to physical activity. HBI was analyzed before and immediately after intervention and at three months' follow-up, and the HBI change was modeled. Statistical analysis included nonparametric tests and generalized linear models with two-way interactions. Comparing the first and third surveys, in the overweight girls, the HBI index improved by 0.348 (SD = 3.17), while in the non-overweight girls it had worsened. After adjusting for other factors, a significant interaction between body weight status and level of self-efficacy as predictors of HBI changes was confirmed. The program turned out to be more beneficial for overweight girls.

**Keywords:** healthy lifestyle intervention; school-based intervention; eating behavior; MVPA; overweight and obesity; self-efficacy; adolescent girls

#### **1. Introduction**

According to the World Health Organization (WHO), adolescence starts in the second decade of life [1]. This period requires special attention because of its specific health and developmental needs and rights [2]. During adolescence, the transition period from childhood to adulthood, health behaviors are shaped and consolidated. Therefore, a healthy lifestyle is crucial for adolescents' proper growth and development. Moreover, targeting adolescents with health behavior-shaping intervention activities affects the burden of disease in adulthood, providing better health through the ripple effect [3,4].

Nearly 40 years of the cross-sectional Health Behavior in School-aged Children (HBSC) study has consistently identified burning problems and the most vulnerable groups of adolescents in the European region of WHO and Canada [5]. The comparison of health behaviors of adolescents of both sexes indicated a co-occurrence of positive trends in boys and negative in girls. That resulted in an elimination of gender-related differences in the frequency of many negative behaviors [6] and exposed the population of 15-year-old girls—the future mothers of the next generations—as extremely vulnerable, especially in the context of persistent disadvantages in girls' self-rated health, observed in many countries.

According to the international report from the HBSC study [5], obesity or overweight was found in 14% of 15-year-old girls and 36% consider themselves too fat. Moreover, girls aged 15 do not regularly eat breakfast on school days (52%) and do not eat fruit (62%) and vegetables (61%) every day, but every day they eat sweets (28%) and drink sweet carbonated drinks (15%). In addition, only 11% of them meet the recommendations for appropriate levels of moderate-to-vigorous physical activity.

Both systematic reviews of intervention programs [7] and guides for the prevention of obesity in children and adolescents [8] indicate the limited effectiveness of obesity prevention programs. Low effectiveness of these programs was found in children under 12 years of age and the introduction of interventions in young people aged 13–18 did not contribute to reducing BMI. Unfortunately, there is little research in this age group. Hence, it is difficult to give a reliable assessment of the effectiveness of the intervention [9].

Health-related behaviors are correlated, and many different consolidated patterns of behaviors can be observed in different environments [10]. Systematic reviews confirm that interventions aimed at improvement in moderate-to-vigorous physical activity have a simultaneous effect on empowering other health-related behaviors such as healthy eating or weight management [11]. Results of meta-analyses show that school-based interventions including a combination of healthy eating and physical activity may prevent overweight in the longer term [12] and also indicate moderate effectiveness of educational interventions in improving eating behaviors and ambiguous results concerning anthropometric changes [13].

Likewise, better intervention outcomes are associated with long-term interventions [14], as well as with the inclusion of a higher number of applied behavioral change techniques [15–17]. Incorporating behavioral change techniques focused on self-regulation into the intervention was found effective in changing physical activity and eating behaviors. Avery et al. (2012) confirmed this relationship in adult studies [18] and Martyn-Nemeth et al. (2009) in adolescents [19]. Furthermore, some studies demonstrate the effectiveness of interventions using interactive modern media to improve diet and physical activity of adolescents, although only a few indicate maintenance of the effect in the long term [20].

Effective behavior change requires the acquisition of appropriate skills that will allow activities to be initiated consistent with acquired knowledge. Moreover, it is extremely sensitive to environmental context [21]. One of the personal competences necessary to successfully implement changes in health behavior is self-efficacy, which has a proven link to motivation, behavior control and goal achievement [22]. By being convinced of one's own effectiveness, a person gains the ability to initiate and continue changes even when faced with emerging challenges [23].

To date, the assessment of the effectiveness of the Healthy Me program has been carried out in the whole study group, without distinguishing between girls with and without excess body weight [24]. The implemented program was a universal prevention aimed at the whole population of 15-year-old girls. In the reviews of systematic community obesity prevention programs, reducing the prevalence of obesity is often assumed to be the main outcome [25]. Less attention is paid to assessing the changes in health behavior of students with and without excess body weight. However, the question arises—to what extent do overweight teenagers use universal programs? Is it a group representing less advantageous health behaviors, and do any beneficial effects of the program remain in this group after its completion? The presented paper fills this knowledge gap, while at the same time providing a

picture of the effectiveness of this innovative program, which tried to reach its addressees with the use of modern multimedia technologies.

The aim of the study was to evaluate the effectiveness of the Healthy Me intervention program on changes in the prevalence of healthy eating behaviors and the level of physical activity among 15-year-old girls in Poland. It has been hypothesized that the effectiveness of an intervention may differ in overweight and non-overweight girls, and the improvement of personal competence may be a factor strengthening the effectiveness of the intervention [26]. Therefore, the main issue was to determine in which groups of girls the Health Behavior Index (HBI), consisted of seven indicators of eating behaviors and physical activity, improved taking into consideration their body weight status, change in self-efficacy, the type of intervention provided and possible effect of school environment.

#### **2. Materials and Methods**

#### *2.1. Study and Intervention Design*

The data were obtained from the randomized field trial with cluster randomization by school and repeated measures. In total, 1198 15-year-old girls, from 48 randomly selected secondary schools all over Poland, participated in the one-year Healthy Me program in 2017–2018. Schools were randomly assigned to the subsequent groups: full intervention group (24 schools, 636 girls), partial intervention group (12 schools, 277 girls) and null intervention group (12 schools, 285 girls) (Figure 1).

**Figure 1.** Location of schools participating in the Healthy Me program by the type of intervention.

The main area of interest was the improvement in physical activity, although the intervention activities were conducted in four thematic phases: physical activity, eating behavior, risk behavior and personal and social competencies. The multicomponent intervention used mobile technology (a dedicated mobile application and a fitness band) and involved a combination of techniques. The Healthy Me program used Social Cognitive Theory [27] as its theoretical foundation and was based on an interactive technology approach [26]. The intervention included behavioral and environmental components. Self-efficacy was shaped by setting goals, observing others and receiving feedback from the technologies (fitness band, app) that supported self-monitoring. However, the type of intervention depended on the type of intervention group, which made it possible to assess the effectiveness of particular sets of intervention methods and techniques (Table 1).


**Table 1.** Intervention components by type of intervention.

The study and the intervention procedure were accepted by the Bioethics Committee of the Mother and Child Institute in Poland (number: 32/2017 from 22 June 2017) and the funding body (Ministry of Health in Poland, Grant no. 6/7/K/6/NPZ/2017/106/622).

#### *2.2. Evaluation Surveys*

The project has been fully evaluated, and as part of the evaluation of the intervention results, questionnaire surveys were conducted three times during the project implementation:


Each questionnaire contained a similar set of questions to allow comparisons to be made about changes in subjective health, different health-related behaviors and related factors.

Anthropometric measurements (e.g., weight, height) were conducted three times by school nurses, once in each survey round.

#### *2.3. Sample Characteristics*

The present analyses cover girls (N = 1111) who have completed three rounds of the survey (Table 2). About half of the girls participated in the full intervention group, and half belonged to the partial and null intervention groups. Based on the WHO standards [28], almost a quarter of participants were assessed as overweight or obese (23.5%), and the frequency was higher than in the groups of similar age form cross-sectional HBSC, probably due to the different cut-off point used for the estimation of body weight status [6]. The percentage of BMI missing data in the studied sample was very low (0.8%). A similar percentage of girls with excess body weight occurred in each type of intervention group. At the baseline, the overweight and non-overweight groups did not differ in terms of the scores of the HBI or the general index of self-efficacy (GSE), both described below.


**Table 2.** Sample characteristics at the baseline.

<sup>1</sup> Missing BMI data 0.8% (n = 9); <sup>2</sup> HBI—health behavior index; <sup>3</sup> GSE—general index of self-efficacy.

#### *2.4. Measures*

#### 2.4.1. Health Behavior Indicators

Six indicators related to eating behaviors and one measure of physical activity were tested in these analyses.

	- Frequency of eating fruits, vegetables, sweets, drinking soft drinks with added sugar. Girls answered how often they eat or drink the products by choosing one answer from seven categories, from "never" to "daily, more than once".
	- Breakfast consumption. Girls were asked to answer the questions on the frequency of eating breakfast on schooldays, choosing from six answer categories, from "never" to "five days a week", and during the weekends, choosing from three options, from "never" to "both days".
	- Moderate-to-vigorous physical activity. Girls answered the question: "Over the past seven days, on how many days were you physically active for a total of at least 60 min per day? Please add up all the time you spent in physical activity each day". The questions had eight response categories: from "zero days" to "seven days".

The frequency distribution of girls undertaking the above-mentioned eating behaviors in subsequent study periods, by type of intervention and body weight status, is presented in the Supplementary Materials, Table S1. The above questions come from the HBSC study protocols and have been tested repeatedly in Poland in a population similar in age [6].

#### 2.4.2. HBI

The summary HBI was estimated for all three study periods. The index consists of seven indicators of eating and physical activity behaviors mentioned above. The response categories in each behavior were recoded and scored from 0 to 3 points, as follows, with a higher value indicating a more favorable result:


The highest value (3 points) attributed to the recoded answers to the above questions was consistent with the national recommendations on the frequency of eating different groups of products and meals [29], as well as the global moderate-to-vigorous physical activity guidelines for children and adolescents [30].

The summary score of the HBI was from 0 to 21 points. HBI scores in each of the three evaluation surveys are presented in Tables 3 and 4.



<sup>1</sup> Differences in HBI between 3 study rounds—Kendall's W test for repeated measures. <sup>2</sup> Differences by the body weight status—U Mann–Whitney test for independent groups.


**Table 4.** Changes in the self-efficacy before and after the Healthy Me program by the body weight status.

<sup>1</sup> Missing data in GSE 6.6% (Study 1) and 6.2% (Study 3). <sup>2</sup> Differences in self-efficacy between 1st and 3rd study rounds—Z Wilcoxon's test for repeated measures. <sup>3</sup> Differences by the body weight status—U Mann–Whitney test for independent groups.

In building the HBI, its six different variants were considered. Some factors were excluded, and attempts were made to additionally include intense physical activity and meals eaten together with parents. The psychometric properties of individual indices in three study periods and the significance of the level of their changes were evaluated. None of the analysed indices had a single factor structure, and the internal consistency was slightly below the recommended level of 0.70 which is accepted for larger sample analyses [31]. The advantage of the chosen index is the fact that it takes into account the level of physical activity, which was a key element of the intervention. Eating healthy food most strongly affects the variability of the selected index. Eating sweets appeared to be the weakest component. However, this element was not abandoned, due to a considerable decrease in the frequency

of eating sweets during the project implementation period (Table S1). There was only one case of missing data in the HBI (n = 1).

#### 2.4.3. Self-Efficacy—Personal Competence Scale

To measure the change in self-efficacy the KompOs scale was used. This is a two-dimension, 12-item, standardized questionnaire by Z. Juczynski, applied for younger and older adolescents to assess their self-efficacy [32]. In older adolescents (15–17 years) this tool has a two-dimensional structure and measures strength to initiate behavior and perseverance to sustain it. Psychometric analysis performed on our sample at the baseline revealed good reliability of the full scale, with Cronbach's α = 0.757, as well as the component scales: for strength Cronbach's α = 0.736 and for perseverance Cronbach's α = 0.677. In other studies, test–retest reliability of the scale, applied in older adolescents, was 0.51. The theoretical validity of the scale was tested and showed a positive correlation with General Self Efficacy [33] *r* = 0.43 and Coopersmith Self-Esteem Inventory (CSEI) [34,35] *r* = 0.30.

In the following description, instead of the national scale abbreviation (KompOs), the term self-efficacy is used. The general self-efficacy score (GSE), as well as two partial scores of strength and perseverance, were analyzed. The percentage of missing data in GSE was 6.6% and 6.2% in the first and third study, respectively.

#### 2.4.4. Body Weight Status

Results from the anthropometric measurements (body weight, height) conducted by school nurses before the intervention (November 2017) were used. BMI classification was made using WHO standards [28]. For the analysis, the BMI variable was recoded into two categories of body weight status: (1) overweight (overweight and obese categories) and (2) non-overweight (other categories).

#### *2.5. Statistical Analysis*

A combined analysis of independent and dependent observations resulting from repeated measurements, which is an approach commonly used in the case of mixed data, was applied.

The HBI changes constituted the main outcome variable. They were analyzed by comparing successive measurements and examining the determinants of the changes themselves, which only required the technique of comparing independent samples. The most important variable was the HBI change between the first study and follow-up three months after intervention, because of simultaneous measurements of competence at these time points.

Due to the non-normal distribution of the HBI values and the HBI changes, non-parametric methods were used for two (BMI groups) and three (types of interventions) adolescent girls' groups, respectively. These were Wilcoxon and Kendall tests for dependent data and Mann–Whitney and Kruskal–Wallis tests for independent data.

The school effect was also examined by estimating the ICC (intraclass correlation coefficient). A mixed linear model with school as a random effect was used for this purpose. The ICC values for different types of interventions were compared separately for the absolute value of the HBI and the changes in this index.

In a multifactor analysis, a generalized linear model was estimated (GENLIN procedure in IBM SPSS software, v.23). It is a method that does not impose strict conditions as to the distribution of the analyzed variables, allowing various types of variables to be included as predictors (binary, categorical, continuous) and enabling a transparent analysis of the interaction effect.

Three GENLIN models were estimated, describing the determinants of the HBI change on the basis of the results of the Study 1 and Study 2, Study 2 and Study 3, and Study 1 and Study 3 evaluation surveys. After checking variants of the models, it was decided to include in the group of predictors the following: body weight status, the type of intervention and the interaction between the body weight status category and the change in self-efficacy. The analyses of the HBI change were also corrected with respect to the initial HBI level and the self-efficacy score. The overall quality of the models was measured by the omnibus test. It gives the answer to the question whether the explained variance in a set of data is significantly greater than the unexplained variance.

#### **3. Results**

#### *3.1. HBI*

The mean scores of the HBI in all three study periods in the overall sample, by body weight status are presented in Table 3, and by the type of intervention group in Table S2.

The HBI in Study 1 did not differ by body weight status. In Study 2, it was slightly higher in overweight than non-overweight girls, but the results were at the tendency to significance level (*p* = 0.052). Three months after the intervention (Study 3), the overweight girls presented significantly higher scores of HBI than non-overweight girls (*p* < 0.01). The highest HBI scores were indicated in the full and null intervention groups in all three study rounds, while the lowest occurred in the partial intervention group.

In the total sample, as well as in both groups distinguished by body weight status, significant differences were found in the HBI scores between the three rounds of the study. Comparing the initial level and results three months after the Healthy Me program completion, the crude level of change in HBI was equal to 0.026 (SD = 2.89). In the group of girls with overweight or obesity, an improvement was observed (0.348 ± 3.17), while in girls without excessive body mass health behaviors worsened.

#### *3.2. Self-E*ffi*cacy—Personal Competence*

Table 4 compares the distributions of self-efficacy indices, taking into account two available measurements, at the beginning of the Healthy Me program implementation (Study 1) and at follow-up after three months (Study 3). A decreasing trend in GSE was observed, which was caused by a considerably deteriorating assessment of the strength dimension, with slight changes in the level of perseverance. Unfavorable changes were observed only in non-overweight girls. In the overweight or obese group, changes in the general index and sub-indices were not statistically significant. These two groups of girls distinguished by body weight status did not differ considerably with regards to the general score, as well as regardless two dimensions of self-efficacy scale, both at the beginning of the program and three months after its completion.

Table S3 compares the results of non-parametric tests of the distribution of GSE, as well as the domains' scores in the three intervention groups. At the onset, the girls from the schools covered by full intervention achieved the best results, while in the control group (null intervention) those indices were the lowest. Observable differences concerned only Study 1, GSE and the dimension of strength. The third measurement point (three months' follow-up) did not reveal any significant differences between the intervention groups. Comparing the level of self-efficacy change in conjunction with the paired data test, a significant deterioration was shown in the full intervention group, which also concerned the overall score and the dimension of strength. A clear trend of a deterioration in competence level was also found with respect to the dimension of perseverance in the partial intervention group.

In reference to the initial hypothesis, HBI changes were checked depending on the level of GSE changes. It was contractually assumed that the deterioration and improvement would occur in case of a change by more than two points. In the three groups representing worsening, lack of change and improvement in GSE, there were 32.5%, 39.9% and 27.6% of girls, respectively. The percentage of girls with improved GSE was 28.4% in the overweight group and 27.0% in the non-overweight group (*p* = 0.792). According to the data presented in Figure 2, a significant change in GSE is associated with an improvement in dietary behavior and physical activity, measured by change in HBI. The impact of improved self-efficacy is more evident in overweight girls. In this group, even with a GSE change around zero, a slight improvement in HBI values has already been noted.

**Figure 2.** Change in HBI comparing baseline and 3 months' follow-up after intervention according to BMI group and change in self-efficacy (GSE).

#### *3.3. School E*ff*ect*

Taking the HBI change between the results of the first and the third evaluation study as the most important outcome, it was examined to what extent this change depends on local school conditions. The ICC index was calculated. In the whole sample of 48 schools, it equaled 0.012. For particular types of intervention, it was estimated at the level 0.006 (full), 0.020 (partial) and 0.015 (null). This means that the proportion of variance in the HBI change that lies between schools is very small and slightly varies depending on the type of intervention. At the same time, the low ICC value allows for the abandonment of multilevel analyses taking into account the hierarchical data structure.

For comparison, the school effect in the whole study group and in relation to the absolute HBI value at the onset (Study 1) equaled 0.031 and 0.044 in Study 1, and 0.039 in Study 3. An increase in the ICC may be a signal that schools were not implementing the intervention program to an equal extent over the entire duration of the program.

#### *3.4. Independent Predictors of the Change in HBI*

Table 5 shows the results of the estimation of generalized linear models in which the dependent variable was the HBI change, calculated on the basis of the results of different surveys (1 and 2; 2 and 3; 1 and 3).

The models were adjusted to the initial levels of the HBI and the self-efficacy score. The selected predictors accurately described the fluctuations of the HBI changes between first and second measurement points and between first and third points (deferred effect). The middle model (2 and 3) described to a small extent the determinants of the HBI changes immediately after the end of the program. The overweight girls achieved significantly higher HBI gains compared to peers without excess body weight in both extreme models (Table 5). The intervention effect was best demonstrated in the last model, describing the change between the first and third surveys. In the case of partial intervention, the changes were less beneficial. A significant interaction between the changes in self-efficacy and body weight was also shown. In the first and third models, among girls with excess body weight, the improvement in personal competence

contributes more to the increase in the HBI value. For example, when comparing the first and third measurement points, an increase in the self-efficacy by one unit results in an increase in the HBI by 0.132 (*p* = 0.006) in the overweight and obese group. In the group of non-overweight girls, the HBI increase was only 0.037, and this parameter of regression function does not differ significantly from zero (*p* = 0.198).


**Table 5.** Determinants of change in the HBI around the period of Healthy Me intervention identified by generalized linear models.

<sup>1</sup> HBI—health behavior index. <sup>2</sup> GSE—general index of self-efficacy.

On the basis of the above three models of the HBI change determinants, it is possible to estimate the theoretical values at the second and third measurement points in two groups of girls with different body weight statuses, starting from the actual initial value (Figure 3).

In both groups, an increase in the HBI was observed between the beginning and the end of the Healthy Me program, followed by a decrease, according to the measurement three months after the end of the program (Study 3). This initial improvement in health behaviors was clearly greater in the overweight group. Comparing the first and third measurement points, it is possible to draw a conclusion regarding the effectiveness of the program as a tool for improving health behaviors. In the group of girls without overweight or obesity, the effectiveness of the program is lower, and extreme measurements indicate a return to the baseline and even a slight deterioration in the HBI. Attempts to devise alternative models have not led to better results. Among other things, the independent influence of partial indices of self-efficacy (strength and perseverance) was studied, and attempts were made to include the main effect of self-efficacy in the model. The model that takes into account the interaction of body weight status with self-efficacy was considered optimal.

**Figure 3.** Changes in HBI in three waves of the survey under the Healthy Me program, according to body weight status groups adjusted for type of intervention, initial values of HBI, self-efficacy and interaction body weight status–self-efficacy.

#### **4. Discussion**

In our research, we assumed that a change in the level of HBI may be related to a change in the level of personal competence among girls participating in the Healthy Me intervention program, which used mobile technologies. Our analyses confirmed this assumption. The change in the HBI was explained by the interaction of the self-efficacy level with body weight status. Although there was no positive change in health behavior among girls without excess weight, girls with excess body weight (overweight or obesity) achieved a better score in the health behavior index in the follow-up study after three months of intervention.

Prior to the intervention, there were no observable differences in the values of health behavior indices between overweight and non-overweight adolescent girls. Taking into account the type of intervention, slightly lower values at the starting point occurred in the partial group than in the other two intervention groups. The other studies conducted among Polish schoolchildren also support our conclusion that maintaining a diet rich in beneficial products is not the domain of adolescents without overweight or obesity and even more often occurs in overweight or obese adolescents [36]. Conversely, some problems are more frequently observed in overweight teenagers compared to their peers without excess body weight, such as skipping meals [37], having fewer meals during the day [38] and lower physical activity [39,40].

The level of the HBI changed in the second evaluation study, but after three months from the end of the intervention, it returned to a level close to the initial level. It turns out that in girls without overweight or obesity, a slightly lower HBI score between extreme measurements was recorded, but in the group of girls with excess body weight, an improvement was observed. Moreover, the deferred effect revealed a significant difference in the average HBI indices in favor of overweight and obese teenagers. The result seems all the more interesting because our intervention was not aimed at changing behaviors of adolescents at risk—overweight, but at the general population of 15-year-old girls, which was selected because of the significant deterioration in health behavior for this age group. The aim of the program was to assess if the proposed intervention could help to slow down the unfavorable trend before they reached the age of 15. As effects were observed in the group of girls with excess body weight, it may be hypothesized that even if the study was not addressed to the adolescents from risk groups (selected prevention), the overweight participants may be more motivated to engage in prevention programs, which makes this group more vulnerable to benefits [41]. Based on our studies and thesis supported by other researchers, there is a need to cautiously draw conclusions about changes in health behavior induced by intervention, especially in the case of long-term programs carried out in the developmental period [42]. Among others, the negative changes resulting from developmental factors should be taken into account. Moreover, in a longer program, the rate of change could be altered by a number of interim measurements, promotion of the program before it starts, overlapping of other parallel trends or a negative effect of withdrawal from the program. Thus, the absence of a significant, positive change in the HBI in the general population of 15-year-old girls can be considered as a satisfactory result, taking into account developmental considerations, and a negative trend in health behaviors which increase with age, observed in other studies among girls [43].

The analysis of self-efficacy showed no differences among intervention types, nor the body weight status among the studied population at the starting point of the intervention. The main changes revealing the impact of the Healthy Me program concerned the general score and the self-efficacy dimension of strength (to initiate behavior) and showed the decrease in these scores among participants of the full intervention group. Within the partial intervention group, the dimension of perseverance also deteriorated, and this result supports the claim that the multicomponent, but moderate, intervention impacts have an exceptional effect on the participants' conviction about the possibility of sustaining behaviors. This result might be caused by an ongoing dynamic process of verification of self-efficacy during the program.

Based on Social Cognitive Theory, the self-efficacy building strategy is one of the most effective tools in the health behavior change programs targeting diet and/or physical activity among children [44]. Jacobson and Mazurek Melnyk (2011) concluded after their pilot study with overweight and obese school-age children that healthy lifestyle interventions that include cognitive behavior skill building may be the key to strengthening the child's healthy beliefs and facilitating healthy lifestyle choices and behaviors [45]. Morano et al. (2016) recommend that childhood obesity programs should target psychosocial correlates of physical activity [46], among which the crucial one, as Kołoło et al. (2010) indicate, is self-efficacy [47]. Higher self-efficacy is related to better decision-making and goal achievement [22]. Therefore, girls assessing their self-efficacy well are more likely to undertake (strength dimension of self-efficacy) and sustain (perseverance dimension) the health behaviors.

Our study shows significant interaction between self-efficacy and body weight. Among overweight girls, improvement in self-efficacy resulted in enhancement of health behaviors. The sense of interaction and the mediating role of self-efficacy and other social competences in the process of changing health behaviors is strongly established and widely proved in the literature. Especially regarding overweight and obesity, according to Goffman's spoiled identity theory [48], and further randomized control studies of the stigma effect on health behaviors [49], children with low social competence are at higher risk for obesogenic behaviors. That interaction was also confirmed in a national sample of Americans where nine-year-old children with lower social competence were at higher risk of becoming overweight or obese by age 11 [50]. In low social competence groups, avoiding stress caused by complex psychosocial factors with negative feedback related to excess body weight may manifest in unhealthy behaviors such as solitary, sedentary or unhealthy eating. According to Melnyk et al. (2009), psychosocial factors may inhibit or cause barriers to healthy behaviors in adolescents [51]. On the other hand, Vila et al. (2004), using a Child Behavior Checklist, found that obese adolescents demonstrated significantly poorer social skills [52]. These studies show the nature of the reciprocal relationship among competences, body weight and health behaviors.

Additionally, the school effect was measured in the analysis. Based on results obtained, the effect of the school has proved to be small, indicating quite a consistent approach by schools towards the implementation of intervention activities. Interventions to improve health behavior are largely implemented in the school environment, and many of them have a positive impact on nutrition and physical activity [53]. Due to the availability of target groups as well as methods, resources and qualified

staff, the school seems to be an ideal environment for health promotion and education [54]. However, a lot depends on the quality of the proposed interventions, the way they are implemented, consistency of the activities [55], proper preparation of the contractors, financial possibilities and the duration of the intervention [54,56]. In this regard, the small school effect obtained in our study may be the result of how schools were prepared to implement the intervention activities. Preparations included providing clear instructions, training of direct executors, application of unified educational methods and contents and a strictly determined sequence of undertaken activities. Moreover, the high competences of physical education teachers responsible for coordination and carrying out activities at schools may have had an impact on the uniform implementation of the program at the school-setting.

#### *Strengths and Limitations*

We are aware of some limitations of our study, which were partly due to the schedule of the Healthy Me program, as well as the assumptions adopted in this article. First, the final deferred effect of the program should be assessed in the long term. Second, one of the most important variables, i.e., the level of self-efficacy, was not measured just after the end of the program but three months after the end of the intervention. Only a few factors were considered in the analyses, focusing on differences in the level of the HBI and its changes in the groups of overweight and non-overweight girls. The Healthy Me program was implemented in a variety of environments (48 schools all over Poland), over a long period of time (a year), covering a large group of girls (*n* = 1111). This environmental variability was undoubtedly an asset but also created additional limitations. In such a large and diverse group, it was difficult to control the involvement of individual schools in the implementation of the program, and the distinct differences are evidenced by the results of qualitative studies and different subjective evaluations of the program by its participants [24]. However, the low ICC values in this study may indicate quite consistent implementation of the intervention actions by the schools involved. It was not possible to analyze in detail the changes in the diet of the program participants. The main outcome variable, i.e., the HBI, contained a strong nutritional component but was corrected with respect to the level of physical activity. This version of HBI was chosen because improvement in physical activity was the main focus of the Healthy Me intervention.

Despite the above limitations, the analyses presented have a number of advantages and bring additional knowledge to research on the evaluation of multifaceted intervention programs. Attention was paid to the heterogeneity of the intervention group. Commonly, it is hypothesized that different intervention components will benefit equally different subgroups of participants in the hope of offering something for everyone. It has been proven that girls with excess body weight have benefited more from participation in the Healthy Me program, which is the main conclusion of these analyses. This was partly due to the change in their self-efficacy, which was at a relatively low level, but any improvement resulted in better health behaviors. Taking into account the aspect of personal competences is one of the strengths of this program. Self-efficacy was measured with a robust tool dedicated to the adolescents. Usually, the motivational and strengthening factors are only mentioned as a theoretical basis for intervention. In our program, this factor was one of the components to be evaluated. In addition, we have introduced an interaction effect into statistical analyses, which is now considered an important part of the search for an optimal intervention model [57].

#### **5. Conclusions**

In summary, our results demonstrate a significant effect of self-efficacy with the interaction of body weight status on improvement in eating behavior and physical activity among adolescent girls. The authors conclude that the positive impact of the intervention proved to be stronger for overweight girls. Girls with excess body weight, three months after intervention completion, presented a higher level of favorable health behaviors than girls without excess body weight. Further work is certainly required to disentangle these complexities in non-direct effects of interventions on health behavior change among adolescent girls. When analyzing the effects of such programs, it is necessary to take

into account the multiplicity of interrelationships between different factors that may modify the effects obtained. Our paper opens new conceptual and practical fields in research on the evaluation of health interventions. Firstly, effective interventions targeting adolescent girls should include a strengthened element of developing personal competences, the growth of which appears to be most beneficial to girls at risk. Secondly, the level of change in personal competences should be monitored during the whole evaluation process. This seems to be far beyond including psychological factors as the only theoretical basis for the intervention.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2072-6643/12/7/2128/s1. Table S1: Prevalence of health behavior by the type of intervention and body weight status (%). Table S2: Health Behavior Index (HBI) change in 3 study periods by the type of intervention. Table S3: Changes in the self-efficacy before and after the Healthy Me program by the type of intervention.

**Author Contributions:** Conceptualization, A.D., J.M., H.N., A.O., A.F.; methodology, J.M., A.D.; analysis, J.M., A.D.; writing—original draft preparation, A.D., J.M., H.N., A.O., A.F.; writing—review and editing, A.D., J.M., H.N., A.O., A.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Health Program of the Ministry of Health in Poland (Grant no. 6/7/K/6/NPZ/2017/106/622).

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript or in the decision to publish the results.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*

### **Parent Stress as a Consideration in Childhood Obesity Prevention: Results from the Guelph Family Health Study, a Pilot Randomized Controlled Trial**

**Valerie Hruska <sup>1</sup> , Gerarda Darlington <sup>2</sup> , Jess Haines <sup>3</sup> and David W. L. Ma 1,\* on behalf of the Guelph Family Health Study**


Received: 7 May 2020; Accepted: 17 June 2020; Published: 19 June 2020

**Abstract:** Parents' stress is independently associated with increased child adiposity, but parents' stress may also interfere with childhood obesity prevention programs. The disruptions to the family dynamic caused by participating in a behaviour change intervention may exacerbate parent stress and undermine overall intervention efficacy. This study explored how family stress levels were impacted by participation in a home-based obesity prevention intervention. Data were collected from 77 families (56 fathers, 77 mothers) participating in the Guelph Family Health Study (GFHS), a pilot randomized control trial of a home-based obesity prevention intervention. Four measures of stress were investigated: general life stress, parenting distress, depressive symptoms, and household chaos. Multiple linear regression was used to compare the level of stress between the intervention and control groups at post-intervention and 1-year follow-up, adjusted for baseline stress. Analyses for mothers and fathers were stratified, except for household chaos which was measured at the family level. Results indicate no significant differences between intervention and control groups for any stress measure at any time point, indicating a neutral effect of the GFHS intervention on family stress. Future work should investigate the components of family-based intervention protocols that make participation minimally burdensome and consider embedding specific stress-reduction messaging to promote family health and wellbeing.

**Keywords:** stress; mental health; family; health behavior; childhood obesity; health intervention

#### **1. Introduction**

Childhood overweight and obesity are associated with several health concerns such as increased risk of chronic illnesses like cardiovascular disease, type 2 diabetes, cancer, and reduced overall lifespan, as well as increased risk of being bullied and developing disordered eating habits due to societal bias against those in larger bodies [1–4]. While there is a well-recognized genetic predisposition to body composition, the main focus of childhood obesity prevention has been on health behaviours such as dietary patterns, physical activity, sedentary or screen-based time, and sleep quality. There appears to be a critical window of development in early childhood where lifelong health behaviour patterns are largely established [5,6]. This presents an especially advantageous target for programs to focus on prevention in early life to maximize the preventative benefit of healthful behavioural patterns. Parental involvement has repeatedly been demonstrated to play a key role in the success of childhood

obesity prevention programs [7–10]. These family-based behaviour change interventions typically focus on changing parenting practices and/or family behaviours such as eating meals as a family or group physical activities. However, parents engaged in a home-based childhood obesity prevention program manage several roles; they are participants making changes to their own behaviours plus being the taskmaster for their child's compliance, as well as the many other roles that they serve outside of the intervention context. The competing demands on parents' time and resources are numerous and dynamic, making it especially complex to effectively engage them in childhood obesity prevention programs.

Parents' stress may be an additional key consideration for family-based childhood obesity prevention programs for two key reasons. First, past research has established cross-sectional associations between parent stress or household dysfunction and several child health outcomes, including behaviours such as increased screen viewing [11], fast food consumption [12] as well as overall child weight status [12–15]. The second consideration is that parents who are overwhelmed may have difficulties adhering to an obesity prevention program, thus undermining the program's efficacy. It is well-understood that family routines are an important contributor to family well-being and positively influence children's development [16–19], but participation in a family-based childhood obesity prevention program is likely to impose substantial changes in the families' typical routines and activities. This perturbation of existing habits, even if intended for healthful changes, may inadvertently disrupt balances within the home. Alternatively, it is possible that promoting new behaviours as part of healthful routines could help families to establish more order and regularity within the home, thus decreasing overall family stress. The impact of health promotion programs on parents' wellbeing has not been widely explored.

In addition, dominant expectations of parenting place much more responsibility on mothers than fathers for active management of children's health and health behaviours [20,21]. Studies in Canada, the US, and Europe consistently demonstrate that, despite men's increasing involvement, women take on the bulk of responsibility for house and family work, including assuming responsibility for the health and well-being of family members, organizing their children's lives, and planning and preparing meals [20,21]. Thus, family-based health interventions may inadvertently reinforce the gendered division of labour and could result in an enhanced level of stress among mothers as compared to fathers. Additionally, perceptions and consequences of stress have repeatedly been demonstrated to differ between males and females [22–26], thus making gender an important consideration when exploring how participation in a family-based intervention may influence family stress.

The purpose of this study was to investigate the longitudinal changes in parents' perceived general life stress, parenting distress, depressive symptoms, and household chaos as a function of participation in a family-based health promotion intervention program among a cohort of Canadian mothers and fathers of young children. This study also examined whether these changes in family stress were moderated by parent gender.

#### **2. Materials and Methods**

#### *2.1. Study Participants*

This study used the Pilot phase 1 and 2 studies of the Guelph Family Health Study (GFHS), a pilot randomized control trial of a home-based obesity prevention intervention (clinical trials registration number NCT02223234, University of Guelph Research Ethics Board REB14AP008). The primary aim of the pilot studies was to test the feasibility of the intervention and assessment protocols. Detailed procedures of the pilot are published elsewhere [27] and briefly summarized below. Participants were recruited using posters and rack cards displayed at local family health team and early childhood education centres as well as posts to these agencies' social media accounts. To be eligible to participate, families had to have at least one child between the ages of 18 months to 5 years of age, live in Wellington

County, Ontario, Canada, with no plans to move in the following year, and have at least one parent able to respond to surveys in English.

Data for these analyses were collected at baseline, 6-months (post-intervention) and 18-months (1-year post-intervention). Participating families received grocery gift cards as compensation at each time point of assessment.

#### *2.2. Exclusions and Losses to Follow-Up*

As shown in Figure 1, 151 parent participants from 86 families met eligibility criteria and were enrolled in the study, though three families (three mothers, one father) later declined to participate before completing baseline assessment. The remaining 83 families (147 parents; 83 mothers, 64 fathers) were randomized to the three treatment groups: two home visits with a health educator (2HV), four home visits with a health educator (4HV), and a minimal-attention control, the protocols of which are explained further below. One family (one mother) randomized to the 4 HV group later declined to receive the intervention and was eventually lost to follow-up. The remaining 82 families (146 parents) completed all components of the intervention program, though five families (five mothers, six fathers) were later lost to follow-up, resulting in a 92.8% retention rate of the GFHS Pilot 1 and 2 cohorts at 1-year post-intervention. No harms of the intervention were detected.

**Figure 1.** Study design and participant flow of the analytic sample from the Guelph Family Health Study Pilot Phase 1 and Phase 2 parent participants.

In addition to the 11 participants who were lost to follow-up, two fathers did not complete baseline stress measures and were therefore excluded from this analytic sample. Thus, a final analytic sample of 133 parent participants (77 mothers, 56 fathers) from 77 families was used for these analyses.

#### *2.3. GFHS Intervention*

The GFHS was designed as a home-based childhood obesity prevention program, informed by the Family Systems [28] and Self Determination [29] theories. The program used motivational interviewing, a collaborative and client-centred counselling technique that increases the likelihood of successful behaviour change by providing families with a sense of autonomy, confidence, and support with respect to health goals that the families set for themselves. Suggested goals in the GFHS included increasing fruit and vegetable intake, replacing sugar-sweetened beverages with water, reducing screen time, establishing a bedtime routine to promote adequate sleep, encouraging physical activity, or another goal of the family's own creation. The intervention program was delivered by a health educator, a registered dietitian trained in motivational interviewing, who worked with the families to develop personalized and self-directed health goals and provided support throughout the 6-month intervention period. These sessions were held in the family's home and typically were an hour in duration. Complementary to the home visits were a series of emails and mailed materials tailored to the family's goals, such as colourful plates to encourage more family meals or children's books to encourage regular sleep routines. Full details of the intervention protocol have been published previously [27].

All participants completed the baseline assessment, including a series of surveys and health visits at the University of Guelph, where measurements such as height, weight, blood pressure, and body composition were taken by trained research assistants. After baseline assessment families were randomized by the study coordinator into one of three parallel groups (in Pilot 1) or into one of two parallel groups (in Pilot 2) using a pseudo-random number generator. The three groups in Pilot 1 consisted of a minimal-attention control group (general health advice through monthly emails, such as current Canadian physical activity guidelines), a two home visit intervention group (home visits with a health educator, weekly emails, and monthly mailed incentives), and a four home visit intervention group (differing only in number of visits from the two home visit group). In Pilot 2, families were randomized to control or four home visits based on early feedback from Pilot 1 participants that two home visits were not preferred. Baseline data were collected between December 2014 and November 2016 at the University of Guelph, Ontario, Canada; follow-up data collection was completed by November 2018.

#### *2.4. Stress Measures*

Four different types of stress (general life stress, parenting distress, parental depression, and household chaos) were assessed via paper (*n* = 152) or online (*n* = 238) surveys. Data collection was conducted at baseline, then repeated post-intervention (6 months from baseline) and at 1-year post-intervention (18 months from baseline).

General life stress was examined with the question "Using a scale from 1 to 10, where 1 means 'no stress' and 10 means 'an extreme amount of stress', how much stress would you say you have experienced in the last year?" [12].

Levels of stress specific to the role of being a parent were examined using the 12-item Parent Distress subscale of the Parenting Stress Index (PSI) [30]. Participants were asked to respond on a 5-point Likert scale from 1 (strongly disagree) to 5 (strongly agree) to items such as "I often have the feeling that I cannot handle things very well", "I feel trapped by my responsibilities as a parent", and "Having a child has caused more problems than I expected in my relationship with my spouse (or male/female friend)". For parents who completed the paper version of the surveys, the response options were on a 4-point Likert scale (i.e., the neither disagree nor agree option was not included). This discrepancy in the response options between the paper and online surveys was managed by recoding the paper survey response options as 1 = strongly disagree, 2 = disagree, 4 = agree, and 5 = strongly agree. Analyses with the paper and online survey data together showed similar results to when only the online survey data were used; thus, results for the paper and online survey were combined for these analyses. A total score out of 60 was calculated by summing the responses; higher scores indicate greater parental

distress. Standardized Cronbach's alpha for mothers in this sample at baseline, was 0.86; for fathers, 0.78. The PSI has been validated for use among both mothers [30] and fathers [31] of young children.

Parental depressive symptoms were assessed with the Andresen short form of the Centre for Epidemiological Studies Depression Scale (CES-D) [32]. Sample items include "My sleep was restless", "Everything I did was an effort", and "I felt fearful", and were scored as 0 (less than one day last week), 1 (1–2 days), 2 (3–4 days), or 3 (5–7 days). A total score out of 30 was calculated by summing the responses; higher scores indicate greater depressive symptoms. Standardized Cronbach's alpha for mothers in this sample at baseline was 0.87; for fathers, 0.80.

Household dynamic and chaos were examined using the 15-item Confusion, Hubbub, and Order Scale (CHAOS) [33]. This scale conceptualizes noisiness, disorganization, and confusion within the home environment. Participants responded to items such as "We almost always seemed to be rushed" or "It's a real zoo in our home" on a 4-point Likert scale from 1 (very much like your own home) to 4 (not at all like your own home). The CHAOS survey was asked only of Parent 1 in this sample (the first parent to sign up for the study, of whom 76% were female), and this was used as a family wide measure. Standardized Cronbach's alpha for this scale at baseline was 0.88.

#### *2.5. Statistical Methods*

In intent-to-treat complete case analyses, we used multiple linear regression models to examine differences between the study groups (control, 2HV, and 4HV) for post-intervention and for 1-year follow-up stress measures after controlling for baseline. Results for the 2HV and 4HV groups were not substantively different (see Table A1), thus, we present results with the two intervention groups combined. General stress, parenting distress, and depressive symptoms were analysed for each participant; household chaos was considered to be a shared variable among family measures and was analysed at the household-level. Data from males and females were analysed separately to account for potential gender-based differences in stress perception [22–26] and to better compare these results to the predominantly mother-focused parenting research in the field [34]. Household chaos was examined as one observation per family, regardless of the gender of the parent who reported it. No demographic covariates were included in the model. The use of a randomized design would mean that any difference in demographic characteristics across study groups would be due to chance. Statistical analyses were performed using SAS University Edition Version 3.6 [35]. A *p*-value of < 0.05 was considered statistically significant for all analyses.

#### **3. Results**

#### *3.1. Descriptive Data*

As shown in Table 1, this analytic sample contained 56 fathers (42%) and 77 mothers (58%). The average age of participants at baseline was 35 years. Over 80% of participants identified as white and over 40% had received postgraduate education. Of the 77 participating families, approximately 85% had parents who were married, nearly 80% contained two or more children, and 45% had an annual household income of \$100,000 or more. Baseline characteristics (Table 1) and levels of stress (Table 2) were similar among the intervention and control groups.


**Table 1.** Baseline characteristics of parent participants in the Guelph Family Health Study.

**Table 2.** Linear regression results comparing intervention and control groups with respect to stress levels at post-intervention and at 1-year follow-up after controlling for baseline, stratified by parent gender. Household chaos model analysed at the family level (one observation per household).


<sup>1</sup> Linear regression coefficient after controlling for baseline.

#### *3.2. Mean Stress Levels*

As shown in Table 2, mothers and fathers reported moderate levels of stress on all measures at all time points and across all treatment groups. Across the three timepoints, mothers' general stress means ranged from 6.0 to 6.6 out of a maximum score of 10. Fathers' general stress scores ranged from 5.9 to 6.8. Mothers' parenting distress mean scores ranged from 26.8 to 30.9 out of a maximum score of 60, which ranks between the 59th and 68th percentiles of the PSI scoring reference [30]. Fathers' parenting distress scores ranged from 27.5 to 29.0, which falls within the 62nd and 64th percentiles. Mothers' depressive symptoms scores ranged from 6.0 to 6.8; fathers' scores ranged from 6.1 to 7.9. While these CES-D means may seem low in relation to the maximum score of 30 points, they should be interpreted as moderate given that a CES-D score of 10 or greater indicates significant depressive symptomology consistent with clinical diagnosis [32]. Household chaos means ranged from 30.3 to 33.0 out of a maximum score of 60 points.

#### *3.3. Post-Intervention*

No intervention effect was observed for any of the stress measures among mothers or fathers at post-intervention after controlling for baseline measures. Among mothers randomized to the intervention, there was a non-significant difference of −0.60 (95% CI: −1.47, 0.27) compared to control, after adjustment for baseline. Among fathers, there was a non-significant difference of 0.56 (95% CI: −0.43, 1.56) in the intervention compared to control, after adjustment for baseline. For parenting distress, mothers randomized to the intervention had a non-significant difference of −0.62 (95% CI: −4.90, 3.65) to control, after adjustment for baseline. Among fathers in the intervention, there was a non-significant difference of −1.28 (95% CI: −4.60, 2.04) compared to the control after adjustment for baseline. Differences in depressive symptoms followed a similar trend; no significant differences were found for either mothers or fathers. Among mothers randomized to the intervention, there was a non-significant difference of −0.57 (95% CI: −2.98, 1.84) compared to the control, after adjustment for baseline. As was found for mothers' depressive symptoms scores, there was no significant difference between fathers in the intervention compared to those in the control after controlling for baseline (−0.91, 95% CI: −3.48, 1.67).

At the family level, household chaos scores were similar at baseline and post-intervention. The difference of 0.65 (95% CI: −3.06, 1.77) was not statistically significant.

#### *3.4. 1-Year Follow-Up*

Similar to the results at post-intervention, no intervention effect was observed for any of the stress measures among mothers or fathers at 1-year follow-up after controlling for baseline (Table 2). Specifically for general stress, the difference between the intervention and control was not significant (−0.15, 95% CI: −1.13, 0.83) after controlling for baseline. Among fathers, there was a non-significant difference in general stress at 1-year post-intervention after controlling for baseline (−0.90, 95% CI: −2.08, 0.27). The mean parental distress score at 1-year follow-up among mothers randomized to the intervention compared to the control yielded a non-significant difference of −1.92 (95% CI: −5.37, 1.53). Likewise, for fathers, the mean parental distress among those randomized to the intervention was not significantly different from the control at 1-year after controlling for baseline (−0.41, 95% CI: −4.56, 3.74). Among mothers randomized to the intervention, the mean depressive symptoms score was not significantly different from mothers randomized to the control (−0.92, 95% CI: −2.87, 1.04) after controlling for baseline. Among fathers, comparison of mean depressive symptoms scores at 1-year follow-up for the intervention and control resulted in a non-significant difference of −0.70 (95% CI: −2.98, 1.58) after controlling for baseline.

At 1-year follow-up, mean household chaos among families randomized to the intervention compared to the control resulted in a non-significant difference of −2.57 (95% CI: −5.34, 0.21) after controlling for baseline.

#### **4. Discussion**

The purpose of this study was to investigate differences in family-based stress between intervention and control groups at post-intervention and 1-year follow-up in a sample of Canadian mothers and fathers participating in the GFHS, a home-based obesity prevention randomized control trial. Our results suggest no harmful impact of the intervention program on the family environment across the four dimensions examined.

The GFHS pilot studies demonstrated success in increasing children's fruit and vegetable consumption [36] and at post-intervention, children and parents had lower indices of body fat [27,37]. This suggests that the GFHS intervention program did meaningfully change some family behaviours, but until the present study, it was unknown how these changes could impact families' stress levels.

The program may have encouraged families to implement more structured, organized behavioural patterns focused around these health goals, thus calming the home environment and increasing parenting confidence; however it is also possible that the program may have caused conflict or confusion from the disruptions to the families' typical behaviours. Our results suggest that family stress levels were not different when comparing intervention to control families, despite evidence that behavioural changes did indeed occur among both parents and children [27,36,37].

There are several potential explanations for these results. Careful planning and consideration went into designing the GFHS intervention to have a minimal burden on participants, such as the health educator visits occurring within the family's home instead of at a research centre, the use of online surveys to allow for more convenient completion, and financial compensation for the family's time. Thus, participation in the study may not have been particularly burdensome to families. In addition, the use of motivational interviewing, a client-centred counselling technique that empowers participants to choose their own goals and strategies to achieve them [38], may have helped to relieve the burden from the participants compared to other more expert-led intervention techniques. The exact characteristics of the intervention protocol that contributed to these effects would require further research to disentangle but likely all factors had an influence.

The current body of evidence on household stress is based mostly on clinical populations such as children with behavioural problems, developmental delays, or chronic illness [39–42], or special interest family situations such as parents who are military servicemembers or incarcerated [43–45], including the few studies that have examined family stress over the course of an intervention program [46–49]. This study extends evidence in the literature by providing insight into the impact of a home-based health intervention on the family environment in a community-based non-clinical sample of families. Additionally, our inclusion of both mothers' and fathers' perceptions addresses a substantial gap in the literature [34]. The present study also includes follow-up beyond the post-intervention period to better understand the nature of these associations.

Despite this study's many strengths, there are some limitations that merit consideration. First, these analyses are based on a small cohort of families because the GFHS pilot was not designed as a fully powered study; thus, there is a risk that important effects were not identified. Second, with respect to the general stress measure, a single item may not be sufficient to capture the many dimensions of everyday stress. Third, our protocol is to ask only Parent 1 (defined as the first parent to enrol in the study) items relating to the household; as such, it is possible that perceptions of the home environment chaos may differ between cohabitants. Fourth, the majority of families in our sample identified as Caucasian and nearly half had an annual household income of over \$100,000, which limits the generalizability of our results. Additional research with a diverse sample of families is needed because the socio-cultural environment, including ethnic and economic factors, is an important consideration for parenting practices and family stress [50–52]. Finally, while it is most likely that any differences in stress due to the intervention would be evident in the post-intervention period, it is possible that the true nature of these associations requires a longer follow-up period to be discovered. Continued longer-term monitoring of the participants' stress may be an important consideration for our participants' retention in the study. Indeed, any family-focused or home-based intervention program

should consider how disruptions to the family dynamic may influence participants' willingness to adhere to the program.

#### **5. Conclusions**

The GFHS has several behaviour change goals aimed at preventing childhood obesity; however, reducing family stress levels was not among the primary intentions of the program. While these results show no differences in family stress between the intervention and control groups, the overall mean stress levels seen here indicate that families may benefit from intervention strategies specifically aimed at reducing family stress. Program designs that integrate family physical and mental health promotion should be further investigated. In conclusion, these results demonstrate a need for continued research into how home-based health interventions influence the family environment. In particular, there is a need for intervention programs that incorporate specific stress-reduction messaging into family health programs.

**Author Contributions:** Conceptualization, V.H. and J.H.; Formal Analysis, V.H.; Writing—Original Draft Preparation, V.H.; Writing—Review & Editing, V.H., G.D., J.H., and D.W.L.M.; Visualization, V.H.; Supervision, J.H. and D.W.L.M.; Project Administration, J.H. and D.W.L.M.; Funding Acquisition, J.H. and D.W.L.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially supported by the University of Guelph's Health for Life Initiative.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Appendix A**

**Table A1.** Linear regression results comparing intervention (two home visit and four home visit groups) and control groups with respect to stress levels at post-intervention and at 1-year follow-up after controlling for baseline, stratified by parent gender. Household chaos model analysed at the family level (one observation per household).



**Table A1.** *Cont*.

<sup>1</sup> Linear regression coefficient after controlling for baseline.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
