*2.1. Dataset Explanation*

This study analyzes the data on eating habits and health to estimate the prevalence of obesity among persons from Mexico, Peru, and Colombia. The data were categorized using the values of Insufficient weight, normal weight, overweight level I, overweight level II, obesity type I, obesity type II, and obesity type III, thanks to the class variable NObesity (obesity level) assigned to the records. The dataset consisted of 2111 records and 17 attributes. The SMOTE filter and the Weka tool were used to artificially produce 77% of the data, while a website platform collected 23% of the data directly from users. The dataset is categorized into three parts: **Food intake indicators:** FAVC (frequent consumption of high-calorie foods), FCVC (frequent consumption of vegetables), NCP (number of meals), CAEC (intake of food between meals), CH2O (daily water intake), CALC (alcohol intake). **Body attribute**: TUE (time utilizing technological devices), FAF (regular exercise frequency), SCC (calorie-ingestion tracking), MTRANS (utilized for transportation). **Other attributes**: gender, age, height, weight, smoke, and family history.

$$BMI = \frac{Weight}{height^2} \tag{1}$$

Dataset attributes were categorized according to the mass body index as shown in Equation (1) for each individual; the results were compared with the data provided by the WHO and the Mexican normativity.


BMI is considered a 'lousy' sign relating to the proportion of body fat because BMI is dependent on age and does not count the fat on different body sites. Therefore, a detailed analysis of individual eating habits, physical activities, and other attributes is needed to understand obesity in a better way.
