**1. Introduction**

In recent decades, industrial emissions, domestic waste, and the overuse of pesticides and fertilizers have caused serious environmental pollution, which has been confirmed as an important factor causing alarming deterioration in public health [1–5]. In particular, food contamination arising from soil and water pollution has been reported to be involved in almost all types of gastrointestinal diseases [6–8]. However, modeling the effects of food contamination on gastrointestinal morbidity is still a challenging task because the pathogenic mechanisms of gastrointestinal diseases are very complex, the number of contaminants is large, and the pathogenic roles of contaminants in the diseases are often unknown or uncertain.

There are numerous studies on the effects of environmental pollution on public health. A majority of studies have been devoted to the relationships between air pollution and respiratory diseases. Using logistic regression and weighted linear regression, Zhang et al. [9] examined the association between children's respiratory morbidity prevalence and district-specific ambient levels of main air pollutants in four Chinese cities, and their results evidenced that morbidity prevalence was positively associated with the levels of NOx, SO2, and coarse particles. Jayaraman and Nidhi

[10] used a generalized additive Poisson regression model to evaluate the association between air pollutants and daily variations in respiratory morbidity in Delhi in 2004–2005. Based on a log-linear Poisson regression model, Sousa et al. [11] performed time-series analysis to assess the impact of air pollution on emergency hospitalization for respiratory disease in Rio de Janeiro, Brazil, in 2000–2005. Zhao et al. [12] used a time-series model with a quasi-Poisson link to examine the association between PM pollution and respiratory morbidities in Dongguan City, China, in 2013–2015. Qiu et al. [13] used a similar approach to estimate the short-term effects of ambient air pollutants (PM10, PM2.5, NO2, and SO2) on hospital admissions of overall and cause-specific respiratory diseases in 17 cities of Sichuan Province, China, during 2015–2016. Although such regression models can demonstrate the associations between pollution and diseases, they are often incapable of providing sufficiently accurate morbidity prediction for healthcare management.

To overcome the limitation of classical linear and logistic models with multiple variables to handle the multifactorial effect, Bibi et al. [14] used an artificial neural network (ANN) to predict the effect of atmospheric changes on emergency department visits for respiratory symptoms. The results showed that the average prediction error of the ANN was much less than the classical models on the test set. Wang et al. [15] applied the Granger causality method to identify the main air pollutants correlated with the mortality of respiratory diseases, and then constructed an ANN model for respiratory mortality prediction in Beijing during 2005–2008, which also achieved higher accuracy than classical correlation-analysis methods. Junk et al. [16] used an ANN to predict the mortality rates of respiratory diseases associated with air pollution under different weather conditions in Western Europe. Moustris et al. [17] developed an ANN model to predict the weekly number of childhood asthma admission at the greater Athens area in Greece from ambient air-pollution data during 2001–2004. Zhu et al. [18] studied the effects of air pollutants on lower respiratory disease in Lanzhou City, China, during 2001–2005, and constructed an ANN based on a group method of data handling to forecast the number of patients in a hospital. Sundaram et al. [19] developed an Elman neural network to predict respiratory mortality and cardiovascular mortality from a set of air-pollution indicators, and the results showed that the dynamic ANN showed good performance on time-series prediction. Recently, Liu et al. [20] employed long short-term memory recurrent neural networks to forecast influenza trends from multiple data sources, including virologic surveillance, influenza geographic spread, Google trends, climate and air pollution; their results also exhibited high prediction accuracy.

Although it is known that many diseases are related to food contamination, studies on their correlations are relatively few, mainly because the number of food contaminations is much larger than the number of air pollutants, and thus classical regression methods and shallow ANNs become inefficient in handling complex correlations in such a high-dimensional feature space. Recently, deep neural networks (DNNs) are a powerful tool for modeling complex probabilistic distributions over a large number of influence factors by automatically discovering intermediate abstractions, layer by layer. Song et al. [21] developed a DNN based on a denoising autoencoder [22] to predict gastrointestinal-infection morbidity from food-contamination data in four counties in China during 2015–2016, and the results showed that the deep-learning model had significantly higher prediction accuracy than shallow ANNs. However, their work only concerned the morbidity of all acute gastrointestinal infections, i.e., it neither considered other gastrointestinal diseases such as chronic gastritis and gastrointestinal tumors, nor did it differentiate the morbidities of different gastrointestinal infections, such as acute gastritis and dysentery.

This study investigates the effects of food contamination on six main gastrointestinal diseases, acute gastroenteritis, chronic gastroenteritis, gastrointestinal ulcers, gastrointestinal tumors, food poisoning, and other acute gastrointestinal infections. We employed five methods, multiple linear regression (MLR), a three-layer feed-forward ANN, a deep belief network (DBN) [23], a deep autoencoder (DAE), and a deep denoising autoencoder (DDAE) [22], for correlation analysis and gastrointestinal-morbidity prediction. For each of the last three deep-learning methods, we respectively constructed two models, one using the basic gradient-based training algorithm and the other using an evolutionary training algorithm. Results showed that the deep-learning models achieved significantly higher accuracies than the MLR and shallow ANN models, and the DDAE with evolutionary training exhibited the highest prediction accuracy.
