*2.1. Study Area Description and Dataset Analysis*

The Jinjiang River is 182 km long, with a watershed area of 5629 square kilometers, an average slope of 0.19%, and an average annual runoff of 5.13 billion cubic meters. It is the largest river in Quanzhou and the third largest river in Fujian Province. The following Figure 1 shows the geographical location of the Jinjiang River.

**Figure 1.** Geographical overview of Jinjiang River basin.

The Jinjiang River is divided into two tributaries, the east stream and the west stream, and the source of the Jinjiang River is the west stream, which is 153 km long with a watershed area of 3101 square kilometers and an average annual runoff of 3.65 billion cubic meters. The east stream of the Jinjiang River originates at the southern foot of Xueshan Mountain in Jindou, Yongchun. The river is 120 km long, with a watershed area of 1917 square kilometers and an average annual runoff of 1.4 billion cubic meters. Quanzhou City, through which the Jinjiang River flows downstream, is one of the most economically developed regions in Fujian Province. Quanzhou, located in the southeastern part of Fujian Province, is one of the three central cities in Fujian Province, and its total economic output has remained the first in Fujian Province for 22 consecutive years. In

2020, the city's population was over 7 million, ranking first in the province in terms of population size. As the Jinjiang River basin covers 53.8% of Quanzhou's land area, water resources are very important for the city's sustainable development. At the same time, there has been a serious pollution problem in the Jinjiang River basin [45,46]. The traditional industrial development model has caused great damage to local sustainable development, the pressure on the water environment is increasing, pollution from some enterprises is rebounding, the construction of environmental protection infrastructure is lagging behind, and the proportion of domestic pollution sources is increasing day by day. Therefore, the accurate prediction of water quality in the Jinjiang River basin will provide crucial decision data support for future pollution control programs.

The dataset used in this study was selected from the weekly report of automatic water quality monitoring at the Shilong section of Jinjiang River basin. Among the many water quality evaluation indexes, we selected dissolved oxygen (DO), permanganate index (CODMn), ammonia nitrogen (NH3-N), and TP (total phosphorus), which are the four most representative indexes of the research object. The time of data collection was from 7 January 2013 to 21 June 2021. The data update cycle occurred once a week, with a total of 443 groups of data. We used the first 421 groups of data as the training set and the last 22 groups as the test set. The images of the dataset are shown in Figure 2.

**Figure 2.** The image of dataset. (**a**) Dissolved oxygen (DO); (**b**) CODMn; (**c**) NH3-N; (**d**) Total phosphorus (TP).

Next, the dataset was analyzed and the missing values were found. The analysis results are shown in Table 2.

Then, we used Pearson's correlation coefficient to analyze the correlation of each dataset. The results are shown in the Table 3. From the above correlation analysis table, it can be seen that the DO dataset was negatively correlated with the CODMn, TP, and NH3-N datasets; the CODMn dataset showed a weak positive correlation with the TP and

a significant positive correlation with the NH3-N dataset; and the TP dataset showed a significant positive correlation with the NH3-N dataset.


**Table 2.** Descriptive statistics of experimental dataset.

**Table 3.** Correlation coefficients for each dataset.

