**1. Introduction**

Accurate weather forecasting is crucial for the development of society and economy, and human activities and safety. With the rapid development of atmospheric modeling, observation systems, and high-performance computing, numerical weather forecasting capability and accuracy have been improved significantly [1]. Nevertheless, due to the chaotic nature of the weather processes and unavoidable uncertainties in various numerical model components, numerical weather forecasts contain significant errors. Therefore, a correction of model forecast errors is necessary to improve the applications of the model outputs. Several statistical post-processing techniques have been developed for model forecast correction. Among them, model output statistics (MOS) [2] and perfect procedures (PP) [3] are widely used in the current numerical model forecast corrections. While the PP approach achieves a correction by establishing a linear statistical relationship between observations and the NWP model analysis, the MOS method pairs observation data with

**Citation:** Qin, Y.; Liu, Y.; Jiang, X.; Yang, L.; Xu, H.; Shi, Y.; Huo, Z. Grid-to-Point Deep-Learning Error Correction for the Surface Weather Forecasts of a Fine-Scale Numerical Weather Prediction System. *Atmosphere* **2023**, *14*, 145. https:// doi.org/10.3390/atmos14010145

Academic Editors: Xiaolei Zou and Zhemin Tan

Received: 8 December 2022 Revised: 28 December 2022 Accepted: 30 December 2022 Published: 9 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the output of NWP and then obtains the correction based on linear regression. In addition, the Kalman filter technique has also been applied for bias correction. Unlike MOS, the Kalman filter technique adjusts its filter coefficients in real time [4,5]. An analog ensemble method proposed by [6] showed an improved capability and has been successfully applied for wind and solar energy forecasting [7–9].

In addition to the statistical model output post-processing, several machine learning techniques have been investigated and have demonstrated benefits and great potential [10]. Li et al. [11] proposed a model output machine learning scheme (MOML) that uses multiple linear regression as well as random forest methods to correct the 2 m temperature in the ECMWF model for the Beijing area. Compared with MOS, which works for single-station site correction, MOML incorporates the spatial and temporal structure of the grid data. Cho et al. [12] used machine learning methods including random forests, support vector regression, artificial neural networks, and multimodel ensembles to establish statistical relationships between predictors and predictands to correct the model forecasts of extreme temperatures in urban areas, demonstrating some ability of the machine learning algorithms for modeling nonlinearities of the weather processes.

In the last decade, convolution neural network (CNN)-based deep-learning technology has made significant strides and offers a natural upgrade to the traditional model output post-processing methods. Rasp et al. [13] proposed a neural network-based model for correcting the 2 m temperature model forecasts in Germany. Han et al. [14] proposed a CU-net model to correct the gridded forecasts of four weather variables of the European Centre for Medium-Range Weather Forecast Integrated Forecasting System global model (ECMWF-IFS): 2 m temperature, 2 m relative humidity, 10 m wind speed, and 10 m wind direction. Their approach turned post-processing into an image transformation problem in the context of image processing. Zhang et al. [15] constructed Dense-CUnet and Fuse-CUnet models based on the CU-net model proposed by Han et al. [14]. By introducing a dense convolution module and a variety of meteorological elements and terrain features into the model, they were able to improve the results of Han et al. [14].

In the last decade, fine-grid numerical weather forecasts with grid intervals of 1–3 km became popular. It is very desirable to explore the deep-learning approaches to extract the meso- and small-scale features of weather circulations simulated by high-resolution numerical models and applied them for model forecast error correction. In theory, multiscale features of weather circulations contain more information about the model forecast errors that the traditional error correction models, which were based on single-point timesequence data, could not include. In this study, we developed a grid-to-multipoint (G2N) deep-learning model for correcting the 2 m temperature, 2 m relative humidity, and 10 m wind speed forecasts of a rapid-updating high-resolution weather model (named PRUFS: Precision Rapid Updated Forecast System) at multiple weather stations in East China. Sensitivity tests were conducted to study the impact of the scales of the input area (the model grids) and the target area, i.e., the number of target stations. The former helps to determine the scale of the mesoscale circulation features for the optimal error corrections for the target stations and the latter assesses the effect of the number of target stations for simultaneous error correction with multitasking learning.

#### **2. Data and Method**

#### *2.1. Data Description*

PRUFS is a precision rapid update forecasting system based on the U.S. Weather Research and Forecasting Model (WRF) and four-dimensional data assimilation (FDDA) technology [16]. The model uses the analysis and forecast fields from the NOAA/NCEP Global Forecast System (GFS) to generate boundary conditions and assimilated various observations in the region during the model initialization. PRUFS runs with a four-nested level with a 3 km grid covering east-central China (Figure 1) since the beginning of 2020. The model system runs at hourly cycles and in each cycle, it generates 24 h forecasts, outputting at hourly intervals.

**Figure 1.** The 3 km horizontal resolution area of PRUFS and the distribution of the 311 automatic weather stations (black dots). The background color shows the height of the terrain, and the red star is the station "Jintan", to be discussed in the later section.

The weather observations were obtained from the state standard ground-based weather stations of the China Meteorological Administration. In this paper, the grid forecast of the PRUFS 3 km domain is corrected by using ground-based meteorological station data. The PRUFS model output and observations are collected for the period from August 2020 to December 2021. The data during the equipment maintenance period September–October 2021 are excluded. The selected computational domain is (*lon* ∈ [113.5◦ E, 125.5◦ E], lat ∈ [27.3◦ N, 36.3◦ N]), with 301 × 401 grid points. The observation sites and topographic information are shown in Figure 1. The meteorological elements to be corrected are the 24 h hourly forecasts of 2 m temperature (*T*2), 2 m relative humidity (*RH*2), and 10 m wind speed (*W*10), generated by PRUFS. With 24 forecast cycles each day, there is a total of 576 model samples (i.e., 24 × 24) per day.

### *2.2. G2N Model*

Unlike the traditional MOS and PP models that are based on individual station timesequence weather observations and the model forecast values interpolated at the weather stations (i.e., One-2-One), G2N uses the two-dimensional gridded meteorological information of the model forecasts. Thus, G2N can exploit the two-dimensional gridded meteorological structures (G) of the model forecasts for multiple-site (N sites) weather forecast error correction (i.e., G2N). The gridded model forecast variables can be considered as images in the field of image processing, with the forecast at each grid point corresponding to a pixel. Therefore, the characteristics of meteorological structures can be extracted using image feature engineering technology.

AlexNet is among the simple and effective image-processing deep-learning network models based on a convolutional neural network (CNN). It was first proposed by Krizhevsky et al. [17]. AlexNet consists of five convolutional layers, six pooling layers, and three fully connected layers. The AlexNet model contains a local response normalization layer (LRN) for amplification or suppression of neural activation, and it works together with Dropout methods to prevent the network from overfitting. We construct G2N based on the AlexNet framework.

The change in data distribution due to a change in the network parameters during training is called Internal Covariate Shift [18]. In AlexNet, to avoid the problem of Internal Covariate Shift that slows down convergence and even degrades network performance, a batch normalization [19] was introduced to replace LRN and Dropout, making the network more robust to the changes in the network parameters and activation functions during the training. BN also solves the problem of gradient disappearance and reduces the negative impact of ICS. The BN layer averages the input values of neurons in the layer to redistribute them to a normal distribution with a mean of 0 and a variance of 1. This allows the increasingly distorted distribution to return to the standard distribution. In this study, to adapt AlexNet for weather variables processing, we replaced LRN and Dropout with BN layers to use it as the core of G2N.

The structure of the G2N model is shown in Figure 2a. G2N takes 2D grid model data as input and has 6 convolutional layers, 4 pooling layers, and 3 fully connected layers. The convolutional layer and pooling layer play a role in extracting 2D features of the model forecasts and the fully connected layer integrates the local information and flattens the feature information. Firstly, G2N convolves and pools the input weather forecast data several times to extract the number of multiscale features. Then, three full-connected operations are performed to finally get the correction results at the weather stations. Each convolutional layer in the figure is followed by a BN layer operation, and then the ReLU activation function is connected. In the fully connected layer, ReLU is also used as the activation function.

**Figure 2.** Multisite (G2N, (**a**)) and single-site (G2-One, (**b**)) forecast error correction models for grid forecasting. (**a**) Whole grid area as input for multisite correction of 311 sites. (**b**) Different proportions of grid regions as input for single-point correction experiments.

With the G2N grid-to-station deep-learning architecture, the sizes of model grid forecasts (G) and the numbers of sites (N, weather stations) to be corrected are the two most important model configuration parameters to be considered. Thus, we conducted extensive sensitivity modeling experiments to study the impact of different G and N. Among them, one special case is N = 1, where we let the model work to correct the forecast at only a single site with an input of the model 2D grid data. This is named G2-One (Figure 2b) and is used to study the influence of the G size on the correction results. When changing the size of the input 2D grid data for the G2-One tests, a proper simplification of the G2N model is needed, given in Figure 2b. Adaptive pooling is used in the last layer of the network model, allowing the model to receive flexible sizes of inputs.

In the training process, it is easy to appear over-fitting phenomena with small training errors and large generalization errors. Regularization is a method to solve the overfitting problem in machine learning. The model was trained using L2 regularized weight decay to control overfitting [20]. Adam was chosen as the optimizer for the gradient descent process of the neural network. The loss function is defined as the mean squared error (*MSE*):

$$Loss = MSE = \frac{1}{M} \frac{1}{N} \sum\_{m=1}^{M} \sum\_{n=1}^{N} (output\_n^m - observer\_n^m)^2 \tag{1}$$

where *m* and *n* represent the sample number and site number and *M* and *N* the batch size and site number, respectively. Each epoch traverses the training set's data multiple times, updating the neural network's parameters with each iteration's batch size. The loss functions of the training and validation datasets are computed after each epoch. In the later examples, all calculations are terminated after 200 epochs of training.

#### *2.3. Data Pre-Processing and Dataset Partitioning*

The PRUFS model output and surface observations from August 2020 to August 2021 were selected to construct the training and validation set and the real-time operational data of PRUFS and surface observations from November and December 2021 were used as the test dataset to evaluate the G2N model. The input data of G2N are the PRUFS hourly gridded 0–24 h forecasts, and the training labels are the surface weather observations at the corresponding time of each forecast of a given cycle and forecast time. A bilinear interpolation method was used to interpolate the PRUFS model forecast to 311 surface weather stations to match the observations (Figure 3). The sample data with empty or invalid values were rejected. The interpolated PRUFS forecasts, G2N outputs, and observations are used to calculate the loss function during the G2N model training and evaluate the model performances.

**Figure 3.** Flow chart of data pre-processing.

After the pre-processing, the number of valid labeled data samples is 167,683,000. The dataset was divided into training and validation sets at an 8:2 ratio. To prevent "information leakage", the 20% validation sets are randomly selected in blocks of continuous 24 forecast cycles, i.e., a whole-day period.

#### *2.4. Model Evaluation Statistics*

The model evaluation is conducted by computing the root mean square error (RMSE), systematic bias (BIAS), and the Pearson correlation coefficient (CC), which are the most used metrics for weather forecasting. The root mean square error is calculated as

$$RMSE = \sqrt{MSE} = \sqrt{\frac{1}{M} \frac{1}{N} \sum\_{m=1}^{M} \sum\_{n=1}^{N} \left(output\_n^m - observer\_n^m \right)^2} \tag{2}$$

The systematic bias (BIAS) formula is

$$BIAS = \frac{1}{M} \frac{1}{N} \sum\_{m=1}^{M} \sum\_{n=1}^{N} \left(output\_n^m - observation\_n^m \right) \tag{3}$$

The Pearson correlation coefficient (CC) formula is

$$\text{Corr}(output, observer) = \frac{\sum (output - \overline{output}) \left( observer - \overline{observed} \right)}{\sqrt{\sum (output - \overline{output})^2 \left( observer - \overline{observed} \right)^2}} \tag{4}$$
