3.1.2. MET Dataset

The meteorological radars at MET collect data at every 5 min (288 acquisitions per day). The data that were used in the experiments were measurements of *composite reflectivity*, which is a derived (computed) radar product that aggregates the actual radar data for all elevations. Every collection is a matrix of floating point numbers and contains 2134 × 1694 = 3,614,996 data points, ranging from −33 to 80. All data that correspond to a single day are stored inside a single .NetCDF file (2134 × 1694 × 288 = 1,041,118,848 data points).

Figure 6 graphically presents the data points that are missing for all acquisitions during a day in the MET dataset. Each pixel in the picture represents 288 data points (all acquisitions during the day). A pixel is red when the data were missing for the entire day in that region. The left-hand picture illustrates the data matrix, while the right-hand figure shows the data when projected onto a map.

**Figure 6.** A visualization of data points that are missing for all acquisitions during a day from the MET dataset. Each pixel in the picture represents 288 data points (all acquisitions during a day). A pixel is colored in red when the data were missing for the entire day at that pixel. On the left, there is a visualization of the data matrix; on the right, there is a visualization of the data projected onto a map.

The available MET data presented some challenges in terms of applying deep learning methods. Approximately 50% of the data were missing (zero-value) due to various factors. Figure 7 depicts the number of missing data points during a day. Each pixel in the picture represents the number of times the data were present during the day. A pixel is colored in

dark red when the data were present for all acquisitions during a day and white/transparent when the data were missing for the entire day. On the left, there is a visualization of the data matrix; on the right, there is a visualization of the data projected onto a map. As shown in Figure 7, the data are never collected in some regions because those regions are not covered by the radars or the geographic topology prevents data collection. For other areas, as shown in Figure 7, data are sometimes present and sometimes not (data are temporarily unavailable at a given point because measurements are eliminated from the composite product, etc.).

A histogram of the non-missing values in the MET dataset for the entire region and for all acquisitions during a day is presented in Figure 8. As shown in the figure, the distribution of the actual values in the dataset was highly imbalanced, as for the NMA dataset (Section 3.1.1). Larger values were of more interest from a meteorological viewpoint as these indicated severe weather phenomena, but those were relatively rare. This severe imbalance was a challenge from a supervised learning viewpoint.

**Figure 7.** A visualization of the number of missing data points in the MET dataset during a day. Each pixel in the picture represents the number of times the data were present during the day. A pixel is colored in dark red when the data were present for all acquisitions during the day and white/transparent when data were missing for the entire day. On the left, there is a visualization of the data matrix; on the right, there is a visualization of the data projected onto a map.

**Figure 8.** A histogram of the non-missing values in the MET dataset for the entire region and for all acquisitions during a day. A logarithmic scale was used on the OY axis.
