**4. Results**

In this section, typical independent individual cases are selected to determine whether the classifications are correct. In addition, an overall analysis is used to assess the performance of the KNN algorithm.

### *4.1. Evaluation Method*

The KNN classification results were compared with the 2A23 product, and the results were evaluated based on the probability of detection (POD), false alarm rate (FAR), and cumulative success index (CSI):

$$\text{POD} = \frac{n\_s}{n\_s + n\_f} \,\text{\,\,\,}\tag{10}$$

$$\text{FAR} = \frac{n\_{fa}}{n\_s + n\_{fa}} \text{\textdegree} \tag{11}$$

$$CSI = \frac{n\_s}{n\_s + n\_f + n\_{fa}} \,\prime \tag{12}$$

In the above three formulas, *ns*, *nf* and *nfa* are the numbers of successful classifications, failed classifications and false alarm classifications, respectively. Success is counted when a method classification is similar to the PR 2A23 classification, failure is counted when a classification is not similar to the PR 2A23 class, and false alarm is counted when a pixel is classified opposite the PR 2A23 classification.

The POD can reflect the relationship between the number of successful classification points and the number of failed classification points; the higher the POD value is, the better the classification performance. The FAR can explain the proportion of false alarm points in the classification according to the number of correct points in the classification results. The lower the FAR value is, the better the classification performance. The CSI reflects the overall classification performance; it can explain the proportion of correctly classified points among all classified points, and when the CSI reaches a high value, the classification performance is satisfactory.

#### *4.2. K Value*

For a finite set classification, the classification error rate of the KNN tends to converge to a certain value as k increases [40]. When k is too large, the classification accuracy rate does not increase significantly, which results in wasted computational resources. When k is too small, the classification accuracy rate is low. Choosing the right k value helps improve the classification accuracy and reduce the calculation amount to improve the calculation speed.

Figure 2 shows the classification of an embedded convective process in the Guangzhou area at 05:28 (UTC) on 6 June, 2008, using the standardized Euclidean distance as the calculated distance. The effect of using different k values on the overall classification results is small. At the junction of different types of precipitation, the results of different k classifications are slightly different. When k is equal to 5, the boundary between stratiform and convective precipitation is rough, and when k is chosen to be 10 or more, the boundary is smooth.

Other cases from Anhui, Jiangsu and Wenzhou were selected for analysis. The results are shown in Table 2. Although the classification result boundary is rough when k is equal to 5, this value yields the highest POD and CSI and a low FAR among several different k values. When k is equal to 10, the smallest FAR is observed, although the POD is not high and the CSI is low. When k is greater than 10, the POD, FAR and CSI di ffer, although the di fference is not obvious. Thus, when k is equal to 5, the performance is obviously the best, therefore, the value of k is set to 5 in this paper.


**Table 2.** The POD, FAR and CSI for di fferent k values in the same case.

The classification of precipitation types for di fferent types of weather processes can fully reflect the KNN classification performance. Squall line cases, embedded convective cases and stratiform cases are selected for KNN classification analyses.

#### *4.3. Squall Line Case*

Figure 3 shows a squall line case. Figure 3a shows the 2A23 product. Two northeast-southwestoriented convective belts are classified within the scanning range. There are tiny gaps between the two band-shaped convective cells. Two northeast-southwest-oriented convective belts are classified within the scanning range. The cluster of convective cells is independent of the band-shaped cells. Outside the convective cells, stratiform precipitation covers large area. The southeastern part of Figure 3a is classified as an unknown type of precipitation. In this case, precipitation may occur, although the type of precipitation is unknown. Figure 3b shows the results of the KNN classification. There is a band-shaped northeast-southwest-trending convective cell, which is observed in the 2A23 product. However, the boundary between the two band-shaped convective cells is not obvious. In the northeast direction of the band-shaped convective cell, a cluster of convective cells is also classified, and the cluster shape is similar to that of the 2A23 product. There is also a massive convective cell in the northeast portion of the cluster of convective cells. In the 2A23 product, due to the sweep coverage, there are no corresponding data for this area. The northeast corner of the radar corresponds to the area classified as unknown in the 2A23 product. Because the KNN categorical variable data have no values in that area, no classification is provided. The southwest corner of Figure 3b is a void area due to the radar elevation angle.

**Figure 3.** A squall line in Lianyungang at 13:25 (UTC) on July 4, 2012: (**a**) is the classification of the 2A23 product, (**b**) is the KNN classification, (**c**) is VIL, and (**d**) is ref2km. The bold black line represents the boundary of the PR scan range.

Figure 3c shows the VIL. In Figure 3c, there is a northeast-southwest band-shaped high-value area. There are multiple independent high-value centers in the high-value area. The values of all these centers exceed 14 kg/m2. The value near the center also reaches or exceeds 4 kg/m2, and there is a block-shaped high-value area in the northeast of the band-shaped high-value area, the value of which exceeds 6 kg/m2. Additionally, in the northeastern part of the high-value area, there is an area with values exceeding 4 kg/m2. The VIL values in the other areas are less than 2 kg/m2. Figure 3d shows ref2km. The high-value area in the figure corresponds to the high-value area in Figure 3c, and the value of each high-value center exceeds 50 dBz. To the northeast of the band-shaped high-value area, there are also areas exceeding 40 dBz. The two high-value areas in the northeast direction of the band-shaped high-value area in Figure 3c,d are consistent with the area classified as convective by the KNN algorithm.

#### *4.4. Embedded Convective Case*

Figure 4 shows the classification results for an embedded convective scenario. In Figure 4a, an arched area in the center of the figure is classified as convective by the 2A23 product, and a large area on the west side of the arched convective area is classified as an unknown type of precipitation. There are small stratiform precipitation areas in the northwest corner and a large stratiform precipitation area on the east side of the arched convective cell. Due to the scope of the sweeping surface, there are no data available for the south side. The arched area in the center of Figure 4b is classified as convective precipitation. There is also a convective precipitation area outside the 2A23 product range, and there is a clear boundary between the two arched convective cells. There are large stratiform areas in the northeast portions of the two arched convective cells, and there are stratiform areas in the northwestern parts of the convective cells. The shape and location of the scattered stratiform areas are consistent with those in the 2A23 product. Most of the areas classified as unknown precipitation in the 2A23 product are due to missing values for the variables used for the classification. In Figure 4c, there are two arched high-value areas. The VIL values of the two high-value areas are greater than 4 kg/m2, and there are obvious gaps between the two arched high-value areas. The VIL of the interval area is between 2 kg/m<sup>2</sup> and 3 kg/m2. On the northeast side of the high-value area, the VIL is above 2 kg/m2, and in some other areas, the value is more than 3 kg/m2. These areas are classified as stratiform in the 2A23 product. There are scattered blocks with VIL values exceeding 1 kg/m<sup>2</sup> on the west side, the northwest side and the south side of the arched area, and the remaining VIL values are all below 0.5 kg/m2. In Figure 4d, the radar reflectivity at the corresponding position of the high-value area in Figure 4c exceeds 36 dBz, and the reflectivity in the northeastern area of the arched high-value area exceeds 24 dBz. The reflectivity in some of this area exceeds 30 dBz, and in the scattered block area near this arched area, the reflectivity also reaches or exceeds 24 dBz.

**Figure 4.** An embedded convective system in Fuyang at 01:41 (UTC) on 8 July 2007: (**a**) is the classification of the 2A23 product, (**b**) is the KNN classification, (**c**) is VIL, and (**d**) is ref2km. The bold black line represents the boundary of the PR scan range.

## *4.5. Stratiform Case*

Figure 5 shows the classification result of a stratiform case. In Figure 5a, a large northwest-southeasttrending band-shaped area is classified as a stratiform area by the 2A23 product, and a small stratiform block is classified on the northwest side of the band-shaped area. In addition, parts of this area are classified as unknown or no precipitation areas. The southern part of the figure is beyond the PR scanning range; thus, there are no data in this area for the 2A23 product. The north side of the solid black line in Figure 5c is within the PR satellite scanning range, and the area and shape of the region classified as stratiform in this range are consistent with those of the 2A23 product. In Figure 5c, the VIL value of the northwest-southeast-trending area exceeds 0.5 kg/m2, and the VIL value of the high-value area exceeds 1 kg/m2. The VIL values of other areas are less than 0.5 kg/m2. The reflectivity of the areas in Figure 5c,d is greater than 18 dBz, with some areas exceeding 24 dBz.

**Figure 5.** A stratiform precipitation system in Fuyang at 13:47 (UTC) on 7 June 2010: (**a**) is the classification of the 2A23 product, (**b**) is the KNN classification, (**c**) is VIL, and (**d**) is ref2km. The bold black line represents the boundary of the PR scan range.

#### *4.6. Stability of the Algorithm*

KNN can classify precipitation types well, but the effect of classifying continuous data is unknown. In fact, continuous data are more widely used and more meaningful. One-month continuous radar data from Lianyungang from 1 July 2012 to 31 July 2012 are used for continuous analysis, and Table 3 shows the result of the continuous analysis.


**Table 3.** Continuous analysis of Lianyungang in July 2012.

Table 3 shows the result of continuous analysis of Lianyungang in July 2012. The time period of 0:00-12:00(UTC) is daytime in Lianyungang, and the time period of 12:00-0:00(UTC) is evening in Lianyungang. Table 3 shows that both the stratiform and convective classification results are better in evening than in daytime; however, the differences between the daytime and evening results are small, and the classification results are stable in different time periods.

Different geographical conditions may have an impact on the type of precipitation. The precipitation types of Lianyungang, Fuyang and Guangzhou stations were therefore classified to analyze the influence of geographical conditions on KNN. Lianyungang and Fuyang are both located in the subtropical zone, Lianyungang is located near the sea and Fuyang is located inland. The impact of coastal conditions on classification can also be analyzed. Both Guangzhou and Lianyungang are near the sea, Lianyungang is located in the subtropical zone, and Guangzhou is located in the tropical zone; consequently, the influence of latitude conditions on classification can be analyzed. The comparisons of the three sites are shown in Table 4.


**Table 4.** Comparison of the classification results of different geographical conditions.

Table 4 shows the classification results under different geographical conditions. The classification results of Guangzhou have the best performance, and the classification results of Lianyungang have the worst performance. The POD values of the three sites are nearly the same for both precipitation types; Guangzhou has the lowest FAR and highest CSI. However, the CSI values of the three sites show few differences, and the results of classification are stable in different geographical conditions.
