**3. Results**

#### *3.1. Evaluations of Classification in Machine Learning*

Among all 374,327 cases analyzed, there were 3924 *drug <sup>D</sup>*1–*drug D*2–SJS combinations. Of these, 923 combinations were detected by all three algorithms—the additive model [18], the multiplicative model [18], and the chi-square statistics model [19]. In this study, these combinations were treated as "hypothetical" true data.

The evaluation of the analysis model is shown in Tables 3 and 4.

**Table 3.** The number of True positive, False positive, True negative, and False negative.


TP: True positive, FP: False positive, TN: True negative, FN: False negative.



*PPV: Positive predictive value, NPV: Negative predictive value.*

Table 3 shows the number of True positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN).

A total of 1793 combinations were detected by the previous subset analysis (*True positive*: 542, *False positive*: 1251). On the other hand, the newly proposed subset analysis detected 909 combinations of signals (*True positive*: 542, *False positive*: 367) (Table 3).

The detection accuracy shown in Table 4 was calculated from the values shown in Table 3.

In addition, the newly proposed subset analysis confirmed that the signal detection was improved with respect to the indicators of *Accuracy* (0.584 to 0.809), *Precision* (*PPV*) (0.302 to 0.596), *Specificity* (0.583 to 0.878), *Youden's index* (0.170 to 0.465), *F*-*measure* (0.399 to 0.592), and *NPV* (0.821 to 0.874) as compared with the signal detection in the previous subset analysis (Table 3).

The values of each indicator of the Ω shrinkage measure model were *Accuracy* (0.858), *Precision* (PPV) (0.756), *Recall* (*Sensitivity*) (0.583), *Specificity* (0.942), *Youden's index* (0.525), *F*-*measure* (0.658), and *NPV* (0.880) (Table 4).

#### *3.2. Cohen's Kappa Coe*ffi*cient*

The similarity between the detection results of the Ω shrinkage measure model and that of the newly proposed subset analysis was κ (95% CI): 0.375 (0.355–0.395), *<sup>P</sup>*positive: 0.502, and *<sup>P</sup>*negative: 0.870. The similarity was κ (95% CI): 0.355 (0.327–0.384), *<sup>P</sup>*positive: 0.678, and *<sup>P</sup>*negative: 0.674 when targeting three or more reports (Table 5).


*n*111: targeting three or more reports, κ: *Cohen's kappa coe*ffi*cient*, *<sup>P</sup>*positive: proportionate agreemen<sup>t</sup> for positive rating, *<sup>P</sup>*negative: proportionate agreemen<sup>t</sup> for negative rating.
