*4.2. Classification Models*

SVM, XGB, Random Forest, Gradient Boosting, and KNN were then used for classification purposes. In order to assess the overall performances of the models, as measured based

on the average values obtained for 10 runs, it is useful to analyze Table 3 in conjunction with the confusion matrices that were determined for each model, which are displayed in Figures 5–9. Table 3 indicates that the SVM and the XGB models are the most accurate, their accuracy being nearly the same. At the same time, SVM has the highest specificity, while XGB is the most sensitive model, all other models being significantly less specific or sensitive. SVM and XGB have the best (and comparable) Matthews correlation coefficient, while the coefficient determined for the other models is significantly smaller. The value of this coefficient is positive for all the models, which indicates positive correlations in all cases. The SVM and XGB models also have the highest ROC AUC, which has the same value (of 0.91) for both models. The ROC AUC being very high (very close to 1), we may conclude that these two models have a very good prediction rate.


**Table 3.** Standard performance metrics calculated for the machine learning models.

**Figure 5.** Confusion matrix for the SVM model.

27

**Figure 6.** Confusion matrix for the XGB model.

**Figure 7.** Confusion matrix for the Random Forest model.

**Figure 8.** Confusion matrix for the Gradient Boosting model.

**Figure 9.** Confusion matrix for the KNN model.

If we take into account that the tested models are tree-based models (XGB, Random Forest, and Gradient Boosting), decision boundary models (SVM), and non-parametric models (KNN), we may conclude that the decision boundary models performed best, followed by the tree-based models and the non-parametric models.

The confusion matrices (Figures 5–9) indicate that, except for the Gradient Boosting model, all the models classify the amphetamines with 100% accuracy. The Gradient Boosting model is not that far behind, with an accuracy of 85.71%. The main difference between the models, regarding the class of amphetamines, is related to the rate of false positives, which is 11.11% for the SVM model, 33.33% for the XGB model, 60% for the Random Forest model, 50.29% for the Gradient Boosting model, and 67.27% for the KNN model. In other words, the classification of amphetamines with the Random Forest, Gradient Boosting, and KNN models is only marginally better than a random guess.

The opioids are 100% correctly classified by the XGB model. The second-best correct classification rate (90%) is recorded for the SVM model, with 10% of the opioids being misclassified as negatives. The other models fail to assign the correct class identity for a significant number of opioids.

The cannabinoids are recognized as such with 100% accuracy only by the SVM model. The second-best model is the XGB model, the rest of the models often failing to distinguish them, especially from the opioids. The other models have significantly lower performances in the case of the cannabinoids as well.

Taking into account both the accuracy and the misclassification rates, the negatives seem to be the hardest to classify correctly for all models, most probably because of the large variety of substances that are forming this class in the dataset.

The availability of screening tools able to screen for illicit substances harmful to humans in a fast and reliable way is essential for public safety. The models presented in this paper can work in harmony with the currently recommended methodology of designer drug detection.

We explored the use of five distinct and highly different multivariate models and discussed their classification performance, next to the interpretation of the confusion matrix for addressing the specifics of each class of substances used in the classification. All the models are more specific than sensitive (see Table 3).

Both SVM and XGB models yielded accuracy results close to other systems previously built for screening for drugs of abuse [28,29]. However, it should be noted that the later systems were built to detect only one (cannabinoids) [28] or two (hallucinogenic amphetamines and cannabinoids [29]) classes of illicit drugs. In our case, the balanced accuracy is calculated for three classes of positives (amphetamines, opioids, and cannabinoids). Hence, the results obtained with SVM and XGB may be considered very good, as both models screen simultaneously for a larger number of classes of drugs of abuse, i.e., (2C-x, DOx, and NBOMe) hallucinogenic amphetamines, cannabinoids, and opioids. Moreover, it is reasonable to expect that their accuracy will increase once more ATR-FTIR spectra of substances belonging to the targeted classes of compounds become available.

From the point of view of overall accuracy, the best-performing model was SVM. As forensic screening systems designed to operate ATR-FTIR field (portable) analytical instruments, the developed models should be able to perform cost-effective, non-destructive, real-time, direct, on-site tests. However, the main objective of these models is to narrow down the number of samples further subjected to in-depth analysis with more sophisticated stationary analytical instruments in the laboratory. Only the samples tested on-site and assigned a positive class identity (hallucinogenic amphetamines, cannabinoids, and opioids) will be analyzed in the laboratory in order to determine their individual identity (not only their class membership).

Hence, the essential feature of such a screening system is its efficiency in detecting positives. In our case, no hallucinogenic amphetamine, cannabinoid, or opioid should be misclassified as a (false) negative. For this reason, XGB is a better fit for the purpose than SVM, as XGB yields no false negatives. While 10% of the opioids are erroneously classified as negatives by SVM, no amphetamine, opioid, or cannabinoid is misclassified as a negative by XGB.

It is true that XGB has a higher rate of misclassified positives than the SVM model. XGB misclassifies 33% of the negatives as amphetamines and 20% of the cannabinoids as opioids, while SVM misclassifies only 11.11% of the negatives as amphetamines and 11.11% as opioids. However, the false positives (false hallucinogenic amphetamines, cannabinoids, and opioids), although also not desirable, are less important. As mentioned before, their individual identity (molecular structure) will be determined during the tests subsequently performed in the laboratory, based on a series of analytical methods that are recommended for each class of drugs of abuse by specialized international agencies such as the United Nations Office on Drugs and Crime [30,31]. In conclusion, SVM performs better than the other tested models, but XGB is a better choice from a forensic point of view.

#### **5. Conclusions**

The high classification accuracy of the presented models indicates that artificial intelligence-based strategies represent an important route to follow in the context of automatizing the processing of ATR-FTIR spectra during field operations. The model which performs best under the classification strategy that takes into account only the overall accuracy is SVM. However, as these are forensic tools, the classification strategy should also consider the false negative rate. For this reason, XGB was found to be the best choice, as it has a significantly lower false negative rate, while its overall accuracy is only very slightly lower than that of SVM.

We believe that the screening systems presented in this paper still have an important potential for improvement, especially in terms of distinguishing better between the classes of positives (amphetamines, cannabinoids, and opioids). We aim to continue our work by using strategies such as the following: increasing the number of positives included in the training set; applying the classification algorithms not to their spectra, but to the PCA or ICA scores derived from these spectra [32]; preprocessing the input with a feature weight that enhances the variables having the largest modeling and/or discriminating power [33]; and using as input only the most relevant variables, as selected with techniques such as Genetic Algorithms (GA) [29].

**Author Contributions:** Conceptualization, I.-F.D., S.R.A. and M.P.; methodology I.-F.D., S.R.A. and M.P.; software, I.-F.D. and S.R.A.; validation, I.-F.D., S.R.A. and M.P.; formal analysis, I.-F.D., S.R.A. and M.P.; investigation, I.-F.D., S.R.A. and M.P.; resources, I.-F.D. and S.R.A.; data curation, I.-F.D.; writing—original draft preparation, I.-F.D., S.R.A. and M.P.; writing—review and editing, I.-F.D., S.R.A. and M.P.; visualization, I.-F.D., S.R.A. and M.P.; supervision, M.P.; project administration, M.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** The ATR-FTIR spectra used in this study were extracted from the Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) public spectral library (www.swgdrug.org).

**Acknowledgments:** The authors appreciate the "Wiley Online Library" and the forensic spectral data science platform SWGDRUG, important and useful tools used to develop the system architectures presented in this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
