Uncertainty Quantification of Machine Learning Model Performance via Anomaly-Based Dataset Dissimilarity Measures †
Abstract
:1. Introduction
- An Uncertainty Quantification (UQ) method aimed at predicting the level at which an ML model will perform, and the uncertainty associated with such a performance prediction, using data dissimilarity measures;
- A novel family of data dissimilarity measures based on anomaly detection algorithms, which are computed from the features represented by ANN activation values.
2. Materials and Methods
2.1. Related Work
2.1.1. Measures of ML Performance
2.1.2. Measures of Dataset Dissimilarity
2.1.3. Relationship between Dataset Dissimilarity Measures and Anomaly Detection Algorithms
2.2. A UQ Method for Model Performance Prediction Based on Data Dissimilarity Measures
2.3. Anomaly-Based Dataset Dissimilarity (ADD) Measures
2.3.1. Description of ADD Measures
2.3.2. Feature Extraction Process
- (i)
- A raw dataset is fed to a CNN, i.e., a chosen ANN. The activation values output by a fixed subset of CNN neurons are recorded for each data instance. Each data instance is thus effectively mapped to a feature vector, each of whose components corresponds to the activation value output by a given neuron. This is an intermediate representation of the raw dataset;
- (ii)
- A transformation is applied to the aforementioned feature vectors to determine a further feature vector representation. This is the secondary representation of the raw dataset.
- Principal component analysis accounts for linear interdependencies between the variables corresponding to CNN neuron activation values;
- PCA analysis leads to a dimensionality reduction of the feature vector (i.e., data compression) which, in turn, lowers the computational costs of the subsequently applied HBOS algorithm.
2.3.3. Anomaly Scores
2.3.4. Anomaly-Based Dataset Dissimilarity—Formula
2.4. Materials and Experimental Setting
2.4.1. Datasets
- A training set comprising 51,000 images;
- A validation set comprising 9000 images;
- A test set comprising 10,000 images.
- A training set comprising 50,000 images;
- A validation set comprising 9000 images;
- A test set comprising 1000 images.
2.4.2. Neural Networks
3. Results
- The ability of the proposed ADD measures to indicate progressively greater dissimilarity when applied to a series of datasets which have been systematically transformed to a progressively greater extent;
- The applicability of the ADD measures to the UQ method designed to predict ML model performance and quantify the associated uncertainty, as outlined in Section 2.2.
3.1. Numerical Results and Evaluation: The Relationship between the Magnitude of Image Transform Parameters and ADD Measures
3.2. Numerical Results and Evaluation: The Relationship between ADD Measures and CNN Classification Performance
3.3. Numerical Results and Evaluation: UQ Method for Predicting CNN Performance Using ADD Measures
4. Discussion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kumar, Y.; Komalpree, K.; Gurpreet, S. Machine learning aspects and its applications towards different research areas. In Proceedings of the International Conference on Computation, Automation and Knowledge Management, Dubai, United Arab Emirates, 9–10 January 2020. [Google Scholar]
- Pugliese, R.; Regondi, S.; Marini, R. Machine learning-based approach: Global trends; research directions, and regulatory standpoints. Data Sci. Manag. 2021, 4, 19–29. [Google Scholar] [CrossRef]
- Siddique, T.; Mahmud, M.S.; Keesee, A.M.; Ngwira, C.M.; Connor, H. A survey of uncertainty quantification in machine learning for space weather prediction. Geosciences 2022, 12, 27. [Google Scholar] [CrossRef]
- Amodei, D.; Olah, C.; Steinhardt, J.; Christiano, P.; Schulman, J.; Mané, D. Concrete problems in AI safety. arXiv 2016, arXiv:1606.06565. [Google Scholar]
- Cobb, A.D.; Jalaian, B.; Bastian, N.D.; Russell, S. Toward safe decision-making via uncertainty quantification in machine learning. In Systems Engineering and Artificial Intelligence; Springer: Cham, Switzerland, 2021. [Google Scholar]
- Abdar, M.; Pourpanah, F.; Hussain, S.; Rezazadegan, D.; Liu, L.; Ghavamzadeh, M.; Fieguth, P.; Cao, X.; Khosravi, A.; Acharya, U.R.; et al. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Inf. Fusion 2021, 76, 243–297. [Google Scholar] [CrossRef]
- Hond, D.; Asgari, H.; Jeffery, D.; Newman, M. An integrated process for verifying deep learning classifiers using dataset dissimilarity measures. Int. J. Artif. Intell. Mach. Learn. 2021, 11, 1–21. [Google Scholar] [CrossRef]
- Incorvaia, G.; Hond, D.; Asgari, H. Uncertainty quantification for machine learning output assurance using anomaly-based dataset dissimilarity measures. In Proceedings of the International Conference on Artificial Intelligence Testing, Athens, Greece, 17–20 July 2023. [Google Scholar]
- Kendall, A.; Yarin, G. What uncertainties do we need in bayesian deep learning for computer vision? In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Harakeh, A.; Smart, M.; Waslander, S. Bayesod: A bayesian approach for uncertainty estimation in deep object detectors. In Proceedings of the International Conference on Robotics and Automation, Paris, France, 31 May–31 August 2020. [Google Scholar]
- Le, M.T.; Diehl, F.; Brunner, T.; Knoll, A. Uncertainty estimation for deep neural object detectors in safety-critical applications. In Proceedings of the International Conference on Intelligent Transportation Systems, Maui, HI, USA, 4–7 November 2018. [Google Scholar]
- Martinez, C.; Potter, K.M.; Smith, M.D.; Donahue, E.A.; Collins, L.; Korbin, J.P.; Roberts, S.A. Segmentation certainty through uncertainty: Uncertainty-refined binary volumetric segmentation under multifactor domain shift. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
- Xu-Darme, R.; Girard-Satabin, J.; Hond, D.; Incorvaia, G.; Chihani, Z. Interpretable out-of-distribution detection using pattern identification. arXiv 2023, arXiv:2302.10303. [Google Scholar]
- Combalia, M.; Hueto, F.; Puig, S.; Malvehy, J.; Vilaplana, V. Uncertainty estimation in deep neural networks for dermoscopic image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Dusenberry, M.W.; Tran, D.; Choi, E.; Kemp, J.; Nixon, J.; Jerfel, G.; Heller, K.; Dai, A.M. Analyzing the role of model uncertainty for electronic health records. In Proceedings of the ACM Conference on Health, Inference, and Learning, Toronto, ON, Canada, 2–4 April 2020. [Google Scholar]
- Licata, R.; Mehta, P. Uncertainty quantification techniques for data-driven space weather modeling: Thermospheric density application. Sci. Rep. 2022, 12, 7256. [Google Scholar] [CrossRef] [PubMed]
- Moosavi, A.; Rao, V.; Sandu, A. Machine learning based algorithms for uncertainty quantification in numerical weather prediction models. J. Comput. Sci. 2021, 50, 101295. [Google Scholar] [CrossRef]
- Ott, M.; Auli, M.; Grangier, D.; Ranzato, M.A. Analyzing uncertainty in neural machine translation. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Xiao, Y.; Wang, W. Quantifying uncertainties in natural language processing tasks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019. [Google Scholar]
- Dong, X.; Guo, J.; Li, A.; Ting, W.T.; Liu, C.; Kung, H.T. Neural mean discrepancy for efficient out-of-distribution detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Mahjour, S.K.; da Silva, L.O.M.; Meira, L.A.A.; Coelho, G.P.; Santos, A.A.d.S.d.; Schiozer, D.J. Evaluation of unsupervised machine learning frameworks to select representative geological realizations for uncertainty quantification. J. Pet. Sci. Eng. 2021, 209, 109822. [Google Scholar] [CrossRef]
- Angermann, C.; Haltmeier, M.; Siyal, A. Unsupervised joint image transfer and uncertainty quantification using patch invariant networks. In Proceedings of the Computer Vision—ECCV 2022 Workshops, Paris, France, 2–6 October 2023. [Google Scholar]
- Kahn, G.; Villaflor, A.; Pong, V.; Abbeel, P.; Levine, S. Uncertainty-aware reinforcement learning for collision avoidance. arXiv 2017, arXiv:1702.01182. [Google Scholar]
- Metelli, A.; Likmeta, A.; Restelli, M. Propagating uncertainty in reinforcement learning via wasserstein barycenters. In Proceedings of the 32nd Annual Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K. On calibration of modern neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 1321–1330. [Google Scholar]
- Zadrozny, B.; Elkan, C. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Proceedings of the International Conference on Machine Learning ICML, Williamstown, MA, USA, 28 June–1 July 2001. [Google Scholar]
- Kubat, M.; Kubat, J. An Introduction to Machine Learning; Springer International Publishing: Cham, Switzerland, 2017. [Google Scholar]
- Fawcett, T. An introduction to ROC analysis. Pattern Recogn. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
- Venturini, G.; Garcıa, A. Statistical Distances and Probability Metrics for Multivariate Data, Ensembles and Probability Distributions. Ph.D. Thesis, Universidad Carlos III de Madrid, Getafe, Spain, 2015. [Google Scholar]
- Markatou, M.; Chen, Y.; Afendras, G.; Lindsay, B.G. Statistical distances and their role in robustness. In New Advances in Statistics and Data Science; Springer International Publishing: Cham, Switzerland, 2017. [Google Scholar]
- Gretton, A.; Borgwardt, K.; Rasch, M.; Schölkopf, B.; Smola, A. A kernel method for the two-sample-problem. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems (NIPS 2006), Vancouver, BC, Canada, 4–7 December 2006; Volume 19. [Google Scholar]
- Lee, K.; Lee, K.; Lee, H.; Shin, J. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
- Kim, J.; Feldt, R.; Yoo, S. Guiding deep learning system testing using surprise adequacy. In Proceedings of the International Conference on Software Engineering, Montreal, QC, Canada, 25–31 May 2019. [Google Scholar]
- Hond, D.; Asgari, H.; Jeffery, D. Verifying artificial neural network classifier performance using dataset dissimilarity measures. In Proceedings of the International Conference on Machine Learning and Applications, Virtual, 14–17 December 2020. [Google Scholar]
- Mandelbaum, A.; Weinshall, D. Distance-based confidence score for neural network classifiers. arXiv 2017, arXiv:1709.09844. [Google Scholar]
- Melekhov, I.; Juho, K.; Esa, R. Siamese network features for image matching. In Proceedings of the International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016. [Google Scholar]
- Motiian, S.; Piccirilli, M.; Adjeroh, D.A.; Doretto, G. Unified deep supervised domain adaptation and generalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Garcia, N.; Vogiatzis, G. Learning non-metric visual similarity for image retrieval. Image Vis. Comput. 2019, 82, 18–25. [Google Scholar] [CrossRef]
- Chen, H.; Wu, C.; Du, B.; Zhang, L. DSDANet: Deep Siamese domain adaptation convolutional neural network for cross-domain change detection. arXiv 2020, arXiv:2006.09225. [Google Scholar]
- Chandola, V.; Arindam, B.; Vipin, K. Anomaly Detection: A Survey; ACM Computing Surveys: New York, NY, USA, 2009. [Google Scholar]
- Hawkins, D. Identification of Outliers; Chapman Hall: London, UK, 1980. [Google Scholar]
- Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 16–18 May 2000. [Google Scholar]
- Li, Z.; Zhao, Y.; Hu, X.; Botta, N.; Ionescu, C.; Chen, G. Ecod: Unsupervised outlier detection using empirical cumulative distribution functions. IEEE Trans. Knowl. Data Eng. 2022, 35, 12181–12193. [Google Scholar] [CrossRef]
- Aryal, S.; Kai, T.; Gholamreza, H. Revisiting attribute independence assumption in probabilistic unsupervised anomaly detection. In Intelligence and Security Informatics: 11th Pacific Asia Workshop; Springer International Publishing: Cham, Switzerland, 2016. [Google Scholar]
- Goldstein, M.; Dengel, A. Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm. In Proceedings of the 35th German Conference on Artificial Intelligence (KI-2012), Saarbrücken, Germany, 24–27 September 2012; Volume 1, pp. 59–63, Poster and Demo Track. [Google Scholar]
- Hond, D.; Asgari, H.; Symonds, L.; Newman, M. Layer-wise analysis of neuron activation values for performance verification of artificial neural network classifiers. In Proceedings of the International Conference on Assured Autonomy, Virtual, 22–24 March 2022. [Google Scholar]
- LeCun, Y.; Cortes, C.; Burges, C. MNIST Handwritten Digit Database; ATT Labs (Online): Atlanta, GA, USA, 2010. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: https://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf (accessed on 22 February 2024).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Github. “Keras”. 2015. Available online: https://github.com/keras-team/keras (accessed on 22 February 2024).
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Jozefow, R. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software. 2015. Available online: https://www.tensorflow.org/ (accessed on 22 February 2024).
- Crowder, S.; Delker, C.; Forrest, E.; Martin, N. Introduction to Statistics in Metrology; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Delker, C.; Auden, E.; Solomon, O. Calculating interval uncertainties for calibration standards that drift with time. NCSLI Meas. 2018, 12, 9–20. [Google Scholar] [CrossRef]
Spearman’s Coefficient | Rotation | Blur | Gaussian Noise | Brightness |
---|---|---|---|---|
PM0-based ADD | 1.00 | 1.00 | 0.94 | 0.32 |
PM1-based ADD | −0.03 | 0.43 | 0.60 | −0.06 |
PM2-based ADD | 1.00 | 0.83 | 0.89 | 1.00 |
PM3-based ADD | 1.00 | 1.00 | 1.00 | 0.94 |
Spearman’s Coefficient | Rotation | Blur | Gaussian Noise | Brightness |
---|---|---|---|---|
PM0-based ADD | 0.83 | 1.00 | 1.00 | 1.00 |
PM1-based ADD | −0.60 | 1.00 | 0.66 | 1.00 |
PM2-based ADD | 0.94 | 1.00 | 0.94 | 1.00 |
PM3-based ADD | 0.66 | 1.00 | 1.00 | 1.00 |
Spearman’s Coefficient | Rotation | Blur | Gaussian Noise | Brightness |
---|---|---|---|---|
PM2-based ADD | 1.00 | 1.00 | 1.00 | 1.00 |
PM3-based ADD | 0.94 | 1.00 | 0.94 | 1.00 |
RMSE | LeNet-5 (tanh) | LeNet-5 (ReLU) |
---|---|---|
PM0-based ADD | 0.132 | 0.056 |
PM2-based ADD | 0.049 | 0.053 |
PM3-based ADD | 0.071 | 0.052 |
RMSE | ResNet-18 |
---|---|
PM2-based ADD | 0.127 |
PM3-based ADD | 0.099 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Incorvaia, G.; Hond, D.; Asgari, H. Uncertainty Quantification of Machine Learning Model Performance via Anomaly-Based Dataset Dissimilarity Measures. Electronics 2024, 13, 939. https://doi.org/10.3390/electronics13050939
Incorvaia G, Hond D, Asgari H. Uncertainty Quantification of Machine Learning Model Performance via Anomaly-Based Dataset Dissimilarity Measures. Electronics. 2024; 13(5):939. https://doi.org/10.3390/electronics13050939
Chicago/Turabian StyleIncorvaia, Gabriele, Darryl Hond, and Hamid Asgari. 2024. "Uncertainty Quantification of Machine Learning Model Performance via Anomaly-Based Dataset Dissimilarity Measures" Electronics 13, no. 5: 939. https://doi.org/10.3390/electronics13050939
APA StyleIncorvaia, G., Hond, D., & Asgari, H. (2024). Uncertainty Quantification of Machine Learning Model Performance via Anomaly-Based Dataset Dissimilarity Measures. Electronics, 13(5), 939. https://doi.org/10.3390/electronics13050939