An Explainable Scheme for Memorization of Noisy Instances by Downstream Evaluation
:1. Introduction
2. Related Work
3. Methods
3.1. Workflow of the Method
3.2. Mixing in Noise
3.3. Integrated Gradients (IG)
3.4. Memorization Matrix
4. Experimental Setup and Results
4.1. Experimental Setup
4.2. Experimental Results
4.3. Ablation Test of Fraction Removing from Training Data
- Across all datasets and noise sampling distributions, a consistent phenomenon is: Models trained by excluding training data in proportion to high memorization scores show a significantly greater drop in accuracy (as shown by the red lines) as the exclusion ratio increased, compared to models trained by randomly excluding data (as shown by the blue lines). This indicates that the absence of high-memorization-score images has a critical and detrimental impact on model learning, leading to a substantial decline in test accuracy.
- A common phenomenon observed across all datasets is that, due to the higher diversity of noise sampled from a uniform distribution compared to a normal distribution, noise from the uniform distribution has a more harmful effect on the model under the same noise intensity.
- An interesting phenomenon across all datasets is that, as the noise intensity increases, the red line tends to approach the blue line more closely when less than 10% of the training data is excluded. This trend is particularly noticeable when the noise intensity reaches 1.0. This occurs because data with exceptionally high memorization scores are mostly concentrated in the top 10% (as shown by the green lines). As the noise intensity increases, the top 10% of data with high memorization scores no longer entirely consist of the high memorization-score data from the original, noise-free dataset. Many of these data instances are likely originally low memorization-score data that, after being subjected to high-intensity noise, are misidentified by the model as high memorization-score data, thus being considered as good training data. Under such interference, the impact of the top 10% of important data on the model becomes less significant, resulting in performance that is nearly identical to randomly excluding 10% of the data.
- Due to the feature complexity of the STL-10 dataset, it would likely be relatively simpler than that of the other two datasets, even after excluding a high fraction of high memorization-score training data, as the decrease in test accuracy is not as pronounced. In contrast, the CIFAR-100 dataset, with its higher number of classes and more complex features, experiences the greatest drop in test accuracy as the proportion of high memorization-score data excluded increases. The performance of the CIFAR-10 dataset falls between the two. For example, observing the fraction between 30% and 50%, while the memorization scores of these data appear relatively lower and might seem less important, excluding this portion of data has a significantly greater impact on model performance for CIFAR-10 and CIFAR-100 compared to STL-10.
5. Conclusions
Author Contributions
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
- Yuan, R.; Janzen, I.; Devnath, L.; Khattra, S.; Myers, R.; Lam, S.; MacAulay, C. MA19.11 Predicting Future Lung Cancer Risk with Low-Dose Screening CT Using an Artificial Intelligence Model. J. Thorac. Oncol. 2023, 18, S174. [Google Scholar] [CrossRef]
- Devnath, L.; Fan, Z.; Luo, S.; Summons, P.; Wang, D. Detection and Visualisation of Pneumoconiosis Using an Ensemble of Multi-Dimensional Deep Features Learned from Chest X-rays. Int. J. Environ. Res. Public Health 2022, 19, 11193. [Google Scholar] [CrossRef]
- Sahu, S.K.; Mokhade, A.; Bokde, N.D. An Overview of Machine Learning, Deep Learning, and Reinforcement Learning-Based Techniques in Quantitative Finance: Recent Progress and Challenges. Appl. Sci. 2023, 13, 1956. [Google Scholar] [CrossRef]
- Ni, J.; Chen, Y.; Chen, Y.; Zhu, J.; Ali, D.; Cao, W. A Survey on Theories and Applications for Self-Driving Cars Based on Deep Learning Methods. Appl. Sci. 2020, 10, 2749. [Google Scholar] [CrossRef]
- Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
- Burkart, N.; Huber, M.F. A survey on the explainability of supervised machine learning. J. Artif. Intell. Res. 2021, 70, 245–317. [Google Scholar] [CrossRef]
- Gohel, P.; Singh, P.; Mohanty, M. Explainable AI: Current status and future directions. arXiv 2021, arXiv:2107.07045. [Google Scholar]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Datta, A.; Sen, S.; Zick, Y. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Proceedings of the 2016 IEEE Symposium on Security and Privacy, San Jose, CA, USA, 23–25 May 2016. [Google Scholar]
- Adler, P.; Falk, C.; Friedler, S.A.; Nix, T.; Rybeck, G.; Scheidegger, C.; Venkatasubramanian, S. Auditing black-box models for indirect influence. Knowl. Inf. Syst. 2018, 54, 95–122. [Google Scholar] [CrossRef]
- Li, J.; Monroe, W.; Jurafsky, D. Understanding neural networks through representation erasure. arXiv 2016, arXiv:1612.08220. [Google Scholar]
- Toneva, M.; Sordoni, A.; Combes, R.T.D.; Trischler, A.; Bengio, Y.; Gordon, G.J. An empirical study of example forgetting 6 during deep neural network learning. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, USA, 30 April–3 May 2018. [Google Scholar]
- Yaghoobzadeh, Y.; Mehri, S.; Tachet, R.; Hazen, T.J.; Sordoni, A. Increasing robustness to spurious correlations using forgettable examples. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, Kyiv, Ukraine, 19–23 April 2021. [Google Scholar]
- Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning requires rethinking generalization. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Feldman, V. Does learning require memorization? A short tale about a long tail. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, Chicago, IL, USA, 22–26 June 2020. [Google Scholar]
- Feldman, V.; Zhang, C. What neural networks memorize and why: Discovering the long tail via influence estimation. Adv. Neural Inf. Process. Syst. 2020, 33, 2881–2891. [Google Scholar]
- Chatterjee, S. Learning and memorization. In Proceedings of the International Conference on Machine Learning (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Montanari, A.; Zhong, Y. The interpolation phase transition in neural networks: Memorization and generalization under lazy training. Ann. Stat. 2020, 50, 2816–2847. [Google Scholar] [CrossRef]
- Khandelwal, U.; Levy, O.; Jurafsky, D.; Zettlemoyer, L.; Lewis, M. Generalization through memorization: Nearest neighbor language models. In Proceedings of the International Conference on Machine Learning (ICLR), Virtual, 26 April–1 May 2020. [Google Scholar]
- Koh, P.W.; Liang, P. Understanding black-box predictions via influence functions. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Wang, W.; Kaleem, M.A.; Dziedzic, A.; Backes, M.; Papernot, N.; Boenisch, F. Memorization in Self-Supervised Learning Improves Downstream Generalization. In Proceedings of the International Conference on Machine Learning (ICLR), Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Zheng, X.; Jiang, J. An Empirical Study of Memorization in NLP. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022. [Google Scholar]
- Cook, R.D.; Weisberg, S. Characterizations of an empirical influence function for detecting influential cases in regression. Technometrics 1980, 22, 495–508. [Google Scholar] [CrossRef]
- Parulekar, A.; Collins, L.; Shanmugam, K.; Mokhtari, A.; Shakkottai, S. InfoNCE loss provably learns cluster-preserving representations. arXiv 2023, arXiv:2302.07920. [Google Scholar]
- Huang, W.; Yi, M.; Zhao, X.; Jiang, Z. Towards the generalization of contrastive self-supervised learning. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Wang, Y.; Zhang, Q.; Wang, Y.; Yang, J.; Lin, Z. Chaos is a ladder: A new theoretical understanding of contrastive learning via augmentation overlap. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 25–29 April 2022. [Google Scholar]
- Biggio, B.; Nelson, B.; Laskov, P. Poisoning attacks against support vector machines. In Proceedings of the International Conference on Machine Learning (ICML), Edinburgh, Scotland, 26 June–1 July 2012. [Google Scholar]
- Mei, S.; Zhu, X. Using machine teaching to identify optimal training-set attacks on machine learners. In Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI), Austin, TX, USA, 25–30 January 2015. [Google Scholar]
- Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership inference attacks against machine learning models. In Proceedings of the IEEE Symposium on Security and Privacy, San Jose, CA, USA, 22–24 May 2017. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018. [Google Scholar]
- Nazaré, T.S.; Costa, G.B.; Contato, W.A.; Ponti, M.A. Deep convolutional neural networks and noisy images. Lect. Notes Comput. Sci. 2018, 10657, 416–424. [Google Scholar]
- Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 13–19 July 2017. [Google Scholar]
- Cook, R.D.; Weisberg, S. Residuals and Influence in Regression; Chapman and Hall: New York, NY, USA, 1982. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G. Imagenet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NIPS). 2012. Available online: (accessed on 30 December 2024).
- Coates, A.; Ng, A.; Lee, H. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, Fort Lauderdale, FL, USA, 11–13 April 2011. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: (accessed on 30 December 2024).
β | S | E on S | E/S | A | E on A | E/A |
0 | 5000 | 34 | 0.68% | -- | -- | -- |
0.1 | 5000 | 30 | 0.60% | 4966 | 2 | 0.04% |
0.2 | 5000 | 33 | 0.66% | 4966 | 9 | 0.18% |
0.5 | 5000 | 41 | 0.82% | 4966 | 22 | 0.44% |
0.7 | 5000 | 64 | 1.28% | 4966 | 42 | 0.85% |
1.0 | 5000 | 93 | 1.86% | 4966 | 79 | 1.59% |
β | S | E on S | E/S | A | E on A | E/A |
0 | 5000 | 50 | 1.00% | -- | -- | -- |
0.1 | 5000 | 56 | 1.12% | 4950 | 19 | 0.38% |
0.2 | 5000 | 51 | 1.02% | 4950 | 30 | 0.61% |
0.5 | 5000 | 80 | 1.60% | 4950 | 60 | 1.21% |
0.7 | 5000 | 120 | 2.40% | 4950 | 99 | 2.00% |
1.0 | 5000 | 233 | 4.66% | 4950 | 211 | 4.26% |
β | S | E on S | E/S | A | E on A | E/A |
0 | 10,000 | 2075 | 20.75% | -- | -- | -- |
0.1 | 10,000 | 1905 | 19.05% | 7925 | 101 | 1.27% |
0.2 | 10,000 | 1633 | 16.33% | 7925 | 219 | 2.76% |
0.5 | 10,000 | 1359 | 13.59% | 7925 | 476 | 6.01% |
0.7 | 10,000 | 1223 | 12.23% | 7925 | 429 | 5.41% |
1.0 | 10,000 | 1228 | 12.28% | 7925 | 234 | 2.95% |
β | S | E on S | E/S | A | E on A | E/A |
0 | 10,000 | 2805 | 28.05% | -- | -- | -- |
0.1 | 10,000 | 2514 | 25.14% | 7195 | 405 | 5.63% |
0.2 | 10,000 | 2298 | 22.98% | 7195 | 572 | 7.95% |
0.5 | 10,000 | 1719 | 17.19% | 7195 | 447 | 6.21% |
0.7 | 10,000 | 1633 | 16.33% | 7195 | 524 | 7.28% |
1.0 | 10,000 | 1784 | 17.84% | 7195 | 729 | 10.13% |
β | S | E on S | E/S | A | E on A | E/A |
0 | 10,000 | 4801 | 48.01% | -- | -- | -- |
0.1 | 10,000 | 4507 | 45.07% | 5199 | 94 | 1.81% |
0.2 | 10,000 | 3878 | 38.78% | 5199 | 250 | 4.81% |
0.5 | 10,000 | 2985 | 29.85% | 5199 | 548 | 10.54% |
0.7 | 10,000 | 2799 | 27.99% | 5199 | 573 | 11.02% |
1.0 | 10,000 | 2769 | 27.69% | 5199 | 634 | 12.19% |
β | S | E on S | E/S | A | E on A | E/A |
0 | 10,000 | 5411 | 54.11% | -- | -- | -- |
0.1 | 10,000 | 4656 | 46.56% | 4589 | 461 | 10.05% |
0.2 | 10,000 | 4417 | 44.17% | 4589 | 809 | 17.63% |
0.5 | 10,000 | 3599 | 35.99% | 4589 | 679 | 14.80% |
0.7 | 10,000 | 3377 | 33.77% | 4589 | 638 | 13.90% |
1.0 | 10,000 | 3506 | 35.06% | 4589 | 790 | 17.22% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (
Share and Cite
Tsai, C.-Y.; Tsai, P.-H.; Chung, Y.-W. An Explainable Scheme for Memorization of Noisy Instances by Downstream Evaluation. Appl. Sci. 2025, 15, 2392.
Tsai C-Y, Tsai P-H, Chung Y-W. An Explainable Scheme for Memorization of Noisy Instances by Downstream Evaluation. Applied Sciences. 2025; 15(5):2392.
Chicago/Turabian StyleTsai, Chun-Yi, Ping-Hsun Tsai, and Yu-Wei Chung. 2025. "An Explainable Scheme for Memorization of Noisy Instances by Downstream Evaluation" Applied Sciences 15, no. 5: 2392.
APA StyleTsai, C.-Y., Tsai, P.-H., & Chung, Y.-W. (2025). An Explainable Scheme for Memorization of Noisy Instances by Downstream Evaluation. Applied Sciences, 15(5), 2392.