Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach
Abstract
:1. Introduction
2. Related Work
3. Finite Query-Testing Scheme
3.1. Problem Statement
3.2. Decoding
3.3. Labeling Error
4. Bounds on Performance for Recovery of Corrupted Labels
4.1. Lower Bound
4.2. Upper Bound
4.3. Proof of Theorem 2
4.4. Discussion
5. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. In Proceedings of the Neural Information Processing System (NIPS), Harrahs and Harveys, Lake Tahoe, NV, USA, 3–8 December 2012; Volume 2, pp. 1097–1105. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Zhang, W.; Du, T.; Wang, J. Deep Learning over Multi-Field Categorical Data. In European Conference on Information Retrieval; Springer: Cham, Switzerland, 2016; pp. 45–57. [Google Scholar]
- Chen, M.; Zhou, X. DeepRank: Learning to rank with neural networks for recommendation. Knowl. Based Syst. 2020, 209, 106478. [Google Scholar] [CrossRef]
- Onal, K.D.; Zhang, Y.; Altingovde, I.S.; Rahman, M.M.; Karagoz, P.; Braylan, A.; Dang, B.; Chang, H.-L.; Kim, H.; McNamara, Q.; et al. Neural information retrieval: At the end of the early years. Inf. Retr. J. 2018, 21, 111–182. [Google Scholar] [CrossRef]
- Howard, J.; Ruder, S. Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; Volume 1, pp. 328–339. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Severyn, A.; Moschitti, A. Twitter sentiment analysis with deep convolutional neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, 9–13 August 2015; pp. 959–962. [Google Scholar] [CrossRef]
- Paolacci, G.; Chandler, J.; Ipeirotis, P.G. Running experiments on Amazon Mechanical Turk. Judgm. Decis. Mak. 2010, 5, 411–419. [Google Scholar] [CrossRef]
- Cothey, V. Web-crawling reliability. J. Am. Soc. Inf. Sci. Technol. 2004, 14, 1228–1238. [Google Scholar] [CrossRef]
- Mason, W.; Suri, S. Conducting behavioral research on Amazon’s mechanical turk. Behav. Res. Methods 2012, 44, 1–23. [Google Scholar] [CrossRef]
- Scott, C.; Blanchard, G.; Handy, G. Classification with asymmetric label noise: Consistency and maximal denoising. In Proceedings of the 26th Annual Conference on Learning Theory, Princeton, NJ, USA, 12–14 June 2013; Volume 30, pp. 489–511. [Google Scholar]
- Frenay, B.; Verleysen, M. Classification in the presence of label Noise: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2013, 25, 845–869. [Google Scholar] [CrossRef]
- Lloyd, R.V.; Erickson, L.A.; Casey, M.B.; Lam, K.Y.; Lohse, C.M.; Asa, S.L.; Chan, J.K.; DeLellis, R.A.; Harach, H.R.; Kakudo, K.; et al. Observer variation in the diagnosis of follicular variant of papillary thyroid carcinoma. Am. J. Surg. Pathol. 2004, 28, 1336–1340. [Google Scholar] [CrossRef]
- Xiao, H.; Xiao, H.; Eckert, C. Adversarial Label Flips Attack on Support Vector Machines. In Proceedings of the ECAI, Montpellier, France, 27–31 August 2012; pp. 870–875. [Google Scholar]
- Tong, X.; Tian, X.; Yi, Y.; Chang, H.; Wang, X. Learning from massive noisy labeled data for image classification. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar] [CrossRef]
- Li, W.; Wang, L.; Li, W.; Agustsson, E.; Gool, L.V. WebVision Database: Visual Learning and Understanding from Web Data. arXiv 2017, arXiv:1708.02862. [Google Scholar]
- Lee, K.H.; He, X.; Zhang, L.; Yang, L. CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–21 June 2018; 2018. [Google Scholar] [CrossRef]
- Song, H.; Kim, M.; Lee, J.G. SELFIE: Refurbishing unclean samples for robust deep learning. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 10–15 June 2019; pp. 5907–5915. [Google Scholar]
- Krause, J.; Sapp, B.; Howard, A.; Zhou, H.; Toshev, A.; Duerig, T.; Philbin, J.; Fei-Fei, L. The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition. In Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; Volume 9907, pp. 301–320. [Google Scholar] [CrossRef]
- Arpit, D.; Jastrzebski, S.; Ballas, N.; Krueger, D.; Bengio, E.; Kanwal, M.S.; Maharaj, T.; Fischer, A.; Courville, A.; Bengio, Y.; et al. A closer look at memorization in deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 7–9 August 2017. [Google Scholar]
- Song, H.; Kim, M.; Park, D.; Shin, Y.; Lee, J.-G. Learning from noisy labels with Deep Neural Networks: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–19, early access. [Google Scholar] [CrossRef]
- Ashtiani, H.; Kushagra, S.; Ben-David, S. Clustering with Same-Cluster Queries. In Advances in Neural Information Processing Systems; Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 29, pp. 3216–3224. [Google Scholar]
- Firmani, D.; Saha, B.; Srivastava, D. Online entity resolution using an Oracle. Proc. VLDB Endow. 2016, 9, 384–395. [Google Scholar] [CrossRef]
- Mazumdar, A.; Saha, B. Clustering with Noisy Queries. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30, pp. 5788–5799. [Google Scholar]
- Wang, J.; Kraska, T.; Franklin, M.J.; Feng, J. CrowdER: Crowdsourcing Entity Resolution. Proc. VLDB Endow. 2012, 5, 1483–1494. [Google Scholar] [CrossRef]
- Mazumdar, A.; Pal, S. Semisupervised Clustering by Queries and Locally Encodable Source Coding. IEEE Trans. Inf. Theory 2021, 67, 1141–1155. [Google Scholar] [CrossRef]
- Kim, D.; Chung, H.W. Binary Classification with XOR Queries: Fundamental Limits and an Efficient Algorithm. IEEE Trans. Inf. Theory 2021, 67, 4588–4612. [Google Scholar] [CrossRef]
- Haanpaa, H.; Jarvisalo, M.; Kaski, P.; Niemela, I. Hard satisfiable clause sets for benchmarking equivalence reasoning techniques. J. Satisf. Boolean Model. Comput. 2006, 2, 27–46. [Google Scholar] [CrossRef]
- Abbe, E.; Bandeira, A.S.; Bracher, A.; Singer, A. Decoding binary node labels from censored edge measurements: Phase transition and efficient recovery. IEEE Trans. Netw. Sci. Eng. 2014, 1, 10–22. [Google Scholar] [CrossRef]
- Ahn, K.; Lee, K.; Suh, C. Community recovery in hypergraphs. IEEE Trans. Inf. Theory 2019, 65, 6561–6579. [Google Scholar] [CrossRef]
- MacKay, D.J.C. Fountain codes. IEEE Proc. Commun. 2004, 152, 1062–1068. [Google Scholar] [CrossRef]
- Cook, L.; Friend, M. Co-Teaching: Guidelines for Creating Effect Practices. Focus Except. Child. 1995, 28, 1–16. [Google Scholar] [CrossRef]
- Huang, J.C.; Qu, L.; Jia, R.F.; Zhao, B.Q. O2U-Net: A Simple Noisy Label Detection Approach for Deep Neural Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3325–3333. [Google Scholar] [CrossRef]
- Zhou, T.; Wang, S.; Bilmes, J. Robust curriculum learning: From clean label detection to noisy label self-correction. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 4 May 2021. [Google Scholar]
- Donoho, D.L. Compressed Sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
- MacKay, D.J.C. Good error-correcting codes based on very sparse matrices. IEEE Trans. Inf. Theory 1999, 45, 399–431. [Google Scholar] [CrossRef]
- Seong, J.-T. Theoretical Bounds on Performance in Threshold Group Testing. Mathematics 2020, 8, 637. [Google Scholar] [CrossRef]
- Seong, J.-T. Theoretical Bounds on the Number of Tests in Noisy Threshold Group Testing Frameworks. Mathematics 2022, 10, 2508. [Google Scholar] [CrossRef]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: Hoboken, NJ, USA, 2009. [Google Scholar]
- Gallager, R. Information Theory and Reliable Communication; John Wiley and Sons: Hoboken, NJ, USA, 1968. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Seong, J.-T. Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach. Mathematics 2023, 11, 3636. https://doi.org/10.3390/math11173636
Seong J-T. Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach. Mathematics. 2023; 11(17):3636. https://doi.org/10.3390/math11173636
Chicago/Turabian StyleSeong, Jin-Taek. 2023. "Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach" Mathematics 11, no. 17: 3636. https://doi.org/10.3390/math11173636
APA StyleSeong, J. -T. (2023). Bounds on Performance for Recovery of Corrupted Labels in Supervised Learning: A Finite Query-Testing Approach. Mathematics, 11(17), 3636. https://doi.org/10.3390/math11173636