Towards a Taxonomy Machine: A Training Set of 5.6 Million Arthropod Images
Abstract
:1. Summary
2. Data Generation
3. Data Description
3.1. Geographic Coverage
3.2. Taxonomic Coverage
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Gaston, K.J.; O’Neill, M.A. Automated species identification: Why not? Philos. Trans. R. Soc. Lond. B 2004, 359, 655–667. [Google Scholar] [CrossRef] [PubMed]
- Godfray, H.C.J. Linnaeus in the information age. Nature 2007, 446, 259–260. [Google Scholar] [CrossRef] [PubMed]
- Hebert, P.D.N.; Cywinska, A.; Ball, S.L.; deWaard, J.R. Biological identifications through DNA barcodes. Proc. R. Soc. B 2003, 270, 312–321. [Google Scholar] [CrossRef] [PubMed]
- Blagoderov, V.; Kitching, I.J.; Livermore, L.; Simonsen, T.J.; Smith, V.S. No specimen left behind: Industrial scale digitization of natural history collections. ZooKeys 2012, 209, 133–146. [Google Scholar] [CrossRef] [PubMed]
- Hebert, P.D.N.; Ratnasingham, S.; Zakharov, E.V.; Telfer, A.C.; Levesque-Beaudin, V.; Milton, M.A.; Pedersen, S.; Janetta, P.; de Waard, J.R. Counting animal species with DNA barcodes: Canadian insects. Philos. Trans. R. Soc. Lond. B 2016, 371, 20150333. [Google Scholar] [CrossRef]
- deWaard, J.R.; Ratnasingham, S.; Zakharov, E.V.; Borisenko, A.V.; Steinke, D.; Telfer, A.C.; Perez, K.H.J.; Sones, J.E.; Young, M.R.; Levesque-Beaudin, V.; et al. A reference library for Canadian invertebrates with 1.5 million barcodes, voucher specimens, and DNA samples. Sci. Data 2019, 6, 308. [Google Scholar] [CrossRef]
- Farnsworth, E.J.; Chu, M.; Kress, J.; Neill, A.K.; Best, J.H.; Pickering, J.; Stevenson, R.D.; Courtney, G.W.; VanDyk, J.K.; Ellison, A.M. Next-generation field guides. BioScience 2013, 63, 891–899. [Google Scholar] [CrossRef]
- Seeland, M.; Rzanny, M.; Alaqraa, N.; Wäldchen, J.; Mäder, P. Plant species classification using flower images—A comparative study of local feature representations. PLoS ONE 2017, 12, e0170629. [Google Scholar] [CrossRef]
- Wäldchen, J.; Mäder, P. Machine learning for image based species identification. Methods Ecol. Evol. 2018, 9, 2216–2225. [Google Scholar] [CrossRef]
- Martineau, C.; Conte, D.; Raveaux, R.; Arnault, I.; Munier, D.; Venturini, G. A survey on image-based insect classification. Pattern Recognit. 2017, 65, 273–284. [Google Scholar] [CrossRef]
- De Cesaro, T., Jr.; Rider, R. Automatic identification of insects from digital images: A survey. Comput. Electron. Agric. 2020, 178, 105784. [Google Scholar] [CrossRef]
- da Silveira, F.A.G.; Castelão Tetila, E.; Astolfi, G.; Bessada Costa, A.; Paraguassu Amorim, W. Performance analysis of YOLOv3 for real-time detection of pests in soybeans. In Proceedings of the Intelligent Systems: 10th Brazilian Conference, BRACIS 2021, Virtual Event, 29 November–3 December 2021; Proceedings, Part II. Springer: Berlin/Heidelberg, Germany; pp. 265–279. [Google Scholar] [CrossRef]
- Li, W.; Zheng, T.; Yang, Z.; Li, M.; Sun, C.; Yang, X. Classification and detection of insects from field images using deep learning for smart pest management: A systematic review. Ecol. Inform. 2021, 66, 101460. [Google Scholar] [CrossRef]
- Xing, S.; Lee, H.J. Crop pests and diseases recognition using DANet with TLDP. Comput. Electron. Agric. 2022, 199, 107144. [Google Scholar] [CrossRef]
- van Horn, G.; Mac Aodha, O.; Song, Y.; Cui, Y.; Sun, C.; Shepard, A.; Adam, H.; Perona, P.; Belongie, S. The iNaturalist species classification and detection dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8769–8778. [Google Scholar] [CrossRef]
- van Horn, G.; Cole, E.; Beery, S.; Wilber, K.; Belongie, S.; Mac Aodha, O. Benchmarking representation learning for natural world image collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12884–12893. [Google Scholar] [CrossRef]
- Schneider, S.; Taylor, G.W.; Linquist, S.; Kremer, S.C. Past, present and future approaches using computer vision for animal re-identification from camera trap data. Methods Ecol. Evol. 2018, 10, 461–470. [Google Scholar] [CrossRef]
- Bothmann, L.; Wimmer, L.; Charrakh, O.; Werber, T.; Edelhoff, H.; Peters, W.; Nguyen, H.; Benjamnin, C.; Menzel, A. Automated wildlife image classification: An active learning tool for ecological applications. Ecol. Inform. 2023, 77, 102231. [Google Scholar] [CrossRef]
- Ding, W.; Taylor, G.W. Automatic moth detection from trap images for pest management. Comput. Electron. Agric. 2016, 123, 17–28. [Google Scholar] [CrossRef]
- Gharaee, Z.; Gong, Z.; Pellegrino, N.; Zarubiieva, I.; Haurum, J.B.; Lowe, S.C.; McKeown, J.T.A.; Ho, C.C.Y.; McLeod, J.; Wei, Y.C.; et al. A step towards worldwide biodiversity assessment: The BIOSCAN-1M insect dataset. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2023; Volume 37. [Google Scholar] [CrossRef]
- Sun, Y.; Liu, X.; Yuan, M.; Ren, L.; Wang, J.; Chen, Z. Automatic in-trap pest detection using learning for pheromone-based Dendroctonus valens monitoring. Biosyst. Eng. 2018, 176, 140–150. [Google Scholar] [CrossRef]
- Wu, X.; Zhan, C.; Lai, Y.-K.; Cheng, M.-M.; Yang, J. IP102: A large-scale benchmark dataset for insect pest recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 8787–8796. [Google Scholar] [CrossRef]
- Badirli, S.; Akata, Z.; Mohler, G.; Picard, C.; Dundar, M. Fine-Grained Zero-Shot learning with DNA as side information. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2021; Volume 34. [Google Scholar] [CrossRef]
- Yang, H.-P.; Ma, C.-S.; Wen, H.; Zhan, Q.-B.; Wang, X.-L. A tool for developing an automatic insect identification system based on wing outlines. Sci. Rep. 2015, 5, 12786. [Google Scholar] [CrossRef]
- Heerlien, M.; van Leusen, J.; Schnörr, S.; de Jong-Kole, S.; Raes, R.; van Hulsen, K. The natural history production line: An industrial approach to the digitization of scientific collections. ACM J. Comput. Cult. Herit. 2015, 8, 3. [Google Scholar] [CrossRef]
- Ströbel, B.; Schmelzle, S.; Blüthgen, N.; Heethoff, M. An automated device for the digitization and 3D modelling of insects, combining extended-depth-of-field and all-side multi-view imaging. ZooKeys 2018, 759, 1–27. [Google Scholar] [CrossRef]
- Tegelberg, R.; Kahanpää, J.; Karppinen, J.; Mononen, T.; Wu, Z.; Saarenmaa, H. Mass digitization of individual pinned insects using conveyor-driven imaging. In Proceedings of the 2017 IEEE 13th International Conference on e-Science (e-Science), Auckland, New Zealand, 24–27 October 2017; pp. 523–527. [Google Scholar] [CrossRef]
- Mantle, B.L.; Salle, J.L.; Fisher, N. Whole-drawer imaging for digital management and curation of a large entomological collection. ZooKeys 2012, 209, 147–163. [Google Scholar] [CrossRef] [PubMed]
- Holovachov, O.; Zatushevsky, A.; Shydlovsky, I. Whole-drawer imaging of entomological collections: Benefits, limitations, and alternative applications. J. Conserv. Mus. Stud. 2014, 12, 9. [Google Scholar] [CrossRef]
- Small, E. The new Noah’s ark: Beautiful and useful species only. Part 2. The chosen species. Biodiversity 2012, 12, 37–53. [Google Scholar] [CrossRef]
- Leandro, C.; Jay-Robert, P.; Vergnes, A. Bias and perspectives in insect conservation: A European scale analysis. Biol. Conserv. 2017, 215, 213–224. [Google Scholar] [CrossRef]
- Hobern, D.; Hebert, P.D.N. BIOSCAN—Revealing Eukaryote Diversity, Dynamics, and Interactions. Biodivers. Inf. Sci. Stand. 2019, 3, e37333. [Google Scholar] [CrossRef]
- Ratnasingham, S.; Wei, C.; Chan, D.; Agda, J.; Agda, J.; Ballesteros-Mejia, L.; Ait Boutou, H.; El Bastami, Z.M.; Ma, E.; Manjunath, R.; et al. BOLD v4: A Centralized Bioinformatics Platform for DNA-Based Biodiversity Data. In DNA Barcoding: Methods and Protocols; Springer: New York, NY, USA, 2024; Chapter 26; pp. 403–441. [Google Scholar]
- Nowosad, D.S.J.; Hogg, I.D.; Cottenie, K.; Lear, C.; Elliott, T.A.; deWaard, J.R.; Steinke, D.; Adamowicz, S.J. High diversity of freshwater invertebrates on Inuinnait Nuna, the Canadian Arctic, revealed using mitochondrial DNA barcodes. Polar Biol. 2024. [Google Scholar] [CrossRef]
- Ratnasingham, S.; Hebert, P.D.N. A DNA-based registry for all animal species: The Barcode Index Number (BIN) System. PLoS ONE 2013, 8, e66213. [Google Scholar] [CrossRef]
- Gharaee, Z.; Lowe, S.C.; Gong, Z.M.; Arias, P.M.; Pellegrino, N.; Wang, A.T.; Haurum, J.B.; Zarubiieva, I.; Kari, L.; Steinke, D.; et al. BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity. arXiv 2024, arXiv:2406.12723. [Google Scholar] [CrossRef]
- Ärje, J.; Melvad, C.; Jeppesen, M.R.; Madsen, S.A.; Raitoharju, J.; Rasmussen, M.S.; Iosifidis, A.; Tirronen, V.; Meissner, K.; Gabbouj, M.; et al. Automatic image-based identification and biomass estimation of invertebrates. Mol. Ecol. Resour. 2021, 11, 922–931. [Google Scholar] [CrossRef]
- Wührl, L.; Pylatiuk, C.; Giersch, M.; Lapp, F.; von Rintelen, T.; Balke, M.; Schmidt, S.; Cerretti, P.; Meier, R. Diversityscanner: Robotic handling of small invertebrates with machine learning methods. Mol. Ecol. Resour. 2022, 22, 1626–1638. [Google Scholar] [CrossRef]
- Schneider, S.; Tayler, G.W.; Kremer, S.C.; Burgess, P.; McGroarty, J.; Mitsui, K.; Zhuang, A.; deWaard, J.R.; Fryxell, J.M. Bulk arthropod abundance, biomass and diversity estimation using deep learning for computer vision. Methods Ecol. Evol. 2021, 13, 346–357. [Google Scholar] [CrossRef]
- Schneider, S.; Taylor, G.W.; Kremer, S.C.; Fryxell, J.M. Getting the bugs out of AI: Advancing ecological research on arthropods through computer vision. Ecol. Lett. 2023, 26, 1247–1258. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Steinke, D.; Ratnasingham, S.; Agda, J.; Ait Boutou, H.; Box, I.C.H.; Boyle, M.; Chan, D.; Feng, C.; Lowe, S.C.; McKeown, J.T.A.; et al. Towards a Taxonomy Machine: A Training Set of 5.6 Million Arthropod Images. Data 2024, 9, 122. https://doi.org/10.3390/data9110122
Steinke D, Ratnasingham S, Agda J, Ait Boutou H, Box ICH, Boyle M, Chan D, Feng C, Lowe SC, McKeown JTA, et al. Towards a Taxonomy Machine: A Training Set of 5.6 Million Arthropod Images. Data. 2024; 9(11):122. https://doi.org/10.3390/data9110122
Chicago/Turabian StyleSteinke, Dirk, Sujeevan Ratnasingham, Jireh Agda, Hamzah Ait Boutou, Isaiah C. H. Box, Mary Boyle, Dean Chan, Corey Feng, Scott C. Lowe, Jaclyn T. A. McKeown, and et al. 2024. "Towards a Taxonomy Machine: A Training Set of 5.6 Million Arthropod Images" Data 9, no. 11: 122. https://doi.org/10.3390/data9110122