Automatic Sorting of Dwarf Minke Whale Underwater Images †
Abstract
:1. Introduction
1.1. Related Work
1.2. Method Overview
- Negative-labelling technique was proposed and verified to be effective in assisting approximate object localisation via classification CNNs.
- Simple and very effective architectural modifications (Table 1) to modern off-the-shelf classification CNNs were verified to be valuable to the end-users by simultaneously yielding approximate object localization and image classification. The combined localization heatmaps and the original images could assist in explaining CNN’s results to the users (marine biologists in this study) and hence to accelerate overall acceptance of the deep CNN technologies.
- We developed a very accurate (below 0.1% false-negatives and below 1% false-positives) pipeline for processing large volumes (1.8TB) of digital imagery of dwarf minke whales.
- The following CNN training techniques were demonstrated to work in a complementary manner for this study’s domain of near-surface underwater imagery: linear learning rate annealing, uniform class undersampling, layer-specific learning rate reduction, trainable conversion of greyscale images for ImageNet-pretrained CNNs, weak cross-domain negative supervision (VOC [35] was used).
2. Results
2.1. First Train-Predict Cycle
2.2. Second Train-Predict Cycle
2.3. Switching from VGG-13bn to ResNet-50
2.4. Negative Labelling
2.5. Uniform Class Sampling
2.6. Switching to the Photo-ID Definition of Whale-Detection
2.7. Third Train-Predict Cycle with ResNet-50-Based MWD
2.8. Final Train-Predict Cycle
2.9. Localisation Heatmaps
2.10. Negative-Labelling Viability
2.11. Final Sorting of the 2018 Season Imagery
3. Discussion
4. Materials and Methods
4.1. Minke Whale Photographic Database
4.2. Requirements and Constraints
- Image-level: MWD should work on a per-image level since many thousands of still digital images are collected in each observation season. The videos were converted to individual frames for processing.
- Below 0.1% false-negative-rate per image and 0% false-negatives per seasonal imagery: The key goal was to detect every individual whale at least once within all available imagery for a given season. That is, if a whale was missed in a frame, there were many other frames or images (in the same or different video clip) where MWD would detect the same whale for the photo-ID purposes with high certainty. Due to the nature of the encounters and whale behaviour, our testing confirmed that in the small number of whales that were missed, every individual whale was detected in other identifiable imagery from the corresponding season. The accuracy we aimed for and indeed achieved was that all possible whale IDs were found, and any identifiable whales that were “missed” were found in another part of the video/imagery. This means that there was a 0% error rate, after missed whales were cross checked. This very high accuracy performance requirement was the main practical challenge of this study.
- Below 1% false-positive rate: MWD should have sufficiently low false-positive rate, where the classification error of less than 1% was deemed acceptable. Thus, at least 99% of negative (missing whale) images or video frames would be correctly filtered out. Note the chosen trade-off in favour of the lowest possible false-negative rate (not missing a whale), rather than a balanced number of false-negatives and false-positives.
- Practical training times and processing speed: MWD could be trained and then process the current yearly volume of imagery in the matter of days, where 300 GB, 1.8 TB and 1.9 TB were collected in 2017, 2018 and 2019, respectively.
4.3. Dwarf Minke Whale Detector
4.4. Training Pipeline
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
CNN | Convolutional Neural Network |
GBR | Great Barrier Reef |
GPU | Graphics Processing Unit |
MWD | Minke Whale Detector |
MWP | Minke Whale Project |
Photo-ID | Photo Identification |
References
- Risch, D.; Norris, T.; Curnock, M.; Friedlaender, A. Common and Antarctic minke whales: Conservation status and future research directions. Front. Mar. Sci. 2019, 6, 247. [Google Scholar] [CrossRef] [Green Version]
- Best, P.B. External characters of southern minke whales and the existence of a diminutive form. Sci. Rep. Whales Res. Inst. 1985, 36, 1–33. [Google Scholar]
- Mangott, A.H.; Birtles, R.A.; Marsh, H. Attraction of dwarf minke whales Balaenoptera acutorostrata to vessels and swimmers in the Great Barrier Reef world heritage area—The management challenges of an inquisitive whale. J. Ecotourism 2011, 10, 64–76. [Google Scholar] [CrossRef]
- Birtles, R.A.; Arnold, P.W.; Dunstan, A. Commercial swim programs with dwarf minke whales on the northern Great Barrier Reef, Australia: Some characteristics of the encounters with management implications. Aust. Mammal. 2002, 24, 23–38. [Google Scholar] [CrossRef] [Green Version]
- Curnock, M.I.; Birtles, R.A.; Valentine, P.S. Increased use levels, effort, and spatial distribution of tourists swimming with dwarf minke whales at the Great Barrier Reef. Tour. Mar. Environ. 2013, 9, 5–17. [Google Scholar] [CrossRef] [Green Version]
- Gedamke, J.; Costa, D.P.; Dunstan, A. Localization and visual verification of a complex minke whale vocalization. J. Acoust. Soc. Am. 2001, 109, 3038–3047. [Google Scholar] [CrossRef]
- Arnold, P.W.; Birtles, R.A.; Dunstan, A.; Lukoschek, V.; Matthews, M. Colour patterns of the dwarf minke whale Balaenoptera acutorostrata sensu lato: Description, cladistic analysis and taxonomic implications. Mem. Qld. Mus. 2005, 51, 277–307. [Google Scholar]
- Sobtzick, S. Dwarf Minke Whales in the Northern Great Barrier Reef And Implications for the Sustainable Management of the Swim-With Whales Industry. Ph.D. Thesis, James Cook University, Townsville, Australia, 2010. Available online: https://bit.ly/2DORPRM (accessed on 4 April 2020).
- Arnold, P.; Marsh, H.; Heinsohn, G. The occurrence of two forms of minke whales in east Australian waters with description of external characters and skeleton of the diminutive form. Sci. Rep. Whales Res. Inst. 1987, 38, 1–46. [Google Scholar]
- Konovalov, D.A.; Hillcoat, S.; Williams, G.; Birtles, R.A.; Gardiner, N.; Curnock, M. Individual minke whale recognition using deep learning convolutional neural networks. J. Geosci. Environ. Prot. 2018, 6, 25–36. [Google Scholar] [CrossRef] [Green Version]
- Zuiderveld, K. Contrast Limited Adaptive Histogram Equalization; Graphic Gems IV; Academic Press Professional: San Diego, CA, USA, 1994; pp. 474–485. [Google Scholar]
- Verma, G.K.; Gupta, P. Wild animal detection using deep convolutional neural network. In Proceedings of the 2nd International Conference on Computer Vision & Image Processing, Roorkee, India, 9–12 September 2017; Chaudhuri, B.B., Kankanhalli, M.S., Raman, B., Eds.; Springer: Singapore, 2018; pp. 327–338. [Google Scholar] [CrossRef]
- Pelletier, D.; Leleu, K.; Mou-Tham, G.; Guillemot, N.; Chabanet, P. Comparison of visual census and high definition video transects for monitoring coral reef fish assemblages. Fish. Res. 2011, 107, 84–93. [Google Scholar] [CrossRef] [Green Version]
- Fawcett, T. An introduction to ROC analysis. Pattern Recogn. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception architecture for computer vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar] [CrossRef] [Green Version]
- Zagoruyko, S.; Komodakis, N. Wide residual networks. In Proceedings of the British Machine Vision Conference (BMVC), York, UK, 19–22 September 2016; Wilson, R.C., Hancock, E.R., Smith, W.A.P., Eds.; BMVA Press: Guildford, UK, 2016; pp. 87.1–87.12. [Google Scholar] [CrossRef] [Green Version]
- Konovalov, D.A.; Jahangard, S.; Schwarzkopf, L. In situ cane toad recognition. In Proceedings of the 2018 Digital Image Computing: Techniques and Applications (DICTA), Canberra, Australia, 10–13 December 2018; pp. 1–7. [Google Scholar] [CrossRef] [Green Version]
- Olsen, A.; Konovalov, D.A.; Philippa, B.; Ridd, P.; Wood, J.C.; Johns, J.; Banks, W.; Girgenti, B.; Kenny, O.; Whinney, J.; et al. DeepWeeds: A multiclass weed species image dataset for deep learning. Sci. Rep. 2019, 9, 2058. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Q.; Yang, Y.; Ma, H.; Wu, Y.N. Interpreting CNNs via decision trees. In Proceedings of the CVPR IEEE, Long Beach, CA, USA, 16–20 June 2019; pp. 6254–6263. [Google Scholar] [CrossRef] [Green Version]
- Mahendran, A.; Vedaldi, A. Understanding deep image representations by inverting them. In Proceedings of the CVPR IEEE, Boston, MA, USA, 7–12 June 2015; pp. 5188–5196. [Google Scholar] [CrossRef] [Green Version]
- Goebel, R.; Chander, A.; Holzinger, K.; Lecue, F.; Akata, Z.; Stumpf, S.; Kieseberg, P.; Holzinger, A. Explainable AI: The new 42? In Machine Learning and Knowledge Extraction; Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 295–303. [Google Scholar] [CrossRef] [Green Version]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Park, M.; Yang, W.; Cao, Z.; Kang, B.; Connor, D.; Lea, M.A. Marine vertebrate predator detection and recognition in underwater videos by region convolutional neural network. In Knowledge Management and Acquisition for Intelligent Systems; Ohara, K., Bai, Q., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 66–80. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the CVPR IEEE, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef] [Green Version]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–12 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Huang, J.; Rathod, V.; Sun, C.; Zhu, M.; Korattikara, A.; Fathi, A.; Fischer, I.; Wojna, Z.; Song, Y.; Guadarrama, S.; et al. Speed/accuracy trade-offs for modern convolutional object detectors. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3296–3297. [Google Scholar] [CrossRef] [Green Version]
- Kuznetsova, A.; Rom, H.; Alldrin, N.; Uijlings, J.; Krasin, I.; Pont-Tuset, J.; Kamali, S.; Popov, S.; Malloci, M.; Duerig, T.; et al. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. arXiv 2018, arXiv:1811.00982. [Google Scholar] [CrossRef] [Green Version]
- Krasin, I.; Duerig, T.; Alldrin, N.; Ferrari, V.; Abu-El-Haija, S.; Kuznetsova, A.; Rom, H.; Uijlings, J.; Popov, S.; Kamali, S.; et al. OpenImages: A Public Dataset for Large-Scale Multi-Label and Multi-Class Image Classification. 2017. Available online: https://bit.ly/34lGYLn (accessed on 4 April 2020).
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef] [Green Version]
- Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. 2017, 39, 640–651. [Google Scholar] [CrossRef]
- Konovalov, D.A.; Saleh, A.; Bradley, M.; Sankupellay, M.; Marini, S.; Sheaves, M. Underwater fish detection with weak multi-domain supervision. IEEE IJCNN 2019, 1–8. [Google Scholar] [CrossRef] [Green Version]
- Everingham, M.; Eslami, S.M.A.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018, 106, 249–259. [Google Scholar] [CrossRef] [Green Version]
- Birtles, A.; Arnold, P.; Curnock, M.; Salmon, S.; Mangott, A.; Sobtzick, S.; Valentine, P.; Caillaud, A.; Rumney, J. Code of Practice for Dwarf Minke Whale Interactions in the Great Barrier Reef World Heritage Area. 2008. Available online: https://bit.ly/36mObKD (accessed on 4 April 2020).
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
- Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the CVPR IEEE, Columbus, OH, USA, 23–28 June 2014; pp. 1717–1724. [Google Scholar] [CrossRef] [Green Version]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the CVPR IEEE, Long Beach, CA, USA, 16–20 June 2009; pp. 248–255. [Google Scholar] [CrossRef] [Green Version]
- Liu, L.; Jiang, H.; He, P.; Chen, W.; Liu, X.; Gao, J.; Han, J. On the variance of the adaptive learning rate and beyond. arXiv 2019, arXiv:1908.03265. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Sutskever, I.; Martens, J.; Dahl, G.; Hinton, G. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; Dasgupta, S., McAllester, D., Eds.; PMLR: Atlanta, GA, USA, 2013; Volume 28, pp. 1139–1147. [Google Scholar]
- Schaul, T.; Zhang, S.; LeCun, Y. No more pesky learning rates. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; Dasgupta, S., McAllester, D., Eds.; PMLR: Atlanta, GA, USA, 2013; Volume 28, pp. 343–351. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 630–645. [Google Scholar] [CrossRef] [Green Version]
- Cubuk, E.D.; Zoph, B.; Mané, D.; Vasudevan, V.; Le, Q.V. AutoAugment: Learning augmentation policies from data. arXiv 2018, arXiv:1805.09501. [Google Scholar]
- Loshchilov, I.; Hutter, F. SGDR: Stochastic gradient descent with restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
- Smith, L.N. No more pesky learning rate guessing games. arXiv 2015, arXiv:1506.01186. [Google Scholar]
- Konovalov, D.A.; Saleh, A.; Efremova, D.B.; Domingos, J.A.; Jerry, D.R. Automatic weight estimation of harvested fish from images. In Proceedings of the 2019 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Perth, Australia, 2–4 December 2019; pp. 1–7. [Google Scholar] [CrossRef] [Green Version]
- Howard, J.; Gugger, S. Fastai: A layered API for deep learning. Information 2020, 11, 108. [Google Scholar] [CrossRef] [Green Version]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in PyTorch. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Krizhevsky, A. Learning Multiple Layers of Features From Tiny Images. 2009. Available online: https://bit.ly/2HfABiJ (accessed on 4 April 2020).
- Buslaev, A.; Iglovikov, V.I.; Khvedchenya, E.; Parinov, A.; Druzhinin, M.; Kalinin, A.A. Albumentations: Fast and flexible image augmentations. Information 2020, 11, 125. [Google Scholar] [CrossRef] [Green Version]
Input Dimensions | Layer Description | Output Dimensions | ||
---|---|---|---|---|
Spatial | Channels | Spatial | Channels | |
1 | Trainable conversion to 3-channels, | |||
* | 3 | |||
3 | An ImageNet-trained CNN without its | |||
classification top (ResNet-50 was used) | 2048 ** | |||
2048 | Trainable object localization heatmap, | |||
+ sigmoid *** | 1 | |||
1 | Image classification output via maxpool | 1 |
Cycle | Training Images (Count) | Test Images (Count) | * (/Count, %) | * (/Count, %) |
---|---|---|---|---|
1 | MWPID-2014 [10] (1300) | |||
+ VOC-2012 [35] (17,000) | MWS-2018i (11,704) | 908 (7.8%) | 395 (3.4%) | |
2 | + MWS-2018i (11,704) | MWS-2018-s100 (8373) | 1973 (23.6%) | 61 (0.73%) |
3 | + MWS-2018-s100 ** | MWS-2018-s10a (≈40,000) | 377 (0.9%) | not recorded |
Final | + MWS-2018-s10a ** | MWS-2018-s10b (≈243,000) | (<0.04%) | (<0.4%) |
(16,471) *** |
Metric | Description | Mean (±Std) | |
---|---|---|---|
, | Predicted * true (TP) and false (FP) positives | ||
, | Predicted false (FN) and true (TN) negatives | ||
P | Actual test positives, | ||
N | Actual test negatives, | ||
Recall | 97.84% (±0.32%) | ||
Precision | 99.57% (±0.15%) | ||
Accuracy | 97.94% (±0.22%) | ||
F1 score | 98.70% (±0.14%) | ||
ROC AUC | Area Under the ROC Curve [14] | 99.58% (±0.12%) |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Konovalov, D.A.; Swinhoe, N.; Efremova, D.B.; Birtles, R.A.; Kusetic, M.; Hillcoat, S.; Curnock, M.I.; Williams, G.; Sheaves, M. Automatic Sorting of Dwarf Minke Whale Underwater Images . Information 2020, 11, 200. https://doi.org/10.3390/info11040200
Konovalov DA, Swinhoe N, Efremova DB, Birtles RA, Kusetic M, Hillcoat S, Curnock MI, Williams G, Sheaves M. Automatic Sorting of Dwarf Minke Whale Underwater Images . Information. 2020; 11(4):200. https://doi.org/10.3390/info11040200
Chicago/Turabian StyleKonovalov, Dmitry A., Natalie Swinhoe, Dina B. Efremova, R. Alastair Birtles, Martha Kusetic, Suzanne Hillcoat, Matthew I. Curnock, Genevieve Williams, and Marcus Sheaves. 2020. "Automatic Sorting of Dwarf Minke Whale Underwater Images " Information 11, no. 4: 200. https://doi.org/10.3390/info11040200
APA StyleKonovalov, D. A., Swinhoe, N., Efremova, D. B., Birtles, R. A., Kusetic, M., Hillcoat, S., Curnock, M. I., Williams, G., & Sheaves, M. (2020). Automatic Sorting of Dwarf Minke Whale Underwater Images . Information, 11(4), 200. https://doi.org/10.3390/info11040200