Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification
Abstract
:1. Introduction
- Biological variability: The characteristics of diseases or biological traits do not remain constant across samples due to differences in growth conditions, disease progression, environmental influences, or genetic variations. Even within the same disease category, intra-class diversity arises from natural variations in symptom expression, making it difficult for models to learn universally valid patterns.
- Acquisition inconsistency: Unlike controlled industrial imaging environments, medical and biological imaging lacks a standardized acquisition process, leading to substantial variation in lighting conditions, image resolutions, object-to-sensor distances, and camera focus. These differences introduce domain shifts in the visual characteristics of images, further complicating generalization across datasets.
- Enhanced physical realism: WS 2.0 introduces aperture modulation, capturing real-world diffraction effects beyond WS 1.0’s infinitesimally small aperture assumption.
- Increased diversity: The added hyperparameter enables finer frequency control for broader, more adaptable augmentation, especially in data-scarce scenarios.
- Seamless integration: WS 2.0 maintains the simplicity and modularity of WS 1.0, making it easily incorporable into existing preprocessing pipelines without architectural changes or high computational overhead.
2. Related Works
2.1. Traditional Data Augmentation Methods
2.2. Fractal- and Pixel-Level Augmentation Techniques
2.3. Augmentation Strategies Using Data Synthesis
2.4. Frequency Transform-Based Data Augmentation Techniques
2.5. Optical Model-Based Augmentations
3. Waveshift 2.0: The Framework
3.1. Evolution: Threads Between WS 1.0 and WS 2.0
Advancement in WS 2.0: Introducing Aperture Modulation
3.2. Theoretical Formulation of WS 2.0
3.3. Expanded Representation of the Propagator with Two Hyperparameters
3.4. Deployment Strategy
Algorithm 1 WS 2.0 augmentation implementation |
|
4. Methodology: Setup and Procedures
4.1. Datasets and Model Architectures
4.2. Augmentation Pipeline and Benchmarking
4.3. Hyperparameter Optimization with Optuna
5. Results
5.1. Performance Comparison
5.2. Execution Metrics
5.3. Hyperparameter Optimization Visualizations
6. Discussion
6.1. Impact Analysis of a New Hyperparameter
6.2. Joint Hyperparameter Analysis: Ablation and Range
6.3. Advantages
- Expanded Search Space: The addition of aperture control refines frequency modulation, enhancing adaptability to datasets with high intra-class variance.
- Improved Feature Preservation: Aperture-based modulation prevents excessive smoothing, preserving structural details in medical and fine-grained datasets.
- Robustness across Architectures: WS 2.0 remains effective in both CNN-based (EfficientNetV2 and ConvNeXt) and transformer-based (Swin Transformer) models, reinforcing its generalizability.
6.4. Limitations and Future Directions
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
WS | Waveshift Augmentation |
WS 1.0 | First version of Waveshift Augmentation |
WS 2.0 | Improved version of Waveshift Augmentation |
DA | Data augmentation |
CNN | Convolutional neural network |
ViT | Vision Transformer |
TPE | Tree-structured Parzen estimator (used in Optuna) |
GPU | Graphics processing unit |
FT | Fourier transform |
CLAHE | Contrast Limited Adaptive Histogram Equalization |
Appendix A
Appendix A.1
Appendix A.2
Appendix A.3
References
- Wayama, R.; Sasaki, Y.; Kagiwada, S.; Iwasaki, N.; Iyatomi, H. Investigation to Answer Three Key Questions Concerning Plant Pest Identification and Development of a Practical Identification Framework. Comput. Electron. Agric. 2024, 222, 109021. [Google Scholar] [CrossRef]
- Wei, X.-S.; Song, Y.-Z.; Mac Aodha, O.; Wu, J.; Peng, Y.; Tang, J.; Yang, J.; Belongie, S. Fine-Grained Image Analysis with Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 8927–8948. [Google Scholar] [CrossRef] [PubMed]
- Xu, M.; Yoon, S.; Fuentes, A.; Park, D.S. A Comprehensive Survey of Image Augmentation Techniques for Deep Learning. Pattern Recognit. 2023, 137, 109347. [Google Scholar] [CrossRef]
- Habe, H.; Yoshioka, Y.; Ikefuji, D.; Funatsu, T.; Nagaoka, T.; Kozuka, T.; Nemoto, M.; Yamada, T.; Kimura, Y.; Ishii, K. Image Augmentation Using Fractals for Medical Image Diagnosis. Adv. Biomed. Eng. 2024, 13, 327–334. [Google Scholar] [CrossRef]
- Alimisis, P.; Mademlis, I.; Radoglou-Grammatikis, P.; Sarigiannidis, P.; Papadopoulos, G.T. Advances in Diffusion Models for Image Data Augmentation: A Review of Methods, Models, Evaluation Metrics, and Future Research Directions. Artif. Intell. Rev. 2025, 58, 112. [Google Scholar] [CrossRef]
- Pozzi, M.; Noei, S.; Robbi, E.; Cima, L.; Moroni, M.; Munari, E.; Torresani, E.; Jurman, G. Generating and Evaluating Synthetic Data in Digital Pathology Through Diffusion Models. Sci. Rep. 2024, 14, 28435. [Google Scholar] [CrossRef] [PubMed]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Kayal, S.; Dubost, F.; Tiddens, H.A.W.M.; de Bruijne, M. Spectral Data Augmentation Techniques to Quantify Lung Pathology from CT-Images. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 586–590. [Google Scholar] [CrossRef]
- Imeraj, G.; Iyatomi, H. Waveshift Augmentation: A Physics-Driven Strategy in Fine-Grained Plant Disease Classification. IEEE Access 2025, 13, 31303–31317. [Google Scholar] [CrossRef]
- Bartleson, C.J.; Grum, F. (Eds.) Diffraction. In Optical Radiation Measurements, Volume 5: Visual Measurements; Academic Press: New York, NY, USA, 1980; Chapter 8. [Google Scholar]
- Iwana, B.K.; Uchida, S. An empirical survey of data augmentation for time series classification with neural networks. PLoS ONE 2021, 16, e0254841. [Google Scholar] [CrossRef] [PubMed]
- Goceri, E. Medical image data augmentation: Techniques, comparisons and interpretations. Artif. Intell. Rev. 2023, 56, 12561–12605. [Google Scholar] [CrossRef] [PubMed]
- Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 3008–3017. [Google Scholar] [CrossRef]
- Garcea, F.; Serra, A.; Lamberti, F.; Morra, L. Data augmentation for medical imaging: A systematic literature review. Comput. Biol. Med. 2023, 152, 106391. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Kebaili, A.; Lapuyade-Lahorgue, J.; Ruan, S. Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review. J. Imaging 2023, 9, 81. [Google Scholar] [CrossRef] [PubMed]
- Xu, Q.; Zhang, R.; Zhang, Y.; Wang, Y.; Tian, Q. A Fourier-based Framework for Domain Generalization. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 14378–14387. [Google Scholar] [CrossRef]
- Shao, S.; Wang, Y.; Liu, B.; Liu, W.; Wang, Y.; Liu, B. FADS: Fourier-Augmentation Based Data-Shunting for Few-Shot Classification. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 839–851. [Google Scholar] [CrossRef]
- Schwabedal, J.T.C.; Snyder, J.C.; Cakmak, A.; Nemati, S.; Clifford, G.D. Addressing Class Imbalance in Classification Problems of Noisy Signals by Using Fourier Transform Surrogates. arXiv 2018, arXiv:1806.08675. [Google Scholar]
- Arabi, D.; Bakhshaliyev, J.; Coskuner, A.; Madhusudhanan, K.; Uckardes, K.S. Wave-Mask/Mix: Exploring Wavelet-Based Augmentations for Time Series Forecasting. arXiv 2024, arXiv:2408.10951. [Google Scholar]
- Nanni, L.; Paci, M.; Brahnam, S.; Lumini, A. Comparison of Different Image Data Augmentation Approaches. J. Imaging 2021, 7, 254. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Courant, R.; Christie, M.; Kalogeiton, V. AKiRa: Augmentation Kit on Rays for Optical Video Generation. arXiv 2024, arXiv:2412.14158. [Google Scholar]
- Zhou, Y.; MacPhee, C.; Suthar, M.; Jalali, B. PhyCV: The First Physics-Inspired Computer Vision Library. In Proceedings of the SPIE PC12438, AI and Optical Data Sciences IV, San Francisco, CA, USA, 17 March 2023; p. PC124380T. [Google Scholar] [CrossRef]
- Blackledge, J.M. Optical Image Formation. In Digital Image Processing; Woodhead Publishing: Cambridge, UK, 2005; pp. 343–394. [Google Scholar] [CrossRef]
- Cap, Q.H.; Uga, H.; Kagiwada, S.; Iyatomi, H. LeafGAN: An Effective Data Augmentation Method for Practical Plant Disease Diagnosis. IEEE Trans. Autom. Sci. Eng. 2022, 19, 1258–1267. [Google Scholar] [CrossRef]
- Shibuya, S.; Cap, Q.H.; Nagasawa, S.; Kagiwada, S.; Uga, H.; Iyatomi, H. Validation of Prerequisites for Correct Performance Evaluation of Image-Based Plant Disease Diagnosis Using Reliable 221K Images Collected from Actual Fields. In Proceedings of the AI for Agriculture and Food Systems, Vancouver, BC, Canada, 28 February 2021; Available online: https://openreview.net/forum?id=md2UDQ7W_IV (accessed on 7 August 2024).
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’19), Anchorage, AK, USA, 4–8 August 2019; ACM: New York, NY, USA, 2019; pp. 2623–2631. [Google Scholar] [CrossRef]
Category | Dataset Name | Classes | Training Data (#) | Test Data (#) |
---|---|---|---|---|
Private symptomatic | Cucumber (plant disease) | 10 | 78,468 | 18,059 |
Eggplant (plant disease) | 6 | 32,516 | 3400 | |
Public symptomatic | Skin Cancer | 9 | 33,126 | 11,000 |
Ocular Disease | 2 | 5000 | 1000 | |
Public unique images | CUB-200-2011 (birds) | 200 | 5994 | 5794 |
STL-10 (objects) | 10 | 5000 | 8000 | |
Black-and-white medical imaging | Chest X-Ray (pneumonia) | 2 | 5216 | 624 |
Brain Tumor MRI | 4 | 2560 | 660 |
Characteristic | EfficientNetV2-S | ConvNeXt Base | Swin Transformer |
---|---|---|---|
Architecture Type | CNN-based | CNN-based | Transformer-based |
Characteristics | Compact, optimized for efficiency | Improved ResNet-like structure | Multi-scale self-attention |
Input Size | 512 × 512 | 512 × 512 | 384 × 384 (4 × 4 patches) |
Optimizer | Adam (learning rate = ) | ||
Training Set-up | 50 epochs max, early stopping (patience = 7), batch size = 32 |
Hyperparameter | Range | Purpose |
---|---|---|
Wavefront Distance () | 15–151 m | Controls light propagation shift |
Aperture Coefficient (R) | 0.0001–0.01 | Modulates high-frequency attenuation |
Dataset | EfficientNetV2 | ConvNeXt Base | Swin Transformer | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Upper (m) |
Macro-F1
(%) |
Upper (m) |
Macro-F1
(%) |
Upper (m) |
Macro-F1
(%) | |||||
Cucumber | No WS | - | - | 56.96 | - | - | 58.28 | - | - | 55.48 |
(10 classes) | WS 1.0 | - | 65 | 58.22 | - | 52 | 59.94 | - | 15 | 57.01 |
WS 2.0 | 0.0037 | 71 | 59.35 | 0.0019 | 69 | 60.73 | 0.0045 | 27 | 57.62 | |
Eggplant | No WS | - | - | 82.68 | - | - | 85.53 | - | - | 82.91 |
(6 classes) | WS 1.0 | - | 47 | 85.29 | - | 68 | 87.56 | - | 52 | 85.56 |
WS 2.0 | 0.0018 | 41 | 86.74 | 0.0010 | 87 | 88.12 | 0.0070 | 57 | 86.41 | |
Skin Cancer | No WS | - | - | 64.41 | - | - | 63.56 | - | - | 69.49 |
(9 classes) | WS 1.0 | - | 41 | 65.25 | - | 103 | 70.34 | - | 70 | 69.49 |
WS 2.0 | 0.0004 | 22 | 73.73 | 0.0050 | 46 | 72.03 | 0.0058 | 25 | 74.58 | |
Ocular Disease | No WS | - | - | 76.88 | - | - | 80.00 | - | - | 81.04 |
(2 classes) | WS 1.0 | - | 41 | 78.18 | - | 144 | 80.00 | - | 133 | 81.56 |
WS 2.0 | 0.0099 | 115 | 78.70 | 0.0084 | 64 | 81.04 | 0.0093 | 79 | 81.56 | |
CUB | No WS | - | - | 75.46 | - | - | 88.31 | - | - | 85.98 |
(200 classes) | WS 1.0 | - | 33 | 77.30 | - | 87 | 88.31 | - | 109 | 88.15 |
WS 2.0 | 0.0013 | 29 | 76.46 | 0.0023 | 138 | 88.81 | 0.0047 | 56 | 87.31 | |
STL10 | No WS | - | - | 94.81 | - | - | 96.45 | - | - | 98.56 |
(10 classes) | WS 1.0 | - | 28 | 95.66 | - | 76 | 96.90 | - | 77 | 99.03 |
WS 2.0 | 0.0016 | 24 | 95.83 | 0.0585 | 82 | 96.93 | 0.0780 | 72 | 99.10 | |
Chest X-Ray | No WS | - | - | 94.23 | - | - | 94.23 | - | - | 94.87 |
(2 classes) | WS 1.0 | - | 60 | 94.71 | - | 54 | 95.52 | - | 106 | 95.35 |
WS 2.0 | 0.0079 | 146 | 95.67 | 0.0054 | 38 | 96.31 | 0.0058 | 100 | 95.67 | |
Brain Tumor | No WS | - | - | 99.47 | - | - | 99.62 | - | - | 99.01 |
(4 classes) | WS 1.0 | - | 62 | 99.85 | - | 33 | 99.69 | - | 32 | 99.77 |
WS 2.0 | 0.0032 | 81 | 99.85 | 0.0053 | 72 | 99.92 | 0.0012 | 32 | 99.54 |
Dataset | DA Employed | Classification Performance in Macro F1 (%) | ||
---|---|---|---|---|
No WS | With WS 1.0 | With WS 2.0 | ||
Cucumber (10 classes) | Geometric | 56.96 | 58.22 | 59.35 |
Geo + CLAHE | 53.99 | 55.43 | 55.42 | |
Geo + AugMix | 51.72 | 54.56 | 53.72 | |
Geo + RandAug | 53.56 | 54.97 | 55.13 | |
Geo + CutMix | 57.59 | 56.32 | 55.68 | |
Geo + MixUp | 55.75 | 55.85 | 56.06 | |
Eggplant (6 classes) | Geometric | 82.68 | 85.29 | 86.74 |
Geo + CLAHE | 81.82 | 84.79 | 84.97 | |
Geo + AugMix | 83.52 | 84.82 | 84.44 | |
Geo + RandAug | 83.62 | 86.76 | 85.35 | |
Geo + CutMix | 84.59 | 85.62 | 84.97 | |
Geo + MixUp | 81.74 | 82.19 | 82.29 | |
Ocular Disease (2 classes) | Geometric | 76.88 | 78.18 | 78.70 |
Geo + CLAHE | 77.40 | 78.70 | 78.21 | |
Geo + AugMix | 76.36 | 77.40 | 77.85 | |
Geo + RandAug | 77.14 | 77.40 | 77.58 | |
Geo + CutMix | 73.77 | 77.14 | 77.15 | |
Geo + MixUp | 71.43 | 71.95 | 71.69 | |
Improvement | - | +1.39 | +1.38 |
System Details | GPU: RTX 3090, CUDA 12.1, 23.48 GB Memory | CPU: x86_64, 24 Cores, 251.78 GB Memory | |||
---|---|---|---|---|---|
DA Employed | Computational Metrics | ||||
Time Complexity | Core Mechanism | Total Time GPU (s) | Total Time CPU (s) | Speedup (GPU vs. CPU) | |
Geometric | affine warps and flips | 38.37 | 55.86 | 1.46× | |
CLAHE | local histogram equalization | 19.35 | 21.93 | 1.13× | |
AugMix | multi-op blending | 101.68 | 580.10 | 5.71× | |
RandAug | randomized op sequences | 49.76 | 214.36 | 4.31× | |
CutMix | patch substitution | 109.38 | 174.09 | 1.59× | |
MixUp | global pixel blending | 141.74 | 616.96 | 4.35× | |
WS 1.0 | phase modulation via FFT | 4.01 | 523.47 | 130.42× | |
WS 2.0 | phase + amplitude modulation | 6.33 | 820.06 | 129.48× |
Dataset | EfficientNetV2 | ConvNeXt Base | Swin Transformer | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Upper (m) |
Macro-F1
(%) |
Upper (m) |
Macro-F1
(%) |
Upper (m) |
Macro-F1
(%) | |||||
Cucumber | WS 1.0 | - | 71 | 57.96 | - | 69 | 57.83 | - | 27 | 52.23 |
(10 classes) | WS 2.0 | 0.0037 | 71 | 59.35 | 0.0019 | 69 | 60.73 | 0.0045 | 27 | 57.62 |
Eggplant | WS 1.0 | - | 98 | 84.94 | - | 87 | 88.29 | - | 57 | 85.26 |
(6 classes) | WS 2.0 | 0.0018 | 98 | 85.21 | 0.0010 | 87 | 88.12 | 0.0070 | 57 | 86.41 |
Skin Cancer | WS 1.0 | - | 22 | 51.69 | - | 46 | 64.41 | - | 25 | 60.17 |
(9 classes) | WS 2.0 | 0.0004 | 22 | 73.73 | 0.0050 | 46 | 72.03 | 0.0058 | 25 | 74.58 |
Ocular Disease | WS 1.0 | - | 115 | 74.29 | - | 64 | 79.74 | - | 79 | 78.96 |
(2 classes) | WS 2.0 | 0.0099 | 115 | 78.70 | 0.0084 | 64 | 81.04 | 0.0093 | 79 | 81.56 |
CUB | WS 1.0 | - | 29 | 75.46 | - | 138 | 89.82 | - | 56 | 87.65 |
(200 classes) | WS 2.0 | 0.0013 | 29 | 76.46 | 0.0023 | 138 | 88.81 | 0.0047 | 56 | 87.31 |
STL10 | WS 1.0 | - | 83 | 81.44 | - | 82 | 96.90 | - | 72 | 98.76 |
(10 classes) | WS 2.0 | 0.0011 | 83 | 82.49 | 0.0585 | 82 | 96.93 | 0.0780 | 72 | 99.10 |
Chest X-Ray | WS 1.0 | - | 146 | 92.95 | - | 38 | 93.27 | - | 100 | 94.71 |
(2 classes) | WS 2.0 | 0.0079 | 146 | 95.67 | 0.0054 | 38 | 96.31 | 0.0058 | 100 | 95.67 |
Brain Tumor | WS 1.0 | - | 81 | 99.31 | - | 72 | 98.55 | - | 32 | 99.77 |
(4 classes) | WS 2.0 | 0.0032 | 81 | 99.85 | 0.0053 | 72 | 99.92 | 0.0012 | 32 | 99.54 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Imeraj, G.; Iyatomi, H. Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification. Electronics 2025, 14, 1735. https://doi.org/10.3390/electronics14091735
Imeraj G, Iyatomi H. Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification. Electronics. 2025; 14(9):1735. https://doi.org/10.3390/electronics14091735
Chicago/Turabian StyleImeraj, Gent, and Hitoshi Iyatomi. 2025. "Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification" Electronics 14, no. 9: 1735. https://doi.org/10.3390/electronics14091735
APA StyleImeraj, G., & Iyatomi, H. (2025). Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification. Electronics, 14(9), 1735. https://doi.org/10.3390/electronics14091735