Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification

Imeraj, Gent; Iyatomi, Hitoshi

doi:10.3390/electronics14091735

Open AccessArticle

Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification

by

Gent Imeraj

^*

and

Hitoshi Iyatomi

Department of Applied Informatics, Graduate School of Science and Engineering, Hosei University, Tokyo 184-8584, Japan

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(9), 1735; https://doi.org/10.3390/electronics14091735

Submission received: 30 March 2025 / Revised: 16 April 2025 / Accepted: 21 April 2025 / Published: 24 April 2025

(This article belongs to the Special Issue New Trends in Computer Vision and Image Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper presents Waveshift Augmentation 2.0 (WS 2.0), an enhanced version of the previously proposed Waveshift Augmentation (WS 1.0), a novel data augmentation technique inspired by light propagation dynamics in optical systems. While WS 1.0 introduced phase-based wavefront transformations under the assumption of an infinitesimally small aperture, WS 2.0 incorporates an additional aperture-dependent hyperparameter that models real-world optical attenuation. This refinement enables broader frequency modulation and greater diversity in image transformations while preserving compatibility with well-established data augmentation pipelines such as CLAHE, AugMix, and RandAugment. Evaluated across a wide range of tasks, including medical imaging, fine-grained object recognition, and grayscale image classification, WS 2.0 consistently outperformed both WS 1.0 and standard geometric augmentation. Notably, when benchmarked against geometric augmentation alone, it achieved average macro-F1 improvements of +1.48 (EfficientNetV2), +0.65 (ConvNeXt), and +0.73 (Swin Transformer), with gains of up to +9.32 points in medical datasets. These results demonstrate that WS 2.0 advances physics-based augmentation by enhancing generalization without sacrificing modularity or preprocessing efficiency, offering a scalable and realistic augmentation strategy for complex imaging domains.

Keywords:

wavefront shifting; optics-based image transformation; Fourier transformations; domain generalization; image classification; machine learning

1. Introduction

The increasing reliance on deep learning-based image diagnosis in fields such as medical imaging and industrial inspection has underscored the critical need for robust and generalizable models. While convolutional neural networks (CNNs) and transformers (ViTs) have demonstrated remarkable accuracy in controlled environments, their effectiveness often deteriorates when applied to unseen data distributions. This decline is fundamentally due to domain shifts between training and test data distributions, where the statistical properties of the training set do not sufficiently cover the variability present in real-world test samples [1]. These challenges are particularly evident in fine-grained classification tasks, where subtle inter-class differences and limited annotated data further exacerbate generalization difficulties [2]. As a result, models trained on limited and non-diverse datasets fail to generalize effectively when exposed to previously unseen conditions.

In medical imaging applications, this issue manifests through two key challenges:

Biological variability: The characteristics of diseases or biological traits do not remain constant across samples due to differences in growth conditions, disease progression, environmental influences, or genetic variations. Even within the same disease category, intra-class diversity arises from natural variations in symptom expression, making it difficult for models to learn universally valid patterns.
Acquisition inconsistency: Unlike controlled industrial imaging environments, medical and biological imaging lacks a standardized acquisition process, leading to substantial variation in lighting conditions, image resolutions, object-to-sensor distances, and camera focus. These differences introduce domain shifts in the visual characteristics of images, further complicating generalization across datasets.

To address overfitting caused by insufficient data diversity and improve model robustness, data augmentation techniques have been extensively employed. Comprehensive studies have categorized various augmentation strategies, providing insights into their effectiveness across different applications [3].

Commonly adopted data augmentation techniques include geometric transformations (e.g., rotation, scaling, flipping, and color adjustments), pixel-level enhancements (e.g., CLAHE), and compositional augmentations (e.g., MixUp, AugMix, and RandAugment) [4]. While effective, these techniques rely on mathematical transformations and basic image processing, often failing to simulate natural variations in image modalities and pathological structures.

In parallel, data synthesis strategies, notably diffusion models (DMs), and latent diffusion models (LDMs), have gained attention for their ability to generate high-fidelity synthetic samples from learned data distributions [5]. While these models have demonstrated strong performance in medical imaging and data-scarce scenarios, their focus on generating entirely new images—rather than augmenting existing ones—can limit their usefulness in contexts where preserving diagnostic fidelity is essential [6].

With the growing interest in combining physics with AI, physics-based augmentation methods have emerged, incorporating real-world image formation principles to generate more realistic variations. For instance, neural rendering [7] leverages deep learning to model radiance fields and material properties, enabling the synthesis and manipulation of photorealistic images. However, its high computational cost limits scalability in certain applications. Additionally, some Fourier-based augmentations, which modify global frequency spectra [8], have been explored for domain generalization in medical imaging, benefiting from high GPU processing efficiency. However, previous methods have struggled with low-frequency dominant images, such as medical scans, where preserving broad contextual information is essential.

To bridge the gap, our previous work introduced Waveshift Augmentation (WS 1.0) [9], a Fourier-based augmentation technique inspired by the physics of light propagation. WS 1.0 simulated shifting an image along its light wavefronts (i.e., propagation along the z axis), accounting for phase-based transformations to generate physically realistic augmentations. This pioneering method proved particularly effective in plant disease diagnosis and fine-grained classification tasks, where photography inconsistencies and symptom variability significantly impact model performance. While WS 1.0 demonstrated solid results, it operated under a simplified optical model, assuming an infinitesimally small aperture with uniform illumination. Although this approach provided a precise idealized framework, it left room for further refinement. In real-world imaging, the aperture size influences light intensity distribution and diffraction, shaping how images are captured.

Building on this foundation, we present Waveshift Augmentation 2.0 (WS 2.0), a more advanced augmentation method that extends the original framework by incorporating realistic optical aperture effects. The key innovation in WS 2.0 is an aperture-controlled hyperparameter which models the radially symmetric amplitude distribution in the propagator, as observed in Fraunhofer diffraction and Airy disk formation [10]. This refinement enables WS 2.0 to simulate light intensity decay, capturing how images are naturally formed in finite apertures rather than idealized optical conditions. As a result, WS 2.0 effectively attenuates high frequencies, generating a wider range of augmented images that enhance data diversity and model generalization by addressing both biological variability and acquisition inconsistencies. WS 2.0 is designed as a general-purpose augmentation framework applicable to a variety of image-based machine learning tasks, although fine-grained tasks such as medical and agricultural imaging remain its primary applications.

This work establishes WS 2.0 as a robust augmentation method that bridges the gap between traditional augmentation and physics-inspired transformations. Our key contributions include the following:

Enhanced physical realism: WS 2.0 introduces aperture modulation, capturing real-world diffraction effects beyond WS 1.0’s infinitesimally small aperture assumption.
Increased diversity: The added hyperparameter enables finer frequency control for broader, more adaptable augmentation, especially in data-scarce scenarios.
Seamless integration: WS 2.0 maintains the simplicity and modularity of WS 1.0, making it easily incorporable into existing preprocessing pipelines without architectural changes or high computational overhead.

2. Related Works

In the realm of diagnostic analysis, data augmentation (DA) has been pivotal in enhancing model robustness and addressing data scarcity. Over time, various DA techniques have evolved, with each leveraging different mechanisms to enhance machine learning performance. Iwana and Uchida [11] conducted a large-scale evaluation of 12 DA methods across 128 datasets and six neural network architectures, providing valuable insights into their effectiveness and limitations. This section examines the progression of DA strategies, their applications in different model architectures, and the challenges they aim to overcome.

2.1. Traditional Data Augmentation Methods

Traditional data augmentation (DA) techniques have long been used to expand training datasets, mitigating overfitting and improving generalization. Common approaches include geometric transformations (e.g., rotation, scaling, and flipping), which improve spatial invariance by enabling models to recognize features regardless of position or orientation [12]. Color space modifications, such as changes in brightness, contrast, and hue, simulate varying lighting conditions and sensor sensitivities, enhancing robustness. Additionally, noise injection (e.g., Gaussian or salt-and-pepper noise) introduces realistic imperfections, encouraging models to learn more discriminative features. While these methods are effective for general tasks and widely used in training architectures like EfficientNet and ConvNeXt, they are often limited in capturing the nuanced variations present in medical and fine-grained imaging domains.

2.2. Fractal- and Pixel-Level Augmentation Techniques

Beyond traditional methods, more advanced augmentation strategies have emerged. Fractal-based techniques such as AugMix and RandAugment combine multiple transformations to generate complex, diverse variants, improving model robustness and uncertainty estimation [13]. While effective in natural image tasks, their use in medical imaging must be carefully constrained to avoid introducing unrealistic artifacts. Pixel-level approaches like Contrast Limited Adaptive Histogram Equalization (CLAHE) enhance image contrast by redistributing pixel intensities, improving visibility in modalities such as retinal scans [14]. However, when overapplied, CLAHE may amplify noise, reducing diagnostic reliability.

2.3. Augmentation Strategies Using Data Synthesis

To address the limitations of mathematically defined augmentations, image synthesis methods have gained traction. Generative adversarial networks (GANs) generate realistic synthetic samples by training a generator-discriminator pair in an adversarial set-up and have been used to augment scarce medical datasets such as MRI and CT scans [15]. Despite their effectiveness, GANs require intensive tuning and may produce artifacts that compromise diagnostic integrity. More recently, diffusion models (DMs) have emerged as a powerful alternative, generating high-fidelity images through iterative denoising processes [16]. However, their application in medical imaging remains exploratory due to computational complexity and a lack of standardization.

2.4. Frequency Transform-Based Data Augmentation Techniques

Recent advances have explored the use of frequency-based transformations, such as Fourier and wavelet transforms, for data augmentation. These techniques allow for spectral manipulation, improving model generalization by introducing frequency-domain variability. Xu et al. [17] proposed amplitude spectrum swapping to enhance domain robustness, while Shao et al. [18] demonstrated that frequency manipulation boosts performance in few-shot classification tasks. Schwabedal et al. [19] used randomized Fourier phase components to address class imbalance in biomedical signals, reporting a 7% gain in F1 scores. In time series forecasting, Arabi et al. [20] introduced wavelet-based masking and mixing strategies that improved temporal pattern learning. Additionally, Nanni et al. [21] showed that combining multiple spectral techniques—Fourier, Radon, and DCT—can yield classifiers that outperform conventional approaches. Overall, these methods demonstrate the potential of frequency-based DA to enhance robustness across domains and tasks.

2.5. Optical Model-Based Augmentations

Optics-inspired augmentation techniques simulate real-world imaging conditions by incorporating principles such as wavefront propagation, diffraction, and aperture modulation. These approaches enhance domain generalization by introducing transformations grounded in physical image formation. For instance, the AKiRa framework [22] enables precise control over optical parameters like aperture size and focus depth, increasing feature diversity without compromising diagnostic relevance. Similarly, PhyCV [23] integrates physics-driven convolutional operators to emulate light propagation effects in neural networks. Among these, Waveshift Augmentation (WS 1.0) introduced a novel Fourier-domain strategy to simulate image shifts along wavefronts, improving model robustness against structural variations, particularly in fine-grained classification tasks such as plant disease diagnosis.

3. Waveshift 2.0: The Framework

3.1. Evolution: Threads Between WS 1.0 and WS 2.0

WS 1.0 introduced a physics-inspired augmentation framework based on wave optics, simulating how light propagates from an object to a camera through free space. Unlike conventional methods that apply transformations in the pixel domain, WS 1.0 operates in the frequency domain, leveraging principles from Fourier optics. The original image

s_{0}^{c} (x, y)

is first transformed via a two-dimensional Fourier transform to obtain its spectral representation

S_{0}^{c} (u, v)

, where

c \in {red | λ_{red} = 620 nm, green | λ_{green} = 535 nm, blue | λ_{blue} = 450 nm}

denotes the RGB color channels, each associated with its characteristic wavelength. The image is then decomposed into its spatial frequency components—a standard technique in computational optics—which enables the simulation of light propagation behavior [24].

Wavefront propagation is then simulated by applying a physics-derived propagator function to the image spectrum (derived using Equation (5) in [9]). The propagator modulates the phase of each frequency component based on a virtual wavefront distance z:

W_{1.0}^{c} (u, v, z) = \exp (- j π λ r^{2} z),

(1)

where u and v are spatial frequency indices,

r = \sqrt{u^{2} + v^{2}}

is the spatial frequency radius, and

λ

is the wavelength of the light for a channel c. The parameter z controls the simulated distance between the object and the virtual camera, introducing a physically meaningful phase shift across the spectrum.

The frequency-modulated spectrum is then brought back to the spatial domain via an inverse Fourier transform to yield the augmented image:

S_{z}^{c} (u, v) = S_{0}^{c} (u, v) \cdot W_{1.0}^{c} (u, v, z), s_{z}^{c} (x, y) = F^{- 1} {S_{z}^{c} (u, v)} .

(2)

This augmentation simulates how an object would appear at varying distances from the camera by modeling phase changes in light propagation, based on scalar diffraction theory. By applying wavefront-induced frequency modulation, WS 1.0 introduced a physically grounded augmentation technique. However, its assumption of an infinitesimally small aperture limited its ability to reproduce real-world optical effects, leaving room for further enhancement.

Advancement in WS 2.0: Introducing Aperture Modulation

WS 2.0 builds upon the WS 1.0 framework by introducing a second hyperparameter—the aperture coefficient R—thus extending the modulation space from a single dimension (z) to two dimensions (

z, R

), as illustrated in Figure 1. While WS 1.0 introduced phase-based frequency modulation to simulate wavefront propagation, WS 2.0 incorporates spatially dependent amplitude modulation derived from diffraction theory. This addition allows more adaptive attenuation of high-frequency components, better capturing the physical behavior of light as it passes through real optical systems.

The updated propagator in WS 2.0 is modeled as the product of the original WS 1.0 phase modulation and an Airy [24] disk–based amplitude decay:

W_{2.0}^{c} (u, v, z, R) = orm [I_{Airy} (u, v)] W_{1.0}^{c} (u, v, z) = norm [{(\frac{2 J_{1} (R r)}{R r})}^{2}] \exp (- j π λ r^{2} z) .

(3)

Here,

J_{1}

is the first-order Bessel function representing the diffraction pattern formed by a circular aperture (see Section 3.2). The aperture coefficient R (derived from the physical aperture diameter D, as defined in Equation (6)) governs the radial attenuation of the frequency components. The function

norm [\cdot]

denotes normalization such that the amplitude at the center frequency is scaled to one. In practice, this is achieved by dividing all values in the Airy pattern by its maximum, ensuring that the central peak remains at unit intensity while the surrounding components’ decay according to the diffraction-based formulation. The exponential term is inherited from WS 1.0 and models the phase shift induced by the virtual wavefront distance z.

As in WS 1.0, the augmented frequency representation is obtained by multiplying the original image spectrum by the new propagator:

S_{z, R}^{c} (u, v) = S_{0}^{c} (u, v) W_{2.0}^{c} (u, v, z, R) .

(4)

WS 2.0 retains the computational efficiency of its predecessor by remaining entirely in the Fourier domain, but it achieves a higher degree of realism and variability through physics-informed attenuation. This makes it more adaptable to datasets with varying structural complexity and imaging conditions.

3.2. Theoretical Formulation of WS 2.0

In optical systems, when light propagates through a finite circular aperture such as a camera lens, diffraction occurs. This wave behavior is modeled well by Fraunhofer diffraction theory and produces the characteristic Airy disk pattern, consisting of a bright central peak with concentric rings. The diffraction-limited resolution is given by

I_{Airy} (u, v) = {(\frac{2 J_{1} (\frac{π D r}{λ})}{\frac{π D r}{λ}})}^{2},

(5)

where D is the physical diameter of the imaging aperture. This intensity profile describes how the aperture affects the distribution of energy in the frequency domain, particularly by attenuating high-frequency components more strongly.

To incorporate these diffraction effects into augmentation, WS 2.0 reparameterizes the aperture impact using a dimensionless aperture coefficient, labeled R:

R = \frac{π D}{λ}

(6)

In optical physics, this ratio reflects the resolving power of an imaging system and determines the angular spread of the diffraction pattern. Smaller R values correspond to tighter apertures that preserve high-frequency details, while larger values simulate broader apertures with stronger suppression of fine spatial structures. Since images in our experiments were typically captured at close ranges (within 3 m), a relatively large aperture diameter D was required to meaningfully modulate the diffraction pattern.

Incorporating the Airy-based modulation into WS 2.0 results in a compound propagator with two tunable components, namely z, controlling phase-based wavefront progression, and R, governing amplitude-based frequency attenuation. This dual-parameter design allows for more flexible and physically realistic transformations.

The final augmented image

s_{z, R}^{c} (x, y)

is reconstructed from its modulated spectrum using an inverse Fourier transform:

s_{z, R}^{c} (x, y) = F^{- 1} {S_{z, R}^{c} (u, v)} = F^{- 1} {S_{0}^{c} (u, v) W_{2.0}^{c} (u, v, z, R)}

(7)

Together, z and R offer complementary controls for simulating wavefront shifts and aperture-induced attenuation, enabling more expressive and domain-adaptive augmentations that preserve structural integrity. WS 2.0 thus bridges the gap between computational efficiency and physical realism, enhancing the generalization capacity of machine learning models across diverse tasks.

3.3. Expanded Representation of the Propagator with Two Hyperparameters

The transition from WS 1.0 to WS 2.0 introduces a second hyperparameter—the aperture coefficient R—alongside the wavefront distance z, thereby expanding the control space and enhancing the expressiveness of the propagator. While z modulates the phase evolution of frequency components, R governs radial amplitude attenuation based on physical aperture constraints. Together, they enable more flexible and realistic augmentation strategies grounded in diffraction theory.

In WS 1.0, the propagator

W (u, v, z)

depended solely on the wavefront distance, allowing it to be visualized in two dimensions. WS 2.0 generalizes this formulation to

W (u, v, z, R)

, introducing a fundamentally two-dimensional hyperparameter space.

Figure 2 presents the propagator surface plots across selected

(z, R)

pairs, offering direct insight into how each hyperparameter affects frequency modulation. The frequency of wavefront oscillations increase with z, while R introduces a radial tapering effect, simulating physical attenuation from aperture diffraction.

To complement the structural perspective, Figure 3 illustrates the corresponding image space effects across the same

(z, R)

configurations. Larger z values simulate stronger wavefront propagation, increasing optical blurring, whereas higher R values result in stronger suppression of fine image details. These combined transformations lead to smooth and coherent augmentations that respect the structural integrity of the original image.

As illustrated, the current WS 2.0 formulation strikes a balance between physical interpretability and operational flexibility. The two hyperparameters, z and R, offer intuitive control over the strength and structure of augmentations, with their grounding in optics ensuring meaningful modulation patterns (for readers interested in extended theoretical behavior, an exploratory visualization is provided in Appendix A.1). These visualizations provide both theoretical and empirical justification for the dual-hyperparameter design of WS 2.0, which we further refine through automated search in Section 4.3.

3.4. Deployment Strategy

To operationalize WS 2.0 within standard image classification pipelines, the augmented image is generated in the frequency domain using both wavefront shift (z) and aperture modulation (R) as hyperparameters. RGB images are processed channel-wise to apply Fourier-based transformations independently, as outlined in Algorithm 1. For grayscale inputs, the same process is applied without channel splitting.

Algorithm 1 WS 2.0 augmentation implementation

Require:: Image $s_{0}^{c} (x, y)$ , hyperparameters: $z \in R^{+}$ (wavefront distance), $R \in R^{+}$ (aperture coefficient), $c \in {R, G, B}$
Ensure:: Augmented image $s_{z, R}^{c} (x, y)$
1:: Fixed parameters: wavelength ${R | λ_{r e d}, G | λ_{g r e e n}, B | λ_{b l u e}}$
2:: Split the image $s_{0}^{c} (x, y)$ into RGB channels: $s_{0}^{R} (x, y), s_{0}^{G} (x, y), s_{0}^{B} (x, y)$
3:: for each channel $c \in {R, G, B}$ do
4:: Compute the Fourier transform: $S_{0}^{c} (u, v) = F {s_{0}^{c} (x, y)}$
5:: Construct the WS 2.0 propagator $W_{2.0}^{c} (u, v, z, R)$ using Equation (3)
6:: Apply the propagator in Fourier space:
$s_{z, R}^{c} (x, y) = F^{- 1} {S_{0}^{c} (u, v) \cdot W_{2.0}^{c} (u, v, z, R)}$
7:: end for
8:: Merge the channels $s_{z, R}^{R} (x, y), s_{z, R}^{G} (x, y), s_{z, R}^{B} (x, y)$ to form the final image $s_{z, R}^{c} (x, y)$

4. Methodology: Setup and Procedures

Our proposed method follows the same modular framework as WS 1.0, allowing seamless integration into existing augmentation pipelines.

4.1. Datasets and Model Architectures

To evaluate the robustness of WS 2.0 across diverse imaging conditions, we curated datasets spanning four distinct categories: private plant disease datasets, public symptomatic datasets, object recognition datasets with unique distributional characteristics, and grayscale medical imaging datasets (Table 1). The private dataset (cucumber and eggplant) comprises 132,443 high-resolution images (≥2000 × 1500 px), each accurately labeled and split into training and testing sets based on farm origin to prevent overfitting [25,26]. All samples were collected under strict disease control protocols from geographically distinct farms. The public datasets, sourced from Kaggle—https://www.kaggle.com/datasets (accessed on 10 November 2024)—included skin cancer (9 classes), ocular disease (2 classes), STL-10 (10 classes), and CUB-200-2011 (200 classes).

The experimental set-up maintained a strict train-test separation, ensuring that test data originated from distinct sources. For private symptomatic datasets, images were split based on geographic origin (different farms), preventing direct overlap between the training and test domains. For public datasets, standard benchmark splits were preserved. Performance was evaluated using macro-F1 scores for the test data. This set-up provides an upper bound on achievable performance under controlled conditions, allowing a standardized comparison of WS 2.0 against baseline augmentation methods.

WS 2.0 was evaluated using three state-of-the-art deep learning models (see Table 2), each representing different architectural paradigms. All models used ImageNet-pretrained weights and were fine-tuned on each dataset while following the same configuration.

4.2. Augmentation Pipeline and Benchmarking

To validate the effectiveness and versatility of WS 2.0, we conducted two benchmarking experiments, with each comparing it against WS 1.0 and a baseline without wavefront augmentation.

Experiment 1 (geometric augmentation baseline): The first evaluation isolated the contribution of wavefront-based augmentation. All models were trained under three configurations: (1) standard geometric augmentation only (random flips, rotations from 0 to

2 π

, and random cropping from 80% to 100% with a fixed 1:1 aspect ratio), (2) geometric + WS 1.0 using fixed z-based phase modulation, and (3) geometric + WS 2.0, extending WS 1.0 with aperture-based attenuation via R.

Experiment 2 (integration with established augmentation techniques): In the second experiment, we benchmarked WS 2.0 within more sophisticated data augmentation pipelines, namely CLAHE, AugMix, RandAugment, CutMix and MixUp. In all cases, WS 2.0 was compared against WS 1.0 and the corresponding baseline without WS to confirm its modularity and reinforce its value as a composable, physics-based augmentation method.

Together, these two experiments assessed both the standalone effectiveness of WS 2.0 and its seamless integration into advanced augmentation pipelines, demonstrating its versatility across diverse preprocessing workflows.

4.3. Hyperparameter Optimization with Optuna

To tune WS 2.0 effectively while preserving both classification performance and visual coherence, we defined the hyperparameter search bounds for the wavefront distance (z) and aperture coefficient (R) based on a combination of qualitative image assessments and preliminary experiments with extended ranges. This process ensured that the selected values produced realistic transformations without compromising structural features. The final parameter bounds are summarized in Table 3.

To automate hyperparameter tuning, we employed Optuna [27], a Bayesian optimization framework that efficiently navigates high-dimensional search spaces. The objective was to identify (z, R) combinations that maximized the macro-F1 score for the validation set, ensuring that WS 2.0 operated under its best-performing configuration. Unlike WS 1.0, where z was fixed manually, WS 2.0 applies a data-driven strategy using tree-structured Parzen estimators (TPEs), which iteratively refine the search by focusing on promising regions and discarding unproductive ones.

This process enables adaptive fine-tuning across datasets, balancing augmentation strength and classification performance. A deeper quantitative analysis of the impact of these hyperparameters is presented in Section 6.2.

5. Results

5.1. Performance Comparison

Table 4 summarizes the performance across all datasets using EfficientNetV2, ConvNeXt Base, and Swin Transformer. The evaluation includes three augmentation settings: geometric only, geometric with WS 1.0, and geometric with WS 2.0. Across all configurations, WS 2.0 provided consistent improvements in the macro-F1 score, with no degradation observed in any model-dataset pair. Compared with WS 1.0, WS 2.0 achieved higher accuracy in 87.5% of the cases, confirming the added value of aperture-controlled modulation. Notably, the average improvements were +1.48 points (EfficientNetV2), +0.65 points (ConvNeXt), and +0.73 points (Swin Transformer). The most significant gain was observed in the skin cancer dataset (+9.32 points), emphasizing the effectiveness of physics-based augmentation in fine-grained classification tasks, where subtle textural differences are critical for accuracy.

Table 5 extends the analysis by evaluating WS 2.0’s integration within broader augmentation pipelines (CLAHE, AugMix, RandAugment, CutMix, and MixUp). In most cases, WS 2.0 matched or outperformed WS 1.0 and standalone methods, confirming its modularity and compatibility with state-of-the-art preprocessing strategies.

Together, these results demonstrate that WS 2.0 strengthens frequency-domain augmentation by incorporating physically grounded amplitude modulation, yielding measurable gains across diverse classification scenarios.

5.2. Execution Metrics

Despite their higher theoretical time complexity (Table 6, “Time Complexity” column), both Waveshift variants demonstrated the fastest GPU execution times among all tested augmentation methods (Table 6, “Total Time GPU (s)” column). This efficiency is largely attributed to their frequency-domain formulation, which is highly parallelizable and well suited for GPU acceleration. While WS 2.0 was slightly slower than WS 1.0 due to the added amplitude modulation step involving the Bessel function, the absolute difference remained minimal and still significantly faster than other methods. WS 2.0 outperformed all other methods in terms of execution speed and also achieved the highest GPU-to-CPU speedup ratio, underscoring its practicality for real-time data augmentation during training.

5.3. Hyperparameter Optimization Visualizations

Optuna’s adaptive optimization process iteratively refines the search space for hyperparameter configurations of WS 2.0 across different datasets and model architectures. The contour plots in Figure 4 and Figure A2 illustrate the distribution of high-performance regions, revealing dataset-specific trends in the wavefront distance (z) and aperture coefficient (R).

Datasets requiring fine-grained classification, such as CUB-200 and skin cancer (EfficientNetV2), benefitted from lower z values (15–42m), where subtle texture differences were preserved more effectively. In contrast, datasets involving broader structural features, such as the brain tumor, ocular disease, and chest X-ray datasets, exhibited improved performance at higher z values (75–145m), likely due to the need for greater retention of contextual information. The aperture coefficient R shows that it plays a critical role in modulating high-frequency attenuation, with optimal values for many datasets clustering around 0.0007–0.004, ensuring adequate frequency suppression without signal degradation.

Notably, different architectures exhibited distinct optimization patterns even on the same dataset, suggesting that WS 2.0 responds differently to each model’s design. This highlights its adaptability and reinforces the importance of architecture-aware tuning, as the optimal parameter pairs shifted across datasets and model configurations.

6. Discussion

6.1. Impact Analysis of a New Hyperparameter

The transition from WS 1.0 to WS 2.0 introduced aperture-controlled augmentation, expanding the hyperparameter search space from a single variable (z) to a two-dimensional space (z, R). This modification allowed for finer control over both global structural shifts (via wavefront propagation) and local contrast variations (via aperture modulation), enhancing the adaptability of the augmentation technique across diverse datasets. A key observation was the stability of WS 2.0 across multiple imaging domains, including symptomatic medical conditions, fine-grained object classification, unique feature datasets, and grayscale medical imaging. In all cases, WS 2.0 improved model generalization, demonstrating its robustness beyond any single application.

The introduction of an additional hyperparameter also revealed dataset-specific sensitivities. Certain datasets, such as CUB-200 (fine-grained bird classification), were more affected by wavefront shifting (z), which preserves macro-structural integrity, whereas medical datasets (e.g., skin cancer and chest X-ray) benefited more from aperture control (R), which modulates high-frequency details while simulating imaging variability. This indicates that WS 2.0 can flexibly adapt to the structural properties of diverse imaging domains, enabling a more targeted and domain-aware augmentation strategy.

Furthermore, WS 2.0 maintained stability across different neural network architectures, including convolution-based models (EfficientNetV2, ConvNeXt) and transformer-based models (Swin Transformer). While CNNs exhibited greater sensitivity to the augmentation due to their reliance on spatial frequency representations, ViTs also showed measurable gains, suggesting that the augmentation enhanced the global image structure and contrasted in ways that benefit self-attention mechanisms. On average, CNN-based models improved by 2.58 points and 2.24 points, while ViTs achieved gains of 1.81 points, reinforcing the broad applicability of the technique.

6.2. Joint Hyperparameter Analysis: Ablation and Range

Analysis of the high-performance regions revealed dataset-specific sensitivities to the wavefront distance (z) and aperture modulation (R). In datasets where optimal values clustered along the R axis (cucumber, skin cancer, and ocular disease), aperture modulation was the dominant factor, enhancing contrast and high-frequency attenuation. Conversely, datasets where optimal values aligned along the z axis (eggplant, CUB-200, and brain tumor) relied primarily on wavefront propagation, suggesting phase-based transformations as key contributors. For datasets where high-performance regions were equidistant from both axes, both hyperparameters were essential, reinforcing the complementary nature of WS 2.0.

To isolate the contribution of aperture modulation (R), we performed a targeted ablation by taking the best-performing (z, R) pairs obtained from WS 2.0 and re-evaluating the performance after setting

R = 0

, effectively reducing the model to WS 1.0. This controlled set-up allowed us to directly assess the added value of the amplitude modulation introduced in WS 2.0. As shown in Table 7, noticeable performance improvements were retained across several datasets—particularly in medical imaging—highlighting the importance of R in enhancing contrast sensitivity and fine structural feature learning.

While the full search space was defined as z∈ [15 m, 151 m] and R∈ [0.0001, 0.01], Optuna’s optimization process revealed a “favored exploration region”, where the majority of the top-performing trials will often be concentrated (notable in Figure 4c,g,k,l,o and Figure A2b,f). This elliptical region, centered around dataset-specific optimal values, exhibited a constrained range of approximately ±42 m in z and ±0.001 in R. The emergence of this localized high-density performance region suggests that hyperparameter tuning can be significantly accelerated by adapting search boundaries dynamically. Instead of exhaustive full-range exploration, an iterative refinement strategy, where initial trials broadly sample the search space before progressively narrowing around the detected favored exploration region, can enhance computational efficiency and fine-tuning precision. This adaptive approach ensures effective generalization across datasets while reducing the computational overhead of exhaustive hyperparameter sweeps.

To better understand the effect of hyperparameter boundaries in WS 2.0, we conducted targeted experiments by extending the search space beyond the default settings. As shown in Figure A3, increasing the aperture coefficient to

R = 0.1

or the wavefront distance to

z = 500

consistently led to performance degradation across the tested datasets. Excessively large values of z and R introduced severe frequency modulation or attenuation, distorting important structural features in the images. While such extended ranges may benefit certain cases, we chose the default tuning ranges (

z \in [15, 151]

,

R \in [0.0001, 0.01]

) to ensure a fair and consistent evaluation across diverse datasets. Together with the qualitative observations in Figure 3, these results support the practicality of the selected range, striking a balance between augmentation diversity, stability, and general applicability.

These findings emphasize the necessity of properly constraining the hyperparameter search space. Without careful regulation, unbounded optimization can lead to unstable training dynamics, as seen in cases where high R values introduced non-ideal augmentation effects. In contrast, restricting R within an empirically validated range maintained augmentation effectiveness while preserving feature integrity.

6.3. Advantages

WS 2.0 builds upon WS 1.0 by introducing aperture modulation (R), expanding the hyperparameter search space from 1D (z) to 2D (z, R), and improving adaptability across diverse datasets. Several limitations from WS 1.0 have been addressed:

Expanded Search Space: The addition of aperture control refines frequency modulation, enhancing adaptability to datasets with high intra-class variance.
Improved Feature Preservation: Aperture-based modulation prevents excessive smoothing, preserving structural details in medical and fine-grained datasets.
Robustness across Architectures: WS 2.0 remains effective in both CNN-based (EfficientNetV2 and ConvNeXt) and transformer-based (Swin Transformer) models, reinforcing its generalizability.

6.4. Limitations and Future Directions

While WS 2.0 introduces a more physically grounded and expressive augmentation framework, several limitations and open challenges remain. First, the selection of hyperparameters (z, R) is still semi-manual. Although Bayesian optimization via Optuna aids in efficient tuning, a fully automated and interpretable mechanism that adapts these values based on a dataset’s characteristics has yet to be developed. Second, the augmentation’s impact varies across domains—particularly in low-frequency-dominant datasets like medical imaging—suggesting that a globally applied augmentation may not always be optimal. This raises the potential for adaptive propagators or region-aware augmentation strategies that respond to local frequency distributions within an image.

These are not strict limitations but rather avenues for future exploration. Addressing them could significantly improve WS 2.0’s versatility and performance across broader imaging scenarios. Future work will focus on incorporating dataset-aware hyperparameter prediction and exploring spatially adaptive modulation mechanisms.

7. Conclusions

This study introduced Waveshift Augmentation 2.0 (WS 2.0), a significant advancement in physics-driven augmentation, integrating aperture-controlled modulation with wavefront shifting to enhance augmentation diversity and structural feature preservation. Expanding the hyperparameter space (

z, R

) allowed for finer frequency modulation control, leading to consistent performance improvements across CNNs and ViTs. Bayesian optimization identified stable hyperparameter regions, reinforcing WS 2.0 as a robust and scalable augmentation method. These advancements establish WS 2.0 as a solid foundation for frequency-based augmentation, with future directions focusing on automated hyperparameter tuning and localized augmentation strategies to further extend its applicability.

Author Contributions

Conceptualization, G.I. and H.I.; methodology, G.I.; software, G.I.; validation, G.I. and H.I.; formal analysis, G.I.; investigation, G.I.; resources, H.I.; data curation, H.I.; writing—original draft preparation, G.I.; writing—review and editing, G.I. and H.I.; visualization, G.I.; supervision, H.I.; project administration, H.I.; funding acquisition, H.I. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Agriculture, Forestry and Fisheries of Japan (MAFF), who commissioned a project study on the development of pest diagnosis technology using AI (JP17935051), and by the Cabinet Office, Public/Private R&D Investment Strategic Expansion Program (PRISM).

Data Availability Statement

Two of our experimental datasets (cucumber and eggplant) comprised proprietary plant disease images collected through the JP17935051 project. While these datasets cannot be made publicly available, they were pivotal in testing our technique, originally aimed at enhancing plant disease classification.

Acknowledgments

We would like to express our sincere thanks to all of the experts and test site personnel who actually grew the plants in their respective fields, strictly controlled pests and diseases, and took an extremely large number of high-quality photographs in conducting this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

WS	Waveshift Augmentation
WS 1.0	First version of Waveshift Augmentation
WS 2.0	Improved version of Waveshift Augmentation
DA	Data augmentation
CNN	Convolutional neural network
ViT	Vision Transformer
TPE	Tree-structured Parzen estimator (used in Optuna)
GPU	Graphics processing unit
FT	Fourier transform
CLAHE	Contrast Limited Adaptive Histogram Equalization

Appendix A

Appendix A.1

Beyond the physically interpretable regime of real, non-negative

(z, R)

values, we explored a broader set of hyperparameter combinations to examine whether the theoretical formulation of the WS 2.0 propagator exhibited any engineering flexibility in its structural behavior. A systematic sweep across real (positive and negative) and imaginary values is visualized in Figure A1.

Sign-inverted configurations—such as

(- z, R)

and

(z, - R)

—preserve both the intensity structure and the propagation phase in Fourier space, resulting in mirrored patterns that remain consistent with the underlying physical formulation.

Imaginary-valued configurations (i.e.,

z \to j z

and

R \to j R

) produce qualitatively different surfaces. Imaginary z values produce inverse propagators, with characteristics resembling high-pass filtering, while the imaginary value R disrupts the expected diffraction structure, leading to unstable or incoherent patterns compared with the physics-derived formulation.

Although these configurations (shown as screenshots from animated visualizations (GIFs) rendered over

z \in [1, 151]

m with fixed

R = 0.01

) were not used in our practical augmentation, they are included here to highlight the sensitivity and structured constraints of the WS 2.0 propagator. The full code and GIFs are available at https://github.com/IyatomiLab/Waveshift_Augmentation (accessed on 11 April 2025).

This emphasizes that its expressiveness stems from its physical derivation, in contrast to arbitrary mathematical modifications lacking gain, interpretability, or coherence.

Figure A1. Theoretical propagator surfaces under extended hyperparameter configurations. Fourier space intensity patterns are visualized for various fixed combinations of z and R, where bright yellow indicates high intensity and dark blue indicates low. (a) Sign-inverted configurations—such as

(- z, R)

and

(z, - R)

—yielded mirrored but structurally preserved surfaces, reflecting the inherent physical symmetry of the formulation. (b) Imaginary-valued configurations—

z \to j z

and

R \to j R

—introduced structural instability or high-pass-dominant patterns, deviating from physical diffraction behavior. Although only real, non-negative

(z, R)

values were used during actual augmentation, these visualizations emphasize the sensitivity and structured behavior of the WS 2.0 propagator. Its expressiveness stems directly from its physics-based derivation, without benefiting from any arbitrary mathematical perturbation that lacks such grounded coherence.

Figure A1. Theoretical propagator surfaces under extended hyperparameter configurations. Fourier space intensity patterns are visualized for various fixed combinations of z and R, where bright yellow indicates high intensity and dark blue indicates low. (a) Sign-inverted configurations—such as

(- z, R)

and

(z, - R)

—yielded mirrored but structurally preserved surfaces, reflecting the inherent physical symmetry of the formulation. (b) Imaginary-valued configurations—

z \to j z

and

R \to j R

—introduced structural instability or high-pass-dominant patterns, deviating from physical diffraction behavior. Although only real, non-negative

(z, R)

values were used during actual augmentation, these visualizations emphasize the sensitivity and structured behavior of the WS 2.0 propagator. Its expressiveness stems directly from its physics-based derivation, without benefiting from any arbitrary mathematical perturbation that lacks such grounded coherence.

Appendix A.2

This appendix provides additional visualizations of the hyperparameter search space for WS 2.0, complementing the results presented in the main text. Figure A2 displays contour maps across different datasets and model architectures, illustrating dataset-specific variations in high-performance regions. Here, we also examine the effects of an extended aperture coefficient range in Figure A3, highlighting the performance degradation observed when exceeding the optimized bounds.

Figure A2. Additional contour maps visualizing the hyperparameter search space for WS 2.0 across different datasets and model architectures. Rows correspond to datasets, and columns correspond to EfficientNetV2, ConvNeXt, and Swin Transformer. Each dot represents a sampled trial (z, R), with the surrounding color indicating the macro-F1 score. Shading highlights performance regions estimated by Optuna, revealing dataset-specific trends in optimal parameter configurations.

Figure A3. Exploration of extended hyperparameter ranges in WS 2.0. The left column explores the effect of extending the wavefront distance z to 500 m (default max: 151 m) while keeping the aperture coefficient R within its proposed range. The right column shows the reverse, extending R to 0.1 (default max: 0.01) while keeping z within the proposed range. In both cases, the top panels (a,b) visualize the propagator at maximum complexity for the respective extended ranges using 3D and 2D surface plots. The middle panels (c,d) show Optuna search contour plots for the respective experiments, where a clear degradation trend emerges as the hyperparameters move beyond the proposed bounds (highlighted by yellow boxes). The bottom panels (e,f) zoom into the proposed parameter ranges, where the search space reveals richer and better-defined optima, suggesting that these bounds support interpretability while avoiding over-suppression or instability.

Appendix A.3

Figure A4. Visual impact of different aperture coefficients (R) and wavefront distances (z) on augmented images across diverse domains. Unlike Figure 3, which shows zoomed-in views of propagator effects, these images are presented at full scale (512 × 512 pixels) across (z, R) combinations. For interactive exploration, please visit our GitHub repository at https://github.com/IyatomiLab/Waveshift_Augmentation (commit a27e885, accessed on 11 April 2025).

References

Wayama, R.; Sasaki, Y.; Kagiwada, S.; Iwasaki, N.; Iyatomi, H. Investigation to Answer Three Key Questions Concerning Plant Pest Identification and Development of a Practical Identification Framework. Comput. Electron. Agric. 2024, 222, 109021. [Google Scholar] [CrossRef]
Wei, X.-S.; Song, Y.-Z.; Mac Aodha, O.; Wu, J.; Peng, Y.; Tang, J.; Yang, J.; Belongie, S. Fine-Grained Image Analysis with Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 8927–8948. [Google Scholar] [CrossRef] [PubMed]
Xu, M.; Yoon, S.; Fuentes, A.; Park, D.S. A Comprehensive Survey of Image Augmentation Techniques for Deep Learning. Pattern Recognit. 2023, 137, 109347. [Google Scholar] [CrossRef]
Habe, H.; Yoshioka, Y.; Ikefuji, D.; Funatsu, T.; Nagaoka, T.; Kozuka, T.; Nemoto, M.; Yamada, T.; Kimura, Y.; Ishii, K. Image Augmentation Using Fractals for Medical Image Diagnosis. Adv. Biomed. Eng. 2024, 13, 327–334. [Google Scholar] [CrossRef]
Alimisis, P.; Mademlis, I.; Radoglou-Grammatikis, P.; Sarigiannidis, P.; Papadopoulos, G.T. Advances in Diffusion Models for Image Data Augmentation: A Review of Methods, Models, Evaluation Metrics, and Future Research Directions. Artif. Intell. Rev. 2025, 58, 112. [Google Scholar] [CrossRef]
Pozzi, M.; Noei, S.; Robbi, E.; Cima, L.; Moroni, M.; Munari, E.; Torresani, E.; Jurman, G. Generating and Evaluating Synthetic Data in Digital Pathology Through Diffusion Models. Sci. Rep. 2024, 14, 28435. [Google Scholar] [CrossRef] [PubMed]
Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
Kayal, S.; Dubost, F.; Tiddens, H.A.W.M.; de Bruijne, M. Spectral Data Augmentation Techniques to Quantify Lung Pathology from CT-Images. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 586–590. [Google Scholar] [CrossRef]
Imeraj, G.; Iyatomi, H. Waveshift Augmentation: A Physics-Driven Strategy in Fine-Grained Plant Disease Classification. IEEE Access 2025, 13, 31303–31317. [Google Scholar] [CrossRef]
Bartleson, C.J.; Grum, F. (Eds.) Diffraction. In Optical Radiation Measurements, Volume 5: Visual Measurements; Academic Press: New York, NY, USA, 1980; Chapter 8. [Google Scholar]
Iwana, B.K.; Uchida, S. An empirical survey of data augmentation for time series classification with neural networks. PLoS ONE 2021, 16, e0254841. [Google Scholar] [CrossRef] [PubMed]
Goceri, E. Medical image data augmentation: Techniques, comparisons and interpretations. Artif. Intell. Rev. 2023, 56, 12561–12605. [Google Scholar] [CrossRef] [PubMed]
Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 3008–3017. [Google Scholar] [CrossRef]
Garcea, F.; Serra, A.; Lamberti, F.; Morra, L. Data augmentation for medical imaging: A systematic literature review. Comput. Biol. Med. 2023, 152, 106391. [Google Scholar] [CrossRef] [PubMed]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Kebaili, A.; Lapuyade-Lahorgue, J.; Ruan, S. Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review. J. Imaging 2023, 9, 81. [Google Scholar] [CrossRef] [PubMed]
Xu, Q.; Zhang, R.; Zhang, Y.; Wang, Y.; Tian, Q. A Fourier-based Framework for Domain Generalization. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 14378–14387. [Google Scholar] [CrossRef]
Shao, S.; Wang, Y.; Liu, B.; Liu, W.; Wang, Y.; Liu, B. FADS: Fourier-Augmentation Based Data-Shunting for Few-Shot Classification. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 839–851. [Google Scholar] [CrossRef]
Schwabedal, J.T.C.; Snyder, J.C.; Cakmak, A.; Nemati, S.; Clifford, G.D. Addressing Class Imbalance in Classification Problems of Noisy Signals by Using Fourier Transform Surrogates. arXiv 2018, arXiv:1806.08675. [Google Scholar]
Arabi, D.; Bakhshaliyev, J.; Coskuner, A.; Madhusudhanan, K.; Uckardes, K.S. Wave-Mask/Mix: Exploring Wavelet-Based Augmentations for Time Series Forecasting. arXiv 2024, arXiv:2408.10951. [Google Scholar]
Nanni, L.; Paci, M.; Brahnam, S.; Lumini, A. Comparison of Different Image Data Augmentation Approaches. J. Imaging 2021, 7, 254. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Courant, R.; Christie, M.; Kalogeiton, V. AKiRa: Augmentation Kit on Rays for Optical Video Generation. arXiv 2024, arXiv:2412.14158. [Google Scholar]
Zhou, Y.; MacPhee, C.; Suthar, M.; Jalali, B. PhyCV: The First Physics-Inspired Computer Vision Library. In Proceedings of the SPIE PC12438, AI and Optical Data Sciences IV, San Francisco, CA, USA, 17 March 2023; p. PC124380T. [Google Scholar] [CrossRef]
Blackledge, J.M. Optical Image Formation. In Digital Image Processing; Woodhead Publishing: Cambridge, UK, 2005; pp. 343–394. [Google Scholar] [CrossRef]
Cap, Q.H.; Uga, H.; Kagiwada, S.; Iyatomi, H. LeafGAN: An Effective Data Augmentation Method for Practical Plant Disease Diagnosis. IEEE Trans. Autom. Sci. Eng. 2022, 19, 1258–1267. [Google Scholar] [CrossRef]
Shibuya, S.; Cap, Q.H.; Nagasawa, S.; Kagiwada, S.; Uga, H.; Iyatomi, H. Validation of Prerequisites for Correct Performance Evaluation of Image-Based Plant Disease Diagnosis Using Reliable 221K Images Collected from Actual Fields. In Proceedings of the AI for Agriculture and Food Systems, Vancouver, BC, Canada, 28 February 2021; Available online: https://openreview.net/forum?id=md2UDQ7W_IV (accessed on 7 August 2024).
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’19), Anchorage, AK, USA, 4–8 August 2019; ACM: New York, NY, USA, 2019; pp. 2623–2631. [Google Scholar] [CrossRef]

Figure 1. Approximation of light propagation from the target (e.g., astrolabe), forming spherical wavefronts along a direction z with an aperture diameter D. WS 2.0 models how a camera would perceive the scene when positioned at different wavefront distances along the propagation axis (z), capturing only light-induced variations in appearance.

Figure 2. Structural visualization of the WS 2.0 propagator

W_{2.0}^{c} (u, v, z, R)

across varying aperture coefficients (R) and wavefront distances (z). For each

(z, R)

pair, a 3D surface plot and corresponding 2D top-down view are shown, illustrating the propagator’s modulation pattern in the frequency domain, where u and v denote spatial frequency indices. The wavefront distance z governs phase evolution—visible in the density of crest-valley rings, while the aperture coefficient R controls radial intensity decay due to diffraction—reflected in the shading (bright white/yellow indicates high intensity; dark black/blue indicates low). Together, they define how WS 2.0 modulates spectral content prior to reconstructing the augmented image.

Figure 2. Structural visualization of the WS 2.0 propagator

W_{2.0}^{c} (u, v, z, R)

across varying aperture coefficients (R) and wavefront distances (z). For each

(z, R)

pair, a 3D surface plot and corresponding 2D top-down view are shown, illustrating the propagator’s modulation pattern in the frequency domain, where u and v denote spatial frequency indices. The wavefront distance z governs phase evolution—visible in the density of crest-valley rings, while the aperture coefficient R controls radial intensity decay due to diffraction—reflected in the shading (bright white/yellow indicates high intensity; dark black/blue indicates low). Together, they define how WS 2.0 modulates spectral content prior to reconstructing the augmented image.

Figure 3. Visual impact of different aperture coefficients (R) and wavefront distances (z) on the augmented image. The matrix layout illustrates how increasing R leads to stronger attenuation of high-frequency components—those responsible for fine image details—resulting in smoother, more blurred images. Simultaneously, increasing z emphasizes wavefront propagation effects, introducing phase-based distortions that further degrade spatial sharpness. Together, these visualizations provide an intuitive understanding of how WS 2.0 hyperparameters influence the frequency content and perceived image clarity.

Figure 4. Contour maps visualizing the hyperparameter search space for WS 2.0 across different datasets and model architectures. Rows correspond to datasets, and columns correspond to EfficientNetV2, ConvNeXt, and Swin Transformer. Each dot represents a sampled trial (z, R), with the surrounding color indicating the macro-F1 score. Shading highlights performance regions estimated by Optuna, revealing dataset-specific trends in optimal parameter configurations.

Table 1. Overview of datasets used for experimental evaluation.

Category	Dataset Name	Classes	Training Data (#)	Test Data (#)
Private symptomatic	Cucumber (plant disease)	10	78,468	18,059
Private symptomatic	Eggplant (plant disease)	6	32,516	3400
Public symptomatic	Skin Cancer	9	33,126	11,000
Public symptomatic	Ocular Disease	2	5000	1000
Public unique images	CUB-200-2011 (birds)	200	5994	5794
Public unique images	STL-10 (objects)	10	5000	8000
Black-and-white medical imaging	Chest X-Ray (pneumonia)	2	5216	624
Black-and-white medical imaging	Brain Tumor MRI	4	2560	660

Table 2. Baseline model architectures and training configuration.

Characteristic	EfficientNetV2-S	ConvNeXt Base	Swin Transformer
Architecture Type	CNN-based	CNN-based	Transformer-based
Characteristics	Compact, optimized for efficiency	Improved ResNet-like structure	Multi-scale self-attention
Input Size	512 × 512	512 × 512	384 × 384 (4 × 4 patches)
Optimizer	Adam (learning rate = $5 \times 10^{- 5}$ )
Training Set-up	50 epochs max, early stopping (patience = 7), batch size = 32

Table 3. Hyperparameter ranges of Waveshift Augmentation and their purposes.

Hyperparameter	Range	Purpose
Wavefront Distance ( $z_{0}$ )	15–151 m	Controls light propagation shift
Aperture Coefficient (R)	0.0001–0.01	Modulates high-frequency attenuation

Table 4. Model performance across datasets with WS 2.0 hyperparameters.

Dataset		EfficientNetV2			ConvNeXt Base			Swin Transformer
		$R$	Upper $z$ (m)	Macro-F1 (%)	$R$	Upper $z$ (m)	Macro-F1 (%)	$R$	Upper $z$ (m)	Macro-F1 (%)
Cucumber	No WS	-	-	56.96	-	-	58.28	-	-	55.48
(10 classes)	WS 1.0	-	65	58.22	-	52	59.94	-	15	57.01
	WS 2.0	0.0037	71	59.35	0.0019	69	60.73	0.0045	27	57.62
Eggplant	No WS	-	-	82.68	-	-	85.53	-	-	82.91
(6 classes)	WS 1.0	-	47	85.29	-	68	87.56	-	52	85.56
	WS 2.0	0.0018	41	86.74	0.0010	87	88.12	0.0070	57	86.41
Skin Cancer	No WS	-	-	64.41	-	-	63.56	-	-	69.49
(9 classes)	WS 1.0	-	41	65.25	-	103	70.34	-	70	69.49
	WS 2.0	0.0004	22	73.73	0.0050	46	72.03	0.0058	25	74.58
Ocular Disease	No WS	-	-	76.88	-	-	80.00	-	-	81.04
(2 classes)	WS 1.0	-	41	78.18	-	144	80.00	-	133	81.56
	WS 2.0	0.0099	115	78.70	0.0084	64	81.04	0.0093	79	81.56
CUB	No WS	-	-	75.46	-	-	88.31	-	-	85.98
(200 classes)	WS 1.0	-	33	77.30	-	87	88.31	-	109	88.15
	WS 2.0	0.0013	29	76.46	0.0023	138	88.81	0.0047	56	87.31
STL10	No WS	-	-	94.81	-	-	96.45	-	-	98.56
(10 classes)	WS 1.0	-	28	95.66	-	76	96.90	-	77	99.03
	WS 2.0	0.0016	24	95.83	0.0585	82	96.93	0.0780	72	99.10
Chest X-Ray	No WS	-	-	94.23	-	-	94.23	-	-	94.87
(2 classes)	WS 1.0	-	60	94.71	-	54	95.52	-	106	95.35
	WS 2.0	0.0079	146	95.67	0.0054	38	96.31	0.0058	100	95.67
Brain Tumor	No WS	-	-	99.47	-	-	99.62	-	-	99.01
(4 classes)	WS 1.0	-	62	99.85	-	33	99.69	-	32	99.77
	WS 2.0	0.0032	81	99.85	0.0053	72	99.92	0.0012	32	99.54

Table 5. Summary of proposed Waveshift (WS) diagnostic performance (in terms of macro-F1 score (%)) for datasets.

Dataset	DA Employed	Classification Performance in Macro F1 (%)
Dataset	DA Employed	No WS	With WS 1.0	With WS 2.0
Cucumber (10 classes)	Geometric	56.96	58.22	59.35
	Geo + CLAHE	53.99	55.43	55.42
	Geo + AugMix	51.72	54.56	53.72
	Geo + RandAug	53.56	54.97	55.13
	Geo + CutMix	57.59	56.32	55.68
	Geo + MixUp	55.75	55.85	56.06
Eggplant (6 classes)	Geometric	82.68	85.29	86.74
	Geo + CLAHE	81.82	84.79	84.97
	Geo + AugMix	83.52	84.82	84.44
	Geo + RandAug	83.62	86.76	85.35
	Geo + CutMix	84.59	85.62	84.97
	Geo + MixUp	81.74	82.19	82.29
Ocular Disease (2 classes)	Geometric	76.88	78.18	78.70
	Geo + CLAHE	77.40	78.70	78.21
	Geo + AugMix	76.36	77.40	77.85
	Geo + RandAug	77.14	77.40	77.58
	Geo + CutMix	73.77	77.14	77.15
	Geo + MixUp	71.43	71.95	71.69
Improvement		-	+1.39	+1.38

Table 6. Comparison of computational metrics on GPU and CPU to apply various data augmentation techniques on 1000 random batches (16 transformed images per batch, 16 workers) from our selected datasets. For our benchmarks, the default input resolution was 512 × 512 pixels.

System Details	GPU: RTX 3090, CUDA 12.1, 23.48 GB Memory		CPU: x86_64, 24 Cores, 251.78 GB Memory
DA Employed	Computational Metrics
DA Employed	Time Complexity	Core Mechanism	Total Time GPU (s)	Total Time CPU (s)	Speedup (GPU vs. CPU)
Geometric	$O (k \cdot C \cdot H W)$	affine warps and flips	38.37	55.86	1.46×
CLAHE	$O (C \cdot H W \log H W)$	local histogram equalization	19.35	21.93	1.13×
AugMix	$O (k \cdot C \cdot H W)$	multi-op blending	101.68	580.10	5.71×
RandAug	$O (k \cdot C \cdot H W)$	randomized op sequences	49.76	214.36	4.31×
CutMix	$O (C \cdot h w)$	patch substitution	109.38	174.09	1.59×
MixUp	$O (C \cdot H W)$	global pixel blending	141.74	616.96	4.35×
WS 1.0	$O (C \cdot H W \log H W)$	phase modulation via FFT	4.01	523.47	130.42×
WS 2.0	$O (C \cdot H W \log H W)$	phase + amplitude modulation	6.33	820.06	129.48×

In this analysis, time complexity is expressed using H, W, C, and optionally k, depending on the nature of each augmentation, where

H \times W

represents the spatial resolution of the input image tensor, C denotes the number of channels (typically 3 for RGB), and k is the number of operations included. Total time reflects only the duration required for data augmentation (DA) operations performed directly on the image tensors on the device (CPU or GPU). It excludes data loading from disk, memory transfer, model inference, backpropagation, or any other pipeline components beyond augmentation. We utilized the Kornia library for tensor-based augmentations, enabling transformations to be applied natively on the GPU or CPU once images were converted to tensors. Kornia is essential when performing on-device augmentation and supports batch-wise and differentiable operations. See more at https://kornia.readthedocs.io/en/latest/augmentation.html (accessed on 13 April 2025).

Table 7. Model performance comparison using same upper-bound of z (WS 1.0 vs. WS 2.0).

Dataset		EfficientNetV2			ConvNeXt Base			Swin Transformer
		$R$	Upper $z$ (m)	Macro-F1 (%)	$R$	Upper $z$ (m)	Macro-F1 (%)	$R$	Upper $z$ (m)	Macro-F1 (%)
Cucumber	WS 1.0	-	71	57.96	-	69	57.83	-	27	52.23
(10 classes)	WS 2.0	0.0037	71	59.35	0.0019	69	60.73	0.0045	27	57.62
Eggplant	WS 1.0	-	98	84.94	-	87	88.29	-	57	85.26
(6 classes)	WS 2.0	0.0018	98	85.21	0.0010	87	88.12	0.0070	57	86.41
Skin Cancer	WS 1.0	-	22	51.69	-	46	64.41	-	25	60.17
(9 classes)	WS 2.0	0.0004	22	73.73	0.0050	46	72.03	0.0058	25	74.58
Ocular Disease	WS 1.0	-	115	74.29	-	64	79.74	-	79	78.96
(2 classes)	WS 2.0	0.0099	115	78.70	0.0084	64	81.04	0.0093	79	81.56
CUB	WS 1.0	-	29	75.46	-	138	89.82	-	56	87.65
(200 classes)	WS 2.0	0.0013	29	76.46	0.0023	138	88.81	0.0047	56	87.31
STL10	WS 1.0	-	83	81.44	-	82	96.90	-	72	98.76
(10 classes)	WS 2.0	0.0011	83	82.49	0.0585	82	96.93	0.0780	72	99.10
Chest X-Ray	WS 1.0	-	146	92.95	-	38	93.27	-	100	94.71
(2 classes)	WS 2.0	0.0079	146	95.67	0.0054	38	96.31	0.0058	100	95.67
Brain Tumor	WS 1.0	-	81	99.31	-	72	98.55	-	32	99.77
(4 classes)	WS 2.0	0.0032	81	99.85	0.0053	72	99.92	0.0012	32	99.54

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Imeraj, G.; Iyatomi, H. Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification. Electronics 2025, 14, 1735. https://doi.org/10.3390/electronics14091735

AMA Style

Imeraj G, Iyatomi H. Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification. Electronics. 2025; 14(9):1735. https://doi.org/10.3390/electronics14091735

Chicago/Turabian Style

Imeraj, Gent, and Hitoshi Iyatomi. 2025. "Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification" Electronics 14, no. 9: 1735. https://doi.org/10.3390/electronics14091735

APA Style

Imeraj, G., & Iyatomi, H. (2025). Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification. Electronics, 14(9), 1735. https://doi.org/10.3390/electronics14091735

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Waveshift 2.0: An Improved Physics-Driven Data Augmentation Strategy in Fine-Grained Image Classification

Abstract

1. Introduction

2. Related Works

2.1. Traditional Data Augmentation Methods

2.2. Fractal- and Pixel-Level Augmentation Techniques

2.3. Augmentation Strategies Using Data Synthesis

2.4. Frequency Transform-Based Data Augmentation Techniques

2.5. Optical Model-Based Augmentations

3. Waveshift 2.0: The Framework

3.1. Evolution: Threads Between WS 1.0 and WS 2.0

Advancement in WS 2.0: Introducing Aperture Modulation

3.2. Theoretical Formulation of WS 2.0

3.3. Expanded Representation of the Propagator with Two Hyperparameters

3.4. Deployment Strategy

4. Methodology: Setup and Procedures

4.1. Datasets and Model Architectures

4.2. Augmentation Pipeline and Benchmarking

4.3. Hyperparameter Optimization with Optuna

5. Results

5.1. Performance Comparison

5.2. Execution Metrics

5.3. Hyperparameter Optimization Visualizations

6. Discussion

6.1. Impact Analysis of a New Hyperparameter

6.2. Joint Hyperparameter Analysis: Ablation and Range

6.3. Advantages

6.4. Limitations and Future Directions

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1

Appendix A.2

Appendix A.3

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI