Next Article in Journal
Empirical Trials on Unmanned Agriculture in Open-Field Farming: Ridge Forming
Previous Article in Journal
Innovative Approaches for Minimizing Disinfection Byproducts (DBPs) in Water Treatment: Challenges and Trends
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Fast Prediction Framework for Multi-Variable Nonlinear Dynamic Modeling of Fiber Pulse Propagation Using DeepONet

Department of Electronics, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(18), 8154; https://doi.org/10.3390/app14188154
Submission received: 12 August 2024 / Revised: 6 September 2024 / Accepted: 9 September 2024 / Published: 11 September 2024
(This article belongs to the Section Optics and Lasers)

Abstract

:

Featured Application

This study presents a novel method for simulating fiber pulse propagation using the DeepONet architecture, significantly reducing computation time compared to traditional methods. The approach is highly applicable in fields requiring real-time fiber optic system control and optimization, such as telecommunications, medical imaging, and high-precision laser systems. The ability to accurately simulate complex fiber dynamics with minimal computational resources opens new possibilities for real-time applications and system design.

Abstract

Traditional femtosecond laser modeling relies on the iterative solution of the Nonlinear Schrödinger Equation (NLSE) using the Split-Step Fourier Method (SSFM). However, SSFM’s high computational complexity leads to significant time consumption, particularly in automatic control and system optimization, thus limiting control model responsiveness. Recent studies have suggested using neural networks to simulate fiber dynamics, offering faster computation and lower costs. In this study, we introduce a novel fiber propagation method utilizing the DeepONet architecture for the first time. By separately managing fiber parameters and input–output pulses in the branch and trunk networks, this method can simulate various fiber configurations with high accuracy and without altering the architecture. Additionally, while SSFM generation time increases linearly with fiber length, the GPU-accelerated AI generation time remains consistent at around 0.0014 s, regardless of length. Notably, in high-order soliton (HOS) compression over a 12 m distance, the AI method is approximately 56,865 times faster than SSFM.

1. Introduction

Machine learning has made significant inroads into various disciplines, including optics [1,2]. For instance, machine learning strategies have been employed to design and optimize mode-locked fiber lasers [3], enhance optical supercontinuum sources [4], and predict soliton properties [5]. Compared to numerical methods like the Split-Step Fourier Method (SSFM) [6], machine learning approaches offer several advantages. First, high-precision fiber systems often exhibit rapid changes. Although numerous studies have discussed automatic stability control, the discrepancies between computed and actual values lag behind the rapid changes in fiber systems [7,8]. Second, numerical methods may not account for the various influences affecting lasers in real environments, and the complexity of simulations and calculations increases with system complexity [9]. In contrast, data-driven machine learning methods can model actual optical systems and even predict their future states by inputting the corresponding input–output states [10], a feat that numerical methods like SSFM cannot be achieved.
A state-of-the-art architecture, DeepONet, has been proposed for solving partial differential equations [11,12]. The DeepONet architecture efficiently learns and approximates complex nonlinear systems by mapping input functions to output functions [11,12]. Its core idea combines traditional neural networks with operator learning, thereby enabling the handling of highly complex dynamic systems. Both operator learning and Physics-Informed Neural Networks (PINNs) leverage neural networks to approximate a mapping [13]. According to the universal approximation theorem, a feedforward neural network with a single hidden layer, with sufficient width and nonlinear activation functions, can approximate a mapping to any desired accuracy [11]. The main difference between the two is that PINNs approximate the solution of the differential equation themselves, while operator learning approximates the mapping from the input function of the differential equation to the solution function. In this context, the independent variable in the mapping is the input function of the differential equation, while the dependent variable is the output function.
The advantage of operator learning over traditional PINNs is that when the input function, initial conditions, or boundary conditions of the differential equation change, the neural network does not need retraining [14]. Regular neural networks approximate the solution of the differential equation itself, causing any slight change in initial conditions to result in a different solution. Conversely, the DeepONet neural network learns the mapping from the input function to the solution, allowing it to manage input functions not included in the training set and derive the corresponding solutions [14]. As a result, a trained DeepONet exhibits stronger generalization capabilities, theoretically achieving similar results with less data, and does not require retraining when switching to a new simulation object as long as the type of differential equation remains the same. This represents a significant advancement given the high costs associated with training neural networks [11,12].
Although studies have shown that various data-driven neural networks, such as Recurrent Neural Networks (RNNs) [15], Convolutional Neural Networks (CNNs) [16], and Long Short-Term Memory networks (LSTMs) [17], perform well in simulating optical systems, they also have drawbacks. These models necessitate specific architectural designs and training for particular conditions, treating the fiber system as a black box and learning the time and frequency domain signals of input–output relationships. Once the fiber configuration changes, the learned input–output relationships also change. Consequently, while these neural networks can handle different lengths and input signals, they cannot adapt to fiber systems with varying parameters [15,16,17].
In this study, we propose an advanced DeepONet model architecture integrated with an attention mechanism for full-field pulse transmission simulation. Research on RNNs, CNNs, and LSTMs suggests their potential; thus, they are incorporated into the Branch network of DeepONet to process the input function values at fixed sensing points. This enables the learning of relationships between input and output signals through multilayer neural networks. The Trunk network handles the locations encoding the output function, which includes the initial pulse parameters such as Full Width at Half Maximum (FWHM) of the spectrum, input peak power, fiber length and propagation distance, and linear and nonlinear model parameters like dispersion and nonlinear coefficients.
The integration of an attention mechanism [18] aims to enhance the model’s competency in managing complex input–output relationships. By dynamically assigning importance weights to various input features, the attention mechanism effectively captures crucial information within the input signals, thereby improving the model’s learning efficiency and prediction accuracy. In the Branch network, the attention mechanism highlights essential features within the input signals, enhancing the comprehension of multi-level input information. In the Trunk network, it ensures the precise handling of the relationship between output location encoding and the corresponding initial pulse and transmission parameters, bolstering the overall simulation accuracy.
The integrated attention mechanism within the DeepONet model not only excels in solving complex nonlinear fiber transmission problems but also enhances the overall accuracy and robustness of full-field pulse transmission simulation by emphasizing critical input features.
By training the DeepONet model architecture on the NLSE equation, we observed several advantages:
  • Extremely fast generation speed: Since the simulation process eliminates the need for iterative calculations, only a single computation is required regardless of the parameters and initial conditions. The pulse generation speed, recorded at approximately 0.0014 s (GPU: RTX 4090, PyTorch V2.3.1, CUDA V12.1), is significantly faster than the step-by-step calculations of SSFM and previous iterative neural network models for long-distance transmission.
  • Enhanced generalization ability and high accuracy: Compared to RNN, CNN, and LSTM models [15,16,17], DeepONet demonstrates superior generalization ability without a substantial loss in accuracy. The training results showcase its high stability and strong generalization capability.
  • Flexible input pulse processing: The Branch network can handle dynamic input pulse lengths without the need to truncate pulses to reduce complexity. Pulse length only affects the uniformity of input pulse accuracy without losing critical data.

2. Materials and Methods

2.1. Data Generation via SSFM

To verify and evaluate the performance of DeepONet in modeling the nonlinear dynamics of fiber propagation, we generate a series of high-quality simulation data. In this study, we utilize the Split-Step Fourier Method (SSFM) to generate these data [6]. SSFM is a widely used numerical method for solving the Nonlinear Schrödinger Equation (NLSE) by iteratively simulating the propagation of optical pulses in fibers through both the time and frequency domains [19].
The NLSE describes the evolution of the optical field within the cavity and can be written as follows:
A Z + α g 2 A + i β 2 2 2 A T 2 β 3 6 3 A T 3 g 2 Ω g 2 2 A T 2 = i γ | A | 2 A + i ω 0 ( | A | 2 A ) T + i γ T R | A | 2 T A
where A denotes the pulse envelope, which describes the complex envelope of the optical pulse propagating in the fiber. Here, Z is the propagation distance, α is the linear loss coefficient representing absorption or scattering loss, and g is the gain coefficient depicting the gain due to stimulated emission in a fiber amplifier. The term β 2 is the second-order dispersion coefficient associated with group velocity dispersion (GVD) effects on pulse propagation, while β 3 is the third-order dispersion coefficient affecting pulse asymmetry. The parameter Ω g represents the gain bandwidth, equivalent to the width of the gain spectrum. The nonlinear coefficient γ accounts for nonlinear effects such as self-phase modulation (SPM) and cross-phase modulation (XPM). ω 0 is the carrier frequency, representing the central frequency of the optical pulse. The term T R is the Raman response time, typically around 3 fs for silica fibers.
The left-hand side of the equation describes changes in the pulse envelope along the propagation distance, including linear loss and gain effects, second-order dispersion, third-order dispersion, and gain bandwidth effects. The right-hand side captures the effects of SPM, self-steepening which is represented by the term with i ω 0 , and Raman scattering which is represented by the last term with T R .
Consider a traveling wave solution of the form A ( Z , T ) = A ( ζ ) , where ζ = T v Z (with v being the group velocity). This transforms the partial differential equation (PDE) into an ordinary differential equation (ODE) with respect to ζ .
Using the variable transformation
ζ = T v Z ,
the derivatives are transformed, as follows:
A Z = v d A d ζ , A T = d A d ζ , 2 A T 2 = d 2 A d ζ 2 , 3 A T 3 = d 3 A d ζ 3
Substituting these expressions into the original equation, we obtain
v d A d ζ + α g 2 A + i β 2 2 d 2 A d ζ 2 β 3 6 d 3 A d ζ 3 g 2 Ω g 2 d 2 A d ζ 2 = i γ | A | 2 A + i ω 0 d ( | A | 2 A ) d ζ + i γ T R d | A | 2 d ζ A
Thus, the equation is transformed from a PDE to an ODE with respect to the variable ζ . Our objective is to learn an operator G that maps an input function A i n to its output values A o u t . After transforming the NLSE to its ODE form, the target operator can be simplified as follows:
A o u t = G ( A i n ) = v d A i n d ζ + α g 2 A i n + i β 2 2 g 2 Ω g 2 d 2 A i n d ζ 2 β 3 6 d 3 A i n d ζ 3 i γ | A i n | 2 A i n + γ ω 0 d ( | A i n | 2 A i n ) d ζ + i γ T R d | A i n | 2 d ζ A i n

2.2. Model Architecture

Figure 1 illustrates the proposed deep learning model framework for simulating nonlinear fiber pulse propagation. The dataset is uniformly sampled at 1024 points for both input and output signals, with real and imaginary parts fed into the branch network. The input signal, determined by different fiber parameters and peak power, is normalized and processed in the trunk network.
BranchNet utilizes convolutional layers followed by residual blocks to extract hierarchical features from the input pulse, effectively capturing the complex dynamics of the signal. After feature extraction, an adaptive average pooling layer reduces the feature dimensions, and a fully connected layer maps these features into a compact representation. Specifically, BranchNet is designed with a consistent network width that is proportional to the input size, ensuring that the feature extraction process is both deep and wide enough to handle the complexity of the pulse dynamics.
TrunkNet incorporates a self-attention mechanism to process the normalized fiber parameters, enhancing the model’s ability to capture interactions between different parameters. TrunkNet’s width is aligned with that of BranchNet, with both networks maintaining a proportional relationship between their layers to ensure balanced learning and feature integration. The resulting features are further refined through fully connected layers, progressively capturing the intricate relationships between the input parameters.
Finally, the outputs from both BranchNet and TrunkNet are multiplied and passed through a multilayer fully connected network, which generates the final output pulse. This final network layer is also designed with a width proportional to the preceding layers, ensuring that the combined features are adequately processed. This architecture leverages the strengths of both convolutional feature extraction and attention mechanisms, along with carefully calibrated network widths, to achieve high accuracy in simulating fiber pulse dynamics, offering a significant improvement over traditional methods.
To ensure model accuracy, we employed a mean squared error (MSE) loss function, defined as
MSE = 1 N i = 1 N ( y i y ^ i ) 2 ,
where N is the number of samples, y i represents the actual values, and y ^ i represents the predicted values. The model was trained using the RMSprop optimizer, with a cosine annealing scheduler to adjust the learning rate dynamically. An early stopping mechanism was also implemented to prevent overfitting. The dataset was split into training and validation sets in a 9:1 ratio, and model generalization was verified on an unseen dataset.
This model’s unique architecture, particularly its use of dynamically adjustable network components and efficient training strategies, demonstrates significant improvements in both accuracy and computational efficiency over conventional methods.
The structure of the input–output data for the model is outlined as follows:
Trunk Network
The trunk network encodes the location of the output function. The input to the trunk network is the position T, corresponding to the output location and its associated parameters, effectively representing the high-dimensional location information of the equation. The output of the trunk network is a set of feature representations related to the input location. Testing by inputting only the variable among the model parameters can effectively improve its accuracy and training speed.
  • Input: T = { T 1 , T 2 , , T m }
  • Output: Trunk ( T ) = { Trunk ( T 1 ) , Trunk ( T 2 ) , , Trunk ( T m ) }
Branch Network
The branch network processes the values of the input function at fixed sensor points. The input to the branch network is the function value at the sensor points A ( T i ) . In this specific task, the input corresponds to the state of the optical pulse at a specific time or location. The output of the branch network is a set of feature representations related to the input function values.
  • Input: A ( T ) = { A ( T 1 ) , A ( T 2 ) , , A ( T n ) }
  • Output: Branch ( A ( T ) ) = { Branch ( A ( T 1 ) ) , Branch ( A ( T 2 ) ) , , Branch ( A ( T n ) ) }
Output Combination
The output feature representations from the branch network b i and trunk network t i are combined through dot product or other nonlinear methods to obtain the final operator value G ( u ) ( y ) . Given the complex nonlinear relationship between fiber parameters and outputs, we found that using a feedforward neural network (FNN) yields the best results.
G ( A ) ( T ) = FNN ( Trunk ( T ) Branch ( A ( T ) ) )
Finally, the Split-Step Fourier Method (SSFM) algorithm generates the dataset structure with 1024 sampled points by default.
[ u , y , G ( u ) ( y ) ] = u ( i ) ( x 1 ) , u ( i ) ( x 2 ) , , u ( i ) ( x m ) y 1 G ( u ( i ) ) ( y 1 ) u ( i ) ( x 1 ) , u ( i ) ( x 2 ) , , u ( i ) ( x m ) y 2 G ( u ( i ) ) ( y 2 ) u ( i ) ( x 1 ) , u ( i ) ( x 2 ) , , u ( i ) ( x m ) y P G ( u ( i ) ) ( y P )

3. Results

3.1. Prediction Results of High-Order Soliton Compression

To train and test the capability of the neural network model in predicting nonlinear dynamics, we first simulated high-order soliton (HOS) compression under three random variable conditions. In these simulations, the fixed parameters included a step size of 0.13 cm for the Nonlinear Schrödinger Equation (NLSE) and a time window size of 10 ps. The variable parameters were as follows: pulse width (Full Width at Half Maximum, FWHM) ranging from 0.5 to 1.4 ps, input peak power ranging from 18 to 34 watts, and fiber length ranging from 0 to 20 m with a step size of 0.2 cm. The fixed parameters were as follows: Dispersion Coefficients β 2 = 5.23 ps2/km and β 3 = 4.27 × 10 2 ps3/km, and Nonlinear Parameter 18.4 × 10 3 W−1m−1. The Branch Network processes the initial pulse input, which comprises 2 channels (representing real and imaginary parts) with 1024 points each. It employs a convolutional layer that expands the 2 input channels to 256 channels, followed by two residual blocks, each maintaining 256 channels. The output undergoes global average pooling, and a final fully connected layer produces a 256-unit representation of the input pulse features.
The Trunk Network is designed to process the single fiber parameter. It consists of a simple feedforward network with two fully connected layers, each containing 256 units and using ReLU activations. This network effectively captures the influence of the single parameter on pulse propagation.
The outputs from the Branch and Trunk Networks are combined through element-wise multiplication, resulting in a 256-dimensional vector. This combined representation is then passed through a multilayer perceptron with one hidden layer of 512 units, followed by an output layer that produces 2048 units. These output units are reshaped to represent the real and imaginary parts of the propagated pulse (2 × 1024).
For our dataset, we generated 2000 samples by varying the chosen fiber parameter within its physically relevant range while keeping other potential variables constant. This dataset was split into 1800 training sets and 200 test sets, providing a robust basis for model training and evaluation.
Figure 2a illustrates the temporal intensity evolution, and Figure 2b depicts the spectral intensity evolution of HOS propagation dynamics for a pulse width of 1.0 ps and an input peak power of 30 watts. As shown in Figure 3, the pulse propagation predicted by the neural network closely matches the pulse propagation simulated by the NLSE in both temporal and spectral intensity evolution. The narrowest pulse in the temporal intensity evolution corresponds to the broadest spectral width, indicating that the predicted maximum compression distance is accurate. Figure 2c provides a clearer depiction of the time-domain evolution as a function of propagation distance under these conditions.
To facilitate a clearer visual comparison, we randomly selected pulses with pulse widths (FWHM) between 0.5 and 1.4 ps and input peak powers ranging from 18 to 34 watts for plotting. As illustrated in Figure 3, the corresponding parameters are detailed in Table 1. The substantial overlap of the full-time intensity pulses indicates the model’s robust generalization ability and accuracy across various parameter variations.

3.2. Results of Simulating Three Types of Fibers Using a Single DeepONet Model

As previously mentioned, the DeepONet architecture exhibits superior generalization capabilities compared to other models. To demonstrate this, we employed a single model to simultaneously simulate three distinct types of fibers: Normal Dispersion Fiber (NDF), High Nonlinearity Fiber (HNLF), and Standard Single-Mode Fiber (SMF). The trained model encompasses the parameter ranges of these three different fibers, as shown in Table 2.
In the DeepONet model, both the Trunk and Branch Networks utilize a primary hidden size of 512 units. The Branch Network processes the initial pulse input, which has a size of 1024 points and 2 channels (representing the real and imaginary parts of the pulse). It starts with an initial convolutional layer that expands the 2 input channels to 512 channels using a kernel size of 7. This is followed by three residual blocks, each maintaining 512 channels, which help capture complex features from the input pulse. The output of these residual blocks undergoes global average pooling, followed by a final fully connected layer, which preserves the 512-unit representation. Similarly, the Trunk Network processes the fiber parameters, applying a self-attention mechanism and fully connected layers, also maintaining the 512-unit hidden size throughout the network.
The Trunk Network handles the 5-dimensional fiber parameter input using a self-attention mechanism with a hidden size of 512 units. This is followed by three fully connected layers, each maintaining 512 units, allowing the network to capture intricate relationships within the fiber parameters.
After processing through the Branch and Trunk Networks, the outputs are combined through element-wise multiplication. The resulting 512-dimensional vector is then passed through a multilayer perceptron. This perceptron consists of two hidden layers with 1024 units each, employing ReLU activations and dropout for regularization. The final output layer expands the representation to 2048 units, which is then reshaped to produce the 2 × 1024 output that represents the real and imaginary parts of the propagated pulse.
For dataset generation, we randomly selected various parameters within the entire possible range, including β 2 , β 3 , β 4 , and γ , while maintaining the fiber length between 40 mm and 120 mm. The soliton order, being a derived quantity, varied as a result of the randomly selected parameters. This comprehensive approach ensured that our dataset covered a wide spectrum of possible fiber configurations and pulse propagation scenarios.
We generated a total of 5000 datasets, divided into 4500 training sets and 500 test sets. The model was trained using the AdamW optimizer with an initial learning rate of 1 × 10 3 and weight decay of 1 × 10 5 . We implemented a learning rate scheduler to adjust the learning rate based on the validation loss. The training process included early stopping with a patience of 30 epochs to prevent overfitting.
After training, we evaluated the model’s performance by setting the parameter values for different fiber types to their median values and fixing the fiber length at 80 mm to regenerate the test set. The comparison between the predicted pulses and the Split-Step Fourier Method (SSFM)-calculated pulses is presented in Figure 4 and Figure 5, demonstrating the model’s ability to accurately simulate pulse propagation across diverse fiber types with varying characteristics.
This unified model approach showcases the DeepONet architecture’s potential in handling complex, multi-parameter physical systems, providing a versatile tool for simulating optical pulse propagation in various fiber types with high accuracy and computational efficiency.

4. Discussion

4.1. Comparison of Neural Network Prediction Time with SSFM

We recorded the generation times as shown in Figure 6, which compares the times required for the Split-Step Fourier Method (SSFM) and DeepONet across different fiber lengths. The step size for SSFM was set to 1 mm. The vertical axis represents generation time (in seconds, on a linear scale), while the horizontal axis represents fiber length. The figure demonstrates that the generation time for SSFM increases exponentially with fiber length, whereas the generation time for DeepONet remains significantly lower and nearly constant at approximately 10 3 s across different fiber lengths. This indicates that DeepONet is far more efficient than SSFM as the fiber length increases.

4.2. Accuracy of Neural Network Predictions

To quantify the superior generalization ability and computational accuracy of our model compared to others, we compiled the loss functions for different configurations during HOS generation, as shown in Figure 7. Here, F represents the number of free fiber parameters. In scenarios where only the fiber length varies ( F = 1 ), the root mean square error (RMSE) is 0.01764, the lowest recorded value, indicating high accuracy in pulse prediction throughout the evolution process. We also tested scenarios where pulse width, input peak power, and fiber length varied randomly. When the model width parameter n = 1 , the accuracy significantly dropped, converging at 0.156. However, increasing the model size to n = 2 significantly improved the model’s learning ability, achieving a validation loss of 0.0248 within only 320 epochs, demonstrating excellent scalability and performance under complex conditions. The ability to handle three fiber parameter variables simultaneously, with changes in both fiber parameters and propagation distance, represents a broader range of fiber output variability and increased learning difficulty, in which our model performs exceptionally well.
As shown in Figure 8, the relationship between RMSE and distance reveals gradually accumulated errors during transmission. Training beyond this length was not conducted to test the generalization ability. The results indicate that the model maintained continuous accuracy within any generated distance included in the training set, attributed to its ability to incorporate distance into the learning scope. Unlike other models that accumulate prediction errors over long distances due to progressively accumulated nonlinear effects, our model’s prediction error does not accumulate with increasing propagation distance. However, beyond the untrained 13 m distance, the model’s error size fluctuates and increases due to enhanced nonlinear effects caused by increased propagation distance. These enhanced nonlinear effects can lead to complex pulse shape changes, making them difficult for the model to accurately capture, resulting in increased errors. Nevertheless, the error remains reasonably low compared to the CNN and RNN [16]. Unlike RNN and CNN, our model considers different combinations of input parameters, thus eliminating the need for the special optimization of different parameters, significantly reducing training difficulty and expanding its applicability.

4.3. Ablation Study

In this ablation study, we conducted an in-depth exploration of the roles of various model components and analyzed their inter-relationships and impact on overall performance, as shown in Figure 9. These experiments reveal the critical importance and interdependence of each component in handling the complex problem of nonlinear fiber pulse propagation.
Firstly, residual blocks and the self-attention mechanism play pivotal roles in the entire model. Removing the residual blocks led to a significant increase in MSE, reaching 1.45 times the baseline model. This indicates that residual blocks are crucial for extracting and processing multi-level features. Similar to “skip connections” in deep neural networks, residual blocks effectively mitigate the vanishing gradient problem and enhance the model’s ability to represent complex signals. The self-attention mechanism further amplifies this effect by capturing the interdependencies among input features, significantly improving the model’s predictive accuracy. When the self-attention mechanism was removed, the MSE soared to 1.81 times the baseline, marking the most substantial impact among all configurations. This underscores the indispensability of the self-attention mechanism in handling the highly intricate task of nonlinear fiber transmission.
The FLOPs (Floating Point Operations per Second) and model parameter count provide additional insight into the computational efficiency and complexity of the model. Removing the residual blocks drastically reduced the FLOPs from 4.859 G to 1.629 G, highlighting the computational load associated with these blocks. This reduction in computational cost, however, comes at the expense of model accuracy, as evidenced by the increase in MSE. Similarly, while the self-attention mechanism has no impact on the FLOPs, it slightly decreases the model parameter count from 9.473 M to 9.204 M. This indicates that the self-attention mechanism, although not computationally expensive, is crucial for the model’s performance due to its ability to efficiently manage dependencies within the data.
Secondly, the dropout layer and the learning rate scheduler also demonstrate important roles in enhancing the model’s generalization ability and stability, although their impact is not as pronounced as that of the residual blocks and the self-attention mechanism. The MSE increased by 18% when the dropout layer was removed, indicating that dropout contributes to preventing overfitting and maintaining generalization. Dropout achieves this by randomly dropping out certain neurons during training, preventing the model from over-relying on specific pathways, thereby improving its performance on unseen data. Notably, neither the FLOPs nor the model parameter count were affected by the removal of dropout, confirming that its contribution lies primarily in regularization rather than computational efficiency.
The learning rate scheduler plays a critical role in dynamically adjusting the learning rate during training, ensuring the model converges smoothly to the optimal solution. When this mechanism was removed, the MSE increased by 52%, confirming the importance of the learning rate scheduler in the optimization process. Particularly during the later stages of training, the scheduler helps the model continue optimizing in the presence of small loss gradients by gradually reducing the learning rate, thus avoiding local minima. Similar to dropout, removing the learning rate scheduler did not impact the FLOPs or model parameter count, indicating that its primary function is to guide the optimization process rather than influence the model’s computational complexity.
The synergy between these model components is particularly important. The combined efforts of the residual blocks and self-attention mechanism in feature extraction and relationship modeling, along with the contributions of the dropout layer and learning rate scheduler in ensuring the model’s generalization ability and optimization stability, create a robust architecture. This combination allows the model to maintain a high accuracy while effectively generalizing to the complex task of fiber pulse propagation.
Through these ablation experiments, we not only confirmed the necessity of each component in the model design but also highlighted how their interactions critically influence the overall performance. These findings provide valuable insights for future model optimization and design, further validating the effectiveness of the current model architecture in addressing complex nonlinear problems.

4.4. Comparative Analysis of Hidden Layer Width N

The width of the hidden layer is crucial to the performance of the model. In this study, we conducted a detailed comparative analysis of different hidden layer widths N, as shown in Figure 10. By adjusting the hidden layer width, we evaluated its impact on model performance, computational complexity (FLOPs), and the number of parameters. The experimental results indicate that as the hidden layer width increases, the model’s loss value exhibits a nonlinear pattern, suggesting that the model may face risks of overfitting or underfitting under certain configurations.
From the perspective of computational complexity, a larger hidden layer width significantly increases FLOPs and the number of parameters. While a wider hidden layer may enhance the model’s representational capacity, it also comes with higher computational costs. Notably, when the hidden layer width was expanded to 1.2 times that of the original model, the FLOPs increased to 6.98G, and the number of parameters rose to 13.11 M. However, this expansion did not result in a significant reduction in the loss value and, in some cases, even led to a decrease in model performance.
Conversely, reducing the hidden layer width (e.g., to 0.2 times that of the original model) significantly reduced both computational complexity and the number of parameters, but also resulted in a higher loss value (0.030561). This suggests that balancing complexity and performance is crucial in model design to avoid performance degradation due to insufficient model capacity.
Overall, this study provides important insights into model optimization through a systematic analysis of the hidden layer width, particularly in scenarios where computational resources are limited. We recommend that in practical applications, the hidden layer width should be carefully selected based on specific task requirements to achieve the best balance between performance and computational cost.

5. Conclusions

In this study, we have demonstrated the effectiveness and efficiency of the DeepONet framework in modeling and predicting the nonlinear dynamics of fiber pulse propagation. Our results indicate that DeepONet can accurately predict pulse propagation dynamics, yielding outcomes that align closely with traditional numerical methods such as the Nonlinear Schrödinger Equation (NLSE).
Accelerated by CUDA, DeepONet generates output pulses of any length and transmission distance in an average time of approximately 0.0014 s, with prediction errors remaining consistent regardless of propagation distance. This efficiency makes it highly suitable for quickly and accurately modeling long-distance fiber pulse propagation. Furthermore, by scaling up the model’s size, its robustness is enhanced, suggesting significant potential for future applications in commercial production.
Importantly, the generalization capability of DeepONet lies in its ability to simultaneously simulate the propagation dynamics of multiple fibers, a feature that other models struggle to achieve. We anticipate that DeepONet could become a standard method for laser dynamic simulation and modeling, delivering notable advancements in both efficiency and accuracy for practical industry applications. The adaptability and computational speed of DeepONet make it a valuable tool for advancing fiber optics research and development, offering a reliable and efficient alternative to traditional simulation methods.

Author Contributions

Conceptualization, investigation, and writing, Y.Z.; review and editing, S.K. and N.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the findings of this study can be publicly accessed on 12 August 2024 GitHub: https://github.com/usagimamahaha/fiber2.git. The dataset was obtained from fiber channel numerical simulations based on the Nonlinear Schrödinger Equation (NLSE), using the Python package PYNLO-V0.1.2.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ma, Q.; Yu, H. Artificial intelligence-enabled mode-locked fiber laser: A review. Nanomanuf. Metrol. 2023, 6, 36. [Google Scholar] [CrossRef]
  2. Zhang, X.; Wang, D.; Song, Y.; Jiang, X.; Li, J.; Zhang, M. Neural Operator-based Fiber Channel Modeling for WDM Optical Transmission System. In Proceedings of the 2023 Opto-Electronics and Communications Conference (OECC), Shanghai, China, 2–6 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–4. [Google Scholar]
  3. Baumeister, T.; Brunton, S.L.; Kutz, J.N. Deep learning and model predictive control for self-tuning mode-locked lasers. J. Opt. Soc. Am. B 2018, 35, 617–626. [Google Scholar] [CrossRef]
  4. Michaeli, L.; Bahabad, A. Genetic algorithm driven spectral shaping of supercontinuum radiation in a photonic crystal fiber. J. Opt. 2018, 20, 055501. [Google Scholar] [CrossRef]
  5. Herrera, R.A. Evaluating a neural network and a convolutional neural network for predicting soliton properties in a quantum noise environment. J. Opt. Soc. Am. B 2020, 37, 3094–3098. [Google Scholar] [CrossRef]
  6. Hult, J. A Fourth-Order Runge–Kutta in the Interaction Picture Method for Simulating Supercontinuum Generation in Optical Fibers. J. Lightwave Technol. 2007, 25, 3770–3775. [Google Scholar] [CrossRef]
  7. Teğin, U.; Rahmani, B.; Kakkava, E.; Borhani, N.; Moser, C.; Psaltis, D. Controlling spatiotemporal nonlinearities in multimode fibers with deep neural networks. APL Photonics 2020, 5, 030804. [Google Scholar] [CrossRef]
  8. Woodward, R.; Kelleher, E.J. Towards ‘smart lasers’: Self-optimisation of an ultrafast pulse source using a genetic algorithm. Sci. Rep. 2016, 6, 37616. [Google Scholar] [CrossRef] [PubMed]
  9. Häger, C.; Pfister, H.D. Physics-based deep learning for fiber-optic communication systems. IEEE J. Sel. Areas Commun. 2020, 39, 280–294. [Google Scholar] [CrossRef]
  10. Wang, D.; Song, Y.; Zhang, M. Data-driven Modeling Technique for Optical Communications Based on Deep Learning. In Proceedings of the Asia Communications and Photonics Conference, Beijing, China, 24–27 October 2020; Optica Publishing Group: Washington, DC, USA, 2020; p. M3B.3. [Google Scholar]
  11. Lu, L.; Jin, P.; Pang, G.; Zhang, Z.; Karniadakis, G.E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat. Mach. Intell. 2021, 3, 218–229. [Google Scholar] [CrossRef]
  12. Lu, L.; Jin, P.; Karniadakis, G.E. Deeponet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv 2019, arXiv:1910.03193. [Google Scholar]
  13. Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
  14. Koric, S.; Abueidda, D.W. Data-driven and physics-informed deep learning operators for solution of heat conduction equation with parametric heat source. Int. J. Heat Mass Transf. 2023, 203, 123809. [Google Scholar] [CrossRef]
  15. Salmela, L.; Tsipinakis, N.; Foi, A.; Billet, C.; Dudley, J.M.; Genty, G. Predicting ultrafast nonlinear dynamics in fibre optics with a recurrent neural network. Nat. Mach. Intell. 2021, 3, 344–354. [Google Scholar] [CrossRef]
  16. Yang, H.; Zhao, H.; Niu, Z.; Pu, G.; Xiao, S.; Hu, W.; Yi, L. Low-complexity full-field ultrafast nonlinear dynamics prediction by a convolutional feature separation modeling method. Opt. Express 2022, 30, 43691–43705. [Google Scholar] [CrossRef] [PubMed]
  17. Pu, G.; Liu, R.; Yang, H.; Xu, Y.; Hu, W.; Hu, M.; Yi, L. Fast predicting the complex nonlinear dynamics of mode-locked fiber laser by a recurrent neural network with prior information feeding. Laser Photonics Rev. 2023, 17, 2200363. [Google Scholar] [CrossRef]
  18. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
  19. Agrawal, G.P. Nonlinear fiber optics. In Nonlinear Science at the Dawn of the 21st Century; Springer: Berlin/Heidelberg, Germany, 2000; pp. 195–211. [Google Scholar]
Figure 1. DeepONet model framework for simulating nonlinear dynamics of fiber pulse propagation.
Figure 1. DeepONet model framework for simulating nonlinear dynamics of fiber pulse propagation.
Applsci 14 08154 g001
Figure 2. Comparison of NLSE simulation and neural network prediction for high-order soliton (HOS) propagation dynamics with a pulse width of 1.0 ps and an input peak power of 30 watts. (a) Temporal intensity evolution as a function of wavelength for NLSE and the neural network. (b) Spectral intensity evolution over time for NLSE and the neural network. The close match between NLSE and DeepONet results demonstrates the high accuracy of the neural network in predicting the temporal and spectral characteristics of pulse propagation. (c) Time-domain evolution of high-order soliton compression as a function of propagation distance.
Figure 2. Comparison of NLSE simulation and neural network prediction for high-order soliton (HOS) propagation dynamics with a pulse width of 1.0 ps and an input peak power of 30 watts. (a) Temporal intensity evolution as a function of wavelength for NLSE and the neural network. (b) Spectral intensity evolution over time for NLSE and the neural network. The close match between NLSE and DeepONet results demonstrates the high accuracy of the neural network in predicting the temporal and spectral characteristics of pulse propagation. (c) Time-domain evolution of high-order soliton compression as a function of propagation distance.
Applsci 14 08154 g002
Figure 3. Comparison of the time intensity distribution predictions for different pulse widths (FWHM), input peak powers, and fiber lengths between SSFM results and model predictions. This figure showcases the accuracy and generalization capability of the model under various parameter settings. The Raman effect and self-steepening effect are both taken into consideration.
Figure 3. Comparison of the time intensity distribution predictions for different pulse widths (FWHM), input peak powers, and fiber lengths between SSFM results and model predictions. This figure showcases the accuracy and generalization capability of the model under various parameter settings. The Raman effect and self-steepening effect are both taken into consideration.
Applsci 14 08154 g003
Figure 4. Comparison of predicted pulses by DeepONet and SSFM calculations for different fiber types (NDF, HNLF, SMF) with median parameter values and a fiber length of 80 mm. (1,4,7) Normal Dispersion Fiber: β 2 = 15 ps2/km, β 3 = 0.55 ps3/km, β 4 = 0.005 ps4/km, γ = 0.55 (1/W·km). (2,5,8) High Nonlinearity Fiber: β 2 = 1.5 ps2/km, β 3 = 0.055 ps3/km, β 4 = 0 ps4/km, γ = 15 (1/W·km). (3,6,9) Standard Single-Mode Fiber: β 2 = 23 ps2/km, β 3 = 0.05 ps3/km, β 4 = 0 ps4/km, γ = 1.5 (1/W·km).
Figure 4. Comparison of predicted pulses by DeepONet and SSFM calculations for different fiber types (NDF, HNLF, SMF) with median parameter values and a fiber length of 80 mm. (1,4,7) Normal Dispersion Fiber: β 2 = 15 ps2/km, β 3 = 0.55 ps3/km, β 4 = 0.005 ps4/km, γ = 0.55 (1/W·km). (2,5,8) High Nonlinearity Fiber: β 2 = 1.5 ps2/km, β 3 = 0.055 ps3/km, β 4 = 0 ps4/km, γ = 15 (1/W·km). (3,6,9) Standard Single-Mode Fiber: β 2 = 23 ps2/km, β 3 = 0.05 ps3/km, β 4 = 0 ps4/km, γ = 1.5 (1/W·km).
Applsci 14 08154 g004
Figure 5. A 160 mm length time-domain heatmap for different fiber types. (1,4) Normal Dispersion Fiber: β 2 = 15 ps2/km, β 3 = 0.55 ps3/km, β 4 = 0.005 ps4/km, γ = 0.55 (1/W·km). (2,5) High Nonlinearity Fiber: β 2 = 1.5 ps2/km, β 3 = 0.055 ps3/km, β 4 = 0 ps4 km, γ = 15 (1/W·km). (3,6) Standard Single-Mode Fiber: β 2 = 23 ps2/km, β 3 = 0.05 ps3/km, β 4 = 0 ps4/km, γ = 1.5 (1/W·km).
Figure 5. A 160 mm length time-domain heatmap for different fiber types. (1,4) Normal Dispersion Fiber: β 2 = 15 ps2/km, β 3 = 0.55 ps3/km, β 4 = 0.005 ps4/km, γ = 0.55 (1/W·km). (2,5) High Nonlinearity Fiber: β 2 = 1.5 ps2/km, β 3 = 0.055 ps3/km, β 4 = 0 ps4 km, γ = 15 (1/W·km). (3,6) Standard Single-Mode Fiber: β 2 = 23 ps2/km, β 3 = 0.05 ps3/km, β 4 = 0 ps4/km, γ = 1.5 (1/W·km).
Applsci 14 08154 g005
Figure 6. Comparison of generation times: SSFM vs. DeepONet.
Figure 6. Comparison of generation times: SSFM vs. DeepONet.
Applsci 14 08154 g006
Figure 7. Training and validation loss rates over epochs for different model configurations of DeepONet. The figure shows the performance of models with different widths (n = 1 and n = 2) and training conditions (single variable vs. multiple variables). The loss rates demonstrate an improvement in model accuracy and generalization as the model width increases, with a significant reduction in validation loss observed for the n = 2 configuration, calculated in RMSE.
Figure 7. Training and validation loss rates over epochs for different model configurations of DeepONet. The figure shows the performance of models with different widths (n = 1 and n = 2) and training conditions (single variable vs. multiple variables). The loss rates demonstrate an improvement in model accuracy and generalization as the model width increases, with a significant reduction in validation loss observed for the n = 2 configuration, calculated in RMSE.
Applsci 14 08154 g007
Figure 8. The variation in RMSE with distance within and beyond the training scope. It shows that within the training scope (indicated by the vertical black line at 12 m), RMSE remains very low and almost constant.
Figure 8. The variation in RMSE with distance within and beyond the training scope. It shows that within the training scope (indicated by the vertical black line at 12 m), RMSE remains very low and almost constant.
Applsci 14 08154 g008
Figure 9. This chart compares the minimum training MSE, FLOPs (Floating Point Operations per Second), and model parameter count (in millions) across different configurations in the ablation study. The left y-axis corresponds to the MSE, while the right y-axis corresponds to the FLOPs and parameter count. The blue bars represent the MSE relative to the original model, with the red dashed line indicating the baseline (1.00×). The orange segments highlight the increase in error above the baseline. The green and purple lines represent the FLOPs and parameter count for each configuration, respectively. The model is based on the one used in Section 3.2 “Results of Simulating Three Types of Fibers Using a Single Model”, with only the associated parameters adjusted and all other parameters kept unchanged. FLOPs (G): Computational complexity in Giga Floating Point Operations per Second. Params (M): Number of model parameters in millions.
Figure 9. This chart compares the minimum training MSE, FLOPs (Floating Point Operations per Second), and model parameter count (in millions) across different configurations in the ablation study. The left y-axis corresponds to the MSE, while the right y-axis corresponds to the FLOPs and parameter count. The blue bars represent the MSE relative to the original model, with the red dashed line indicating the baseline (1.00×). The orange segments highlight the increase in error above the baseline. The green and purple lines represent the FLOPs and parameter count for each configuration, respectively. The model is based on the one used in Section 3.2 “Results of Simulating Three Types of Fibers Using a Single Model”, with only the associated parameters adjusted and all other parameters kept unchanged. FLOPs (G): Computational complexity in Giga Floating Point Operations per Second. Params (M): Number of model parameters in millions.
Applsci 14 08154 g009
Figure 10. This chart compares the training loss, computational complexity (FLOPs, Floating Point Operations per Second), and model parameter count (in millions) across different hidden layer width configurations. The left y-axis corresponds to the relative loss values (based on the original model), while the right y-axis corresponds to the FLOPs and model parameter count. The model is based on the one used in Section 3.2 “Results of Simulating Three Types of Fibers Using a Single Model”, with only the hidden layer width adjusted and all other parameters kept unchanged. FLOPs (G): Computational complexity in Giga Floating Point Operations per Second. Params (M): Number of model parameters in millions.
Figure 10. This chart compares the training loss, computational complexity (FLOPs, Floating Point Operations per Second), and model parameter count (in millions) across different hidden layer width configurations. The left y-axis corresponds to the relative loss values (based on the original model), while the right y-axis corresponds to the FLOPs and model parameter count. The model is based on the one used in Section 3.2 “Results of Simulating Three Types of Fibers Using a Single Model”, with only the hidden layer width adjusted and all other parameters kept unchanged. FLOPs (G): Computational complexity in Giga Floating Point Operations per Second. Params (M): Number of model parameters in millions.
Applsci 14 08154 g010
Table 1. Pulse width, input peak power, and fiber length.
Table 1. Pulse width, input peak power, and fiber length.
Pulse Width (ps)Input Pulse Energy (pJ)Fiber Length (cm)
10.8264192.622621380
20.6623262.391764470
30.6496012.647509740
41.0690522.9133481103
51.1748633.2163441204
60.6891982.3247621606
Table 2. Parameter ranges and soliton order for different fiber types.
Table 2. Parameter ranges and soliton order for different fiber types.
ParameterNormal Passive FiberHigh Nonlinearity FiberStandard Single-Mode Fiber
β 2 (ps2/km)10 to 20−1 to −2−20 to −30
β 3 (ps3/km)0.1 to 10.01 to 0.10 to 0.1
β 4 (ps4/km)0.001 to 0.0100
γ (1/(W·km))0.1 to 110 to 201 to 2
Soliton Order0.99 to 1.9831.46 to 62.932.57 to 3.95
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, Y.; Kitajima, S.; Nishizawa, N. A Fast Prediction Framework for Multi-Variable Nonlinear Dynamic Modeling of Fiber Pulse Propagation Using DeepONet. Appl. Sci. 2024, 14, 8154. https://doi.org/10.3390/app14188154

AMA Style

Zhu Y, Kitajima S, Nishizawa N. A Fast Prediction Framework for Multi-Variable Nonlinear Dynamic Modeling of Fiber Pulse Propagation Using DeepONet. Applied Sciences. 2024; 14(18):8154. https://doi.org/10.3390/app14188154

Chicago/Turabian Style

Zhu, Yifei, Shotaro Kitajima, and Norihiko Nishizawa. 2024. "A Fast Prediction Framework for Multi-Variable Nonlinear Dynamic Modeling of Fiber Pulse Propagation Using DeepONet" Applied Sciences 14, no. 18: 8154. https://doi.org/10.3390/app14188154

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop