Deep Mutual Learning-Based Mode Recognition of Orbital Angular Momentum

Qu, Tan; Zhao, Zhiming; Zhang, Yan; Wu, Jiaji; Wu, Zhensen

doi:10.3390/photonics10121357

Open AccessArticle

Deep Mutual Learning-Based Mode Recognition of Orbital Angular Momentum

¹

School of Electronic Engineering, Xidian University, Xi’an 710071, China

²

School of Computer Science, Xi’an Shiyou University, Xi’an 710065, China

³

School of Physics, Xidian University, Xi’an 710071, China

^*

Author to whom correspondence should be addressed.

Photonics 2023, 10(12), 1357; https://doi.org/10.3390/photonics10121357

Submission received: 3 November 2023 / Revised: 5 December 2023 / Accepted: 7 December 2023 / Published: 8 December 2023

Download

Browse Figures

Versions Notes

Abstract

:

Due to its orbital angular momentum (OAM), optical vortex has been widely used in communications and LIDAR target detection. The OAM mode recognition based on deep learning is mostly based on the basic convolutional neural network. To ensure high-precision OAM state detection, a deeper network structure is required to overcome the problem of similar light intensity distribution of different superimposed vortex beams and the effect of atmospheric turbulence disturbance. However, the large number of parameters and the computation of the OAM state detection network conflict with the requirements of deploying optical communication system equipment. In this paper, an online knowledge distillation scheme is selected to achieve an end-to-end single-stage training and the inter-class dark knowledge of similar modes are fully utilized. An optical vortex OAM state detection technique based on deep mutual learning (DML) is proposed. The simulation results show that after mutual learning training, a small detection network with higher accuracy can be obtained, which is more suitable for terminal deployment. Based on the scalability of the number of networks in the DML queue, it provides a new possibility to further improve the detection accuracy of the optical communication.

Keywords:

deep mutual learning; orbital angular momentum; mode recognition; knowledge distillation

1. Introduction

Vortex light carries orbital angular momentum (OAM) due to its spiral phase distribution. Theoretically, OAM has an infinite number of eigenstates, which can greatly improve communication capacity and security [1,2]. However, vortex light is affected by atmospheric turbulence distortion in the transmission process, and the OAM state is dispersed, which has a serious impact on the communication performance [3]. The traditional OAM state detection mainly relies on complex optical elements, which are identified after preprocessing such as diffraction or interference [4]. However, expensive device cost and complex optical system construction with limited detection capabilities conflict with the requirement of high-speed real-time communication.

With the application of deep learning in image processing, speech recognition, and natural language processing, optical vortex mode detection based on depth learning has also been studied extensively. Knutson proposed to detect OAM state based on the Deep Neural Network (DNN) [5], which uses 16 layers Deep Neural Network to detect 100 types of single-mode vortex beams, with an accuracy rate of more than 70%. In 2017, Doster et al. [6] studied the OAM state detection based on Convolutional Neural Network (CNN), which uses the Alexnet network to detect OAM state according to the high-resolution light intensity distribution of Bessel Gaussian beam at the focus. The scheme not only simplifies the structure of the receiver, but also has certain robustness to turbulence intensity, data, sensor noise and pixels. Zhang et al. used CNN as the demodulator of the multiplexed vortex beam and compared the performance of K nearest neighbor neural network, Bayesian classifier, Back-Propagation (BP) artificial neural network and CNN. Under different turbulence conditions, the detection performance of CNN is higher than that of other demodulators [7]. They improved the original LeNet-5 network and proposed a decoder scheme that can realize OAM state detection and turbulence intensity detection simultaneously [8]. To solve the problem of detecting similar distribution of optical vortex superimposed modes, the optical vortex mode detection method based on attention pyramid convolution neural network (AP-CNN) is proposed [9].

Deep learning requires large datasets and an extended period of time to train. Although the model accuracy is very high, the number of parameters and computation are also large. Therefore, the application of model compression is gradually popularized in both academia and industry, especially when it is applied to low resource devices such as mobile internet and the Internet of Things. How to get an efficient deep learning model to meet the real-time and low-power requirements of low resource devices has aroused the interest of many scholars.

In order to obtain an efficient deep learning model, research is generally carried out in two directions. One is the construction of efficient network modules, including the manual design lightweight models (such as MobileNet [10], ShuffeNet [11]), and the automated design of network based on neural architecture search [12] (NAS). The other is model compression and acceleration techniques, including pruning, quantification, convolutional kernel compression and knowledge distillation. In 2006, Bucilua et al. first proposed to transfer the knowledge learned from large-scale model training to small-scale models [13]. In 2015, Hinton formally put forward the idea of knowledge distillation [14]. FitNets algorithm was proposed by Romero et al., which introduced intermediate representation for the first time, directly matching the feature activation of teacher model and student model, so that the student model can imitate the global intermediate feature extraction ability of learning teacher model [15]. Li et al. extract more discriminating features from teacher models through supervised learning, so that student models will pay more attention to these features, thus improving their performance [16]. Yim et al. propose to use the FSP matrix of the teacher model to guide the training of the student model, focusing only on the relationship knowledge between different network layers of each sample [17]. Focused on the relationship knowledge between data samples, the knowledge distillation based on the sample angle relationship and distance relationship was proposed in [18]. In 2020, Bajestani et al. used the relationship knowledge of relevant tasks to imitate human vision and transfer the time dependence of teacher model to student model for target detection tasks [19]. By comparing the objective function and learning the structural knowledge of the teacher model, the relationship between the correlation of structural feature knowledge and higher-order output is obtained in [20]. Xu et al. make use of confrontation learning to optimize global prediction with the same structure of teacher model and student model [21].

Deep Mutual Learning [22] (DML) is a kind of online distillation, and distillation training is carried out by using result-based knowledge. Through peer-to-peer simultaneous learning between networks, the final training effect of each network is not only better than the training effect of individual learning, but also better than the training effect guided by the trained large-scale teacher model in the distillation of classical knowledge. As the DML algorithm only uses the logits distribution of each network in the training process, and only pays attention to the recognition accuracy, it is applicable to any size and structure of the network. Even heterogeneous networks composed of networks with different sizes can be learned through mutual distillation.

The free space optical vortex communication system has the requirements of terminal deployment and high-speed operation. In order to ensure the accuracy of OAM state detection, the detection network has a high complexity, and there is a serious contradiction between the actual demand and the algorithm. At the same time, the light intensity distribution of some different superposed vortex beams is very similar, and the inter-class dark knowledge is of great significance to improve the OAM state detection accuracy. Therefore, the OAM state detection technique combined with knowledge distillation proved to be of high research value.

Combining the classical knowledge distillation algorithm to detect the OAM state, a large-scale and high-precision OAM state detection network needs to be trained in advance as a teacher and then one-way transfer of knowledge to guide the training of student networks. The scheme is divided into two stages, which can not achieve end-to-end, and the actual operation is more cumbersome. Optical communication system needs to deal with different transmission conditions such as turbulence intensity and transmission distance, so it is difficult to apply the OAM state detection network that requires two-stage training. Therefore, online distillation is selected in this paper, and an optical vortex OAM state detection technique based on DML is proposed. In addition, because the number of networks in the mutual learning queue can be expanded to include multiple networks, the more networks in the queue, the better the network performance after training, which also provides a new direction for further improving the OAM state detection accuracy.

The remainder of this paper is organized as follows. In Section 2 the principle of DML and the framework of OAM state detection based on DML are presented. The experiment and results discussions are presented in Section 3. Section 4 is devoted to the conclusion of this paper.

2. DML-Based OAM Mode Recognition

2.1. Principle of Deep Mutual Learning

DML is a type of online distillation [14,22] that uses response-based knowledge for distillation training. DML consists of a group of untrained networks that are trained simultaneously. The final training effect of each network is not only better than that of individual learning based on traditional supervised learning but also better than the one-way guided training with an already trained large-scale teacher network in classical knowledge distillation.

The training set contains samples of M types, and each type contains N samples. Then, the training set can be represented as

X = {x_{i}}_{i = 1}^{N}

, and the corresponding set of category labels can be represented as

Y = {y_{i}}_{i = 1}^{N}

. When the samples

x_{i}

are input into the network

Θ_{1}

, the output soft target (probability distribution) is

p_{1}^{m} (X_{i}) = \frac{\exp (z_{1}^{m})}{\sum_{m = 1}^{M} \exp (z_{1}^{m})}

(1)

where

z^{m}

denotes the output of the softmax layer. The loss function of the DML algorithm consists of two parts: supervised loss and imitation loss. For the neural network

Θ_{1}

of the multi-category image classification task, its supervised loss function is chosen as the cross-entropy loss function, defined as the cross-entropy loss between the predicted value and the true label as shown.

L_{C_{1}} = - \sum_{i = 1}^{N} \sum_{m = 1}^{M} I (y_{i}, m) \log (p_{1}^{m} (X_{i}))

(2)

where

I (y_{i}, m)

is defined as:

I (y_{i}, m) = {\begin{cases} 1, y_{i} = m \\ 0, y_{i} \neq m \end{cases}

(3)

The imitation loss uses Kullback–Leibler (KL) scatter to quantify whether the network

Θ_{1}

matches the network

Θ_{2}

, so that the probability distributions of the outputs of the network

Θ_{1}

and the network

Θ_{2}

can be as similar as possible. To improve the generalization performance of the neural network

Θ_{1}

, the posterior probability of the neural network

Θ_{2}

is used to help its training. The KL scatter

p_{1}

to

p_{2}

is as follows:

D_{K L} (p_{2} | | p_{1}) = \sum_{i = 1}^{N} \sum_{m = 1}^{M} p_{2}^{m} (X_{i}) \log \frac{p_{2}^{m} (X_{i})}{p_{1}^{m} (X_{i})}

(4)

In the KL scatter of

p_{1}

to

p_{2}

,

p_{1}

takes

p_{2}

as the true probability distribution of the treated sample. In the KL scatter of

p_{2}

to

p_{1}

,

p_{2}

treats

p_{1}

as the true probability distribution of the sample. Training makes the distributions of

p_{1}

and

p_{2}

as similar as possible. The total loss function of the neural network

Θ_{1}

is

L_{Θ_{1}} = L_{C_{1}} + D_{K L} (p_{2} | | p_{1})

(5)

The total loss function of the neural network

Θ_{2}

is as follows:

L_{Θ_{2}} = L_{C_{2}} + D_{K L} (p_{1} | | p_{2})

(6)

The DML algorithm can be extended to a larger number of networks. When the number of networks is K, for one of the networks, the remaining K − 1 networks are its teachers. For one of the networks

Θ_{k}

, its total loss function is as follows:

L_{Θ_{k}} = L_{C_{k}} + \frac{1}{K - 1} \sum_{l = 1, l \neq k}^{k} D_{K L} (p_{l} | | p_{k})

(7)

2.2. OAM Mode Recoginition Based on DML

In the OAM state detection task, AP-CNN [9] has good detection performance. The correct tag bit in the output probability distribution (soft labels) is close to 1, and other error tag bit values are very small. However, the probability corresponding to different misclassification tags may vary greatly. In the trained modal identification network, its generalization information is concentrated on those misclassification tag positions that are close to zero. We have selected the simulation results of the Laguerre–Gauss vortex beam as an illustration. For the optical vortex with OAM = {4,

-

4}, the probability of being wrongly classified as OAM = {2,

-

6} is

3.5 \times 10^{- 2}

, and the probability of being wrongly classified as OAM = {6,

-

6} is

5 \times 10^{- 3}

. Here, OAM = {4,

-

4} indicates the superimposed modes of optical vortices with topological charge 4 and

-

4, and the rest are defined analogously. After being transmitted at different turbulence intensities and distances, the misclassification probability will constantly change accordingly. These minimum misclassification probabilities represent whether the light intensity distribution of OAM = {4,

-

4} is closer to that of OAM = {2,

-

6} or OAM = {6,

-

6}. Because the value is very small and the impact on the objective function is very small, the dark knowledge can easily be lost in the training process. Knowledge distillation can effectively retain these inter class dark knowledge and ensure the accuracy of OAM state detection of the network.

In the classical knowledge distillation, because the teacher model has completed training, the corresponding probability of misclassification tags is very small, and the student model cannot effectively learn this knowledge from the probability distribution output by the teacher model. Classical knowledge distillation makes the mapping curve of softmax layer more gentle by increasing the temperature (T) parameter. Under the condition that the relative size remains unchanged, it reduces the probability value corresponding to correct labels and increases the probability value corresponding to misclassified labels. The soft labels of the teacher model after temperature rise are shown in Figure 1. After the temperature is raised, it correctly identifies that the label bit corresponding to OAM = {4,

-

4} is 0.6, the label bit corresponding to OAM = {6,

-

6} is 0.1, and the label corresponding to OAM = {2,

-

6} is 0.3. It can be seen that the light intensity distribution of OAM = {4,

-

4} and OAM = {2,

-

6} is relatively close, which is easy to cause errors. Taking soft labels as the training goal, it provides greater information entropy and higher learning rate for student models.

In the DML algorithm, since each network is not trained in advance, it learns simultaneously. In the training process, the probability distribution of each network output is still relatively flat, and other networks can effectively learn these inter class dark knowledge through the imitation loss based on KL divergence. Its effect is the same as that of increasing the T parameter in the distillation of classical knowledge, so DML does not need to introduce the T parameter. The transfer process of dark knowledge between OAM superposition states and classes among networks is shown in Figure 2.

The framework of the OAM mode recognition based on DML is shown in Figure 3. As can be seen from Figure 3, the two networks (AP-CNN and CNN) are selected in the DML queue. The selected CNN network contains three convolutional network layers and two fully connected network layers, as shown in Figure 4. Each convolutional network layer consists of a convolutional layer, a batch normalization layer, a maximum pooling layer, and the layers are connected by rectified linear units (Relu). Each layer uses a random deactivation unit (Dropout) with the probability set to 0.5. The convolutional layer of the first convolutional network layer contains 16 convolutional kernels with a size of 5 × 5; the convolutional layer of the second convolutional network layer contains 32 convolutional kernels with a size of 3 × 3; and the convolutional layer of the third convolutional network layer contains 64 convolutional kernels with a size of 3 × 3. All three convolutional network layers have a maximum pooling layer size of 2 × 2 with a step size of 2.

DML is an algorithm of mutual learning between networks. AP-CNN and CNN act as both teachers and students of each other. The ultimate goal of the paper is to obtain a small OAM state detection CNN with high detection accuracy. Therefore, we focused on introducing the one-way process of AP-CNN as a teacher and CNN as a student. The loss function of CNN consists of two parts. One part is the difference between the OAM state detection probability distribution of CNN and AP-CNN, and the other part is the difference between the OAM state detection result of CNN and the real OAM state label. The other process can be similar to unfolding.

In the specific training process as shown in Figure 4, the size of the light intensity distribution map of OAM beam is set to 128 × 128 × 3. After the three convolutional network layers, the size of output feature map obtained is 16 × 16 × 64. The feature map is fed to the subsequent two fully connected layers, the output of the first fully connected layer has 500 values, and the output of the second layer has 8 values, corresponding to 8 OAM mode categories. Then, the softmax function is used to activate the process and output the final OAM mode recognition result. The initial learning rate is set to 0.01, and the learning rate decreases by 10% every 10 iterations, and 50 epochs are trained. At the beginning of the training, the parameters of each network in the queue are randomly initialized, and the probability distribution of the output is uniformly distributed. For the CNN network, the supervised loss is large, and the imitation loss is small. The training is mainly guided by the supervised loss, and the parameters are updated mainly with the true labels as the training target. The OAM mode recognition performance of CNN and AP-CNN is continuously improved at this stage. However, because the parameters of AP-CNN and CNN are initialized differently and the representational knowledge learned during training process may be different, the output probabilities are not necessarily the same, and the imitation loss between AP-CNN and CNN keeps increasing. At this time, the CNN then learns knowledge from the logits distribution of the AP-CNN and combines the true labels and the logits distribution of AP-CNN network as the training target.

3. Numerical Results and Discussions

3.1. Simulation Data Set Construction

As shown in Figure 5, we select four pairs of OAM modes: {1,

-

2} and {1,

-

2,

-

5}, {1,

-

2, 3,

-

5} and {

-

2, 3,

-

5}, {4,

-

4} and {2,

-

6}, {6,

-

6} and {9,

-

3}. In Figure 5, the light intensity distributions of multi-mode OAM beam in the four columns are similar in each column. The wavelength of OAM communication system is

0.6328 μ m

. The comparison experiments are carried out under different turbulent conditions, such as different atmospheric refractive index structure constants (

C_{n}^{2}

) and the transmission distance.

C_{n}^{2}

are selected from

1.0 \times 10^{- 14} m^{- 2 / 3},

3.0 \times 10^{- 14} m^{- 2 / 3},

5.0 \times 10^{- 14} m^{- 2 / 3},

1.0 \times 10^{- 13} m^{- 2 / 3},

3.0 \times 10^{- 13} m^{- 2 / 3}

and

5.0 \times 10^{- 13} m^{- 2 / 3}

. Six transmission distances are chosen: 500 m, 1000 m, 1500 m, 2000 m, 2500 m, and 3000 m. When simulating the atmospheric turbulence channel, the power spectrum inversion method is used to decimate the transmission distance to obtain ten phase screens with certain intervals. For each transmission condition, 2000 light intensity maps are generated for each OAM mode, and a total of 16,000 light intensity distribution map of OAM beam are included in the hybrid dataset. And it is divided into a training set and a test set with the ratio of 8:2 (12,800 images in the training set and 3200 images in the test set).

3.2. Analysis of OAM Mode Recognition Results Based on DML

CNN was trained directly under different turbulence intensity mentioned above, and CNN and AP-CNN were trained simultaneously by DML to make them mutual students and teachers. The variation in accuracy of the two networks with the training process in the two turbulent environments (

C_{n}^{2} = 3.0 \times 10^{- 14} m^{- 2 / 3}

and

C_{n}^{2} = 3.0 \times 10^{- 13} m^{- 2 / 3}

) is shown in Figure 6. The improvement in accuracy of CNN in the DML queue for DML compared to separate learning is shown in Table 1. The DML-Ind in Table 1 refers to the difference in accuracy of OAM mode recognition between separate learning and DML.

When

C_{n}^{2} = 3.0 \times 10^{- 14} m^{- 2 / 3}

, the accuracy of OAM mode recognition reaches 95.3% when CNN is trained alone, while the accuracy of the OAM mode recognition technique based on DML can reach 96.4%, which has an improvement of 1.1%. When

C_{n}^{2} = 3.0 \times 10^{- 13} m^{- 2 / 3}

, the recognition accuracy is only 90.4% when the CNN is trained alone, while the accuracy of OAM mode recognition based on DML improves to 92.9%, which has an improvement of 2.5%. The improvement is more obvious compared with the weak turbulence case. Compared with independent learning, the accuracy of OAM state detection of large-scale network AP-CNN is still improved by 0.2% and 0.7% in two turbulent environments.

The network complexity includes two parts, spatial complexity and temporal complexity, with the number of model parameters representing the spatial complexity and the amount of model computation representing the temporal complexity. As can be seen from Table 2, the number of parameters of the small CNN network trained by DML is only 8.22M, the computation volume is only 52.50M, which is much smaller than that of the AP-CNN network, and the network complexity is significantly reduced as well. Combined with the results in Table 1, it can be seen that the OAM mode recognition based on DML CNN loses only 1.3% and 1.5% recognition accuracy compared to that of AP-CNN under the two turbulent transmissions, respectively. The experimental results show that the OAM mode recognition scheme proposed in our work effectively alleviates the contradiction between the complexity of the OAM mode recognition network and the deployment of the optical communication mobile while ensuring recognition accuracy.

To further improve the accuracy of the OAM mode recognition of the small CNN network, the scalability of the number of networks in the DML queue can be fully exploited. The third network added is MobileNet, a lightweight network that is easier to train. The comparison of the accuracy of the small CNN with the training process is shown in Figure 7. In Figure 7, DML_2 indicates that the queue contains two networks (CNN and AP-CNN), and DML_3 indicates that the queue contains three networks (CNN, AP-CNN, and MobileNet). DML-Ind represents the difference between the DML_2 or DML_3 and the small network CNN. The results in Table 3 shows that when MobileNet is added to the DML queue, the accuracy of the small CNN network is further improved by 0.5% when

C_{n}^{2} = 3.0 \times 10^{- 14} m^{- 2 / 3}

and by 0.9% when

C_{n}^{2} = 3.0 \times 10^{- 13} m^{- 2 / 3}

. It reveals that with the increase in the number of networks in the DML queue, the small CNN network can learn different knowledge from different networks, and the accuracy is further improved, which provides a new idea to get a high-precision OAM mode recognition network in the future.

4. Conclusions

The paper proposes an OAM state detection technique based on DML, which provides a solution to the problem of the contradiction between the high complexity of the detection network and the deployment requirements of the terminal of the optical communication system to ensure the detection accuracy of the OAM. Firstly, the principle of DML is introduced. Then, the importance of inter class dark knowledge in the OAM state detection task is analyzed, and the framework of the OAM state detection based on DML is presented, including the selection of the network in the DML queue and the specific parameter setting and training process. The experiments are designed to compare the accuracy of OAM state detection of small network CNN trained by DML and independent learning, as well as the complexity of large network AP-CNN and small network CNN. The MobileNet network is added to the mutual learning queue, and the scalability of the number of networks in the mutual learning queue is used to further improve the detection accuracy. The results show that the OAM state detection based on DML proposed in this paper can greatly reduce the complexity of the model, and is more suitable for the deployment of terminals on the premise of ensuring the detection accuracy. This research is of great significance in wireless communication and LIDAR target detection.

Author Contributions

Conceptualization and methodology, T.Q.; software, T.Q. and Z.Z.; validation, T.Q. and Y.Z.; formal analysis, Y.Z. and J.W.; editing, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (62071359 and 62271381) in part by the Postdoctoral Science Foundation in Shaanxi Province and the Fundamental Research Funds for the Central Universities (ZYTS23138).

Data Availability Statement

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Djordjevic, I.B. Deep-space and near-Earth optical communications by coded orbital angular momentum (OAM) modulation. Opt. Express 2011, 19, 14277–14289. [Google Scholar] [CrossRef] [PubMed]
Willner, A.E.; Ren, Y.; Xie, G.; Yan, Y.; Li, L.; Zhao, Z.; Wang, J.; Tur, M.; Molisch, A.; Ashrafi, S. Recent advances in high-capacity free-space optical and radio-frequency communications using orbital angular momentum multiplexing. Philos. Trans. R. Soc. A 2017, 375, 20150439. [Google Scholar] [CrossRef] [PubMed]
Zhu, X.; Kahn, J.M. Free-space optical communication through atmospheric turbulence channels. IEEE Trans. Commun. 2002, 50, 1293–1300. [Google Scholar] [CrossRef]
Guo, C.; Lu, L.; Wang, H. Characterizing topological charge of optical vortices by using an annular aperture. Opt. Lett. 2009, 34, 3686–3688. [Google Scholar] [CrossRef] [PubMed]
Knutson, E.; Lohani, S.; Danaci, O.; Huver, S.; Glasser, R. Deep learning as a tool to distinguish between high orbital angular momentum optical modes. In Proceedings of the SPIE Optical Engineering + Applications. Optics and Photonics for Information Processing X, San Diego, CA, USA, 28 August–1 September 2016; p. 997013. [Google Scholar] [CrossRef]
Doster, T.; Watnik, A. Machine learning approach to OAM beam demultiplexing via convolutional neural networks. Appl. Opt. 2017, 56, 3386–3396. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Zhang, M.; Wang, D. Adaptive demodulator using machine learning for orbital angular momentum Shift Keying. IEEE Photonic Technol. Lett. 2017, 29, 1455–1458. [Google Scholar] [CrossRef]
Li, J.; Zhang, M.; Wang, D.; Wu, S.; Zhan, Y. Joint atmospheric turbulence detection and adaptive demodulation technique using the CNN for the OAM-FSO communication. Opt. Express 2018, 26, 10494–10508. [Google Scholar] [CrossRef] [PubMed]
Qu, T.; Zhao, Z.; Zhang, Y.; Wu, J.; Wu, Z. Mode recognition of orbital angular momentum based on attention pyramid convolutional neural network. Remote Sens. 2022, 18, 4618. [Google Scholar] [CrossRef]
Howard, A.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017. [Google Scholar] [CrossRef]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar] [CrossRef]
Elsken, T.; Metzen, J.; Hutter, F. Neural architecture search: A survey. J. Mach. Learn Res. 2019, 20, 1997–2017. [Google Scholar]
Bucila, C.; Caruana, R.; Niculescu-Mizil, A. Model compression. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; pp. 535–541. [Google Scholar]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
Romero, A.; Ballas, N.; Kahou, S.; Chassang, A.; Gatta, C.; Bengio, Y. Fitnets: Hints for thin deep nets. arXiv 2015, arXiv:1412.6550. [Google Scholar]
Li, X.; Xiong, H.; Wang, H.; Huan, J. Delta: Deep learning transfer using feature map with attention for convolutional networks. arXiv 2019, arXiv:1901.09229. [Google Scholar]
Yim, J.; Joo, D.; Bae, J.; Kim, J. A gift from knowledge distillation: Fast optimization, Network minimization and Transfer learning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7130–7138. [Google Scholar] [CrossRef]
Park, W.; Kim, D.; Lu, Y.; Cho, M. Relational knowledge distillation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3962–3971. [Google Scholar] [CrossRef]
Farhadi, M.; Yang, Y. TKD: Temporal Knowledge Distillation for Active Perception. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 2–5 March 2020; pp. 942–951. [Google Scholar] [CrossRef]
Tian, Y.; Krishnan, D.; Isola, P. Contrastive representation distillation. arXiv 2019, arXiv:1910.10699. [Google Scholar]
Xu, X.; Zou, Q.; Lin, X.; Huang, Y.; Tian, Y. Integral knowledge distillation for multi-person pose estimation. IEEE Signal Proc. Lett. 2020, 27, 436–440. [Google Scholar] [CrossRef]
Zhang, Y.; Xiang, T.; Hospedales, T.; Lu, H. Deep mutual learning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4320–4328. [Google Scholar] [CrossRef]

Figure 1. Transfer of dark knowledge between classical knowledge distillation classes when detecting superimposed vortex light.

Figure 2. Transfer of dark knowledge between deep mutual learning classes when detecting superimposed vortex light.

Figure 3. Framework of OAM mode recognition technology based on DML.

Figure 4. Structure of CNN-based OAM mode recognition network.

Figure 5. Light intensity distribution of similar multi-mode OAM beam.

Figure 6. Variation in the recognition accuracy with the training process of two networks.

Figure 7. Variation in the recognition accuracy of small CNN networks with different DML networks queue.

Table 1. Accuracy of OAM mode recognition by DML and individual learning.

$C_{n}^{2}$ $/ (m^{- 2 / 3})$	Individual Learning		DML		DML-Ind
$C_{n}^{2}$ $/ (m^{- 2 / 3})$	CNN	AP-CNN	CNN	AP-CNN	CNN	AP-CNN
$3.0 \times 10^{- 14}$	95.3%	97.7%	96.4%	97.9%	1.1%	0.2%
$3.0 \times 10^{- 13}$	90.4%	94.4%	92.9%	95.1%	2.5%	0.7%

Table 2. Network complexity of small network CNN compared with large network AP-CNN.

Network Complexity	CNN	AP-CNN
Spatial Complexity	8.22M	14.09M
Time Complexity	52.50M	3061.25M

Table 3. Comparison of the recognition accuracy of small network CNN with different DML networks queue.

$C_{n}^{2}$ $/ (m^{- 2 / 3})$	DML_2	DML-Ind (2 Networks)	DML_3	DML-Ind (3 Networks)
$3.0 \times 10^{- 14}$	96.4%	1.1%	96.9%	1.6%
$3.0 \times 10^{- 13}$	92.9%	2.5%	93.8%	3.4%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qu, T.; Zhao, Z.; Zhang, Y.; Wu, J.; Wu, Z. Deep Mutual Learning-Based Mode Recognition of Orbital Angular Momentum. Photonics 2023, 10, 1357. https://doi.org/10.3390/photonics10121357

AMA Style

Qu T, Zhao Z, Zhang Y, Wu J, Wu Z. Deep Mutual Learning-Based Mode Recognition of Orbital Angular Momentum. Photonics. 2023; 10(12):1357. https://doi.org/10.3390/photonics10121357

Chicago/Turabian Style

Qu, Tan, Zhiming Zhao, Yan Zhang, Jiaji Wu, and Zhensen Wu. 2023. "Deep Mutual Learning-Based Mode Recognition of Orbital Angular Momentum" Photonics 10, no. 12: 1357. https://doi.org/10.3390/photonics10121357

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Mutual Learning-Based Mode Recognition of Orbital Angular Momentum

Abstract

1. Introduction

2. DML-Based OAM Mode Recognition

2.1. Principle of Deep Mutual Learning

2.2. OAM Mode Recoginition Based on DML

3. Numerical Results and Discussions

3.1. Simulation Data Set Construction

3.2. Analysis of OAM Mode Recognition Results Based on DML

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI