Next Article in Journal
Enhancing Smartphone Battery Life: A Deep Learning Model Based on User-Specific Application and Network Behavior
Previous Article in Journal
Auto-Trimming-Based Designs for the Test Optimization of Mass-Produced Automotive Microcontroller Unit Chip Probing Tests
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Method of Mobile Speed Measurement Using Semi-Supervised Masked Auxiliary Classifier Generative Adversarial Networks

by
Eunchul Yoon
and
Sun-Yong Kim
*
Department of Electrical and Electronics Engineering, Konkuk University, 120 Neungdong-ro, Seoul 05029, Republic of Korea
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(24), 4896; https://doi.org/10.3390/electronics13244896
Submission received: 23 November 2024 / Revised: 8 December 2024 / Accepted: 10 December 2024 / Published: 12 December 2024

Abstract

:
We propose a semi-supervised masked auxiliary classifier generative adversarial network (SM-ACGAN) that has good classification performance in situations where labeled training data are limited. To develop SM-ACGAN, we combine the strengths of SSGAN (semi-supervised GAN), ACGAN-SG (auxiliary classifier GAN based on spectral normalization and gradient penalty), and MaskedGAN. Additionally, we devise a novel masking technique that performs masking adaptively depending on the real/fake ratio of the input data and a novel regularization technique that stabilizes the generator training depending on the maximum ratio of the average power of the generated fake data to the average power of the noise latent variables. Finally, we devise a rule of selecting an appropriate quantity of unlabeled data and labeled fake data generated by the generator for effective data augmentation. Through simulations, we demonstrate that SM-ACGAN has lower root mean square error (RMSE) values and lower variance, demonstrating superior mobile speed measurement performance on Rician channels compared to ACGAN-SG, MaskedGAN, SSGAN, a CNN (convolutional neural network), and a DNN (deep neural network).

1. Introduction

Research has been actively conducted to improve the measurement performance of wireless communication systems using machine learning [1,2]. In order to achieve the desired performance improvement through machine learning, sufficient labeled training data are required [3,4]. However, in most wireless communication systems, obtaining sufficient labeled training data is a difficult task that involves much cost and time [5]. Ultimately, the lack of labeled training data has been a major obstacle making it difficult to apply machine learning to practical wireless communications.
In this paper, we utilize a generative adversarial network (GAN) as a machine learning prototype to improve data classification performance of wireless communication systems in situations where labeled training data are limited. The generator of a GAN takes random noise as input and tries to create fake images that look real, and the discriminator of the GAN tries to better distinguish the fake images generated by the generator [6]. Through this adversarial learning process, the GAN can eventually generate high-quality fake images. GANs have been used in a variety of fields, including image generation, image-to-image conversion, video synthesis, 3D generation, text-to-image generation, data augmentation, audio and speech synthesis, medical image analysis, web security such as CAPTCHA, etc. [7,8,9,10,11,12,13,14,15].
However, in many fields that require image generation via GANs, it is often difficult or impossible to collect a sufficiently large dataset in a given application domain due to constraints such as imaging cost, privacy, and copyright status. The performance of GANs seriously deteriorates in situations where training data are limited [16]. Due to limited labeled training data, the discriminator of a GAN exhibits an overfitting behavior during training, which causes problems such as training instability [17], over-confidence [18], mode collapse [19], and a deterioration of the quality of generated images.
For a few years, research on few-shot GAN [20,21,22,23,24] and data-efficient GAN [25,26,27,28,29,30] has been receiving a lot of attention, which aims to generate high-quality images with a small quantity of data. As a result of these studies, data augmentation [25,28,30], modulation-based methods [22], regularization methods [23], masking techniques [28,29,30], and lottery ticket techniques [30] were developed. However, most of those methods and techniques focus on improving the quality and generalization of the images created by the generator in situations with limited labeled training data, rather than improving the classification performance of the discriminator.
Recently, an ACGAN (auxiliary classifier GAN) was used to improve the classification performance of the discriminator in situations with limited labeled training data. In [31], ACGAN-SG (ACGAN based on spectral normalization and gradient penalty), a new type of ACGAN based on the spectral normalization [32] and gradient penalty techniques [33], was introduced to improve the bearing fault diagnosis performance of the discriminator. In [34], another type of ACGAN with a confidence mechanism added to the discriminator loss function was introduced to improve the performance of analog circuit fault diagnosis. Inspired by [31,34], we developed an improved ACGAN that focused on improving the data classification ability of the discriminator for wireless communication systems with limited labeled training data.
In this paper, we propose a semi-supervised masked ACGAN (SM-ACGAN) that has good classification performance in situations where labeled training data are limited. To develop SM-ACGAN, we combine the strengths of SSGAN (semi-supervised GAN) [35], ACGAN-SG, and MaskedGAN [28]. For the architecture of SM-ACGAN, we adopt that of ACGAN-SG, where the generator receives noise latent variables and data classes as input to generate fake data, and the discriminator determines not only whether the data are real or fake but also the classes of the data. We design the loss function of SM-ACGAN by adopting the advantages of the gradient-penalty-based loss function [31,33], the confidence-based loss function [34], and the masking-based loss function [9]. Additionally, we devise a novel masking technique that performs masking adaptively depending on the real/fake ratio of the input data and a novel regularization technique that stabilizes the generator training depending on the maximum ratio of the average power of the generated fake data to the average power of the noise latent variables. Finally, we devise a rule for selecting an appropriate quantity of unlabeled data and labeled fake data generated by the generator for effective data augmentation. Through simulations, we demonstrate that SM-ACGAN has a lower root-mean-square error (RMSE) values and lower variance, demonstrating superior mobile speed measurement performance on Rician channels compared to ACGAN-SG, MaskedGAN, SSGAN, a CNN (convolutional neural network) [36], and a DNN(deep neural network) [37].

2. System Model

We considered an OFDM system consisting of N F ( = 128 ) subcarriers. N M sets of channel frequency response (CFR) data are obtained based on the signals received by the base station from N M test user equipment devices (TeUEs). The channel impulse response of the mth TeUE for m = 1 , 2 , , N M is defined by a tapped delay line model with L ( = 4 ) taps,
h ( m ) [ t ] = [ h 0 ( m ) [ t ] h 1 ( m ) [ t ] h L 1 ( m ) [ t ] ]
for t = 0 , 1 , , N D a t a 1 , where t denotes the OFDM symbol index, and h l ( m ) [ t ] denotes the coefficient of the lth channel path. To consider the effects of the line-of-sight (LOS) channel, we modeled the channel coefficient of the mth TeUE as in [38] by
h l ( m ) [ t ] = Ω h , l 10 0.1 K r ( m ) + 1 h l , a ( m ) [ t ] + 10 0.1 K r ( m ) Ω h , l 10 0.1 K r ( m ) + 1 h l , b ( m ) [ t ]
where h l , a ( m ) [ t ] and h l , b ( m ) [ t ] denote non-LOS and LOS components of the channel coefficient, respectively, K r ( m ) denotes the Rician K-factor defined on a log scale, and Ω h , l ( m ) denotes the power delay profile. K r ( m ) can be negative in urban environments and can be as high as 10 dB or more in rural areas or on highways with fewer obstacles [39]. Based on this, we assumed that K r ( m ) was uniformly distributed in [ 5 dB , 5 dB ] . We assumed an exponential power delay profile (PDP) defined by
Ω h , l = ρ ( m ) l l = 0 L 1 ρ ( m ) l
where ρ ( m ) is a positive number less than or equal to one. To simulate the diverse channel environments of N M TeUEs, we assumed that ρ ( m ) was uniformly distributed in [ 0.1 , 1 ] . We generated the time-varying fading channel components, h l , a ( m ) [ t ] t = 1 N D a t a , by using an autoregressive (AR) model [40]. The autocorrelation function of h l , a ( m ) [ t ] was presented in [38] as
r h l , a ( m ) [ t ] = I 0 ( κ ( m ) ) 2 ( 2 π f d ( m ) T s t ) 2 + j 4 π κ ( m ) f d ( m ) T s t cos α ( m ) I 0 ( κ ( m ) )
where f d ( m ) denotes the Doppler spread, κ ( m ) denotes the azimuth angle of arrival (AOA) width factor which controls the width of the azimuth AOA, α ( m ) denotes the mean direction of the azimuth AOA, T s denotes the data-sampling time period, and I 0 ( · ) denotes the zeroth-order modified Bessel function of the first kind. If κ ( m ) is not too small, it can be approximated as κ ( m ) = ( 360 / θ ( m ) / π ) 2 , where θ ( m ) denotes the angle spread [41]. For severe non-isotropic scattering, the angle spread, θ ( m ) , is mostly less than 30 , and for very severe non-isotropic scattering, it is less than 10 [42]. Based on this, we assumed that θ ( m ) had a uniform distribution in [ 10 , 30 ] , which was equivalent to the assumption that κ ( m ) had a uniform distribution in [ 1.31 , 14.6 ] . When scatterers are distributed randomly, scattered waves can be incident at any angle in space. Therefore, we assumed that α ( m ) had a uniform distribution in [ π , π ] . The lth Rician channel component, h l , b ( m ) [ t ] , can be written [38] as
h l , b ( m ) [ t ] = e j 2 π f d ( m ) T s t cos α 0 ( m ) + j ϕ 0 ( m )
where α 0 ( m ) is the parameter controlling the azimuth AOA direction of the LOS component, and ϕ 0 ( m ) is the parameter representing the phase of the LOS component. We assumed that α = α 0 as in [42] and that ϕ 0 ( m ) had a uniform distribution in [ 0 , 2 π ] . The CFR coefficient over the kth subcarrier is given by the discrete Fourier transform (DFT) of h ( m ) [ t ] ,
H k ( m ) [ t ] = l = 0 L 1 h l ( m ) [ t ] e j 2 π N F k l
for k = 0 , 1 , , N F 1 . The OFDM demodulated signal over the kth subcarrier of the tth OFDM symbol for the mth TeUE can be written as
Y k ( m ) [ t ] = H k ( m ) [ t ] X k ( m ) [ t ] + W k ( m ) [ t ]
where X k ( m ) [ t ] denotes the transmitted symbol, and W k ( m ) [ t ] denotes a zero-mean circularly symmetric complex Gaussian noise with variance σ ( m ) 2 . If the number of pilot symbols in the OFDM block is N p ( = 16 ) , the CFR coefficient over the subcarrier of the pth pilot symbol for p = 0 , 1 , , N p 1 can be estimated by the least square detection method in [43] as
H ^ k p ( m ) [ t ] = Y k p ( m ) [ t ] / X k p ( m ) [ t ]
where X k p ( m ) [ t ] denotes the pilot symbol, and k p denotes the subcarrier index of the pth pilot symbol. We used the minimum mean-square-error-based channel interpolation method in [44] to estimate the CFR coefficients, { H ^ k ( m ) [ t ] } 0 t N D a t a 1 , 0 k N F 1 . We assumed that the pilot symbol had unit average power. Then, the average signal-to-noise ratio (SNR) was defined as 1 / σ ( m ) 2 . The SNR encountered in real-world mobile communications can be negative or higher than 30 dB depending on the surrounding environment [45]. Considering that we targeted Rician channels in our simulations, we assumed that the SNR was uniformly distributed in the range [ 10 dB ,   20 dB ] . The CFR data of the mth TeUE were defined as
H ( m ) = H ^ k ( m ) [ t ] | k = 0 , 1 , , N F 1 , t = 0 , 1 , , N D a t a 2 1 .
By extracting key features from the original data to create processed data and using them in machine learning, the efficiency of machine learning can be improved [46,47]. As processed data that can effectively extract features from CFR data for Doppler spread measurement, we formed power spectral density (PSD) data. The PSD on the kth subcarrier at the uth Doppler frequency index can be calculated [48] as
S k ( m ) [ u ] = 1 N D a t a t = 0 N D a t a 1 H ^ k [ t ] ( m ) e j 2 π N D a t a u t 2
where u = 0 , 1 , , N D a t a 2 1 denotes a Doppler frequency index. For each u value, the average PSD was obtained by averaging the values of S k ( m ) [ u ] over all k values, and the result was defined as S ( m ) [ u ] . The vector of S ( m ) [ u ] values for u = 0 , 1 , , N D a t a 2 1 was defined as s ( m ) . The tensor obtained by centering the vector s ( m ) at the 0 vector of length 1024 and rearranging it into a tensor of size [ 1 , 1 , 64 , 64 ] was defined as X ( m ) . To practically obtain X ( m ) , the base station must collect N F × N D a t a CFRs of the mth TeUE in a 2D frequency–time space. Since X ( m ) has a tensor size equal to a black-and-white image of size 64 × 64 , it can be considered as image data. By stacking all the X ( m ) data for m = 1 , 2 , , N M , a tensor of the size of [ N M , 1 , 64 , 64 ] can be formed, which was defined as a training dataset and was denoted as X .
If the mth TeUE measures the speed v ( m ) moving toward the base station and supplies the result to the base station, the base station can find the Doppler spread of the mth TeUE channel by
f d ( m ) = f 0 c 0 × v ( m )
where c 0 denotes the speed of the radio wave given by c 0 = 3 × 10 8 m/s, and f 0 denotes the center frequency of the radio wave. The Doppler frequency index, u, and the Doppler frequency, f, are related by
u = f / Δ f .
Herein, Δ f denotes the Doppler frequency sampling interval defined as Δ f = 1 / T , where T denotes the time taken to collect channel information defined as T = N D a t a T s . Based on Equations (11) and (12), the Doppler spread index u d ( m ) can be written in terms of the mobile speed v ( m ) as
u d ( m ) = f 0 T c 0 × v ( m ) .
The values of u d ( m ) for m = 1 , 2 , , N M can be arranged to form a vector of size [ N M , 1 ] , which was defined as Y and used as label data for machine learning. [ X , Y ] are the labeled training dataset used to train and verify a neural network. For the convenience of training neural networks, X was divided into X T r a i n and X V a l i d , each of which had size [ N M / 2 , 1 , 64 , 64 ] . X T r a i n and X V a l i d were used for the training process and the validation process, respectively. X T r a i n was divided into n sub-tensors of size [ N M / 2 / n , 1 , 64 , 64 ] , and each sub-tensor was used for training networks in a mini-batch loop. Assuming that N B a t c h = N M / 2 / n is a positive integer, we can write N M = 2 n N B a t c h . We denoted a sub-tensor of size [ N B a t c h , 1 , 64 , 64 ] used for training data in a mini-batch loop as x . Similarly to the above, Y was divided into Y T r a i n and Y V a l i d , and Y T r a i n was divided into sub-tensors of size [ N B a t c h ,   1 ] . We denoted the sub-tensor of size [ N B a t c h ,   1 ] used as labeling data for a mini-batch loop by y .
After training the neural network designed to measure mobile speed by using [ X , Y ] , the base station can use the trained neural network to measure the speed of the target user equipment (TaUE). Similar to how the base station formed X ( m ) from the channel information of the mth TeUE as mentioned above, the base station first collects the channel information of the TaUE and forms X ( m ) . Here, m is the index of the TaUE. The trained neural network performs a classification task with X ( m ) , and as a result, finds the Doppler spread index of the TaUE. By applying the searched Doppler spread index, u d ( m ) , to Equation (13), the speed of the TaUE, v ( m ) , can be computed. Referring to the LTE standards [49,50], we assumed f 0 = 2.4 GHz in our simulations. We also assumed T s = 0.001 s and N D a t a = 1000 to limit the time for the base station to collect the channel information of the TaUE, T, to 1 s. Under these simulation settings, the mobile speed can be estimated based on the Doppler spread index estimated by the trained neural network as
v ( m ) = c 0 f 0 T × u d ( m ) = 0.45 × u d ( m ) km / hour .
In machine learning, performance evaluation indicators such as accuracy and F1-score are widely used. When the number of target classes for classification is large or the class distribution is imbalanced, the accuracy and F1-score results may become very small, or one or two classes may have a large impact on the overall results, which may lead to the underestimation of model performance or inaccurate performance comparison between models [51,52]. In the mobile speed estimation task, the data class is large, such as 100, and the class distribution is asymmetric. Therefore, we defined the root-mean-square error (RMSE) as another performance evaluation metric for the mobile speed measurement,
R M S E = c 0 f 0 × 1 N M m = 1 N M f d ( m ) f d 2 1 / 2
where f d ( m ) and f d denote the estimated Doppler spread of the mth TeUE and the actual Doppler spread, respectively.

3. Proposed SM-ACGAN

3.1. Design of the SM-ACGAN Architecture

Figure 1 compares the structural differences between SM-ACGAN and ACGAN-SG. In Figure 1, D and G denote the discriminator and the generator, respectively, and x , y , y , y ^ , y ^ , v , v , and z denote the data tensor, the true label vector, the fake label vector, the estimated class probability tensor of y (the estimated one-hot encoded tensor of y ), the estimated class probability tensor of y (the estimated one-hot encoded tensor of y ), the validity flag vector of x , the validity flag vector of the fake data tensor G ( z , y ) , and the noise vector, respectively. Note that the output of D when given real data x as its input is v and y ^ , i.e., [ v , y ^ ] = D ( x ) , and the output of D when given fake data G ( z , y ) as its input is v and y ^ , i.e., [ v , y ^ ] = D ( G ( z , y ) ) . The architecture of SM-ACGAN is similar to that of ACGAN-SG, but unlike ACGAN-SG, the discriminator of SM-ACGAN applies masking to the validity flag output and the class probability tensor output to improve training stability. In order to stabilize the SM-ACGAN training, we applied masking to the validity flag vector and the class probability tensor of the discriminator. We implemented the confidence mechanism of the discriminator using a confidence filter that passed only elements of v and y ^ that were in the same position as the elements of v greater than a threshold.
The six types of layer blocks used to construct the architectures of the machine learning models considered in this paper are shown in Table 1.
Table 2 shows the detailed architecture of the SM-ACGAN discriminator. The SM-ACGAN discriminator consists of a type-A block, four type-B blocks, a reshape operator, two linear layers, a sigmoid activation function, and a mask layer. The type-A block consists of a 2D convolution layer, a spectral normalization layer, a rectified linear unit (ReLU) function layer, and a dropout function layer. SpectralNorm is a kind of regularization function defined in PyTorch (https://pytorch.org/docs/stable/generated/torch.nn.utils.spectral_norm.html accessed on 9 December 2024). As in ACGAN-SG [31], we adopted S p e c t r a l N o r m in the SM-ACGAN discriminator because this function can alleviate the mode collapse problem of the neural network model and stabilize learning by limiting the maximum singular value of the weight matrix. We also adopted InstanceNorm 2 d because it performs normalization independently for each color channel of the input data, thereby preserving the characteristics of the data and enabling stable training even with a small batch size. Since we assumed a relatively small mini-batch size N B a t c h = 8 due to a lack of labeled training data, InstanceNorm 2 d was particularly useful in the design of SM-ACGAN. The type-B block consists of a 2D convolution layer, a spectral normalization layer, a 2D instance normalization layer, a ReLU layer, and a dropout layer.
Table 3 shows the detailed architecture of the SM-ACGAN generator. The SM-ACGAN generator consists of a noise latent variable generator, a type-C block, a one-hot encoder, a label expander, a tensor concatenator, three type-D blocks, and a type-E block. The type-C block consists of a 2D inverse convolution layer, a spectral normalization layer, a 2D instance normalization layer, a ReLU function layer, and a dropout function layer. The type-D block is identical to the type-C block except that the padding parameter of the 2D inverse convolution layer is set to 1. The type-E block consists of a 2D inverse convolution layer and a hyperbolic tangent function layer. We adopted ConvTranspose 2 d in the type-C block, type-D block, and type-E block because it can change low-resolution images into high-resolution images, gradually improving image resolution and class prediction. Tanh as shown in the type-E block is a hyperbolic tangent function that converts input values to values between −1 and 1.

3.2. Design of the Loss Function of the SM-ACGAN Discriminator

The process of computing the loss function of the SM-ACGAN discriminator is shown in Figure 2a. If labeled training data ( x , y ) are given for a mini-batch loop, the discriminator loss function of ACGAN [53] can be written as
L D , l a b e l = 1 4 f B C E { v , 1 } + f C E { y ^ , y } + 1 4 f B C E { v , 0 } + f C E { y ^ , 0 }
where 1 and 0 denote the vector consisting of 1’s and that consisting of 0’s, respectively, and f B C E { x } and f C E { x } denote the binary cross-entropy loss function and the cross-entropy loss function, respectively. Note that the 1’s in v indicate that the input data at their positions are real, and the 0’s in v indicate that the input data at their positions are fake.
If we implement a confidence mechanism in the discriminator as in [34], the discriminator loss function for labeled training data ( x , y ) can be written as
L D , l a b e l ( c o n f i d e n c e ) = 1 4 f B C E { v , 1 } + f C E { y ^ , y } + 1 4 f B C E { v , 0 } + f C E { y ^ , 0 }
where v and y ^ denote the output of a confidence filter that passes only elements of v and y ^ that are in the same position as the elements of v greater than a threshold. We adopted 0.2 as the threshold value of the confidence filter as in [34].
As in the SSGAN introduced in [35], we included the loss function term due to unlabeled training data as a component of the discriminator loss function. If unlabeled training data x ¯ are given for a mini-batch loop, the discriminator loss function for x ¯ is given by
L D , u n l a b e l = 1 2 f B C E { v ¯ , 1 } + 1 4 f B C E { v , 0 } + f C E { y ^ , 0 }
where v ¯ denotes the validity flag vector of x ¯ .
In [28], it was proved that uniformly generated masking for generator (or discriminator) reduced the gradient of the generator (or discriminator) loss function, thereby lowering the learning rate of the generator (or discriminator). In [54], it was shown that a GAN could reach the Nash equilibrium when the learning rate of the generator was far smaller than that of the discriminator, based on the theory of stochastic approximation. When training data are limited, the GAN’s discriminator tends to overfit, which means it memorizes small training samples and focuses only on easily distinguishable image locations and spectra, instead of having a holistic understanding of the image. When masking is applied to the discriminator, the time for the GAN to reach the Nash equilibrium increases. Therefore, by applying masking to the discriminator, we can prevent the discriminator from learning too quickly, thereby alleviating unstable learning, overconfidence, and mode collapse problems, and improve the diversity and quality of the generated images. In [28,29,30], methods for applying masking to the discriminator were presented to reflect these principles. We applied masking to the validity flag vector and class probability tensor of the SM-ACGAN discriminator and added the resulting loss function term to the discriminator loss function. The loss function term due to masking can be written as
L D , m a s k e d = 1 2 f B C E f M { v , R m a s k } , f M { 1 , R m a s k } + f C E f M { y ^ , R m a s k } , f M { y , R m a s k }
where f M denotes a function that randomly selects a number of elements of the input according to the ratio R m a s k and masks them to 0. While the masking ratio R m a s k was fixed to a constant such as 0.2 in [28,29,30], we introduced a novel technique to adaptively change R m a s k in each mini-batch loop. If the discriminator judges real data as fake data, the discriminator needs to be trained more, so the learning rate for the discriminator should be lowered. On the other hand, if the discriminator judges the real data as real data, the discriminator is operating normally, so the learning rate can be increased to reduce the training time of the discriminator. Since the learning speed becomes slower as the masking ratio increases, we determined the masking ratio R m a s k as the rate at which the discriminator receives a batch of real data as input and distinguishes those as fake, which can be written as
R m a s k = 1 f s u m { v / f m a x { v } } / f l e n g t h { v } if f m a x { v } 0.75 1 f s u m { v } / f l e n g t h { v } otherwise
where f s u m , f m a x , and f l e n g t h denote functions that return the sum, the maximum component, and the length of the input vector, respectively. In Equation (20), when f m a x { v } < 0.75 , R m a s k is set to a larger value than when f m a x { v } 0.75 . According to Equation (20), if there are many 0’s in v , the masking ratio increases, and the learning rate of the discriminator decreases, so that learning progresses slowly, while if there are many 1’s in v , the masking ratio decreases, and the learning rate of the discriminator increases, so that learning progresses quickly. Therefore, by adaptively changing the masking ratio of the discriminator according to Equation (20), which is designed based on the analysis of [54], the learning of the discriminator and generator can be made more stable so that the discriminator can have more stable classification performance.
To further improve the stability of SM-ACGAN training, as in [31], we included the discriminant loss function term introduced by WGAN-GP (Wasserstein GAN with gradient penalty) [33] as part of the discriminator loss function of SM-ACGAN, which can be written as
x ˜ = α x + ( 1 α ) G ( z , y )
L D , W a s s e r s t e i n = f g r a d { v ˜ , x ˜ }
where α denotes a uniformly distributed random variable, and f g r a d { a , b } is a function that computes the sum of gradients of a with respect to b , and v ˜ is the validity flag vector output when x ˜ is given to the discriminator as input.
By combining the loss function terms described above, we determined the discriminator loss function of SM-ACGAN as
L D = L D , l a b e l ( c o n f i d e n c e ) + L D , u n l a b e l + λ m a s k e d · L D , m a s k e d + λ W a s s e r s t e i n · L D , W a s s e r s t e i n .
Herein, λ m a s k e d and λ W a s s e r s t e i n are hyper-parameters, which we set to 3 and 10, respectively.

3.3. Design of the Loss Function of the SM-ACGAN Generator

The process of computing the loss function of the SM-ACGAN generator is shown in Figure 2b. The generator loss function of ACGAN [53] can be written as
L G , d a t a = 1 2 f B C E { v , 1 } + f C E { y ^ , y } .
In [55], it was shown that minimizing the maximum value of the mode search term of the fake images generated by the generator could not only greatly improve the diversity of fake images but also improve the classification performance of the discriminator to some extent. Motivated by [55], we designed a new regularization method for generator learning that focused on stabilizing the classification performance of the discriminator. Our novel generator loss function term was written as
L G , p o w e r = f m a x f m e a n G ( z , y ) 2 / f m e a n ( z 2 )
where f m a x is a function that returns the maximum of the components in the input vector, f m e a n is a function that returns a vector whose components are the average values of the data in the input tensor, and · 2 is a function that returns a tensor whose components are the squared absolute values of the data components of the input tensor. The term in Equation (25) corresponds to the maximum value of the ratio of the average power of the fake data generated by the generator to the average noise power. Assuming that the average power of the noise is constant, regularizing the generator learning with the term in Equation (25) can be thought of as limiting the average power of the fake images generated by the generator. This regularization operation prevents the discriminator from being trained to be accustomed to the classification task only for input data with high power and induces the discriminator to perform the classification task stably for all input data. In the simulation results section of this paper, we show the effectiveness of using the term in Equation (25) as a part of the generator loss function.
By combining the loss function terms described above, we determined the generator loss function of SM-ACGAN as
L G = L G , d a t a + λ p o w e r · L G , p o w e r .
Herein, λ p o w e r is a hyper-parameter, which we set to 1.

3.4. Effective Data Augmentation Using Unlabeled Data and Labeled Fake Data

The key to improving SM-ACGAN performance by using unlabeled data and labeled fake data generated by the SM-ACGAN generator is to appropriately balance the quantity of unlabeled data (i.e., N U n l a b e l ) and the quantity of labeled fake data generated by the SM-ACGAN generator (i.e., N G e n ) with the quantity of available labeled data (i.e., N T r a i n ). Through numerous experimental simulations, we found that the following formulas for determining N U n l a b e l and N G e n led to effective data augmentation:
N G e n = 15 N B a t c h if 1 n 10 m a x ( 1 , 15 n / 10 ) × N B a t c h otherwise ,
N U n l a b e l = N G e n if 1 n < 100 2 N G e n otherwise .
Herein, m a x ( a , b ) and c are functions that return the maximum of a and b and the greatest integer less than or equal to c, respectively. The solutions in Equations (27) and (28) were determined heuristically based on simulation results of the mobile speed measurement task and therefore cannot be used as general optimal solutions for other classification tasks. However, the solutions in Equations (27) and (28) were not determined arbitrarily. They were formed based on the following design strategy: when the training data are small, a relatively large quantity of auxiliary data (i.e., unlabeled data and labeled fake data generated by the generator) should be used compared to the training data, and when the training data are large, a relatively small quantity of auxiliary data should be used. Through simulations, we showed that the performance of the mobile speed measurement task could be improved when using the solutions in Equations (27) and (28), which demonstrated that the design strategy mentioned above was effective at determining the quantity of auxiliary data. We used Equations (27) and (28) to determine N U n l a b e l and N G e n in all subsequent simulations of SM-ACGAN.

4. The Training and Validating Processes of SM-ACGAN

Figure 3 shows the training and validating processes of SM-ACGAN. In the training process, there is an epoch loop that repeats N E p o c h times. In the epoch loop, a mini-batch loop is executed that evaluates the performance of SM-ACGAN. If N U n l a b e l unlabeled training data are given, those are divided by N B a t c h to form n U n l a b e l = N U n l a b e l / N B a t c h data groups of unlabeled training data. With those data groups, the mini-batch loop repeats n U n l a b e l times. Then, N G e n labeled fake training data are generated by the generator and mixed with N T r a i n real labeled training data to make N T r a i n + N G e n mixed labeled training data. Those data are divided by N B a t c h to form n L a b e l = ( N T r a i n + N G e n ) / N B a t c h data groups of labeled training data. With those data groups, the mini-batch loop repeats n L a b e l times. We computed the losses of neural networks in mini-batch units. In each mini-batch loop, we performed the process of backward error propagation by applying the backward function of Pytorch. Then, we updated the weights of the neural network based on the stochastic gradient decent (SGD) algorithm [47] by calling the step function of the torch . optim . Adam optimizer object [56,57] in Pytorch. For the scheduler, we used the torch . optimum . lr _ scheduler . StepLR object of Pytorch. We set the step size and the gamma to 1 and 0.5, respectively. After the training process, the validating process followed, which was performed using N V a l i d labeled training data. Although omitted in Figure 3, the validating process also included a mini-batch loop, which evaluated the classification performance of the trained discriminator. The mini-batch loop in the validating process repeats n V a l i d = N V a l i d / N B a t c h times. After all iterations of the epoch loop are completed, the optimal discriminant that minimizes the speed measurement RMSE is stored for future use. The last part of Figure 3 shows the process of evaluating the optimal discriminator with labeled test data. The performance of the optimal discriminator evaluated on labeled test data corresponds to its actual performance when applied to real tasks.

5. Simulation Results

In this section, we compare the mobile speed measurement performance of SM-ACGAN with the performance of machine learning methods presented below, assuming limited labeled training data.
  • DNN: This stands for deep neural network [37]. A DNN is essentially a discriminator made up of linear layers, which determines the Doppler spread index of the PSD image provided as input. Table 4 shows the detailed architecture of the DNN. The loss function of the DNN is defined by
    L = f C E { y ^ , y } .
  • CNN: This stands for convolutional neural network [36]. A CNN is essentially a discriminator made up of convolutional and linear layers, which determines the Doppler spread index of the PSD image provided as input. Table 5 shows the detailed architecture of the CNN. The loss functions of the CNN is defined by Equation (29).
  • SSGAN: This stands for a semi-supervised GAN consisting of a discriminator and a generator. Table 5 and Table 6 show the detailed structures of the SSGAN discriminator and the SSGAN generator, respectively. The loss function of the SSGAN discriminator is given by
    L D = L D , l a b e l + L D , u n l a b e l
    where L D , l a b e l and L D , u n l a b e l are given by Equations (16) and (18). The loss function of the SSGAN generator is given by Equation (24).
  • ACGAN-SG: This stands for an auxiliary classifier GAN based on spectral normalization and gradient penalty [31]. Table 5 and Table 6 show the detailed architectures of the ACGAN-SG discriminator and the ACGAN-SG generator, respectively. The loss function of the ACGAN-SG discriminator is given by
    L D = L D , l a b e l + λ W a s s e r s t e i n · L D , W a s s e r s t e i n
    where L D , l a b e l and L D , W a s s e r s t e i n are given by Equations (16) and (22), respectively. We assumed λ W a s s e r s t e i n = 10 . The loss function of the ACGAN-SG generator is defined by Equation (24).
  • MaskedGAN: This stands for a masked GAN consisting of a discriminator and a generator [28]. Table 2 and Table 7 show the detailed architectures of the MaskedGAN discriminator and the MaskedGAN generator, respectively. The loss function of the MaskedGAN discriminator is given by
    L D = L D , l a b e l + λ m a s k e d · L D , M a s k e d
    where L D , l a b e l and L D , M a s k e d are given by Equations (16) and (19), respectively. We assumed R m a s k = 0.2 and λ m a s k e d = 3 . The loss function of the MaskedGAN generator is defined by
    L G = L G , d a t a + λ m a s k e d ( f a k e d a t a ) · f B C E { v , 1 } + f C E { y ^ , y }
    where L G , d a t a is given by Equation (24). Herein, v and y ^ are defined by
    ( v , y ^ ) = D f M ( f a k e d a t a ) { G ( z , y ) }
    where f M ( f a k e d a t a ) denotes a function that randomly selects a number of elements of the input tensor according to the ratio of R m a s k ( f a k e d a t a ) and masks them to zero. We assumed R m a s k ( f a k e d a t a ) = 0.2 and λ m a s k e d ( f a k e d a t a ) = 1.5 .
One of the representative non-machine learning-based methods for measuring mobile speed is to use the Doppler frequency location with the largest PSD size [42]. Once the Doppler spread index is estimated this way, the mobile speed can be found based on Equation (14). For reference purposes, we included this method in our simulations alongside a previously introduced machine learning method for mobile speed measurement, which we refer to as TRAD.
For simulations, we assumed that N C l a s s = 100 , N z = 100 , N E p o c h = 200 , N B a t c h = 8 , N T r a i n = N V a l i d = n × N B a t c h , and N M = 2 N T r a i n = 2 N V a l i d . Note that n is a positive integer that represents the number of mini-batch data groups. With these settings, the number of labeled data required for training and validating (i.e., N M ) can be expressed as N M = 16 n .
Figure 4 compares the losses of the discriminator and generator in four SM-ACGAN cases when n = 20 . The first was the case using the proposed SM-ACGAN. The second was the case where SM-ACGAN was used when R m a s k of Equation (20) was set to a fixed value of 0.2. The third was the case where SM-ACGAN was used when the term in Equation (25) was set to zero. The fourth was the case of using SM-ACGAN without data augmentation, which meant that unlabeled data and labeled fake data generated by the generator were not used for the training. It can be seen from Figure 4 that the losses of the discriminator and generator in the first SM-ACGAN case converged far faster and more stably than the losses of the discriminator and generator in the second SM-ACGAN case. This result implies that varying the masking ratio according to Equation (20) can make training of the discriminator and generator more stable than when using a fixed masking ratio. The generator loss of the first SM-ACGAN case converged faster and more stably than the generator loss of the fourth SM-ACGAN case. This result implies that data augmentation with unlabeled data and labeled fake data generated by the generator can stabilize the generator training. For epoch numbers larger than 100, the loss variances of the discriminator and generator in the first case were 0.0043 and 0.0013, respectively, whereas the loss variances of the discriminator and generator in the third case were 0.0082 and 0.0064, respectively.
This implies that regularizing training the generator by using L G , p o w e r in Equation (25) can reduce the error fluctuation in the fake data generated by the generator and lead to more stable classification performance of the discriminator.
Figure 5 compares the mobile speed’s RMSEs of the discriminator and generator in the four SM-ACGAN cases defined in Figure 4. From the simulation results in Figure 5, we can see the following: Firstly, the performance of the proposed SM-ACGAN outperformed the performance of the SM-ACGAN that used a fixed masking ratio. The mean and variance of the RMSEs from the proposed SM-ACGAN were 8.633 and 0.042, whereas the mean and variance of the RMSEs from the SM-ACGAN with a fixed masking ratio were 8.851 and 0.136. The fact that the proposed SM-ACGAN had a smaller mean and a smaller variance verified that using a masking ratio that varied according to Equation (20) enabled mobile speed measurement with higher accuracy and higher stability. Secondly, the performance of the proposed SM-ACGAN outperformed the performance of the SM-ACGAN that did not use the L G , p o w e r term in Equation (25). The mean and variance of the RMSEs from the proposed SM-ACGAN were 8.633 and 0.043, whereas the mean and variance of the RMSEs from the SM-ACGAN that did not use the L G , p o w e r term in Equation (25) were 8.832 and 0.162. This verified that regularizing the generator with the maximum ratio of the average power of the fake data generated by the generator to the average power of the noise latent variables could reduce the error fluctuation in the generated fake data and lead to a more improved classification performance of the discriminator. Thirdly, the performance of the proposed SM-ACGAN outperformed the performance of the SM-ACGAN with N U n l a b e l = N G e n = 0 . The mean and variance of the RMSEs from the proposed SM-ACGAN were 8.633 and 0.042, whereas the mean and variance of the RMSEs from the SM-ACGAN with N U n l a b e l = N G e n = 0 were 9.223 and 0.069. This confirmed that applying data augmentation to SM-ACGAN according to Formulas (27) and (28) improved the classification performance of the discriminator.
Figure 6 compares the losses of the discriminator and generator of SM-ACGAN with those of ACGAN-SG, MaskedGAN, SSGAN, the CNN, and the DNN when n = 20 . It can be seen that the discriminator loss of SM-ACGAN tended to converge much faster and more stably than those of the CNN, DNN, and MaskedGAN. It can also be seen that the generator loss of SM-ACGAN converged much faster and more stably than that of SSGAN and ACGAN-SG. When n = 20 and the epoch number was greater than 100, the loss variances of SM-ACGAN, ACGAN-SG, MaskedGAN, SSGAN, the CNN, and the DNN are summarized in Table 8. Evaluating the results given in Table 8, we can see that SM-ACGAN had a more stable learning process than ACGAN-SG, MaskedGAN, SSGAN, the CNN, and the DNN in situations where labeled training data were limited.
Figure 7 compares the mobile speed’s RMSEs of SM-ACGAN with those of ACGAN-SG, MaskedGAN, SSGAN, CNN, DNN, and TRAD. From the figure, we can see that SM-ACGAN had an overall lower RMSE along the n-axis compared to other machine learning methods. The RMSE of TRAD showed an average RMSE result of 9.3944 on the n-axis. It can be seen that SM-ACGAN had better mobile speed measurement performance than TRAD even in situations where labeled training data were limited. The means and variances of the RMSEs from SM-ACGAN, ACGAN-SG, MaskedGAN, SSGAN, CNN, and DNN are shown in Table 9. It can be seen that SM-ACGAN had the smallest RMSE mean and the smallest RMSE variance when compared with ACGAN-SG, MaskedGAN, SSGAN, CNN, and the DNN. This indicates that the proposed SM-ACGAN can achieve higher accuracy and greater robustness than other machine learning methods when measuring mobile speed with limited labeled training data.
The computer system utilized was Windows 11, equipped with a 3.5 GHz Intel Core i9-11900K CPU. The GPU employed for accelerating the model training was an NVIDIA GeForce RTX3090. The learning (i.e., training and validating) times of SM-ACGAN, ACGAN-SG, MaskedGAN, SSGAN, CNN, and DNN for one epoch with n = 20 were 13.5 s, 13.1 s, 10.6 s, 8.6 s, 4.9 s, and 2.4 s, respectively. It can be seen that SM-ACGAN and ACGAN-SG took more learning time than other machine learning methods. This is because SM-ACGAN and ACGAN-SG needed more time to compute the f g r a d function as shown in Equation (22). However, since only a machine learning model that has completed training is used for actual measurement tasks, the longer training time is not a big problem in the mobile speed measurement task.

6. Conclusions

We developed SM-ACGAN by combining the advantages of SSGAN, ACGAN-SG, and MaskedGAN and applying novel techniques of our design such as masking, regularization, and data augmentation. The proposed SM-ACGAN showed a lower mean error and smaller variability than other machine learning-based methods in measuring mobile speed. This suggests that appropriately combining GAN techniques developed to overcome image quality degradation caused by limited training data and enhancing them with our proposed techniques can improve GAN measurement accuracy and robustness for wireless communication systems using limited training data. The proposed SM-ACGAN can be widely utilized for network management and system operation in 5G and 6G environments, such as channel measurement, signal strength and quality measurement, terminal location measurement, network performance measurement, etc.

Author Contributions

Conceptualization, E.Y. and S.-Y.K.; methodology, E.Y.; software, E.Y.; validation, E.Y. and S.-Y.K.; formal analysis, E.Y.; investigation, E.Y.; resources, E.Y.; data curation, E.Y.; writing—original draft preparation, E.Y.; writing—review and editing, E.Y. and S.-Y.K.; visualization, E.Y.; supervision, E.Y. and S.-Y.K.; project administration, E.Y. and S.-Y.K.; funding acquisition, E.Y. and S.-Y.K. All authors have read and agreed to the published version of the manuscript.

Funding

National Research Foundation of Korea: 2021R1F1A1047578; Institute for Information and Communications Technology Planning and Evaluation: IITP-2024-RS-2023-00258639.

Data Availability Statement

Data are contained within the article.

Acknowledgments

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2021R1F1A1047578). This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2024-RS-2023-00258639) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation). This paper was written as part of Konkuk University’s research support program for its faculty on sabbatical leave in 2023.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Saad, W.; Bennis, M.; Chen, M. A vision of 6G wireless systems: Applications, trends, technologies, and open research problems. IEEE Netw. 2020, 34, 134–142. [Google Scholar] [CrossRef]
  2. Chen, M.; Challita, U.; Saad, W.; Yin, C.; Debbah, M. Artificial neural networks-based machine learning for wireless networks: A tutorial. IEEE Commun. Surv. Tutor. 2019, 21, 3039–3071. [Google Scholar] [CrossRef]
  3. Yao, F. Machine learning with limited data. arXiv 2021, arXiv:2101.11461. [Google Scholar] [CrossRef]
  4. Baddour, K.E.; Beaulieu, N.C. Nonparametric doppler spread estimation for flat fading channels. In Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC), New Orleans, LA, USA, 28 May 2003. [Google Scholar]
  5. Morocho-Cayamcela, M.E.; Lee, H.; Lim, W. Machine learning for 5G/B5G mobile and wireless communications: Potential, limitations, and future directions. IEEE Access 2019, 7, 137184–137206. [Google Scholar] [CrossRef]
  6. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 8–13 December 2014; Available online: https://arxiv.org/pdf/1406.2661 (accessed on 1 January 2021).
  7. Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
  8. Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
  9. Chang, H.; Zhang, H.; Jiang, L.; Liu, C.; Freeman, W.T. MAGVIT: Masked generative video transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
  10. Zhou, P.; Xie, L.; Ni, B.; Tian, Q. CIPS-3D++: End-to-end real-time high-resolution 3D-aware GANs for GAN inversion and stylization. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 11502–11520. [Google Scholar] [CrossRef] [PubMed]
  11. Tao, M.; Bao, B.K.; Tang, H.; Xu, C. GALIP: Generative adversarial CLIPs for text-to-image synthesis, In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023.
  12. Frid-Adar, M.; Klang, E.; Amitai, M.; Goldberger, J.; Greenspan, H. Synthetic data augmentation using GAN for improved liver lesion classification. In Proceedings of the IEEE 15th International Symposium on Biomedical Imaging (ISBI), Washington, DC, USA, 4–7 April 2018. [Google Scholar]
  13. Engel, J.; Agrawal, K.K.; Chen, S.; Gulrajani, I.; Donahue, C.; Roberts, A. GANSynth: Adversarial neural audio synthesis. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  14. Musalamadugu, T.S.; Kannan, H. Generative AI for medical imaging analysis and applications. Future Med. AI 2023, 1, 1–8. [Google Scholar]
  15. Kwon, H.; Kim, Y.; Yoon, H.; Choi, D. CAPTCHA image generation systems using generative adversarial networks. IEICE Trans. Inf. Syst. 2018, 2, 543–546. [Google Scholar] [CrossRef]
  16. Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. Training generative adversarial networks with limited data. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
  17. Lu, H.; Lu, Y.; Jiang, D.; Szabados, S.R.; Sun, S.; Yu, Y. CM-GAN: Stabilizing GAN Training with Consistency Models. In Proceedings of the ICML 2023 Workshop on Architectured Probabilistic Inference & Generative Modeling, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
  18. Fernandez, D.L. GAN Convergence and Stability: Eight Techniques Explained. ML Blog, 17 May 2022. Available online: https://davidleonfdez.github.io/gan/2022/05/17/gan-convergence-stability.html (accessed on 1 January 2023).
  19. Mangalam, K.; Garg, R. Overcoming mode collapse with adaptive multi adversarial training. In Proceedings of the British Machine Vision Conference (BMVC), Virtual, 22–25 November 2021. [Google Scholar]
  20. Liu, B.; Zhu, Y.; Song, K.; Elgammal, A. Towards faster and stabilized GAN training for high-fidelity few-shot image synthesis. In Proceedings of the 9th International Conference on Learning Representations (ICLR), Virtual Conference, 3–7 May 2021. [Google Scholar]
  21. Yang, M.; Wang, Z.; Chi, Z.; Feng, W. WaveGAN: Frequency-aware GAN for high-fidelity few-shot image generation. In Proceedings of the 9th International Conference on Learning Representations (ICLR), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
  22. Zhao, Y.; Chandrasegaran, K.; Abdollahzadeh, M.; Cheung, M.N. Few-shot image generation via adaptation-aware kernel modulation. In Proceedings of the Advances in Neural Information Processing Systems 35 (NeurIPS 2022), New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
  23. Zhao, Y.; Ding, H.; Huang, H.; Cheung, N.M. A closer look at few-shot image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
  24. Seo, J.; Kang, J.S.; Park, G.M. LFS-GAN: Lifelong few-shot image generation. In Proceedings of the International Conference on Computer Vision Conference (ICCV), Paris, France, 2–6 October 2023. [Google Scholar]
  25. Hou, L.; Cao, Q.; Yuan, Y.; Zhao, S.; Ma, C.; Pan, S.; Wan, P.; Wang, Z.; Shen, H.; Cheng, X. Augmentation-aware self-supervision for data-efficient GAN training. In Proceedings of the The 37th Annual Conference on Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
  26. Cao, S.; Yin, Y.; Huang, L.; Liu, Y.; Zhao, X.; Zhao, D.; Huang, K. Towards high-resolution image generation with efficient vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023. [Google Scholar]
  27. Saxena, D.; Cao, J.; Xu, J.; Kulshrestha, T. Re-GAN: Data-efficient GANs training via architectural reconfiguration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
  28. Huang, J.; Cui, K.; Guan, D.; Xiao, A.; Zhan, F.; Lu, S.; Liao, S.; Xing, E. Masked generative adversarial networks are data-efficient generation learners. In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 16–19 May 2022. [Google Scholar]
  29. Ni, Y.; Koniusz, P. CHAIN: Enhancing generalization in data-efficient GANs via lipsCHitz continuity constrAIned normalization. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024. [Google Scholar]
  30. Chen, T.; Cheng, Y.; Gan, Z.; Liu, J.; Wang, Z. Data-efficient GAN training beyond (just) augmentations: A lottery ticket perspective. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS), Virtual Conference, 6–14 December 2021. [Google Scholar]
  31. Liu, S.; Dou, L.; Jin, Q. Improved generative adversarial network for bearing fault diagnosis with imbalanced data. In Proceedings of the Sixth International Conference on Information Communication and Signal Processing (ICICSP), Xi’an, China, 23–25 September 2023. [Google Scholar]
  32. Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  33. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of Wasserstein GANs, In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 5–7 December 2017.
  34. Zheng, Y.; Wang, D. An auxiliary classifier generative adversarial network based fault diagnosis for analog circut. IEEE Access 2023, 11, 86824–86833. [Google Scholar] [CrossRef]
  35. Odena, A. Semi-supervised learning with generative adversarial networks. In Proceedings of the ICML Workshop on Data-Efficient Machine Learning, Marriott Marquis, NY, USA, 24 June 2016. [Google Scholar] [CrossRef]
  36. Huang, G.; Liu, Z.; Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  37. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  38. Abdi, A.; Zhang, H.; Tepedelenlioglu, C. A unified approach to performance analysis of speed estimation techniques in mobile communication. IEEE Trans. Commun. 2008, 56, 126–135. [Google Scholar] [CrossRef]
  39. Sadowski, J. Estimation of Rician K-factor values in urban terrain. In Proceedings of the 10th European Conference on Antennas and Propagation (EuCAP), Davos, Switzerland, 10–15 April 2016. [Google Scholar]
  40. Baddour, K.E.; Beaulieu, C.B. Autoregressive modeling for fading channel simulation. IEEE Trans. Wirel. Commun. 2005, 4, 1650–1662. [Google Scholar] [CrossRef]
  41. Abdi, A.; Kaveh, M. Parametric modeling and estimation of the spatial characteristics of a source with local scattering. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA, 13–17 May 2002. [Google Scholar]
  42. Zhang, H.; Abdi, A. Nonparametric mobile speed estimation in fading channels: Performance analysis and experimental results. IEEE Trans. Wirel. Commun. 2009, 8, 1683–1692. [Google Scholar] [CrossRef]
  43. Choi, Y.; Voltz, P.J.; Cassara, F.A. On channel estimation and detection for multicarrier signals in fast and selective Rayleigh fading channels. IEEE Trans. Commun. 2001, 49, 1375–1387. [Google Scholar] [CrossRef]
  44. Song, S.; Singer, A.C. Pilot-aided OFDM channel estimation in the presence of the guard band. IEEE Trans. Commun. 2007, 55, 1459–1465. [Google Scholar] [CrossRef]
  45. Ramos, A.R.; Silva, B.C.; Lourenco, M.S.; Teixeira, E.B.; Velez, F.J. Mapping between average SINR and supported throughput in 5G new radio small cell networks. In Proceedings of the 22nd International Symposium on Wireless Personal Multimedia Communications (WPMC), Lisbon, Portugal, 24–27 November 2019. [Google Scholar]
  46. Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspective. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed]
  47. Geron, A. Hands-On Machine Learning With Scikit-Learn and Tensorflow: Concepts, Tools, and Techniques to Build Intelligent Systems, 1st ed.; O’Reilly: Sebastopol, CA, USA, 2017. [Google Scholar]
  48. Yoon, E.; Kim, J.; Yun, U. Doppler spread estimation for an OFDM system with a rayleigh fading channel. IEICE Trans. Commun. 2018, E101-B, 1328–1335. [Google Scholar] [CrossRef]
  49. LTE; Evolved Universal Terrestrial Radio Access (E-UTRA) and Evolved Packet Core (EPC); ETSI TS 136 508 V14.2.0 (2017-07). Available online: https://www.etsi.org/deliver/etsi_ts/136500_136599/136508/14.02.00_60/ts_136508v140200p.pdf (accessed on 10 June 2020).
  50. LTE Release 18 Description; TR 21.918 V1.0.0 (2024-07). Available online: https://www.3gpp.org/ftp/Specs/archive/21_series/21.918 (accessed on 10 August 2024).
  51. Lipton, Z.C.; Elkan, C.; Narayanaswamy, B. Thresholding classifiers to maximize F1 score. arXiv 2014, arXiv:1402.1892. [Google Scholar]
  52. Farhadpour, S.; Warner, T.A.; Maxwell, A.E. Selecting and interpreting multiclass loss and accuracy assessment metrics for classifications with class imbalance: Guidance and best practices. Remote Sens. 2024, 16, 533. [Google Scholar] [CrossRef]
  53. Odena, A.; Olah, C.; Shlens, J. Conditional image synthesis with auxiliary classifier GANs. In Proceedings of the 34 th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017. [Google Scholar]
  54. Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  55. Mao, Q.; Lee, H.Y.; Tseng, H.Y.; Ma, S.; Yang, M.H. Mode seeking generative adversarial networks for diverse image synthesis. In Proceedings of the IEEE CVF Conference on Computewr Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
  56. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  57. Zhang, Z. Improved Adam optimizer for deep neural networks. In Proceedings of the IEEE/ACM 26th International Symposium On Quality Of Service (IWQoS), Banff, AB, Canada, 4–6 June 2018. [Google Scholar]
Figure 1. The difference between (a) the SM-ACGAN architecture and (b) the ACGAN-SG architecture.
Figure 1. The difference between (a) the SM-ACGAN architecture and (b) the ACGAN-SG architecture.
Electronics 13 04896 g001
Figure 2. The processes of computing (a) the loss function of the SM-ACGAN discriminator and (b) the loss function of the SM-ACGAN generator.
Figure 2. The processes of computing (a) the loss function of the SM-ACGAN discriminator and (b) the loss function of the SM-ACGAN generator.
Electronics 13 04896 g002
Figure 3. The training and validating processes of SM-ACGAN.
Figure 3. The training and validating processes of SM-ACGAN.
Electronics 13 04896 g003
Figure 4. Comparison of the losses of the discriminator and generator in four SM-ACGAN cases when n = 20 .
Figure 4. Comparison of the losses of the discriminator and generator in four SM-ACGAN cases when n = 20 .
Electronics 13 04896 g004
Figure 5. Comparison of the mobile speed’s RMSEs of the discriminator and generator in four SM-ACGAN cases.
Figure 5. Comparison of the mobile speed’s RMSEs of the discriminator and generator in four SM-ACGAN cases.
Electronics 13 04896 g005
Figure 6. Comparison of the losses of the proposed SM-ACGAN with those of ACGAN-SG, MaskedGAN, SSGAN, the CNN, and the DNN when n = 20 .
Figure 6. Comparison of the losses of the proposed SM-ACGAN with those of ACGAN-SG, MaskedGAN, SSGAN, the CNN, and the DNN when n = 20 .
Electronics 13 04896 g006
Figure 7. Comparison of the mobile speed’s RMSEs of the proposed SM-ACGAN with those of ACGAN-SG, MaskedGAN, SSGAN, CNN, DNN, and TRAD.
Figure 7. Comparison of the mobile speed’s RMSEs of the proposed SM-ACGAN with those of ACGAN-SG, MaskedGAN, SSGAN, CNN, DNN, and TRAD.
Electronics 13 04896 g007
Table 1. Six types of layer blocks.
Table 1. Six types of layer blocks.
Type-A Block ( N in , N out )
Conv2d i n _ c h a n n e l s = N in , o u t _ c h a n n e l s = N out ,
k e r n e l = 4 , s t r i d e = 2 , p a d d i n g = 1
SpectralNorm
LeakyReLU n e g a t i v e _ s l o p e = 0.2 , i n p l a c e = T r u e
Dropout p = 0.1
Type-B Block ( N in , N out )
Conv2d i n _ c h a n n e l s = N in , o u t _ c h a n n e l s = N out ,
k e r n e l = 4 , s t r i d e = 2 , p a d d i n g = 1
SpectralNorm
InstanceNorm2
LeakyReLU n e g a t i v e _ s l o p e = 0.2 , i n p l a c e = T r u e
Dropout p = 0.1
Type-C Block ( N in , N out )
ConvTranspose2d i n _ c h a n n e l s = N in , o u t _ c h a n n e l s = N out ,
k e r n e l = 4 , s t r i d e = 2 , p a d d i n g = 0
SpectralNorm
InstanceNorm2
ReLU i n p l a c e = T r u e
Dropout p = 0.1
Type-D Block ( N in , N out )
ConvTranspose2d i n _ c h a n n e l s = N in , o u t _ c h a n n e l s = N out ,
k e r n e l = 4 , s t r i d e = 2 , p a d d i n g = 1
SpectralNorm
InstanceNorm2
ReLU i n p l a c e = T r u e
Dropout p = 0.1
Type-E Block ( N in , N out )
ConvTranspose2d i n _ c h a n n e l s = N in , o u t _ c h a n n e l s = N out ,
k e r n e l = 4 , s t r i d e = 2 , p a d d i n g = 1
Tanh
Type-F Block ( N in , N out )
Linear i n _ f e a t u r e s = N in , o u t _ f e a t u r e s = N out ,
BatchNorm1d
ReLU i n p l a c e = T r u e
Dropout p = 0.1
Table 2. The discriminator architectures of SM-ACGAN and MaskedGAN.
Table 2. The discriminator architectures of SM-ACGAN and MaskedGAN.
Component NameInput SizeOutput Size
Input ImageNot Available [ N Batch , 1 , 64 , 64 ]
Type-A block (1, 64) [ N Batch , 1 , 64 , 64 ] [ N Batch , 64 , 32 , 32 ]
Type-B block (64, 128) [ N Batch , 64 , 32 , 32 ] [ N Batch , 128 , 16 , 16 ]
Type-B block (128, 256) [ N Batch , 128 , 16 , 16 ] [ N Batch , 256 , 8 , 8 ]
Type-B block (256, 512) [ N Batch , 256 , 8 , 8 ] [ N Batch , 512 , 4 , 4 ]
Type-B block (512, 1024) [ N Batch , 512 , 4 , 4 ] [ N Batch , 1024 , 2 , 2 ]
Reshape operator [ N Batch , 1024 , 2 , 2 ] [ [ N Batch , 4096 ] ]
Linear layer (4096, 1) [ [ N Batch , 4096 ] ] [ N Batch , 1 ]
Sigmoid [ N Batch , 1 ] [ N Batch , 1 ]
Mask layer [ N Batch , 1 ] [ N Batch , 1 ] Validity flag
Linear Layer (4096, 100) [ [ N Batch , 4096 ] ] [ N Batch , 100 ] Label class
Table 3. The generator architecture of SM-ACGAN.
Table 3. The generator architecture of SM-ACGAN.
Component NameInput SizeOutput Size
Noise generator [ N Batch , 100 , 1 , 1 ]
Type-C block (100, 64) [ N Batch , 100 , 1 , 1 ] [ [ N Batch , 64 , 4 , 4 ] ]
Label one-hot Encoder [ N Batch , 1 ] [ N Batch , 100 , 1 , 1 ]
Label expander [ N Batch , 100 , 1 , 1 ] [ [ [ N Batch , 100 , 4 , 4 ] ] ]
Tensor concatenator [ [ N Batch , 64 , 4 , 4 ] ]
[ [ [ N Batch , 100 , 4 , 4 ] ] ] [ N Batch , 164 , 4 , 4 ]
Type-D block (164, 192) [ N Batch , 164 , 4 , 4 ] [ N Batch , 192 , 8 , 8 ]
Type-D block (192, 128) [ N Batch , 192 , 8 , 8 ] [ N Batch , 128 , 16 , 16 ]
Type-D block (128, 64) [ N Batch , 128 , 16 , 16 ] [ N Batch , 64 , 32 , 32 ]
Type-E block (64, 1) [ N Batch , 64 , 32 , 32 ] [ N Batch , 1 , 64 , 64 ] Fake image
Table 4. The discriminator architecture of the DNN.
Table 4. The discriminator architecture of the DNN.
Component NameInput SizeOutput Size
Input imageNot Available [ N Batch , 1 , 64 , 64 ]
Image flattener [ N Batch , 1 , 64 , 64 ] [ N Batch , 4096 ]
Type-F block (4096, 4096) [ N Batch , 4096 ] [ N Batch , 4096 ]
Type-F block (4096, 2048) [ N Batch , 4096 ] [ N Batch , 2048 ]
Type-F block (2048, 1024) [ N Batch , 2048 ] [ N Batch , 1024 ]
Type-F block (1024, 1024) [ N Batch , 1024 ] [ [ N Batch , 1024 ] ]
Linear Layer (1024, 100) [ [ N Batch , 1024 ] ] [ N Batch , 100 ] Label class
Table 5. The discriminator architectures of the CNN, SSGAN, and ACGAN-SG.
Table 5. The discriminator architectures of the CNN, SSGAN, and ACGAN-SG.
Component NameInput SizeOutput Size
Input imageNot Available [ N Batch , 1 , 64 , 64 ]
Type-A block (1, 64) [ N Batch , 1 , 64 , 64 ] [ N Batch , 64 , 32 , 32 ]
Type-B block (64, 128) [ N Batch , 64 , 32 , 32 ] [ N Batch , 128 , 16 , 16 ]
Type-B block (128, 256) [ N Batch , 128 , 16 , 16 ] [ N Batch , 256 , 8 , 8 ]
Type-B block (256, 512) [ N Batch , 256 , 8 , 8 ] [ N Batch , 512 , 4 , 4 ]
Type-B block (512, 1024) [ N Batch , 512 , 4 , 4 ] [ N Batch , 1024 , 2 , 2 ]
Reshape operator [ N Batch , 1024 , 2 , 2 ] [ [ N Batch , 4096 ] ]
Linear layer (4096, 1) [ [ N Batch , 4096 ] ] [ N Batch , 1 ]
Sigmoid [ N Batch , 1 ] [ N Batch , 1 ] Validity flag
Linear layer (4096, 100) [ [ N Batch , 4096 ] ] [ N Batch , 100 ] Label class
Table 6. The generator architectures of SSGAN and ACGAN-SG.
Table 6. The generator architectures of SSGAN and ACGAN-SG.
Component NameInput SizeOutput Size
Noise generator [ N Batch , 100 , 1 , 1 ]
Type-C block (100, 64) [ N Batch , 100 , 1 , 1 ] [ [ N Batch , 64 , 4 , 4 ] ]
Label one-hot encoder [ N Batch , 1 ] [ N Batch , 100 , 1 , 1 ]
Label expander [ N Batch , 100 , 1 , 1 ] [ [ [ N Batch , 100 , 4 , 4 ] ] ]
Tensor concatenator [ [ N Batch , 64 , 4 , 4 ] ]
[ [ [ N Batch , 100 , 4 , 4 ] ] ] [ N Batch , 164 , 4 , 4 ]
Type-D block (164, 192) [ N Batch , 164 , 4 , 4 ] [ N Batch , 192 , 8 , 8 ]
Type-D block (192, 128) [ N Batch , 192 , 8 , 8 ] [ N Batch , 128 , 16 , 16 ]
Type-D block (128, 64) [ N Batch , 128 , 16 , 16 ] [ N Batch , 64 , 32 , 32 ]
Type-E block (64, 1) [ N Batch , 64 , 32 , 32 ] [ N Batch , 1 , 64 , 64 ] Fake image
Table 7. The generator architecture of MaskedGAN.
Table 7. The generator architecture of MaskedGAN.
Component NameInput SizeOutput Size
Noise generator [ N Batch , 100 , 1 , 1 ]
Type-C block (100, 64) [ N Batch , 100 , 1 , 1 ] [ [ N Batch , 64 , 4 , 4 ] ]
Label one-hot encoder [ N Batch , 1 ] [ N Batch , 100 , 1 , 1 ]
Label expander [ N Batch , 100 , 1 , 1 ] [ [ [ N Batch , 100 , 4 , 4 ] ] ]
Tensor concatenator [ [ N Batch , 64 , 4 , 4 ] ]
[ [ [ N Batch , 100 , 4 , 4 ] ] ] [ N Batch , 164 , 4 , 4 ]
Type-D block (164, 192) [ N Batch , 164 , 4 , 4 ] [ N Batch , 192 , 8 , 8 ]
Type-D block (192, 128) [ N Batch , 192 , 8 , 8 ] [ N Batch , 128 , 16 , 16 ]
Type-D block (128, 64) [ N Batch , 128 , 16 , 16 ] [ N Batch , 64 , 32 , 32 ]
Type-E block (64, 1) [ N Batch , 64 , 32 , 32 ] [ N Batch , 1 , 64 , 64 ]
Mask layer [ N Batch , 1 , 64 , 64 ] [ N Batch , 1 , 64 , 64 ] Fake image
Table 8. Comparison of the loss variances of SM-ACGAN, ACGAN-SG, MaskedGAN, SSGAN, the CNN, and the DNN when n = 20 and the epoch number was greater than 100.
Table 8. Comparison of the loss variances of SM-ACGAN, ACGAN-SG, MaskedGAN, SSGAN, the CNN, and the DNN when n = 20 and the epoch number was greater than 100.
SM-ACGANACGAN-SGMaskedGANSSGANCNNDNN
Variance of D losses0.00430.00700.30740.00240.08792.0230
Variance of G losses0.00130.01750.00230.0974      -      -
Table 9. Comparison of the means and variances of the mobile speed’s RMSEs from SM-ACGAN, ACGAN-SG, MaskedGAN, SSGAN, CNN, and DNN.
Table 9. Comparison of the means and variances of the mobile speed’s RMSEs from SM-ACGAN, ACGAN-SG, MaskedGAN, SSGAN, CNN, and DNN.
SM-ACGANACGAN-SGMaskedGANSSGANCNNDNN
Mean of RMSEs8.63319.04458.98158.89999.508679.5434
Variance of RMSEs0.04300.08130.11620.10340.04920.0614
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yoon, E.; Kim, S.-Y. Method of Mobile Speed Measurement Using Semi-Supervised Masked Auxiliary Classifier Generative Adversarial Networks. Electronics 2024, 13, 4896. https://doi.org/10.3390/electronics13244896

AMA Style

Yoon E, Kim S-Y. Method of Mobile Speed Measurement Using Semi-Supervised Masked Auxiliary Classifier Generative Adversarial Networks. Electronics. 2024; 13(24):4896. https://doi.org/10.3390/electronics13244896

Chicago/Turabian Style

Yoon, Eunchul, and Sun-Yong Kim. 2024. "Method of Mobile Speed Measurement Using Semi-Supervised Masked Auxiliary Classifier Generative Adversarial Networks" Electronics 13, no. 24: 4896. https://doi.org/10.3390/electronics13244896

APA Style

Yoon, E., & Kim, S.-Y. (2024). Method of Mobile Speed Measurement Using Semi-Supervised Masked Auxiliary Classifier Generative Adversarial Networks. Electronics, 13(24), 4896. https://doi.org/10.3390/electronics13244896

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop