1. Introduction
Implementation of QKD networks relies on classical wired fiber channels, which makes it challenging to directly distribute quantum keys to mobile devices and has limitations in terms of distance, cost, and reliability [
1]. Therefore, QKD over wireless channels is of great significance and value, as it can overcome these challenges and limitations. Moreover, QKD over wireless channels can protect wireless communication from future quantum attacks, as conventional encryption algorithms might be vulnerable to quantum computers. QKD over wireless channels can also enable secure communication between mobile users, enhancing the flexibility and coverage of the network [
2]. Hence, researching and developing QKD over wireless channels is very necessary and urgent.
In wireless communication, spoofing and eavesdropping are two common methods of attack. QKD itself can effectively prevent eavesdropping [
3], but spoofing attacks pose a security risk to the wireless access of QKD networks. It is necessary to seek new security solutions. PLA is an effective authentication scheme against spoofing attacks, which is characterized by its speed and lightness, making it suitable for wireless communication. Recently, Physical Layer Authentication (PLA) has been commonly combined with machine learning (ML) algorithms for device authentication. PLA can be categorized into various research directions based on the sources of authentication evidence [
4], among which the schemes based on hardware components characteristics and the schemes based on channel characteristics are the most representative approaches. Hardware-based methods use hardware features of devices to verify UE’s identity [
5]. They are costly and sensitive to hardware variations. Channel-based methods use features such as Channel Impulse Response (CIR), Carrier Frequency Offset (CFO), or Channel State Information (CSI) to authenticate devices [
6], but they depend on stable and distinguishable channel features.
MU-MIMO greatly improves spectrum efficiency, and various techniques such as spatial multiplexing and beamforming are implemented in MU-MIMO. Spatial multiplexing allows multiple data streams, called layers, to be transmitted over the same time-frequency resource. In multi-user scenarios, different UE can transmit over different layers, and Base Stations (BS) reuse the data streams and assign layers to ports according to pre-coding matrix [
7]. This changes the CSI at the BS’s receiving antennas, which affects the traditional authentication schemes. However, it also enriches the features for the proposed FPLA mechanism. FPLA is a low-complexity and low-overhead solution that can easily handle different CSI dimensions, and achieve high authentication accuracy in MU-MIMO systems.
In this paper, a Flexible Physical Layer Authentication (FPLA) mechanism for wireless access of QKD networks is proposed against spoofing attacks, enabling authentication under different antenna diversity conditions. The mechanism can flexibly adapt to different antenna diversity and is also robust to antenna layer heterogeneity caused by changes in the number of UEs. It employs CSI fed back in the Up-Link (UL) of Time Division Duplex (TDD) systems to train authentication classifier. The motivation is that the reported CSI information from UEs exhibits unique characteristics due to factors such as multipath effects and delay distortions, which vary with the UE’s location. As CSI is a physical-layer attribute, it is inherently difficult to tamper with or forge, thereby offering robust security properties. In 5G TDD systems, the same frequency band is used for both UL and Down-Link (DL) transmissions. The wireless radio channel exhibits reciprocity, allowing the determination of the DL beam based on the SRS transmitted by UEs. Consequently, investigating device authentication based on SRS holds significant engineering practicality. The contributions of this paper are as follows:
A Dimensional Transformation Residual Network (DTRN) model and a dimensional transformation block (D-block) are proposed to prevent loss of detection accuracy caused by the variation in CSI dimensions;
A DTRN-based FPLA mechanism is proposed, which dynamically adapts to the antenna diversity in 5G MU-MIMO systems and conforms to QKD network wireless access;
Authentication performance was evaluated under a time-varying CDL channel model in the 5G FR1 n78 band (3.5 GHz), which aligns with the frequency allocation of China Telecom’s 5G QKD network. The impact of antenna diversity, multiple user access, and various SNR on the authentication performance is evaluated.
The rest of this paper is organized as follows:
Section 2 provides an overview of the related work;
Section 3 introduces the system model;
Section 4 presents the framework and workflow of FPLA;
Section 5 describes the structure and implementation method of DTRN.
Section 6 presents the simulation results, and
Section 7 concludes this paper.
3. System Model
The system model proposed in this paper is developed based on the actual scenario requirements of QKD networks of China Telecom. As shown in
Figure 1, the base station acts as the legitimate communication receiver (Alice) and implements the embedded FPLA mechanism to authenticate the UE attempting to access the network. The legitimate quantum UE (Bob) and an unspecified number of malicious UEs (Eve), as well as an unspecified number of regular UEs, attempt to connect to the base station.
Assuming that Alice and Bob are in fixed positions, although the locations of malicious UEs vary randomly, the distance between Eve and Bob is always greater than half a wavelength to ensure wireless physical layer independence. The system model proposed in this paper operates in a TDD scenario. Based on the SRSs transmitted by users, Alice estimates the channel conditions for each user. These channel estimates are then utilized to authenticate users and calculate the parameters of Physical Down-link Shared Channel (PDSCH) for legitimate users. This transmission method of this TDD system is commonly referred to as reciprocity-based transmission.
Assuming that Alice has antenna ports and the maximum limit of antenna ports for UEs is , which is randomly generated in (Assume , and ). (The randomness is due to the inherent characteristics of different UEs, as they may not all support the same number of antennas. To avoid confusion, in this paper, CSI refers specifically to the Channel State Information obtained from channel estimation based on SRS transmitted by UEs in reciprocity-based transmission, and it records the channel estimation and noise. Alice will require the UE to use the same ports as used for transmitting SRS in subsequent transmission processes.
The proposed scheme in this paper is applied during the UL transmission process of the UE. Alice obtains the channel estimation from the CSI transmitted by the UE i as training samples or authentication basis. The mathematical description of the system model is given below.
First, according to the model of a forward link for a multi-user system [
27], the transmission model can be simply expressed as:
where
represents the received signal of Alice,
;
denotes the power level of SRS transmitted by the UE
i;
is the channel response matrix,
;
represents the signal transmitted from the antenna ports
,
; and
denotes the Additive White Gaussian Noise (AWGN),
, assumed to be independently and identically distributed with a covariance matrix
. In addition,
represents number of resource blocks and
represents number of Alice’s antennas.
Let vector
represents the number of ports of all UEs which are in the system at time
t, it can be expressed as:
where
denotes the number of antenna ports for device
i. It is important to note that the ZF (Zero Forcing) pre-coding method is employed in this model, thus the number of UE’s SRS ports is equal to the number of its physical antenna ports. The diversity in the number of antennas arises due to the inherent variations among different UEs. Based on the scenario assumptions, NPi is subject to the following constraints:
The multi-port reception by Alice can lead to inter-port interference during the UE’s transmission, thus requiring the UE to perform pre-coding during the UL transmission process. In Equation (
1), for the sake of simplicity, the pre-coding matrix is included in
. To improve computational efficiency, it is assumed that Alice employs a linear receiver, and the inter-port interference is mitigated by the Zero Forcing (ZF) receiver through channel estimation [
28]. The method is based on:
where
is the weight matrix of Alice’s linear receiver, which is used to calculate the receiver output by weighting the signal
, and
.
is the channel response matrix obtained by Alice through channel estimation based on SRS signal values. The notation
denotes the Hermitian transpose of the complex matrix
. By utilizing Equations (
1) and (
4), the output
of the ZF receiver can be calculated as:
Alice can continuously collect channel information reported by each UE, enabling reciprocity-based TDD transmission and device authentication. Let
represent the feature vector composed of the features collected by Alice from different UE. Then,
can be represented as:
According to the assumption, the system consists of one legitimate UE and
malicious UEs.
of UE
i can be expressed as:
Vector
extracted by Alice is stored as three-dimensional complex vector in this model, and
. After the training phase, Alice continues to collect feature vectors, which are then fed into the trained DTRN for authentication to determine whether a UE is legitimate. If the UE is deemed legitimate, Alice grants network access, otherwise, the network directly rejects the user’s access. In addition, physical environment between Alice and Bob remains stable, although there are location differences between Bob and Eve. Therefore, the presence of multiple antennas and multipath fading results in Bob’s physical layer features containing specific information that exhibits independence [
29,
30], which can serve as authentication evidence.
5. Dimensional Transformation Residual Network (DTRN)
Before introducing the structure of DTRN, it is necessary to analyze the SRSs and feature vectors extracted from them. These vectors serve as the input vectors for the DTRN network and are crucial for understanding the network’s operation principles. First, the SRS signal, in NR standard, the maximum number of SRS layers supported is 4, and it is mapped to the antennas through spatial filters. In terms of resource grid configuration, the frequency-domain structure of SRS exhibits a comb structure. In sequence design, the base sequence of SRS is based on the extended Zadoff–Chu sequence [
31]. The base sequence is used to generate the SRS signal and this generation process occurs continuously during the network access procedure. For multi-port SRS, NR standard specifies the use of cyclic shift to process the SRS signal. Cyclic shift in the time domain is equivalent to the phase rotation in the frequency domain. In this way, the different logic ports can be effectively distinguished.
Table 1 shows the phase rotation method for SRS at
.
Based on the characteristics of the extended Zadoff–Chu sequence, continuous cyclic rotation of the phase does not change the orthogonality between two signals. However, in the time domain, it introduces cyclic fluctuations in the amplitude [
32]. These fluctuations do not have a substantial impact on the processing of the sequence. The sequence still maintains perfect autocorrelation properties in frequency domain.
In addition, FPLA can operate policy adjustments based on the number of elements in the feature vector, denoted as E. FPLA works in the 5G n78 frequency band, with a supported channel bandwidth of 20 MHz and a Sub-Carrier Spacing (SCS) of 15 kHz. According to the NR standard, the maximum number of resource blocks
supported by the above configuration is 106. The number of elements
of the feature
extracted from UE
i can be obtained from Equation (
7) as:
Therefore, when the band allocation is completed, the size of depends solely on and . The diversity of UEs in the MU-MIMO system leads to the uncertainty of Ei, requiring Alice to handle it in a special manner to accommodate the DTRN network.
Based on Equations (
2), (
3) and (
8), once the number of antenna ports
is determined at Alice’s end, the feature dimension extracted from UEs will exhibit three distinct forms. The third dimension is determined by the inherent attributes of the UEs and takes values from the set
. In other words, vector
can assume three forms:
.
Furthermore, the three forms of features exhibit significant differences in the number of elements, with a ratio of 1:2:4.
Figure 3 illustrates these modulus graphs, intuitively demonstrating the differences in shapes of the vectors.
A common approach to handle feature vectors is to establish CSI image [
24]. However, this method typically maps channel feature vectors with fixed shapes into a format suitable for CNN inputs. It is evident that this approach has limitations when dealing with feature vectors with different dimensions extracted from MU-MIMO systems. Typical CNN architectures such as LeNet, AlexNet, VGG, ResNet, etc., usually expect inputs in the form of grayscale images (the shape of each image is
) or RGB images (the shape of each image is
). In order to adapt to the network input, this method maps the feature vectors from the complex domain to the real domain. The real and imaginary parts are treated as two color channels, whereas the other two dimensions
correspond to the number of resource blocks and antenna ports, respectively. As a result of this processing, the shape of the CSI image becomes
, which satisfies the input requirements of the CNN.
However, this approach of handling feature vectors cannot adapt well to MU-MIMO systems due to the following reasons. According to Equation (
8), the variation in the number of device antennas results in a significant change in
, which in turn leads to a substantial change in the shape of CSI image. The output size of the convolutional layer may change with the input size. As a result, the output of the convolutional layer (or pooling layer) cannot be matched with the input of the fully connected layer, rendering the network unable to function properly. It can be seen that when the number of antennas in the system is indefinite, the CSI IMAGE method cannot flexibly handle such variations. Moreover, modifying the network parameters of the CNN itself to adapt to different inputs would significantly increase the system overhead, which goes against the original intention of physical layer authentication. The proposed DTRN network in this section aims to address the issue of abrupt changes in the shape of feature vectors.
D-block is proposed to be incorporated into the DTRN network for handling shape-variable input vectors
at the input. As shown in
Figure 4, it initially processes the feature vector
using a Transformation layer and a ports-mapping layer. The processed vector retains both structural characteristics and all data elements, while also meeting the parameter requirements of the subsequent network. The Transformation layer first records information from each dimension, expanding the complex dimension of the three-dimensional complex vector and the two antenna dimensions. It then determines the size of
to decide whether to augment the insufficient data. Data augmentation involves replicating the channel estimation of existing ports, adding noise, and filling in the gaps.
is fixed to a real vector with a shape of after the Transformation layer. Based on the dimensional information recorded earlier, the ports-mapping layer then maps this Real vector to the input channels of the first convolutional layer based on the antennas. The Real vector is mapped to a 4-channel convolutional layer input vector, and a convolutional layer is then used to extract the structural features of the input vector. Following the convolutional layer, the D-block adds a batch normalization layer (BatchNorm layer) and a max pooling layer (Maxpool layer), which serve to prevent overfitting and aggregate global information for subsequent processing of the network.
D-block utilizes spatial invariance to preserve structural features, while employing a Transformation layer to modify the dimensions to meet the requirements of subsequent networks. Let represent the input vector of a convolutional layer, where . This can be understood as a three-dimensional vector composed of resource grids, antennas, and SRS ports. The original feature vector ’s complex dimensions are unfolded into the antenna dimension of , and the third dimension of can be abstractly regarded as a multi-port representation of time-frequency resources. There are similarities between the Ports mapping layer and the SRS logical port generation process.
In addition, the D-block enables FPLA to have excellent scalability, as it only requires generic modifications to the Transformation layer within the D-block. This allows a single deep learning network to be utilized for authentication schemes across various channel bandwidths and network standards.
The structure of DTRN is shown in
Figure 5. D-block, along with four R-blocks and one Output-block, collectively forms the main structure of the DTRN network. The architectural design incorporates the concept of residual connections, which was proposed by He et al. [
33]. Residual connections effectively address issues such as gradient vanishing in deep neural networks, and they are employed in the R-Block.
There are two types of R-block structures in DTRN. The first one (R-block-1) does not contain
convolutional layer, and the other one (R-block-2, R-block-3, R-block-4) contains
convolutional layer, whose function is to adjust the number of channels, ensuring that the vector of residual connection bypass and the transformed vector from the C-block belong to the same vector space. The output-block contains a global convergence layer, a fully connected layer and a Softmax layer. Their roles are to collect information, implement classification, and output mapping, respectively. Ultimately, the DTRN outputs a probability distribution that represents the legitimacy of the UE, determining whether the UE corresponding to the input vector is legitimate. The four R-blocks in the DTRN network utilize convolutional layers with different parameters, which are listed in
Table 2.