Next Article in Journal
Physical-Layer Security Enhancement for UAV Downlink Communication Using Joint Precoding and Artificial Noise Design in Generalized Spatial Directional Modulation
Previous Article in Journal
A Robust Tool for 3D Rail Mapping Using UAV Data Photogrammetry, AI and CV: qAicedrone-Rail
Previous Article in Special Issue
Virtual Force-Based Swarm Trajectory Design for Unmanned Aerial Vehicle-Assisted Data Collection Internet of Things Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

Secure Transmission for RIS-Assisted Downlink Hybrid FSO/RF SAGIN: Sum Secrecy Rate Maximization

1
School of Electronics and Information, Northwestern Poltechnical University, Xi’an 710129, China
2
College of Information Science and Engineering, Jiaxing University, Jiaxing 314001, China
*
Author to whom correspondence should be addressed.
Drones 2025, 9(3), 198; https://doi.org/10.3390/drones9030198
Submission received: 29 December 2024 / Revised: 19 February 2025 / Accepted: 6 March 2025 / Published: 10 March 2025
(This article belongs to the Special Issue Advances in UAV Networks Towards 6G)

Abstract

:
This paper proposes a novel reconfigurable intelligent surface (RIS)-assisted downlink hybrid free-space optics (FSO)/radio frequency (RF) space–air–ground integrated network (SAGIN) architecture, where the high altitude platform (HAP) converts the optical signal sent by the satellite into an electrical signal through optoelectronic conversion. The drone equipped with RIS dynamically adjusts the signal path to serve ground users, thereby addressing communication challenges caused by RF link blockages from clouds or buildings. To improve the security performance of SAGIN, this paper maximizes the sum secrecy rate (SSR) by optimizing the power allocation, RIS phase shift, and drone trajectory. Then, an alternating iterative framework is proposed for a joint solution using the simulated annealing algorithm, semi-definite programming, and the designed deep deterministic policy gradient (DDPG) algorithm. The simulation results show that the proposed scheme can significantly enhance security performance. Specifically, compared with the NOMA and SDMA schemes, the SSR of the proposed scheme is increased by 39.7% and 286.7%, respectively.

1. Introduction

1.1. Background

As the demand for high-bandwidth and high-speed communication continues to grow, space–air–ground integrated networks (SAGIN) have emerged as a new communication architecture that addresses the limitations of traditional terrestrial networks, such as coverage gaps, capacity bottlenecks, and communication challenges in remote areas [1]. SAGIN is a multi-level network architecture that integrates satellites, air platforms, and ground equipment to provide wider, more efficient, and more reliable communication coverage through multi-platform collaboration. The high-altitude platform (HAP) plays a crucial role in SAGIN by efficiently connecting satellites with ground users, enhancing network coverage, transmission efficiency, and communication robustness [2,3]. However, signal degradation and eavesdropping affect the communication links between HAP and ground nodes, resulting in unstable communication and significantly increasing the risk of information leakage.
To address these challenges, the hybrid free-space optics (FSO)/radio frequency (RF) mode has attracted widespread attention [4]. FSO links utilize the high bandwidth characteristics of optical signals to provide high-speed and secure communications, while RF links ensure stable transmission under harsh weather conditions [5]. By employing FSO links for data transmission between satellite and HAP, and using RF links to enhance communication reliability between HAP and ground users, the hybrid FSO/RF mode not only improves communication quality and reliability but also enhances security to some extent, reducing the risk of information leakage, especially in complex environments and under potential eavesdropping threats.

1.2. Related Works

The FSO/RF-based satellite-to-ground network combines the advantages of FSO and RF communications and can provide high data rates and strong anti-interference capabilities [6,7]. The research on HAP-based hybrid FSO/RF SAGIN has made significant progress. In [8], the authors verified the advantages of HAP-assisted hybrid RF/FSO satellite-to-ground multicast network in energy efficiency by deriving the expressions of outage probability and bit error rate. For the HAP-assisted satellite-to-ground network, the authors proposed a strategy to dynamically select FSO and RF links according to channel quality to reduce the impact of channel fading and improve communication reliability [9]. In addition, the author also analyzed the impact of weather attenuation, spatial loss, and beam divergence on FSO links in the uplink and downlink networks. However, the RF link between HAP and ground users is susceptible to factors such as multipath fading, cloud, and terrain, which seriously restricts communication quality and overall communication performance. Therefore, effective improvement of the communication quality between HAPs and ground users has become a key challenge for improving the performance of SAGIN.
To address the fading issue of RF links, reconfigurable intelligent surface (RIS), which has the advantages of high coverage and flexible deployment, is introduced into SAGIN as an emerging technology. RIS can adjust the phase of reflective elements to reconstruct the propagation path of signals, enhancing communication quality and effectively suppressing noise interference [10]. Initially, RIS is placed on building surfaces to enable the hybrid RF/FSO-assisted uplink and downlink communication transmission [11]. However, fixed-position deployment may limit RIS from obtaining the optimal LoS path and achieving wide-area coverage. Gradually, RIS-integrated drones have received more and more attention. Drones carrying RIS can dynamically adjust communication links to provide better signal coverage and communication performance in complex terrain and dynamic environments [12,13]. Leveraging the advantages of RIS, the authors derived closed-form expressions for the end-to-end signal-to-noise ratio to analyze the performance of RIS-assisted uplink FSO/RF space–ground communication under fixed relay gain and different decoding schemes. Additionally, the impacts of the number of RIS elements and imperfect channel state information were considered [14]. Moreover, existing studies show that integrating RIS with HAP-assisted hybrid FSO/RF SAGIN can improve resource utilization and communication reliability, and deep reinforcement learning algorithms can be used to optimize RIS phase shifts and transmit beams, thereby significantly improving communication energy efficiency [15,16].
With the increasing demand for multi-user access, the resource allocation problem in SAGIN has become increasingly complex. Traditional multiple access technologies, such as NOMA and SDMA, can effectively improve the overall communication performance of the system, but they ignore the security issues in the communication process [17,18]. Rate splitting multiple access (RSMA), a multiple access technology based on partial decoding, effectively improves spectrum efficiency through message splitting and is particularly suitable for scenarios with high interference or large-scale communications. Physical Layer Security (PLS) leverages the physical properties of the channel to ensure that legitimate receivers can accurately decode confidential information while effectively suppressing eavesdropping by unauthorized users, thereby enhancing system security and data confidentiality [19]. However, existing research mainly focuses on improving the communication capacity and energy efficiency of hybrid FSO/RF SAGIN, with limited studies addressing the security enhancement in multi-user communications, especially in scenarios involving large-scale access and complex security requirements. Therefore, while ensuring high-speed communication in SAGIN, combining hybrid FSO/RF with PLS technology to optimize resource allocation and enhance security performance has become a critical issue that needs to be addressed.

1.3. Motivations and Contributions

Although existing research has made some progress in hybrid FSO/RF-based SAGIN, it still brings new challenges to performance improvement. First, the application of RIS-assisted hybrid FSO/RF and RSMA strategies in SAGIN is limited, which overlooks the signal attenuation problem in long-distance communication and restricts the optimization potential of SAGIN in high-capacity and multi-user scenarios. Second, current studies mainly focus on the performance analysis of SAGIN’s capacity and energy efficiency [15,16,17,18], but there is a blank in improving the security of multi-user communication in SAGIN, especially in large-scale access and complex dynamic environments, the confidentiality of communication still faces challenges. In addition, traditional optimization algorithms exhibit limitations in dynamic SAGIN environments, making it difficult to adapt to time-varying network states and large-scale access demands. Deep learning algorithms should be employed to achieve real-time optimization based on dynamic environmental changes [20,21,22,23,24,25,26,27,28,29].
Inspired by the above, this paper combines RIS-assisted hybrid FSO/RF with RSMA technology and applies it to SAGIN. Leveraging signal path reconstruction and flexible access strategies significantly enhances the performance of the SAGIN system in large-scale user access and high-bandwidth demand scenarios. The proposed architecture has great application potential in civilian and military fields. For example, in the civilian field, it can be applied to fields that require reliable and secure communications, such as disaster recovery, remote communications, and smart transportation. In the military field, the architecture can enhance the security performance of battlefield management, surveillance, and reconnaissance, and achieve eavesdropping protection in hostile environments. In addition, this paper innovatively applies PLS technology to SAGIN, effectively preventing eavesdropping through the dynamic location deployment of drones, thereby enhancing the overall security performance of the system. The comparison between the model designed in this paper and the existing architecture is shown in Table 1. The main contributions of this paper are as follows.
  • We propose a novel architecture of RIS-assisted hybrid FSO/RF downlink SAGIN, in which drones dynamically adjust the RIS deployment positions to overcome communication issues caused by RF link blockages due to clouds or buildings. To accurately model the FSO and RF links under various weather conditions, the Málaga fading and the Nakagami-m model are employed. In addition, we adopt the RSMA strategy to achieve flexible access, thereby providing high-quality communication services for multiple users.
  • For the SSR maximization problem, we solve it by optimizing the power allocation coefficient, RIS phase shifts, and drone trajectory. Specifically, we employ the simulated annealing (SA) algorithm to optimize power allocation and combine semi-definite programming (SDP) and penalty algorithms to obtain the optimal RIS phase shifts. Then, the designed DDPG algorithm interacts with the dynamic environment to optimize the drone’s flight trajectory. Finally, an alternating iterative framework is proposed to achieve joint optimization.
  • Finally, extensive simulations are conducted to validate the superiority of the proposed scheme. The simulation results demonstrate that the proposed scheme effectively improves the SAGIN’s security performance. Compared with the NOMA and SDMA schemes, the SSR of the proposed scheme increases by 39.7% and 286.7%, respectively.

2. System Model and Problem Formulation

Figure 1 illustrates an RIS-assisted downlink hybrid FSO/RF space–air–ground integrated network, which utilizes a HAP and a drone equipped with RIS as air relays to forward information, thus achieving secure satellite-to-ground communication. Considering that the satellite-to-ground link is susceptible to the impact of dense buildings, atmospheric turbulence, and severe weather during transmission, the satellite first sends information to the HAP through the FSO link, which can provide high-speed and stable optical communication. Then, HAP uses the RF link to transmit information to ground users through the low-altitude drone, solving the problem that clouds may block the links from HAP to ground users. In addition, HAP adopts the RSMA strategy for information transmission to solve multi-user communication interference and improve spectrum efficiency. The drone carrying RIS acts as a mobile reflection relay, which can optimize the signal propagation path and provide high-quality communication services for multiple ground users through location deployment.
In the proposed network, the RIS is equipped with M passive reflective elements, whose reflection characteristics can be controlled by adjusting the phase shift matrix containing all elements. The reflection phase shift matrix is denoted as Φ = d i a g ( e i ϑ 1 , e i ϑ 2 , , e i ϑ M ) , ϑ [ 0 , 2 π ) . The dynamic position deployment of the drone is optimized and adjusted based on pre-collected information and real-time transmission status feedback, such as ground user spatial distribution, channel status, etc. To obtain the optimal trajectory of the drone while enhancing signal transmission and reducing eavesdropping, the flight time T is discretized. The position of the drone in the tth time slot is represented as q u [ t ] = [ x [ t ] , y [ t ] , H R ] , and the initial position is q 0 = [ x 0 , y 0 , H R ] . The location coordinates of the satellite, HAP, and eavesdropper are represented as Z s = [ x s , y s , H S ] , Z a = [ x a , y a , H A ] , Z e = [ x e , y e , 0 ] .

2.1. FSO Model

The satellite first uses the FSO link to send information to the HAP through optical communication. In the first-hop communication, the signal received by the HAP is
y H = P s G S H x s + n H ,
where P s is the satellite’s transmission power, x s is the satellite’s transmission signal and satisfies E x s 2 = 1 . n H represents the additive white Gaussian noise (AWGN) when the HAP receives the information and obeys n H C N ( 0 , σ H 2 ) . G S H is the channel fading coefficient of the FSO link, which is affected by path loss ( g F S O ), atmospheric turbulence ( g a ), and pointing error ( g p ). The fading coefficient of the FSO channel is expressed as
G S H = g F S O g a g p ,
where the path loss of the FSO link is defined as
g F S O = L T + L R L S L E L M 2 ,
where L T and L R are the transmitting antenna gain and the receiving antenna gain, respectively, L S is the loss of free space transmission, and L E , L M are the lens loss and the system margin. The units of the above variables are dB.
To account for the influence of atmospheric turbulence during communication, compared with Lognormal fading suitable for weak turbulence and Gamma-Gamma distribution suitable for moderate and strong turbulence, we employ Málaga fading to adapt to various turbulence conditions. The Málaga fading includes three parts: line-of-sight (LoS) transmission component, coupling component with LoS, and independent scattering component. The probability density function of the Málaga fading can be denoted as
f g a g = A 1 i = 1 β B 1 g α + i 2 1 K α i 2 α β g k 1 β + P I ,
where
A 1 = 2 α α 2 k 1 1 + α 2 Γ α · k 1 β k 1 β + P I α 2 + β ,
B 1 = β 1 i 1 k 1 β + P I 1 i 2 i 1 ! P I k 1 i 1 α β i 2 ,
where α represents the number of effective large-scale units in the scattering process, and β is the fading parameter. K α i x is the modified Bessel function of the second kind of order α i , and defined as K α i x = 0 e x cosh t cosh α i t d t , where cosh t = e t + e t 2 is the hyperbolic cosine function and x = 2 α β g k 1 β + P I . The average power 2 b of the total scattered component and the scattered power ε coupled to the LoS component determine the average power of the scattered component k 1 , that is, k 1 = 2 b 1 ε , ε 0 , 1 . The average power coupled to the LoS component is P I = P L + 2 b ε + 2 2 b ε P L cos θ 1 θ 2 , where P L is the average power of the LoS component, and θ 1 , θ 2 are the deterministic angles of the LoS component and the coupling to the LoS component.
The pointing error of the FSO link is a non-zero axis pointing error caused by the misalignment of the satellite’s launch angle with the HAP’s direction. Considering the different jitters of beam width, detector size, elevation angle, and horizontal displacement, as well as the influence of non-zero boresight error, the signal attenuation coefficient is defined as
g p A p exp 2 r 2 s p 2 ,
where r is the radial offset of the pointing error, which represents the offset distance between the center of the beam emitted by the satellite and the receiving center of the HAP. A p is the maximum signal gain when there is no pointing error, that is, the beam alignment r = 0 . s p is the equivalent beam radius of the HAP, which depends on the divergence angle and propagation distance of the beam.

2.2. RF Model

For the second-hop communication, HAP uses its own receiving optical aperture to combine and separate the received optical signals, converts them into RF signals, and then uses the RF link to forward the RF signals to the ground users through the UAV-RIS relay. In addition, the HAP adopts the RSMA strategy to divide the information into public information and private information of K users, reducing communication interference between multiple users. The received signal of the kth ground user is
y U , k = ξ P s G S H H A , k α 0 x 0 + k = 0 K α k x k + n D ,
where ξ is the photoelectric conversion coefficient, x 0 ,   x k ,   k K are the public information and the private information of K users. α 0 ,   α k are the power allocation coefficient of the public information and kth private information, respectively. n D is the noise when the user receives the signal and obeys n D C N ( 0 , σ D 2 ) . H A , k = h l ζ m h U , k H Φ h A U represents the channel vector from HAP to ground user k through the mobile drone, where the free space transmission loss of the RF link is related to the carrier frequency f R and the distance d between the transmitter and the receiver, that is, h l = 0.5 · 92.45 + 20 lg f R + 20 lg d . ζ m is a random number that follows the Nakagami-m distribution. h U , k C M × 1 and h A U C M × 1 are the RIS-ground users and HAP-RIS channel vectors, respectively, which can be modeled as
h U , k = χ d U k 2 a R θ k , φ k , h A U = χ d A U 2 a R θ A U , φ A U , k K ,
where χ is the path loss at d = 1 m , d U k and d A U are the distances between UAV-kth user and HAP-UAV, respectively. a R is the receive antenna array response of RIS, and the number of RIS reflective elements is M = m x m y , defined as
a R θ , φ = 1 , , e j π m x 1 cos θ sin φ T 1 , , e j π m y 1 sin θ sin φ T ,
cos θ sin φ = Δ x d , sin θ sin φ = Δ y d ,
where θ and φ represent the arrival azimuth and departure angle, respectively. Δ x represents the x-axis coordinate difference between the transmitter and the receiver, and Δ y represents the y-axis coordinate difference. Then, we have
cos θ k sin φ k = x [ t ] x k d U k , sin θ k sin φ k = y [ t ] y k d U k ,
cos θ A U sin φ A U = x [ t ] x a d A U , sin θ A U sin φ A U = y [ t ] y a d A U .

2.3. Communication Model and Problem Formulation

For the first-hop communication, the satellite transmits information to the HAP based on the FSO link, and the information rate of the HAP receiving the optical signal can be presented as
R A = log 2 1 + P s G S H 2 σ H 2 .
During the second-hop communication, HAP sends the information to the low-altitude drone through the RF link, and then RIS forwards it to the ground user. For ground multi-users, based on the RSMA decoding principle, all users first decode the public information and treat the private information stream as interference.
Therefore, the achievable rate of the kth user for decoding public information can be denoted as
R k c = log 2 1 + α 0 ξ 2 P s G S H 2 H A , k 2 ξ 2 P s G S H 2 H A , k 2 i = 1 K α i + σ D 2 .
To ensure that all users can successfully decode public information, the information receiving rate of the kth user is defined as c k . The sum of the public information receiving rates of all users should satisfy i = 1 K c i min R k c , k K , that is, for public information, the receiving rate of all ground users cannot exceed the minimum decoding rate.
After successfully decoding the public information, the user uses interference cancellation technology to remove it and then decodes its private information. The rate at which the kth user decodes the private information can be expressed as
R k p = log 2 1 + α k ξ 2 P s G S H 2 H A , k 2 ξ 2 P s G S H 2 H A , k 2 i = 1 , i k K α i + σ D 2 .
Eavesdroppers in the network will eavesdrop on confidential information, posing a serious security threat to the communication system. For the proposed network model, public information does not contain private information, and private information is hidden in the user’s private information. Therefore, eavesdroppers will eavesdrop on information when the user decodes private information. The eavesdropping rate for user k is
R k e = log 2 1 + α k ξ 2 P s G S H 2 H e , k 2 ξ 2 P s G S H 2 H e , k 2 i = 1 , i k K α i + σ D 2 ,
where H e , k = h l ζ m h U E H Φ h A U , h U E is the channel vector from UAV to eavesdropper and obeys h U E = χ d U E 2 a R θ U E , φ U E + Δ H , where Δ H is the estimation error of the eavesdropper channel. The channel state information of the eavesdropper is imperfect, which is more in line with the actual scenario.
Therefore, the secrecy rate of ground user k can be defined as
R k = c k + R k p R k e + ,
where x + = max x , 0 .
For the proposed RIS-assisted hybrid FSO/RF heterogeneous sky-ground integrated network, this paper proposes a sum secrecy rate (SSR) maximization problem to meet the communication needs of green communication and full coverage. The secrecy performance of the communication network is improved by jointly optimizing the power allocation coefficient A = α 0 , α 1 , , α K , RIS reflection phase shift Φ , and flight trajectory Q = q u [ t ] , t T . The optimization problem is formulated as
P 1 : max A , Φ , Q k = 1 K R k = k = 1 K c k + R k p R k e +
s . t . C 1 : i = 1 K c i min R k c , k K
C 2 : 0 < α i < 1 , i 0 , 1 , , K
C 3 : α 0 + α 1 + + α K 1 ,
C 4 : ϑ i [ 0 , 2 π ) , i M
C 5 : q u [ t ] q u [ t 1 ] 2 Δ s 2 ,
C 6 : q u X × Y ,
where constraint (19b) is the sum rate requirement for all ground users to receive public information. Constraint (19b) ensures the reasonable allocation of system resources and provides stable communication services in the worst case (i.e., the user with the weakest decoding ability). Constraints (19c) and (19d) specify the value range of the power allocation coefficient, and the sum of the power allocation coefficients should not exceed 1. The (19e) is the phase shift constraint of the reflective element. (19f) specifies the maximum displacement Δ s of the drone’s movement in adjacent time slots, which is related to the speed of the drone and the time of each time slot. Constraint (19g) gives the horizontal flight area X × Y where the drone can move. The reflection phase shift of RIS and the flight trajectory of the drone are closely coupled in the air-to-ground channel, and the three variables of the SSR maximization problem are interrelated and difficult to solve directly. Therefore, this paper maximizes the SSR by alternately solving the three sub-problems. The specific solution will be shown in the next section.

3. Sum Secrecy Rate Maximization Scheme

For the SSR maximization problem consisting of three variables: power allocation coefficient, RIS phase shift, and drone trajectory, this paper alternately solves the three sub-problems based on global optimization strategy, convex optimization, and DDPG algorithm, and finally designs an iterative framework to achieve joint optimization.
The optimization problem (19) is non-convex, and the sum rate of the public information received by the ground users must satisfy i = 1 K c i min R k c , k K . According to references [1,2], at the optimal solution A , Φ , Q of the problem P1, constraint (19b) holds, and the sum of the rates at which ground users receive public information satisfies i = 1 K c i = min R k c , k K . The objective function of the optimization problem P1 is re-expressed as min R k c + k = 1 K R k p R k e + . Then, problem (19) can be reconstructed as
P 1.1 : max A , Φ , Q C t h + k = 1 K R k p R k e +
s . t . C 1 : R k c C t h , k K
C 2 , C 3 , C 4 , C 5 , C 6 ,
where C t h (non-negative) is the threshold of the public information rate for ground user decoding, which all users obey.

3.1. Power Allocation

Given the phase shift matrix Φ of the RIS and the flight trajectory Q of the drone, the power allocation subproblem can be expressed as
P 2 : max α 0 , α 1 , , α K C t h + k = 1 K R k p R k e +
s . t . C 1 : R k c C t h , k K
C 2 : 0 < α i < 1 , i 0 , 1 , , K
C 3 : α 0 + α 1 + + α K 1 .
For the power allocation problem, the optimization target SSR shows a nonlinear relationship for ( K + 1 ) power allocation coefficients. In addition, since the power allocation coefficients need to satisfy non-negativity and are subject to complex constraints such as total power limit, traditional gradient descent methods or deterministic optimization algorithms are difficult to effectively search for the global optimal solution. In view of this, we propose a heuristic simulated annealing algorithm that does not rely on the convexity or gradient information of the objective function and can achieve optimal power allocation.
The SA algorithm is a probabilistic search algorithm that obtains the global optimal solution by simulating the three stages of solid annealing: heating, constant temperature, and cooling. In the field of physics, atoms inside a solid perform irregular thermal motion when heated, and the energy of the system will gradually decrease during the cooling process. The atoms will arrange themselves according to certain rules as the temperature decreases, and eventually reach a stable state, at which time the energy of the system reaches the lowest point. The SA algorithm can transform the objective function of the optimization problem into the concept of “energy” by introducing temperature parameters similar to those in the physical annealing process, such as initial temperature and cooling rate, and complete the global search by gradually lowering the temperature.
The core of SA is the ability to jump out of the local optimal solution. The SA algorithm starts from an initial solution to explore the adjacent solution space, and accepts the worse solution with a certain probability, thereby minimizing the system energy. Assume that the system energy corresponding to the two adjacent states c t ,   c t + 1 during the exploration process are E t and E t + 1 . Referring to the Metropolis criterion, solution c t + 1 is accepted when the system energy decreases E t + 1 E t . If the energy increases, the solution is judged with a certain probability whether to accept the solution. The acceptance probability of the new solution can be expressed as
P e = 1 , if E ( t + 1 ) < E ( t ) e E ( t + 1 ) E ( t ) T c , if E ( t + 1 ) E ( t )
where T c is the current temperature.
To find the global optimal solution and reach convergence within a limited time, the SA algorithm divides the exploration process into an external loop and an internal loop, where the internal loop accepts a poor solution with a certain probability. The external loop achieves rapid convergence by controlling the algorithm convergence parameters, using a higher initial temperature in the early stage, and then gradually reducing the temperature at a certain cooling rate. This paper uses exponential descent, defined as
T t + 1 = ƛ T t ,
where cooling rate ƛ is a positive number less than 1.
For this problem, the SA algorithm aims to maximize the system’s SSR by optimizing the power allocation coefficient. The SSR includes the user’s public information rate and the secrecy rate of private information. Therefore, the objective function of the SA algorithm comprehensively considers the public information rate, the secrecy rate of private information, and the penalty term, where the penalty term is employed to limit the occurrence of solutions that do not meet the constraints so that the optimization result has better performance and stability. Specifically, the objective function of the SA algorithm includes the following four cases:
  • Case 1: The public information rate of user k is less than the specified threshold, and the rate at which all users receive private information is greater than the eavesdropping rate. The objective function is set as
    O b 1 = C t h + k = 1 K R k p R k e + p 1 p b ,   R k p R k e ,
    where p 1 = p 0 C t h R k c 2 , p 0 is the penalty weight. p b = p b 0 υ α 0 , α 1 , , α K is the balance penalty, where A is the balance error weight and υ α 0 , α 1 , , α K is the variance of the power allocation coefficient.
  • Case 2: The rate at which user k receives private information is less than the eavesdropping rate, and the public information rate of all users is greater than the specified threshold. The objective function is
    O b 2 = C t h + k = 1 K R k p R k e + p 2 p b ,   R k c C t h ,
    where p 2 = p 0 R k p R k e 2 .
  • Case 3: The rate at which user k receives private information is less than the eavesdropping rate, and the rate at which user k receives public information is greater than the specified threshold. The objective function is
    O b 3 = C t h + k = 1 K R k p R k e + p 3 p b ,
    where p 3 = p 1 + p 2 .
  • Case 4: The rate at which all users receive private information is greater than the eavesdropping rate, and the rate of public information exceeds the specified threshold. The objective function is defined as
    O b 4 = C t h + k = 1 K R k p R k e + p b ,   R k p R k e ,   R k c C t h .
It is worth noting that the goal of the SA algorithm is to minimize the system energy, so the corresponding optimization objective in the proposed algorithm framework should be O b 1 , O b 2 , O b 3 , O b 4 .
For the power allocation optimization problem, the specific solution steps of the SA algorithm are summarized in Algorithm 1.
Algorithm 1 Power allocation based on SA algorithm
1:
Initialize initial temperature T 0 , number of iterations N S A .
2:
Set cooling rate ƛ , initial power allocation coefficient A 0 .
3:
Input initial solution, initial temperature, constraints.
4:
Output optimal power allocation coefficient and corresponding objective function value.
5:
for i = 1 , 2 , , N S A do
6:
 Generate a new candidate solution A n based on A 0 .
7:
 Calculate the objective function value and the difference between the objective functions of two adjacent iterations Δ E = E A n E A n 1 .
8:
if Δ E 0  then
9:
  Accept A n as the new solution and update E A n .
10:
  else
11:
   Calculate the probability according to (22) and determine whether to accept the solution.
12:
  end if
13:
  Update temperature based on T t + 1 = ƛ T t .
14:
end for
15:
Until the maximum number of iterations is reached.
16:
Obtain the optimal power allocation coefficient A .

3.2. RIS Phase Shift Optimization

For the given power allocation coefficient α 0 , α 1 , , α K and drone trajectory Q, the RIS phase shift optimization subproblem can be reformulated as
P 3 : max Φ C t h + k = 1 K R k p R k e +
s . t . C 1 : R k c C t h , k K
C 4 : ϑ i [ 0 , 2 π ) , i M
Since the objective function (28a) is still non-convex for the variable Φ , we adopt semi-definite programming to deal with the objective function and constraints. By introducing the auxiliary variable θ = e j ϑ 1 , e j ϑ 2 , , e j ϑ M H C M × 1 , the channel vectors of HAP-ground users and HAP-eavesdropper can be expressed as h U , k H Φ h A U = θ H d i a g h U , k H h A U , h U E H Φ h A U = θ H d i a g h U E H h A U . Define Ψ = θ θ H C M × M . For any complex vector D C M × 1 , we have
Ψ H = θ θ H H = θ H H θ H = θ θ H = Ψ ,
D H Ψ D = D H θ θ H D = D H θ θ H D = D H θ 2 0 .
Then, Ψ is a semi-positive Hermitian matrix, so we have
| H A , k | 2 = h l 2 ζ m 2 h U , k H Ψ h A U 2 = h l 2 ζ m 2 θ H d i a g h U , k H h A U 2 = h l 2 ζ m 2 θ H diag ( h U , k H ) h A U h A U H diag ( h U , k H ) H θ = T r ( Y A , k Ψ )
where Y A , k = h l 2 ζ m 2 d i a g ( h U , k H ) h A U d i a g ( h U , k H ) h A U H . Similarly, we have | H e , k | 2 = T r ( Y e , k Ψ ) , where Y e , k = h l 2 ζ m 2 d i a g ( h U E H ) h A U d i a g ( h U E H ) h A U H . Let δ = ξ 2 P s G S H 2 , constraint C1 can be rewritten as
R k c = log 2 1 + α 0 ξ 2 P s G S H 2 H A , k 2 ξ 2 P s G S H 2 H A , k 2 i = 1 K α i + σ D 2 = log 2 1 + α 0 δ T r Y A , k Ψ δ T r Y A , k Ψ i = 1 K α i + σ D 2 C t h α 0 δ T r Y A , k Ψ 2 C t h 1 δ T r Y A , k Ψ i = 1 K α i + σ D 2 .
Then, the optimization problem (28) can be reformulated as
P 3.1 : max Ψ C t h + k = 1 K R k p R k e +
s . t . C 1 : α 0 δ T r Y A , k Ψ 2 C t h 1 δ T r Y A , k Ψ i = 1 K α i + σ D 2 , k K
C 4.1 : Ψ i i = 1 , i M
C 4.2 : Ψ 0 ,
C 4.3 : r a n k Ψ = 1 ,
where
R k p = log 2 1 + α k δ T r Y A , k Ψ δ T r Y A , k Ψ i = 1 , i k K α i + σ D 2 ,
R k e = log 2 1 + α k δ T r Y e , k Ψ δ T r Y e , k Ψ i = 1 , i k K α i + σ D 2 .
Remark 1.
(44b) requires that all K ground users meet the rate constraints for receiving public information, so there are K constraints.
The objective function of (28a) is still non-convex, so we perform a first-order Taylor expansion on the negative terms to obtain its upper bound function, referring to
f Ψ f Ψ r + T r Ψ f Ψ r · Ψ Ψ r H .
Specifically, according to property of the matrix trace, i.e., T r A + B = T r A + T r B and T r A B A = B T , we first rewrite the function (33a) as
R k p R k e = log 2 α k δ T r Y A , k Ψ + δ T r Y A , k Ψ i = 1 , i k K α i + σ D 2 log 2 δ T r Y A , k Ψ i = 1 , i k K α i + σ D 2 + log 2 δ T r Y e , k Ψ i = 1 , i k K α i + σ D 2
log 2 α k δ T r Y e , k Ψ + δ T r Y e , k Ψ i = 1 , i k K α i + σ D 2 = log 2 T r X k 1 Ψ + σ D 2 log 2 T r X k 2 Ψ + σ D 2 log 2 T r X 3 Ψ + σ D 2 + log 2 T r X k 4 Ψ + σ D 2 ,
where
X k 1 = α k δ Y A , k + δ i = 1 , i k K α i Y A , k = δ i = 1 K α i Y A , k , X k 2 = δ i = 1 , i k K α i Y A , k ,
X 3 = δ i = 1 K α i Y e , k , X k 4 = δ i = 1 , i k K α i Y e , k .
Since log 2 x is a concave function, its first-order Taylor expansion is an upper bound function. According to (36), we have
log 2 T r X k 2 Ψ + σ D 2 log 2 T r X k 2 Ψ r + σ D 2 + T r X k 2 Ψ Ψ r ln 2 T r X k 2 Ψ r + σ D 2 = Λ k Ψ ,
log 2 T r X 3 Ψ + σ D 2 log 2 T r X 3 Ψ r + σ D 2 + T r X 3 Ψ Ψ r ln 2 T r X 3 Ψ r + σ D 2 = Ξ k Ψ .
The objective function (33a) can be denoted as
R s = C t h + k = 1 K log 2 T r X k 1 Ψ + σ D 2 + log 2 T r X k 4 Ψ + σ D 2 Λ k Ψ Ξ k Ψ .
The RIS phase shift optimization problem can be expressed as
P 3.2 : max Ψ R s
s . t . C 1 : α 0 δ T r Y A , k Ψ 2 C t h 1 δ T r Y A , k Ψ i = 1 K α i + σ D 2 , k K
C 4.1 : Ψ i i = 1 , i M
C 4.2 : Ψ 0 ,
C 4.3 : r a n k Ψ = 1 .
We employ SDP technology to transform the rank constraint of the original phase shift optimization problem into a semi-positive constraint to simplify the problem and then utilize convex optimization tools to obtain the optimal phase shift matrix after ignoring the rank-1 constraint.
By introducing semi-positive relaxation, the rank-1 constraint is relaxed. However, the rank of the optimized solution is generally greater than 1. Therefore, we need to post-process it to restore the solution that satisfies the rank-1 constraint. This paper adopts a penalty algorithm to introduce a penalty term in the objective function to restore the rank-1 matrix. The objective function constructed by the penalty algorithm is max Θ r L Θ r = R s Ψ Θ r F 2 , where Ψ is the optimal phase shift matrix that satisfies the rank-1 constraint. The penalty algorithm process is detailed in Algorithm 2.
Algorithm 2 Overall algorithm framework
1:
Initialize the power allocation coefficients A 0 , RIS phase shift matrix Φ 0 , and the initial position of the drone q 0 .
2:
Set the total number of iterations N o , number of reflective elements M, the coordinates of ground users, and the channel status information.
3:
for i = 1 , 2 , , N 0 do
4:
 Obtain the power allocation coefficients based on Algorithm 1.
5:
for i = 1 , 2 , , N S D R do
6:
  Use convex optimization tools to generate a semi-positive definite matrix Φ based on (44) with a relaxed rank-one constraint.
7:
  Define the objective function L Θ r = R s Ψ Θ r F 2 .
8:
  Generate a random phase vector and normalize: ϕ 0 = ϕ ϕ .
9:
  Construct a rank-one matrix through outer product: Θ r = ϕ 0 ϕ 0 H .
10:
   Minimize the objective function based on gradient descent L Θ r .
11:
   Find the eigenvector corresponding to the maximum eigenvalue of the matrix to update vector: ϕ n e w = e i g max Θ r + τ L Θ r .
12:
   Normalize the magnitude of the updated vector: ϕ 1 = ϕ n e w ϕ n e w .
13:
   if  Ψ Θ r F 2 Δ S D R  then
14:
   Obtain the optimal phase shift matrix with rank-one constraint.
15:
   else
16:
   repeat steps 6–12.
17:
   end if
18:
end for
19:
 Obtain the phase shift matrix Ψ .
20:
for  i = 1 , 2 , , N D D P G  do
21:
  Input the initial coordinate, add random noise, and explore.
22:
  Put the training samples into the experience buffer.
23:
  Update the parameters of the actor and critic networks.
24:
  Until the number of iterations N D D P G is reached.
25:
end for
26:
end for
27:
Obtain the optimal power allocation coefficient A , the RIS phase shift matrix Φ , and drone trajectory Q .

3.3. Trajectory Optimization

The trajectory optimization subproblem is defined as
P 4 : max Q C t h + k = 1 K R k p R k e +
s . t . C 1 : R k c C t h , k K
C 5 : q u [ t ] q u [ t 1 ] 2 Δ s 2 ,
C 6 : q u X × Y .
The problem aims to maximize the SSR by optimizing the trajectory of the drone during the flight time. However, the trajectory of the drone is coupled in the air-to-ground channel, which makes the SSR maximization problem nonlinear for the trajectory. In addition, the irregular constraints of the rate threshold of public information and the drone’s motion area also lead to the non-convexity of the problem. Therefore, we design the DDPG algorithm to solve it. Traditional convex optimization and global optimization algorithms can be solved by mathematical transformation or convex optimization tools, but their computational complexity and cost overhead are large. The comparison of the advantages and disadvantages of the three algorithms is summarized in Figure 2.
DDPG is a policy optimization method based on reinforcement learning. Its unique actor–critic dual network architecture makes it suitable for continuous action optimization problems and responds to dynamic environmental changes in real time. Compared with Q-learning and deep Q-network algorithms, DDPG based on deterministic strategies can effectively avoid Q value fluctuations and converge faster. The two neural networks of DDPG are responsible for action generation and strategy evaluation, respectively. According to the state obtained at a certain moment, the actor–network adds random noise to generate actions that meet the constraints. The critic network evaluates the action generation strategy by calculating the reward value corresponding to the current action. The update process of the DDPG algorithm is mainly achieved by updating the parameters of the neural network. Specifically, the target actor network obtains the current state s from the environment and then adds random noise to generate actions according to the action strategy. The above process is expressed as
a = π s + σ t ,
where σ t is random disturbance noise. Then the critic network estimates the Q value function corresponding to the next moment based on the current state and action. The corresponding target value in the current state can be expressed as
y t = r t + ρ Q s , π s ,
where ρ is the discount factor.
The critic’s online network calculates the Q value Q s , a corresponding to the current state and the generated action, then minimizes the difference Δ = 1 T t y t Q s , a 2 between the target value and the evaluation value through gradient descent to achieve the critic network update. For the actor network, a new action is generated based on the current state a n = π s + σ t , and then the critic evaluates the state and the new action Q s , a n , and finally, the parameter update is achieved through the gradient ascent method to maximize the cumulative Q value. The target network will gradually approach the parameters of the current network through soft updates at each update to stabilize the training process. Since the parameters in the target network are slowly updated through soft updates, its output will be more stable, and using the target network to calculate the target value will naturally be more stable, thereby further ensuring that the learning process of the critic network is more stable. The specific framework of the DDPG algorithm is shown in Figure 3.
In the DDPG algorithm designed in this paper, the RIS-assisted downlink hybrid FSO/RF space–air–ground integrated network proposed in Section 2 is the environment of the DDPG algorithm, from which the location information of satellite, HAP, ground users, eavesdropper, the air-to-ground channel status, and the dynamic environment changes caused by drone movement can be obtained. The position of the drone in all time slots constitutes the state S. The action selection strategy of the actor–network is related to the speed of the drone, which can be presented as
x t + 1 = x t + ν δ cos ϕ , y t + 1 = y t + ν δ sin ϕ ,
where ϕ is the azimuth of movement and ν is the speed of the drone. The reward R = n = 1 T r [ n ] of DDPG is the sum of the corresponding reward values of all time slots. When the drone moves in a given area and the rate of public information reaches the set threshold, the reward value of this time slot is the objective function value, that is, r [ n ] = k = 1 K min R k c + R k p R k e + ; when it flies out of the given area or is lower than the rate threshold, the reward in this time slot is r [ n ] = k = 1 K min R k c + R k p R k e + , and the reward value is a negative number as a penalty.

3.4. Overall Algorithm

The three sub-problems of power allocation, RIS phase shift, and drone trajectory have been solved preliminarily through different optimization methods. However, since the solutions to these sub-problems are usually based on local optimization algorithms, the solutions obtained may only be local optimal solutions and cannot guarantee global optimality. In order to overcome this problem of locally optimal solutions, we design an alternating iterative framework, which gradually approaches the global optimal solution in each iteration by alternately optimizing power allocation, RIS phase shift, and drone trajectory. In this framework, power allocation, RIS phase shift, and drone trajectory are regarded as interdependent variables, so we divide their optimization process into multiple alternating iterative steps. In each step, one of the variables is optimized by different optimization algorithms, while the other variables are fixed, which effectively avoids premature convergence or local optimal solutions that may occur when optimizing a single variable, thereby promoting better global search. The overall algorithm framework is summarized in Algorithm 2.
Complexity Analysis: In this paper, we design three optimization algorithms with different computational characteristics. The computational complexity of the SA algorithm for optimizing the power allocation factor is proportional to the number of iterations and the dimension of the optimization variable. The computational complexity of the SA algorithm is O K N s , where N s is the number of temperature iterations and K is the dimension of the optimization variable. For RIS phase shift optimization, the complexity of the SDP method solved by the interior point method is O I M 3.5 L , where I M is the number of RIS reflective elements and L is the number of constraints. The computational complexity of the DDPG algorithm for UAV trajectory optimization is related to the dimension of the optimization variable, the number of hidden layer units h, the sampling size I s , and the maximum capacity C of the experience buffer. The number of neurons in the DDPG algorithm is defined as 2 × 2 × h × 2 + 4 × h × 1 = 16 h , then the complexity of the algorithm is O 16 h I s C N t E , where N t and E are the number of time slots and the total number of iterations, respectively. Therefore, for the proposed overall algorithm framework, SA, SDP, and DDPG are jointly optimized to maximize SSR, and the complexity of the overall algorithm is O D N K N s + I M 3.5 L + 16 h I s C N t E , where D N is the number of iterations of the overall algorithm.

4. Simulation Results

In this section, we evaluate the security performance of the proposed SSR maximization scheme through multiple simulations. For FSO links, the transmit antenna gain is L T = 10 lg π 2 D 2 λ 2 η , where D = 15 cm is the transmit antenna diameter, the wavelength of light is 1550 nm and the transmit antenna efficiency is η = 0.8 . The receiving antenna gain is similar, the receiving antenna diameter is 20 cm, and the receiving antenna efficiency is 0.85 . The free space loss is L S = 20 lg 4 π d λ . The number of reflective elements is M = 8 . The noise power when receiving the signal is σ D 2 = 60 dBm. The parameters of Málaga fading are α = 2.296 , β = 2 , corresponding to strong turbulent weather conditions, and the remaining parameters are set to P L = 1.3265 , θ 1 θ 2 = π 2 . This paper simulates the cell user communication scenario, where all ground users are distributed in a specified circular area with distribution coordinates [ 750.41 , 1114.88 ] , [ 782.66 , 1526.13 ] , [ 570.87 , 1836.54 ] , and [ 324.53 , 1558.02 ] . The simulation parameter settings are summarized in Table 2.
The satellite’s transmission is set to 10 dB, ensuring reliable signal coverage in large-scale networks and multi-hop communication links while avoiding excessive interference and energy waste. The UAV’s flight speed is 10 m/s, providing sufficient maneuverability for the communication relay mission while maintaining stable data transmission in a dynamic service environment.
To ensure fairness, the network models and parameter settings of the two comparison schemes (NOMA and SDMA) are exactly the same as those of the proposed scheme, including parameters such as power and user distribution. However, due to the different principles of multiple access technology, the objective functions of the NOMA and SDMA schemes are different from those of the proposed scheme. Specifically, the objective functions of the NOMA and SDMA schemes are defined as
R N = R p N R e N + , R N = R p S R e S + ,
where
R p N = log 2 1 + α k ξ 2 P s G S H 2 H A , k 2 ξ 2 P s G S H 2 H A , k 2 i = 1 + k K α i + σ D 2 ,
R e N = log 2 1 + α k ξ 2 P s G S H 2 H e , k 2 ξ 2 P s G S H 2 H e , k 2 i = 1 + k K α i + σ D 2 ,
R p S = log 2 1 + α k ξ 2 P s G S H 2 H A , k 2 ξ 2 P s G S H 2 H A , k 2 i = i = 1 , i k K α i + σ D 2 ,
R e S = log 2 1 + α k ξ 2 P s G S H 2 H e , k 2 ξ 2 P s G S H 2 H e , k 2 i = i = 1 , i k K α i + σ D 2
Figure 4 shows the trend of SSR increasing with iteration under the SDMA scheme, NOMA scheme, and the proposed scheme. In the initial iteration stage, the secrecy performance of the three schemes increases rapidly, and the SSR tends to converge as the iteration proceeds. The proposed RSMA scheme’s secrecy performance is significantly better than the SDMA and NOMA schemes, with SSR increasing by 39.7% and 286.7%, respectively. This advantage is attributed to the unique resource allocation strategy of RSMA, which balances full decoding and noise processing. Specifically, SDMA treats all signals as noise during the decoding process, resulting in a low communication rate. Although NOMA can utilize SIC technology to achieve full decoding based on users’ channel quality and improve communication efficiency, it is still limited by the channel status, and its resource scheduling space is limited in multi-user scenarios. In contrast, the RSMA scheme can flexibly allocate resources according to the quality of multi-user channels and effectively manage communication interference, maximizing SSR while ensuring user communication quality, thereby effectively improving the security performance of the system.
Figure 5 illustrates the impact of the SA algorithm on secrecy performance at different cooling rates. The cooling rate determines the temperature drop rate in each round of iteration, which affects the algorithm’s search space range and the quality of the optimal solution. As can be seen from Figure 5, when the cooling rate is low ( 0.5 and 0.7 ), the algorithm shows a faster convergence speed. In particular, when the cooling rate is 0.5 , the algorithm quickly reaches a stable state after about 310 iterations, and the SSR value is about 16.451 bps/Hz. A lower cooling rate makes the temperature drop quickly, and the algorithm is more likely to accept solutions of poor quality and accelerate convergence, but it may fall into a local optimal solution too early. When the cooling rate is high, such as 0.9 and 0.99 , the SSR values converge to 12.906 and 9.762 after 1000 iterations. A higher cooling rate causes the temperature to drop slowly, and the algorithm maintains randomness for a longer time during the search process. Although this effectively increases the diversity of the search and reduces the risk of falling into the local optimal solution, it also leads to a slower convergence speed and a higher computational cost for each iteration. Therefore, the cooling rate of the SA algorithm should be reasonably selected to strike a balance between obtaining the global optimal solution and fast convergence.
Figure 6 shows the two-dimensional trajectory of the drone when the starting point is determined. The trajectory optimization design of the drone aims to maximize the secrecy performance. To fit the actual scenario, we evenly distribute the ground users in a circular area with a center of ( 700 , 1500 ) and a radius of 600 m. The black triangle and orange triangle in the figure represent the starting point and end point of the drone, respectively. By analyzing its trajectory, the drone tries to get close to the user area during flight while keeping a distance from the eavesdropper to enhance the quality of the legitimate link while reducing the quality of the eavesdropping channel to ensure information security. In particular, when passing through the area where the user group is located (dashed gray circle), the drone significantly adjusts its flight path to ensure a longer stay time, thereby improving the communication quality of the user. In addition, the drone’s trajectory avoids getting close to the red eavesdropper, which reduces the risk of the system being eavesdropped.
Figure 7 shows the drone’s trajectory when the starting and ending positions are determined. In Figure 7, the drone rises rapidly in the direction of the y-axis in the initial stage to obtain better signal propagation conditions and stay away from the eavesdropper, and then the drone adjusts its trajectory to cover all user nodes. Unlike Figure 6, since the starting and ending positions are determined, the drone must comprehensively consider the secrecy performance during the flight process, so it cannot fly directly to the user area. Therefore, the drone starts away from the eavesdropper (red square) and then gradually moves towards the direction close to the HAP and the ground user. This trajectory enables the drone to better act as a low-altitude relay, receiving the signal of the HAP and forwarding it to the ground user, thereby reducing eavesdropping and improving the user service quality. In addition, the sum of all time slot rewards of the optimization path is 714.37, corresponding to the maximum reward value in all iterations, which shows that the trajectory optimization of the drone balances the requirements of secrecy performance and service quality under the constraints of the given starting and ending points, and effectively improves the secrecy performance of the system.
Figure 8 shows the trend of the reward value based on the DDPG algorithm with the number of iterations in the proposed system model. The simulation results show that the reward value has undergone a process of initial fluctuation, exploration adjustment, and final convergence. In the 20th episode, the reward fluctuates greatly. The DDPG algorithm evaluates the SSR performance by trying different coordinate transformations, which belong to the initial exploration period, and tries different coordinate transformations to evaluate the SSR performance. After the 22nd iteration, the reward value dropped significantly, indicating that the DDPG algorithm exceeded the boundary of the specified area during the exploration process, thus receiving a negative reward penalty. As the number of iterations increases, the reward gradually increases, indicating that the algorithm learns a more effective strategy. However, the reward value drops significantly again at the 41st iteration. This may be because the algorithm tries to adjust the strategy in the further exploration stage to seek a potentially better solution. After the strategy optimization, the reward value rises rapidly after the 80th iteration and eventually converges, indicating that the algorithm has found a better strategy and achieved convergence.

5. Conclusions

This paper proposed a novel RIS-assisted downlink hybrid FSO/RF SAGIN, where the HAP and drone equipped with RIS dynamically adjust the signal propagation path, effectively reducing signal attenuation and enhancing RF link transmission. In addition, we adopted the Málaga fading and Nakagami-m models to model the transmission characteristics of FSO and RF channels under various weather conditions. To improve the security performance of SAGIN, this paper formulated a sum secrecy rate maximization problem involving three variables: power allocation, RIS phase shift, and drone trajectory. Specifically, the simulated annealing algorithm and semi-definite relaxation were employed to optimize the power allocation coefficients and RIS phase shift. Then, the designed DDPG algorithm was used to obtain the optimal drone trajectory, and finally, an alternating iterative framework was proposed for joint optimization. Simulation results demonstrated the superiority of the proposed scheme in enhancing security performance. To further enhance the system’s performance, future research could explore the integration of sensing and communication functionalities, enabling more efficient resource utilization and improved system adaptability. In addition, extending to multi-UAV networks could further optimize communication reliability and security by achieving collaborative decision-making and resource sharing among multiple UAVs.

Author Contributions

J.L.: conceptualization, methodology, software, and writing—original draft preparation. W.Y.: conceptualization, resources, writing—review and editing, supervision, and funding acquisition. T.L.: methodology, software, and writing—review and editing. L.L.: conceptualization, resources, and writing—review and editing. Y.J.: formal analysis and writing—review and editing. Y.H.: resources, funding acquisition, and writing—review and editing. D.W.: conceptualization, resources, writing—review and editing, supervision, project administration, and funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62271399, Grant 62031012, Grant 262222107, and Grant 62401230, in prat by the Aviation scientific fund project under Grant 2024Z073051001, and in part by National Key Research and Development Program of China under Grant 2024YFC2206804.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Mao, W.; Lu, Y.; Pan, G.; Ai, B. UAV-Assisted Communications in SAGIN-ISAC: Mobile User Tracking and Robust Beamforming. IEEE J. Sel. Areas Commun. 2025, 43, 186–200. [Google Scholar] [CrossRef]
  2. Cao, X.; Yang, B.; Yuen, C.; Han, Z. HAP-Reserved Communications in Space-Air-Ground Integrated Networks. IEEE Trans. Veh. Technol. 2021, 70, 8286–8291. [Google Scholar] [CrossRef]
  3. Kawamoto, Y.; Matsushita, A.; Verma, S.; Kato, N.; Kaneko, K.; Sata, A.; Hangai, M. HAPS-Based Interference Suppression Through Null Broadening with Directivity Control in Space-Air-Ground Integrated Networks. IEEE Trans. Veh. Technol. 2023, 72, 16098–16107. [Google Scholar] [CrossRef]
  4. Qu, L.; Xu, G.; Zeng, Z.; Zhang, N.; Zhang, Q. UAV-Assisted RF/FSO Relay System for Space-Air-Ground Integrated Network: A Performance Analysis. IEEE Trans. Wirel. Commun. 2022, 21, 6211–6225. [Google Scholar] [CrossRef]
  5. Lei, H.; Luo, H.; Park, K.H.; Ansari, I.S.; Lei, W.; Pan, G.; Alouini, M.S. On Secure Mixed RF-FSO Systems with TAS and Imperfect CSI. IEEE Trans. Commun. 2020, 68, 4461–4475. [Google Scholar] [CrossRef]
  6. Li, M.; Hong, Y.; Zeng, C.; Song, Y.; Zhang, X. Investigation on the UAV-To-Satellite Optical Communication Systems. IEEE J. Sel. Areas Commun. 2018, 36, 2128–2138. [Google Scholar] [CrossRef]
  7. Balti, E.; Guizani, M.; Hamdaoui, B.; Khalfi, B. Aggregate Hardware Impairments Over Mixed RF/FSO Relaying Systems with Outdated CSI. IEEE Trans. Commun. 2018, 66, 1110–1123. [Google Scholar] [CrossRef]
  8. Wang, D.; Wang, Z.; Zhao, H.; Zhou, F.; Alfarraj, O.; Yang, W.; Mumtaz, S.; Leung, V.C.M. Secure Energy Efficiency for ARIS Networks with Deep Learning: Active Beamforming and Position Optimization. IEEE Trans. Wirel. Commun. 2025; to be published. [Google Scholar]
  9. Samy, R.; Yang, H.-C.; Rakia, T.; Alouini, M.-S. Hybrid SAG-FSO/SH-FSO/RF Transmission for Next-Generation Satellite Communication Systems. IEEE Trans. Veh. Technol. 2023, 72, 14255–14267. [Google Scholar] [CrossRef]
  10. Nguyen, K.-T.; Vu, T.-H.; Shin, H.; Kim, S. Performance Analysis of Active RIS and Passive RIS-Aided MISO Systems Over Nakagami-m Fading Channel With Imperfect CSI. IEEE Trans. Veh. Technol. 2024, 74, 4334–4348. [Google Scholar] [CrossRef]
  11. Wang, D.; Wu, M.; Wei, Z.; Yu, K.; Min, L.; Mumtaz, S. Uplink Secrecy Performance of RIS-Based RF/FSO Three-Dimension Heterogeneous Networks. IEEE Trans. Wirel. Commun. 2024, 23, 1798–1809. [Google Scholar] [CrossRef]
  12. Tang, X.; Jiang, T.; Liu, J.; Li, B.; Zhai, D.; Yu, F.R.; Han, Z. Secure Communication with UAV-Enabled Aerial RIS: Learning Trajectory with Reflection Optimization. IEEE Trans. Intell. Veh. 2023; to be published. [Google Scholar] [CrossRef]
  13. He, Y.; Huang, F.; Wang, D.; Zhang, R.; Gu, X.; Pan, J. NOMA-enhanced cooperative relaying systems in drone-enabled IoV: Capacity analysis and height optimization. IEEE Trans. Veh. Technol. 2024, 73, 19065–19079. [Google Scholar] [CrossRef]
  14. Sun, Q.; Hu, Q.; Chen, X.; Yang, Y.; Dang, S.; Zhang, J. Performance Analysis of RIS-Aided FSO/RF Hybrid Satellite-Terrestrial Network with Imperfect CSI. IEEE Trans. Veh. Technol. 2024, 74, 2958–2972. [Google Scholar] [CrossRef]
  15. Wu, M.; Guo, K.; Li, X.; Lin, Z.; Wu, Y.; Tsiftsis, T.A.; Song, H. Deep Reinforcement Learning-Based Energy Efficiency Optimization for RIS-Aided Integrated Satellite-Aerial-Terrestrial Relay Networks. IEEE Trans. Commun. 2024, 72, 4163–4178. [Google Scholar] [CrossRef]
  16. Guo, K.; Wu, M.; Li, X.; Lin, Z.; Tsiftsis, T.A. Joint Trajectory and Beamforming Optimization for Federated DRL-Aided Space-Aerial-Terrestrial Relay Networks with RIS and RSMA. IEEE Trans. Wirel. Commun. 2024, 23, 18456–18471. [Google Scholar] [CrossRef]
  17. Huang, Q.; Lin, M.; Zhu, W.-P.; Cheng, J.; Alouini, M.-S. Uplink Massive Access in Mixed RF/FSO Satellite-Aerial-Terrestrial Networks. IEEE Trans. Commun. 2021, 69, 2413–2426. [Google Scholar] [CrossRef]
  18. Li, J.; Yang, L.; Wu, Q.; Lei, X.; Zhou, F.; Shu, F.; Mu, X.; Liu, Y.; Fan, P. Active RIS-Aided NOMA-Enabled Space- Air-Ground Integrated Networks with Cognitive Radio. IEEE J. Sel. Areas Commun. 2025, 43, 314–333. [Google Scholar] [CrossRef]
  19. Zhang, W.; Chen, J.; Kuo, Y.; Zhou, Y. Artificial-Noise-Aided Optimal Beamforming in Layered Physical Layer Security. IEEE Commun. Lett. 2019, 23, 72–75. [Google Scholar] [CrossRef]
  20. Wang, D.; Wang, Z.; Yu, K.; Wei, Z.; Zhao, H.; Al-Dhahir, N.; Guizani, M.; Leung, V.C. Active Aerial Reconfigurable Intelligent Surface Assisted Secure Communications: Integrating Sensing and Positioning. IEEE J. Sel. Areas Commun. 2024, 42, 2769–2785. [Google Scholar] [CrossRef]
  21. He, Y.; Huang, F.; Wang, D.; Yang, L.; Zhang, R. Delay minimization for NOMA-MEC offloading in ABS-aided maritime communication networks. IEEE Trans. Veh. Technol. 2025; early access. [Google Scholar] [CrossRef]
  22. Li, X.; Lu, Z.; Yuan, M.; Liu, W.; Wang, F.; Yu, Y.; Liu, P. Tradeoff of Code Estimation Error Rate and Terminal Gain in SCER Attack. IEEE Trans. Instrum. Meas. 2024, 73, 8504512. [Google Scholar] [CrossRef]
  23. Zhang, M.; Zhang, Y.; Cen, Q.; Wu, S. Deep learning–based resource allocation for secure transmission in a non-orthogonal multiple access network. Int. J. Distrib. Sens. Netw. 2022, 18, 15501329221104330. [Google Scholar] [CrossRef]
  24. Zhang, X.; Zhang, H.; Liu, L.; Han, Z.; Poor, H.V.; Di, B. Target Detection and Positioning Aided by Reconfigurable Surfaces: Reflective or Holographic? IEEE Trans. Wirel. Commun. 2024, 23, 19215–19230. [Google Scholar] [CrossRef]
  25. Sun, G.; Sheng, L.; Luo, L.; Yu, H. Game Theoretic Approach for Multipriority Data Transmission in 5G Vehicular Networks. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24672–24685. [Google Scholar] [CrossRef]
  26. Wang, D.; Li, J.; Lv, Q.; He, Y.; Li, L.; Hua, Q.; Alfarraj, O.; Zhang, J. Integrating Reconfigurable Intelligent Surface and UAV for Enhanced Secure Transmissions in IoT-Enabled RSMA Networks. IEEE Internet Things J. 2024; to be published. [Google Scholar] [CrossRef]
  27. Wang, D.; Yuan, L.; Zhao, H.; Min, L.; He, Y. Secure transmission of IRS-UAV buffer-aided relaying system with delay constraint. Chin. J. Aeronaut. 2024; in press. [Google Scholar] [CrossRef]
  28. Lin, Z.; Xiao, Y.; Lu, X.; Wu, C.; Wu, W. RSMA-Assisted Distributed Computation Offloading in Vehicular Networks based on Stochastic Geometry. IEEE Trans. Veh. Technol. 2025; to be published. [Google Scholar] [CrossRef]
  29. Nguyen, T.V.; Le, H.D.; Pham, A.T. On the Design of RIS–UAV Relay-Assisted Hybrid FSO/RF Satellite–Aerial–Ground Integrated Network. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 757–771. [Google Scholar] [CrossRef]
Figure 1. The RIS-assisted downlink hybrid FSO/RF space–air–ground integrated network.
Figure 1. The RIS-assisted downlink hybrid FSO/RF space–air–ground integrated network.
Drones 09 00198 g001
Figure 2. The comparison of three algorithms.
Figure 2. The comparison of three algorithms.
Drones 09 00198 g002
Figure 3. The framework of DDPG algorithm.
Figure 3. The framework of DDPG algorithm.
Drones 09 00198 g003
Figure 4. The comparison of SSR performance for three schemes.
Figure 4. The comparison of SSR performance for three schemes.
Drones 09 00198 g004
Figure 5. The influence of different cooling rates on SSR in SA algorithm.
Figure 5. The influence of different cooling rates on SSR in SA algorithm.
Drones 09 00198 g005
Figure 6. The two-dimensional trajectory of the drone (starting point is determined).
Figure 6. The two-dimensional trajectory of the drone (starting point is determined).
Drones 09 00198 g006
Figure 7. The flight trajectory of drone (starting point = ending point).
Figure 7. The flight trajectory of drone (starting point = ending point).
Drones 09 00198 g007
Figure 8. The trend of reward with the number of iterations in the DDPG algorithm.
Figure 8. The trend of reward with the number of iterations in the DDPG algorithm.
Drones 09 00198 g008
Table 1. Comparison of the proposed system model with the existing architecture.
Table 1. Comparison of the proposed system model with the existing architecture.
SatelliteHAPUAVRISHybrid FSO/RF
[1] × ×
[2,3] ×××
[4]×× ×
[5,6,7]××××
[8] ××
[9,10]×××
[11]×
[12,13]×× ×
This paper
Table 2. Simulation Parameters.
Table 2. Simulation Parameters.
ParameterValue
Number of ground users, K4
Channel gain, χ −30 dBm
Frequency,2.4 GHz
Satellite’s transmit power, P s 10 dB
Transmit antenna gain, L T 38.5 dB
Receiving antenna gain, L R 42.7 dB
Path loss, L S 111.26 dB
Pointing error, g p 1
Altitude of drone, H R 1000 m
Drone’s flight speed, ν 10 m/s
Initial position of drone, q 0 [2600, 2200]
Satellite’s location, Z S [1500, 1500, 600,000]
HAP’s location, Z a [1000, 2400, 14,000]
Eavesdropper’s location, Z e [2200, 1800, 0]
The flight area of drone, X Y 3000 × 3000 m
Total time slots, T500
Learning rate of actor network0.001
Learning rate of critic network0.002
Experience replay buffer capacity10,000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, J.; Yang, W.; Liu, T.; Li, L.; Jin, Y.; He, Y.; Wang, D. Secure Transmission for RIS-Assisted Downlink Hybrid FSO/RF SAGIN: Sum Secrecy Rate Maximization. Drones 2025, 9, 198. https://doi.org/10.3390/drones9030198

AMA Style

Li J, Yang W, Liu T, Li L, Jin Y, He Y, Wang D. Secure Transmission for RIS-Assisted Downlink Hybrid FSO/RF SAGIN: Sum Secrecy Rate Maximization. Drones. 2025; 9(3):198. https://doi.org/10.3390/drones9030198

Chicago/Turabian Style

Li, Jiawei, Weichao Yang, Tong Liu, Li Li, Yi Jin, Yixin He, and Dawei Wang. 2025. "Secure Transmission for RIS-Assisted Downlink Hybrid FSO/RF SAGIN: Sum Secrecy Rate Maximization" Drones 9, no. 3: 198. https://doi.org/10.3390/drones9030198

APA Style

Li, J., Yang, W., Liu, T., Li, L., Jin, Y., He, Y., & Wang, D. (2025). Secure Transmission for RIS-Assisted Downlink Hybrid FSO/RF SAGIN: Sum Secrecy Rate Maximization. Drones, 9(3), 198. https://doi.org/10.3390/drones9030198

Article Metrics

Back to TopTop