Optimal Unmanned Combat System-of-Systems Reconstruction Strategy with Heterogeneous Cost via Deep Reinforcement Learning
Abstract
:1. Introduction
- We treat the UCSoS as a heterogeneous combat network (HCN) and incorporate the cost of edge reconstruction into the UCSoS reconstruction problem. The HCN reconstruction problem with heterogeneous costs is formulated as a nonlinear optimization problem.
- A task-oriented UCSoS operational capability metric is proposed to comprehensively consider the coverage and balance of striking enemy targets and the operational capability of the force. The metric can effectively describe the operational effectiveness of the UCSoS and provide an optimization goal for the reconstructed model.
- We propose an innovative approach to the UCSoS reconstruction problem called DRL-Restorer, which uses a deep neural network and the actor-critic algorithm to generate the optimal solution in real time. The proposed method is more suitable for dynamic combat scenarios that require real-time UCSoS reconstruction.
- With extensive experiments, the results demonstrate that DRL-Restorer exhibits significant advantages over the benchmark algorithm in terms of solution quality and scalability, while also finding optimal reconstruction strategies in a remarkably short time.
2. Related Work
2.1. UCSoS Reconstruction
2.2. Deep Reinforcement Learning in Combinatorial Optimization Problems
3. UCSoS Model
3.1. Heterogeneous Network Model of UCSoS
- Sensor Entities (S): entities that carry out reconnaissance, detection, and early warning assignments.
- Decider entities (D): entities that carry out command and control missions.
- Influence entities (I): entities that carry out the precision strike, fire damage, and electronic interference functions.
- Target entities (T): enemy combat entities, including sensors, deciders, and influence entities, can all be considered targets on the battlefield.
3.2. Operational Capability Measurement of UCSoS
4. UCSoS Reconstruction Problem with Heterogeneous Costs
4.1. Problem Illustration
4.2. Problem Model
4.3. Complexity Analysis
5. The Design of SoS-Restorer
5.1. General Overview
5.1.1. Stage I: Preparation
5.1.2. Stage II: Selection
5.1.3. Stage III: Mapping
- Step 1: Initialize the trained parameters of the deep neural network, the fragmented UCSoS structure, and the total reconstruction budget.
- Step 2: The nodes in the fragmented UCSoS are paired in pairs to generate candidate links.
- Step 3: Termination condition: The iteration stops when the sum of costs for selected links exceeds the total reconstruction budget.
- Step 4: According to the deep neural network, calculate the probability of selecting each node.
- Step 5: Select the link with the highest probability using a greedy strategy.
- Step 6: If the termination condition is not met, go to step 3.
5.2. The Neural Network Architecture Model
5.2.1. The Encoder
5.2.2. The Decoder
5.2.3. The Attention
Algorithm 1: Processing procedure of the SoS-Restorer. |
5.3. Training Procedure
Algorithm 2: Training process in the AC framework |
6. Performance Evaluation
6.1. Simulation Setup
6.1.1. Dataset
6.1.2. Hyperparameter Setting
6.1.3. Device Configuration
6.2. Benchmarks
- High Capability First (HCF): Sort the nodes in descending order of capability and prioritize reconstructing the collaboration relationships between nodes with the highest capability.
- High Degree First (HDF): Sort the nodes in descending order of degree and prioritize reconstructing the collaboration relationships between nodes with the highest degree.
- High Degree Adaptive (HDA): Sort the nodes in descending order of degree, reconstruct the collaborative relationships between the nodes with the highest degrees, and then recalculate the degrees of each node. Repeat the above steps continuously.
6.3. Performance Results
6.3.1. Solving Quality and Speed
6.3.2. Generalization Ability
6.4. Summary of the Results
- Strong generalization ability: Once the network parameters of the proposed model have been trained, it can be applied to new problems without needing re-training. In our experiments, our method can accurately find the optimal reconstruction strategy regardless of whether the reconstruction cost is homogeneous (p = 0) or heterogeneous (p = 1) and regardless of the size of the problem instances.
- Achieving an optimal balance between solution speed and solution quality: Another advantage of SoS-Restorer is that the optimization solution can be obtained directly by simple forward propagation of deep neural networks. Therefore, the reconstructed solution can always be found in a reasonable time while ensuring the quality of the solutions.
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sapaty, P.S. Mosaic warfare: From philosophy to model to solutions. Int. J. Robot. Autom 2019, 2019, 157–166. [Google Scholar] [CrossRef]
- Clark, B.; Patt, D.; Schramm, H. Mosaic Warfare Exploiting Artificial Intelligence and Autonomous Systems to Implement Decision-Centric Operations; Center for Strategic and Budgetary Assessments (CSBA): Washington, DC, USA, 2020. [Google Scholar]
- Zhang, Y.; Guo, Q.-S.; Fan, Y.-P. Research on Operational Effectiveness Test Evaluation Method of Ground Unmanned Combat System Based on Capability. Fire Control Command. Control 2021, 1633, 182–187. [Google Scholar]
- Zhong, Y.; Yao, P.; Zhang, J.; Wan, L. Formation and adjustment of manned/unmanned combat aerial vehicle cooperative engagement system. J. Syst. Eng. Electron. 2018, 29, 756–767. [Google Scholar]
- Wang, Z.; Guo, Y.; Li, N.; Yuan, H.; Hu, S.; Lei, B.; Wei, J. Autonomous confrontation strategy learning evolution mechanism of unmanned system group under actual combat in the loop. Comput. Commun. 2023, 209, 283–301. [Google Scholar] [CrossRef]
- Zhu, X.; Zhu, X.; Yan, R.; Peng, R. Optimal routing, aborting and hitting strategies of UAVs executing hitting the targets considering the defense range of targets. Reliab. Eng. Syst. Saf. 2021, 215, 107811. [Google Scholar] [CrossRef]
- Madni, A.M.; Sievers, M.; Erwin, D. Formal and Probabilistic Modeling in Design of Resilient Systems and System-of-Systems. In Proceedings of the AIAA Scitech 2019 Forum, San Diego, CA, USA, 7–11 January 2019. [Google Scholar]
- Fan, D.; Sun, B.; Dui, H.; Zhong, J.; Wang, Z.; Ren, Y.; Wang, Z. A modified connectivity link addition strategy to improve the resilience of multiplex networks against attacks. Reliab. Eng. Syst. Saf. 2022, 221, 108294. [Google Scholar] [CrossRef]
- Chen, Z.; Zhou, Z.; Zhang, L.; Cui, C.; Zhong, J. Mission reliability modeling and evaluation for reconfigurable unmanned weapon system-of-systems based on effective operation loop. J. Syst. Eng. Electron. 2023, 34, 588–597. [Google Scholar] [CrossRef]
- Sun, Y.; Zhang, T. Research on Autonomous Reconstruction Method for Dependent Combat Networks. IEEE Syst. J. 2023, 17, 1–10. [Google Scholar] [CrossRef]
- Sun, Q.; Li, H.; Zhong, Y.; Ren, K.; Zhang, Y. Deep reinforcement learning-based resilience enhancement strategy of unmanned weapon system-of-systems under inevitable interferences. Reliab. Eng. Syst. Saf. 2023, 242, 109749. [Google Scholar] [CrossRef]
- Sun, Q.; Li, H.; Wang, Y.; Zhang, Y. Multi-swarm-based cooperative reconfiguration model for resilient unmanned weapon system-of-systems. Reliab. Eng. Syst. Saf. 2022, 222, 108426. [Google Scholar] [CrossRef]
- Raman, R.A.r.; D’Souza, M.A. Decision learning framework for architecture design decisions of complex systems and system-of-systems. Syst. Eng. 2019, 538–560. [Google Scholar] [CrossRef]
- Fang, Z. System-of-Systems Architecture Selection: A Survey of Issues, Methods, and Opportunities. IEEE Syst. J. 2022, 16, 4768–4779. [Google Scholar] [CrossRef]
- Davendralingam, N.; Delaurentis, D.A. A Robust Portfolio Optimization Approach to System of System Architectures. Syst. Eng. 2015, 18, 269–283. [Google Scholar] [CrossRef]
- Lin, M.; Chen, T.; Chen, H.; Ren, B.; Zhang, M. When architecture meets AI: A deep reinforcement learning approach for system of systems design. Adv. Eng. Inform. 2023, 56, 101965. [Google Scholar] [CrossRef]
- Wang, Q.; Lai, K.H.; Tang, C. Solving combinatorial optimization problems over graphs with BERT-Based Deep Reinforcement Learning. Inf. Sci. 2023, 619, 930–946. [Google Scholar] [CrossRef]
- Yu, J.J.Q.; Yu, W.; Gu, J. Online Vehicle Routing With Neural Combinatorial Optimization and Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3806–3817. [Google Scholar] [CrossRef]
- Li, J.; Jiang, J.; Yang, K.; Chen, Y. Research on Functional Robustness of Heterogeneous Combat Networks. IEEE Syst. J. 2018, 13, 1487–1495. [Google Scholar] [CrossRef]
- Li, J.; Zhao, D.; Ge, B.; Jiang, J.; Yang, K. Disintegration of Operational Capability of Heterogeneous Combat Networks Under Incomplete Information. IEEE Trans. Syst. Man Cybern. Syst. 2018, 50, 5172–5179. [Google Scholar] [CrossRef]
- Li, J.; Zhao, D.; Jiang, J.; Yang, K.; Chen, Y. Capability Oriented Equipment Contribution Analysis in Temporal Combat Networks. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 696–704. [Google Scholar] [CrossRef]
- Zhang, J.; Lv, H.; Hou, J. A novel general model for RAP and RRAP optimization of k-out-of-n:G systems with mixed redundancy strategy. Reliab. Eng. Syst. Saf. 2023, 229, 108843. [Google Scholar] [CrossRef]
- Levitin, G.; Xing, L.; Dai, Y. Optimizing partial component activation policy in multi-attempt missions. Reliab. Eng. Syst. Saf. 2023, 235, 109251. [Google Scholar] [CrossRef]
- Peiravi, A.; Nourelfath, M.; Zanjani, M.K. Universal redundancy strategy for system reliability optimization. Reliab. Eng. Syst. Saf. 2022, 225, 108576. [Google Scholar] [CrossRef]
- Ordoukhanian, E.; Madni, A. Model-Based Approach to Engineering Resilience in Multi-UAV Systems. Systems 2019, 7, 11. [Google Scholar] [CrossRef]
- Zhong, Y.; Li, H.; Sun, Q.; Huang, Z.; Zhang, Y. A kill chain optimization method for improving the resilience of unmanned combat system-of-systems. Chaos Solitons Fractals 2024, 181, 114685. [Google Scholar] [CrossRef]
- Papadimitriou, C.H.; Steiglitz, K. Combinatorial Optimization: Algorithms and Complexity; Dover Publications, Inc.: Mineola, NY, USA, 1998. [Google Scholar]
- Dorigo, M.; Gambardella, L. Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Trans. Evol. Comput. 1997, 1, 53–66. [Google Scholar] [CrossRef]
- Horowitz, E.; Sahni, S. Computing Partitions with Applications to the Knapsack Problem; Cornell University: Ithaca, NY, USA, 1972. [Google Scholar]
- Yuan, E.; Wang, L.; Cheng, S.; Song, S.; Fan, W.; Li, Y. Solving flexible job shop scheduling problems via deep reinforcement learning. Expert Syst. Appl. 2024, 245, 123019. [Google Scholar] [CrossRef]
- Marinescu, R.; Dechter, R. AND/OR Branch-and-Bound search for combinatorial optimization in graphical models. Artif. Intell. 2009, 173, 1457–1491. [Google Scholar] [CrossRef]
- Rabiner, L. Combinatorial optimization:Algorithms and complexity. IEEE Trans. Acoust. Speech Signal Process. 2003, 32, 1258–1259. [Google Scholar] [CrossRef]
- Li, K.; Zhang, T.; Wang, R.; Wang, Y.; Han, Y.; Wang, L. Deep Reinforcement Learning for Combinatorial Optimization: Covering Salesman Problems. IEEE Trans. Cybern. 2022, 52, 13142–13155. [Google Scholar] [CrossRef]
- Hopfield, J.J.; Tank, D.W. Neural computation of decisions in optimization problems. Biol. Cybern. 1985, 52, 141–152. [Google Scholar] [CrossRef]
- Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer networks. In Proceedings of the International Conference on Neural Information Processing Systems, Montreal, QC, USA, 7–12 December 2015. [Google Scholar]
- Bello, I.; Pham, H.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural combinatorial optimization with reinforcement learning. arXiv 2016, arXiv:1611.09940. [Google Scholar]
- Dai, H.; Khalil, E.B.; Zhang, Y.; Dilkina, B.; Song, L. Learning Combinatorial Optimization Algorithms over Graphs. Statistics 2017, 52, 6348–6358. [Google Scholar]
- Li, Z.; Chen, Q.; Koltun, V. Combinatorial optimization with graph convolutional networks and guided tree search. In Proceedings of the NIPS’18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 2–8 December 2018. [Google Scholar]
- Chen, W.; Li, J.; Jiang, J. Heterogeneous Combat Network Link Prediction Based on Representation Learning. IEEE Syst. J. 2021, 15, 4069–4077. [Google Scholar] [CrossRef]
- Sun, Y.; Han, J.; Yan, X.; Yu, P.S. PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks. Proc. Vldb Endow. 2011, 4, 992–1003. [Google Scholar] [CrossRef]
- Cares, J.R. An Information Age Combat Model; Technical Report; Produced for the United States Office of the Secretary of Defense: Arlington, VA, USA, 2004. [Google Scholar]
- Pan, X.; Wang, H.; Yang, Y.; Zhang, G. Resilience based importance measure analysis for SoS. J. Syst. Eng. Electron. 2019, 30, 920–930. [Google Scholar]
- Agnetis, A.; Mirchandani, P.B.; Pacifici, P.A. Scheduling Problems with Two Competing Agents. Oper. Res. 2004, 52, 229–242. [Google Scholar] [CrossRef]
- Singh, R.; Gupta, A.; Shroff, N.B. Learning in Constrained Markov Decision Processes. IEEE Trans. Control. Netw. Syst. 2023, 10, 441–453. [Google Scholar] [CrossRef]
- Zhan, W.; Luo, C.; Wang, J. Deep-Reinforcement-Learning-Based Offloading Scheduling for Vehicular Edge Computing. IEEE Internet Things J. 2020, 7, 5449–5465. [Google Scholar] [CrossRef]
- Bahdanau, D.; Brakel, P.; Xu, K.; Goyal, A.; Lowe, R.; Pineau, J.; Courville, A.; Bengio, Y. An Actor-Critic Algorithm for Sequence Prediction. arXiv 2016, arXiv:1607.07086. [Google Scholar]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Proceedings of the NIPS’14: Proceedings of the 27th International Conference on Neural Information Processing Systems-Volume 2, Montreal, QC, USA, 8–13 December 2014. [Google Scholar]
- Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [Google Scholar] [CrossRef]
- Hu, B.; Li, F.; Zhou, H. Robustness of Complex Networks under Attack and Repair. Chin. Phys. Lett. 2009, 26, 128901. [Google Scholar]
- Bin, H.; Fang, L. Repair strategies of scale-free networks under multifold attack strategies. Syst. Eng. Electron. 2010, 32, 43–47. [Google Scholar]
Reference | Research Content | Research Deficiencies |
---|---|---|
References [19,20,21] | Evaluate and disintegrate the operational capabilities of the UCSoS | No mention of how to restore the operational capabilities of the UCSoS |
References [22,23,24] | Adopt a redundancy strategy to restore the efficiency of the failed SoS | Too expensive for UCSoS to build a redundancy strategy |
References [9,10,26] | Reconstruct the collaborative relationship between survival combat entities to restore the operational capabilities of the UCSoS | Ignore the heterogeneity of collaboration costs between different entities |
Reference | Model | Training Method | Solving Problems |
---|---|---|---|
Reference [35] | Pointer network | Supervised training | TSP problem |
Reference [36] | Pointer network | REINFORCE and critic baseline | TSP and KnapSack problems |
Reference [37] | Structure2vec | DQN | MVC problem |
Reference [38] | GCN | Guided tree search | MVC and MIS problems |
Algorithm | Node20 | Node40 | Node60 | Node80 |
---|---|---|---|---|
SoS-Restorer | 0.2988 | 0.1910 | 0.1814 | 0.2103 |
HCF | 0.1539 | 0.1483 | 0.1620 | 0.1793 |
HDF | 0.1579 | 0.1186 | 0.1183 | 0.1246 |
HDA | 0.1068 | 0.0737 | 0.1032 | 0.1141 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, R.; Yuan, H.; Ren, B.; Zhang, X.; Chen, T.; Luo, X. Optimal Unmanned Combat System-of-Systems Reconstruction Strategy with Heterogeneous Cost via Deep Reinforcement Learning. Mathematics 2024, 12, 1476. https://doi.org/10.3390/math12101476
Li R, Yuan H, Ren B, Zhang X, Chen T, Luo X. Optimal Unmanned Combat System-of-Systems Reconstruction Strategy with Heterogeneous Cost via Deep Reinforcement Learning. Mathematics. 2024; 12(10):1476. https://doi.org/10.3390/math12101476
Chicago/Turabian StyleLi, Ruozhe, Hao Yuan, Bangbang Ren, Xiaoxue Zhang, Tao Chen, and Xueshan Luo. 2024. "Optimal Unmanned Combat System-of-Systems Reconstruction Strategy with Heterogeneous Cost via Deep Reinforcement Learning" Mathematics 12, no. 10: 1476. https://doi.org/10.3390/math12101476
APA StyleLi, R., Yuan, H., Ren, B., Zhang, X., Chen, T., & Luo, X. (2024). Optimal Unmanned Combat System-of-Systems Reconstruction Strategy with Heterogeneous Cost via Deep Reinforcement Learning. Mathematics, 12(10), 1476. https://doi.org/10.3390/math12101476