Study on Consulting Air Combat Simulation of Cluster UAV Based on Mixed Parallel Computing Framework of Graphics Processing Unit
:1. Introduction
2. Related Work
3. Best Solution to Consulting Air Combat of Clusters
- The grouping principle of converting large-scale air combat into fleet operations;
- Optimize the target of fleet attack using negotiation theory;
- Optimize in-team marshalling by negotiation theory;
- The role of individuals in the fleet;
- Individual air combat within the fleet, using the game theory to find the best chase/escape strategy.
3.1. Cluster Air Combat Turned into Multiple Fleet Operations
3.2. Fleet Negotiation
- (1)
- Offensive and assist roles in air combat: if the joint strikes, each UAV will play a different role in the combat group, and the better superiority value is (relatively dominant) the offensive UAV, and the rest will be an assistant. As the main attacker has a better value, the enemy has less chance of winning. In contrast, the enemy has a greater chance of winning again the assisting UAV which has a lower advantage. To make the superiority better, the enemy should compete against the assisting UAV to increase the advantage value with the best strategy. From our point of view, the assisting UAV at this time has deterred the threat of the enemy UAV. Therefore, the main attacker can ignore the threat which poses to him and use a single-player tactic to attack the enemy aircraft boldly. This is a dare to or impossible tactic during one-on-one air combat, which has increased the kill rate as a whole. The main attack and assist role are determined by in (22), the biggest one is the main attack, and the rest are assists. The strategy of the assisting machine uses (25) to find the best decision; while the remaining one is the main attacker, at this time, look for the strategy to minimize the value in (23). The flow chart for the role of the main attacker and assistant determined is shown in Figure 4.
- (2)
- Situational assessment: this is mainly determined by the superiority value of the UAV. If the superiority value is superior, then the role is the attacker, otherwise it is the evasion side. If the sum of the advantage values is required to be the largest, the UAVs will only focus on the attack. When the UAV is at a disadvantage and is possible shot down, the advantage value will be very small, and the unmanned aerial vehicle will not focus on the most threatening UAV but it will attack another UAV to increase its advantage value. This kind of unreasonable phenomenon needs to be corrected by judging the superiority value. The flow chart for situational assessment is shown in Figure 4. Assuming that now the superiority value for is at a disadvantage, should escape directly. If is on a very disadvantageous position, the other aircraft will help escape and pin down the enemy to increase ’s chances of escape.
4. Using MATLAB/CUDA to Accelerate the Best Solution
extern “C” void g_kel(float* input, float &output, int &ind, int num ); #include <iostream> using namespace std; __device__ void cuMAX(float& a, float& b, int &c, int &d) { if (b>a) { a=b; c=d; }} __global__ void cuscanmax1(float* indata, float* outdata, int* index, int n) { int t=threadIdx.x; int b=blockIdx.x; int bdim=blockDim.x; int gdim=gridDim.x; int m=bdim*gdim; float stand = -99999999.0f; __shared__ float w[512]; __shared__ int s[512]; w[t] = stand; s[t] = 0; __syncthreads(); for (int k=bdim*b+t; k<n; k+=m) cuMAX(w[t], indata[k], s[t], k); __syncthreads(); if (t<256) cuMAX(w[t], w[t+256], s[t], s[t+256]); __syncthreads(); if (t<128) cuMAX(w[t], w[t+128], s[t], s[t+128]); __syncthreads(); if (t<64) cuMAX(w[t], w[t+64], s[t], s[t+64]); __syncthreads(); if (t<32) cuMAX(w[t], w[t+32], s[t], s[t+32]); if (t<16) cuMAX(w[t], w[t+16], s[t], s[t+16]); if (t<8) cuMAX(w[t], w[t+8], s[t], s[t+8]); if (t<4) cuMAX(w[t], w[t+4], s[t], s[t+4]); if (t<2) cuMAX(w[t], w[t+2], s[t], s[t+2]); if (t<1) cuMAX(w[t], w[t+1], s[t], s[t+1]); __syncthreads(); if (t==0) { outdata[b] = w[0]; index[b] = s[0]; }} __global__ void cuscanmax2(float* indata, float* outdata, int* index, int n) { int t=threadIdx.x; __syncthreads(); if (t<256) cuMAX(indata[t], indata[t+256], index[t], index[t+256]); __syncthreads(); if (t<128) cuMAX(indata[t], indata[t+128], index[t], index[t+128]); __syncthreads(); if (t<64) cuMAX(indata[t], indata[t+64], index[t], index[t+64]); __syncthreads(); if (t<32) cuMAX(indata[t], indata[t+32], index[t], index[t+32]); if (t<16) cuMAX(indata[t], indata[t+16], index[t], index[t+16]); if (t<8) cuMAX(indata[t], indata[t+8], index[t], index[t+8]); if (t<4) cuMAX(indata[t], indata[t+4], index[t], index[t+4]); if (t<2) cuMAX(indata[t], indata[t+2], index[t], index[t+2]); if (t<1) cuMAX(indata[t], indata[t+1], index[t], index[t+1]); __syncthreads(); outdata[0] = indata[0]; } void g_kel(float* input, float &output, int &ind, int num ) { float *cudainput, *cudaoutput; int *cudaindex; int sizedata = 4 * num; cudaMalloc((void**)&cudainput, sizedata); cudaMalloc((void**)&cudaoutput, 2048); cudaMalloc((void**)&cudaindex, 2048); cudaMemcpy(cudainput, input, sizedata, cudaMemcpyHostToDevice); cuscanmax1<<< 512, 512 >>>( cudainput, cudaoutput, cudaindex, num); cuscanmax2<<< 1, 512 >>>( cudaoutput, cudaoutput, cudaindex, 512); cudaMemcpy(&output, cudaoutput, 4, cudaMemcpyDeviceToHost); cudaMemcpy(&ind, cudaindex, 4, cudaMemcpyDeviceToHost); cudaFree(cudainput); cudaFree(cudaoutput); cudaFree(cudaindex); }
#include “mex.h” #include <omp.h> extern “C” void g_kel(float* input, float &output, int &ind, int num ); void FloatToDouble(double *data_D, float *data_F, int size_N) { #pragma omp parallel for for (int k = 0; k < size_N; k++) data_D[k] = (double) data_F[k]; } void DoubleToFloat(float *data_F, double *data_D, int size_N) { #pragma omp parallel for for (int k = 0; k < size_N; k++) data_F[k] = (float) data_D[k]; } void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { int n1, datasize, index; double *mA_D, *mB_D, *mC_D; float *mA_F, oub; if (nlhs > 2) mexErrMsgTxt(“Only two return values.”); if (nrhs != 1) mexErrMsgTxt(“Require 1 input vectors.”); if (mxIsComplex(prhs[0])) mexErrMsgTxt(“Not for complex value.”); if (mxGetM(prhs[0]) != 1) mexErrMsgTxt(“Only for row vector.”); n1 = mxGetN(prhs[0]); datasize = sizeof(float) * n1; plhs[0] = mxCreateDoubleMatrix(1, 1, mxREAL); plhs[1] = mxCreateDoubleMatrix(1, 1, mxREAL); mA_D = mxGetPr(prhs[0]); mB_D = mxGetPr(plhs[0]); mC_D = mxGetPr(plhs[1]); mA_F = (float *) mxMalloc(datasize); DoubleToFloat(mA_F, mA_D, n1); g_kel(mA_F, oub, index, n1); mB_D[0] = (float)oub; mC_D[0] = (int)index + 1; mxFree(mA_F); }
5. Simulation Results
5.1. Consulting Air Combat Simulation of 2 × 2
5.2. Performance Evaluation Using Decentralized Calculations
5.3. Performance Evaluation Using GPGPU
5.4. Ultimate Performance Ratio of CPU/GPGPU
5.5. Performance Comparison of Single Core with Integrated Parallelization
- (1)
- The number of clusters is too small to really exert the computing power that the GPGPU should have.
- (2)
- In the process of simulation, each time the loop must transfer data from the memory on the motherboard to the memory on the GPGPU, it will waste a lot of time;
- (3)
- In the current hardware of GPGPU, the calculation core of single precision floating point operation is more than the calculation core of double precision floating point operation. For GTX285, the operation core of single precision floating point number is 240, but the operation core of double precision floating point number is only 30, which is eight times worse, so in the MATLAB environment, double precision must be converted to single precision, so it will increase a lot of time to do this conversion;
- (4)
- Parallel computing has its disadvantages to determine the maximum value. In a single core algorithm, we use the zeroth element as a basis to compare with other elements. Therefore, we only need to read other elements, and then write the result to the zeroth element, so only one reading and one writing. However, in the framework of parallelism, the action of comparing two data is double reading and one writing. Therefore, the algorithm of the parallel operation is inherently more computationally intensive than the single-core operation.
6. Conclusions
Strategy | Strategy 1 | Strategy 2 | Strategy 3 | Strategy 4 | Strategy 5 | Strategy 6 | Strategy 7 | |
Command Values | Max Load Factor Left Turn | Max Long Acceleration | Steady Flight | Max Long Deceleration | Max Load Factor Right Turn | Max Load Factor Pull Up | Max Load Factor Push Over | |
(g) | 0 | 1.5 | 0 | −1.5 | 0 | 0 | 0 | |
(g) | 3 | 0 | 0 | 0 | 3 | 3 | −3 | |
(rad) | 0 | 0 | 0 | 0 | 0 |
Component Type | Component |
CPU | Intel Core 2 Quad [email protected] |
Operating system | Windows 7 |
GPU | GTX285 |
GPU cuda cores | 240 |
CPU memory | 12.0 GB |
GPU memory | 1 GB |
CPU compiler | VC++ 2010 |
GPU compiler | NVCC 4.0 |
Number of Clusters | CPU Calculation Time (ms) Q9450 | GPGPU Calculation Time (ms) GTX285 |
2 × 2 | 0.016168 | 5.41302 |
4 × 4 | 20.6626 | 282.696 |
Row Number | CPU Calculation Time (ms) Q9450 | GPGPU Calculation Time (ms) GTX285 | Performance Ratio (CPU Time/GPGPU Time) |
0.0967 | 0.1088 | 0.0888 | |
0.6592 | 0.1087 | 6.064 | |
4.6257 | 0.1515 | 30.53 | |
33.3234 | 0.3058 | 108.97 |
Cluster Number | Single Core Calculation (s) | Decentralized Calculation and Join CUDA (s) |
2 vs. 2 | 25.058792 | 30.456212 |
4 vs. 4 | 198.9946 | 398.4786 |
