Next Article in Journal
Indentation Hardness and Elastic Recovery of Some Hardwood Species
Previous Article in Journal
Analysis on Flow and Temperature Field of High-Power Magnetorheological Fluid Transmission Device
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Resource Optimization for 3D Video SoftCast with Joint Texture/Depth Power Allocation

by
Saqr Khalil Saeed Thabet
1,
Emmanuel Osei-Mensah
1,
Omar Ahmed
2,
Abegaz Mohammed Seid
3 and
Olusola Bamisile
4,*
1
School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
2
School of Computer Science and Technology, Hefei University of Technology, Hefei 230009, China
3
College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar
4
College of Nuclear Technology and Automation Engineering, Chengdu University of Technology, Chengdu 610059, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(10), 5047; https://doi.org/10.3390/app12105047
Submission received: 29 March 2022 / Revised: 9 May 2022 / Accepted: 13 May 2022 / Published: 17 May 2022

Abstract

:
During wireless video transmission, channel conditions can vary drastically. When the channel fails to support the transmission bit rate, the video quality degrades sharply. A pseudo-analog transmission system such as SoftCast relies on linear operations to achieve a linear quality transition over a wide range of channel conditions. When transmitting 3D videos over SoftCast, the following issues arise: (1) assigning the transmission power to texture and depth maps to obtain the optimal overall quality and (2) handling 3D video data traffic by dropping and re-allocating resources. This paper solves the pseudo-analog transmission resource allocation problem and improves the results by applying the optimal joint power allocation. First, the minimum and the target distortion optimization problems are formulated in terms of a power–bandwidth pair versus distortion. Then, a minimum distortion optimization algorithm iteratively computes all the possible resource allocations to find the optimal allocation based on the minimum distortion. Next, the three-dimensional target distortion problem is divided into two subproblems. In the power-distortion problem, to obtain a target distortion, the algorithm exhaustively solves the closed form of the power resource under a predefined upper-bound bandwidth. For the bandwidth-distortion problem, reaching a target distortion requires solving iteratively for the bandwidth resource closed form, given a predefined power. The proposed resource control scheme shows an improvement in transmission efficiency and resource utilization. At low power usage, the proposed method could achieve a PSNR gain of up to 1.5 dB over SoftCast and even a 1.789 dB gain over a distortion-resource algorithm, using less than 1.4% of the bandwidth.

1. Introduction

Owing to the immense growth of industrial and entertainment applications and the rising need for an immersive visual experience, three-dimensional video (3DV) has become a favored choice for many consumers. The 3D scene provides an arbitrary view of the actual scene. Adopting multi-view video plus depth (MVD) [1] to represent 3D video content efficiently has become more common. At the same time, 3D-video-based applications on mobile terminals have gained consumer attention due to the improvement in wireless communication (e.g., 5G and LTE [2]) and the existence of 3D-enabled laptops and naked-eye 3D mobile devices. However, some technical issues remain. Firstly, large amounts of computing resources are required to process a 3D video signal, as the amount of video data grows proportionally to the number of cameras and the frame depth, which imposes grave challenges for resource-constrained mobile terminals. Secondly, the unpredicted channel condition of the wireless network poses a higher requirement for 3D video transmission than for traditional 2D video. Hence, any proposed efficient wireless transmission method must address these specific issues.
In a 3D video transmission system, the transmitter implements several cameras to record texture videos and depth map frames of the 3D scene. After the receiver selects the preferred virtual viewpoint, the transmitter captures the frames of the adjacent cameras near the selected virtual points. These frames are considered reference viewpoint texture videos and depth maps for that virtual viewpoint. The captured data are source and channel encoded before transmission to the receiver. The demanded virtual viewpoint at the receiver is synthesized from the decoded texture video and depth map frames [3,4,5] via a technique named depth-image-based rendering (DIBR). Therefore, the virtual view quality is determined by the received texture video quality and depth map quality.
Conventional video transmission over wireless networks relies on Shannon’s theorem [6,7], in which source coding is separated from channel coding. Based on this strategy, optimal performance can be obtained provided that the channel condition and capacity are known before source and channel coding, in order to select the proper transmission rate adaptively [8]. However, this strategy does not suit practical wireless communication where the channel condition could vary drastically, and even the receiver is ignorant of the exact channel-state information. The  video quality degrades abruptly due to channel capacity mismatch when the channel SNR falls below the acceptable threshold. Even when the channel SNR is high, the reconstruction quality will not improve, as the quantization factor in the digital video compression already determines the upper bound. This phenomenon is known as the cliff effect [9].
SoftCast, an uncoded video transmission technique, has been proposed to overcome conventional digital video transmission problems [10,11,12]. The transmitted signal is transformed into a series of real-value coefficients linearly with the pixel value in the video signal. By skipping quantization and entropy coding, such a transmission becomes lossy in nature. More importantly, the channel noise is directly mapped to the video signal reconstruction error, which makes the reconstruction quality commensurate with the channel SNR of the receiver. Thus, SoftCast tackles the cliff effect. However, when considering SoftCast for transmitting 3DV, several challenges appear.
Firstly, transmitting a large-sized video (i.e., MVD) over a wireless medium consumes substantial amounts of limited resources (power and bandwidth), mostly wasted in carrying reference viewpoints with quite similar content. Secondly, unlike 2D videos, every scene in 3DV consists of texture and depth map frames. Both must be transmitted to the receiver to synthesize the virtual view, which is affected nonlinearly by the received texture videos and the depth maps. Intuitively, assigning more power to texture videos than depth maps will cause an inevitable geometric distortion in virtual views, whereas less power allocation to texture videos would incur additional distortions in reference views. Hence, under limited resources and joint texture/depth power allocation, the problem becomes one of achieving the optimal reconstruction quality. In another case, given a predefined quality at the receiver end, the problem is one of achieving the minimal resource (bandwidth and power) consumption.
This paper proposes incorporating a joint texture–depth power allocation and resource optimization for 3D video transmission, as shown in Figure 1. The optimal power allocation ratio (PAR) between texture and depth is estimated, assuming equal power among all reference viewpoints. Then, the intricate resource control problem of 3D video transmission is considered from the point of view of two optimization problems: minimum distortion and target distortion. Resource allocation algorithms are designed to find the optimal solutions to the formulated problems. The main contributions of this work are as follows:
  • The optimal solutions for minimum and target distortion problems are obtained in terms of texture and depth using proposed resource allocation algorithms, and the trade-off relationship between power and bandwidth usage is analyzed.
  • The proposed resource control scheme for 3D transmission integrates the optimal PAR with resource allocation algorithms, ensuring balanced power allocation between texture and depth to improve the transmission results.
The simulation results demonstrate a significant reduction in resource usage in terms of texture and depth, while achieving a satisfactory or desired video quality.
The remainder of the paper is organized as follows. SoftCast is reviewed in Section 2. The minimum distortion optimization is presented in Section 3. Section 4 discusses the target distortion optimization. Section 5 and Section 6 show the simulation results and conclude the paper.

2. Related Work

2.1. SoftCast Video Transmission

The energy efficiency of any network shows the extent to which the network can reduce distortion cost efficiently. In [13,14], two examples of improving the network energy efficiency in two different networks are given. Video transmission over wireless network channels is challenging due to the time-varying channel characteristics and power constraints. In such systems, the objective of bit allocation between source and channel is to minimize the power consumption while maintaining satisfactory video quality, and this is also defined as the energy efficiency of the network. The SoftCast transmission scheme is an end-to-end architecture for wireless video transmission that replaces the complicated bit allocation in digital video transmission with a power allocation method. It differs from digital video transmission in the video encoding mechanism, assuring error resilience and transmission. Unlike conventional digital video transmission, the SoftCast approach depends on analog code to achieve linearity and a compression–protection trade-off with an optimal power allocation. In addition to the excellent video quality reception, which is linearly dependent on the channel condition, SoftCast offers scalability that is realized by broadcasting only a single signal. To improve the SoftCast efficiency, Fan et al. [15] first replaced 3D-DCT with motion-compensated temporal filtering (MCTF) and then proposed DCast [16], a distributed source coding that exploits the inter-frame redundancy. In [17,18], adaptive chunk division was incorporated to achieve optimal transmission power usage, and in [19], the bandwidth and computation requirement were reduced. Some other studies suggested a hybrid digital–analog framework, incorporating analog transmission alongside digital transmission to take advantage of both systems, achieving a balance between robust adaptation capability and high coding efficiency [15,20,21,22,23,24]. These systems can be categorized based on the type of information to be digitized and how the analog and digital parts share the channel. With the increase in the wireless traffic burden on wireless networks, efficient network resource exploitation becomes necessary. Based on the Shannon capacity [6], the network capacity is enhanced via bandwidth increase and frequency reuse. In HDA-SIM [25], the bandwidth of digital traffic is treated as a hidden resource, where the analog modulated data are superimposed over existing digital traffic. He et al. [26] proposed MCast, a linear video transmission system that retransmits data across multiple time slots and multiple channels to exploit the time and frequency resources. These authors also proposed MUcast [27] to solve the problem of resource allocation to realize efficient multi-user video transmission.
To improve the network’s energy efficiency in pseudo-analog 3D video transmission. Yang et al. [28] suggested an uncoded wireless depth map transmission scheme using mean-removed block-based DCT to improve decorrelation and view-synthesis distortion, to enhance energy allocation efficiency. Power allocation scaling is performed on rearranged chunks that include inter-block DCT coefficients. Luo et al. [29] considered the view-synthesis distortion in the power-distortion optimization problem to achieve optimal reconstruction quality in the reference and virtual views. The optimization problem was then solved in a closed form of texture/depth power allocation.
In general, all the studies mentioned followed the minimum distortion optimization approach for a pseudo-analog system. Only a few studies addressed the target distortion. In [30], Liu et al. proposed prediction models to solve the target distortion optimization problem for single-view video. The curve-fitting-based resource control algorithm allocates the constrained bandwidth and power resources to obtain the predefined quality. Zhang et al. [31] developed resource allocation algorithms for 3D video. In distortion-resource optimization, the DR algorithm exhaustively searches all possible discrete power and channel resources. For resource-distortion optimization, the optimization problem is divided into power and channel optimization subproblems. Then, considering fixing one resource each time given a desired distortion, the RD algorithm looks for the optimal solution after searching in an exhaustive manner. Nonetheless, it combines texture and depth using 5D-DCT without considering the characteristic distinction. Unlike 2D video pseudo-analog transmission systems, which resort to discarding chunks with the smallest variation to meet the constrained resource requirement without significantly degrading quality, 3D video pseudo-analog transmission systems can face some complications. Due to the involvement of synthesis distortion, the source data that can be discarded from both texture and depth frames must significantly degrade neither the virtual view quality nor the reference view quality, rather than only the latter. Hence, this work aims to perform resource allocation while jointly balancing the power assigned to texture and depth maps to avoid causing geometric distortion in virtual views or additional distortions in reference views, to improve resource allocation in 3D video transmission.

2.2. SoftCast Power Allocation

In a conventional wireless video transmission scheme, the real-value coefficients are transformed into a bitstream for transmission, and channel coding is adopted to add parity bits to cope with drastic channel noise. This strategy demolishes the numeric properties within the video data and leads to the cliff effect. SoftCast avoids this by transmitting the real-value coefficients directly without bitstream transformation. In SoftCast, a novel error protection scheme was developed by scaling the magnitude of the transmitted coefficient. For a specific conventional 2D video sequence, consider applying 3D-DCT to every GOP. The resulting stream of DCT coefficients is divided into N chunks x 1 , x 2 , , x N with size hxw, hence x i [ j ] , j = 1 , , ( h x w ) . The amount of information in each chunk is captured by its entropy (i.e., average energy) λ i . Each chunk x i is scaled up to y i [ j ] = g i * x i [ j ] , g i > 1 at the sender. The values y are then transmitted over the wireless channel. The channel noise is taken to be AWGN with zero mean and variance σ n 2 . The received value becomes y ^ i [ j ] = y i [ j ] + n i [ j ] . To estimate the reconstructed signal x ^ i [ j ] given y ^ i [ j ] , the linear least squares estimator (LLSE) is used:
x ^ i [ j ] = λ i g i λ i g i 2 + σ n 2 y ^ i [ j ]
Consider how SNR changes the LLSE. At high SNR [11], the reconstructed signal x ^ i [ j ] is estimated by simply scaling down the received signal, x ^ i [ j ] = y ^ i [ j ] / g i . For the scale up/down scheme, the reconstruction error is only σ n 2 / g i 2 compared to σ n 2 for the direct transmission error. Therefore, this scheme remarkably enhances the transmission error protection when g i is relatively large. Since the power budget for any transmission system is a constrained resource, scaling up some signal samples with more power means leaving other signal samples with less power. Therefore, power allocation is conducted to determine the optimal scaling factors for minimizing the total reconstruction distortion for the entire transmitted sequence. In [11,18,32], the optimal factor for scaling chunk i is:
g i = λ i 1 / 4 P ( h x w ) i = 1 N λ i
where P is the total power budget. The total reconstruction distortion under optimal power allocation is
D = ( h x w ) i = 1 N σ n 2 g i 2 = ( h x w ) 2 σ n 2 P i = 1 N λ i 2
From (2), it becomes clear that for traditional 2D video, the reconstruction distortion D is inversely related to the total power P. The value of D can be computed once P is known. However, for 3D video, the power budget P is shared by texture videos and depth maps, which together determine the overall quality of the received 3D video. Hence, to optimally allocate the total power between texture videos and depth maps, the texture/depth power allocation ratio is estimated, considering the texture/depth distortion trade-off [29].

2.3. Resource Allocation Optimization Problem

Optimization problems for the power–bandwidth pair versus distortion address resource constraint challenges in 3D video pseudo-analog transmission. For example, the objective of minimum distortion optimization is to minimize the overall distortion for a given resource usage:
min D s . t . R < R a v a i l a b l e
where D is the distortion and R a v a i l a b l e represents the available power and bandwidth resources. The target distortion optimization minimizes the resource usage for a given distortion:
min R s . t . D < D e x p
where D e x p is the expected distortion. It must be noted that as pseudo-analog systems transform video data into independent chunks, which means each chunk contributes independent distortion to the entire video quality, the distortion of a chunk caused by a transmission error does not affect any other chunk. Hence, discarding chunks to fit the bandwidth does not prevent each receiver from obtaining a video quality proportional to its channel conditions, and due to the heavy data traffic involved in 3D video transmission, it is crucial to optimally select chunks for transmission and reuse the saved power efficiently.
In summary, the problem becomes: (a) the optimal allocation of the total power between texture videos and depth maps and (b) the joint optimal resource (power and bandwidth) allocation for the entire signal sequence to obtain satisfactory video quality.

3. Minimum Distortion Optimization

3.1. Investigation and Motivation

In pseudo-analog transmission, every scaled DCT chunk is given its own slot in the transmission channel. An advantage of the compacting nature of DCT is that low variance coefficients dominate the high spatial frequencies. Chunks with the lowest variance are considered the least important chunks, denoted LP (low-priority) because they contribute least to the video quality. Therefore, discarding these chunks to meet the constrained resource requirement will not lead to a significant quality degradation. The power saved from chunk dropping is re-allocated to other more important HP (high-priority) chunks for efficient resource utilization, particularly in stringent power-constraint conditions. Therefore, we have a joint problem of power allocation and chunk selection for 3D video transmission.

3.2. Problem Formulation

Given a 3D video sequence, a 3D-DCT is used to decorrelate the texture and depth frames of the reference viewpoints in one GOP, which results in N chunks of DCT coefficients with variance indicated by λ 1 , λ 2 , , λ N . Without loss of generality, assume λ i λ j for all i < j . Let K be the available bandwidth resources (e.g., frequency or time slots). Let k = [ k 1 , k 2 , , k N ] be a binary allocation vector for each GOP. Therefore, the chunk selection is indicated as follows: chunk i is discarded when k i = 0 and transmitted over a bandwidth resource slot when k i = 1 . The expected MSE of the ith chunk is denoted as the Euclidean norm ε i = x i [ j ] x ^ i [ j ] 2 . The optimal values of power allocation ρ * = [ ρ 1 * , ρ 2 * , , ρ N * ] and bandwidth allocation are found by minimizing the transmission distortion under constrained power and bandwidth resources. The problem is formulated as follows [33]:
M S E ( k , ρ ) = min ( k , ρ ) E [ ε i | σ i 2 ] = 1 N i = 1 N λ i k i ρ i + 1
s . t . i = 1 N k i < K
1 N i = 1 N k i ρ i P N ( h x w ) σ n 2 : = SNR chk
k i { 0 , 1 } , 1 i N
ρ i 0 , 1 i N
where ρ i = g i 2 λ i σ n 2 is the signal noise-to-power ratio for the ith chunk for a given power allocation, K should be in the range [ 0 , N ] , and SNR chk is the SNR budget for each chunk. As systems run short on bandwidth, with K < N , to minimize distortion the system will resort to transmitting a number K of HP chunks and discarding the remaining chunks.
k = 1 , if i K 0 , if i > K
By re-expressing MSE as
MSE = 1 N i = 1 K λ i ρ i + 1 + i = K + 1 N λ i
the problem becomes one of finding the optimal power allocation ρ i * and bandwidth K * . Since the second-order derivative of the objective function in (12) is derived as
2 MSE ρ i 2 = 2 N i = 1 K λ i ρ i + 1 3 > 0 , ρ i > 0
the Lagrange multiplier method is used to solve (12), after defining μ > 0 and the Lagrange function J as
min ( k , ρ ) J = 1 N i = 1 K λ i ρ i + 1 + i = K + 1 N λ i μ 1 N i = 1 K ρ i SNR chk
By setting J ρ i = 0 , i = 1 , 2 , , N and J μ = 0
J ρ i = 1 N λ i ρ i + 1 2 + μ N = 0 J μ = 1 N i = 1 K ρ i SNR chk = 0
SNR chk = 1 N i = 1 K λ i λ i ρ i + 1 K = i = 1 K λ i N · λ i ρ i + 1 K N
The optimal power allocation value ρ i * is derived as
ρ i * = N λ i i = 1 K λ i ( SNR chk + K N ) 1 N λ i i = 1 K λ i SNR chk
This confirms the chunk scaling in (2).
g i * = σ n 2 · ρ i λ i = σ n 2 · N · SNR chk λ i i = 1 K λ i = λ i 1 / 4 P ( h x w ) i = 1 K λ i
From (12), the total power budget P is inversely proportional to the reconstruction distortion. Increasing the power budget increases ρ i * and therefore decreases the distortion. Nevertheless, the bandwidth resource K in (17) is the summation term upper bound. Its discrete nature complicates the effort to attain the optimal bandwidth K * in closed form.

3.3. Problem Solution

We propose using a greedy search algorithm to find the optimal bandwidth K * , given that the transmitter has complete knowledge of the chunk variance λ i and the available transmission power P. The transmitter exhaustively searches all the possible discrete bandwidth resources to reach the number of chunks that leads to distortion minimization. With the optimal chunk selection, the amount of traffic is reduced without critical performance degradation, and users’ experience improves as the latency is greatly shortened. Meanwhile, other users in the network may utilize the saved bandwidth resources.
In Algorithm 1, after computing the average power of the chunks in each GOP, we calculate all the possible power and bandwidth allocations throughout the GOP in an exhaustive manner. Then, at line 11, the maximum PSNR is chosen to determine the optimal bandwidth usage K * , the corresponding optimal chunk selection k i * , and the optimal power allocation ρ i * . In some cases, the performance curve tends to flatten after a certain range, as shown in Figure 2, indicating that an excessive number of channels are used to accomplish a marginal improvement in PSNR, which is apparently ineffective. Therefore, a control parameter τ in the range [0, 1] is introduced in line 14 to find a suboptimal solution. The bandwidth resource usage can be significantly decreased by slightly sacrificing the PSNR performance. Note that, as 3D-DCT decorrelation transformation is implemented in this scheme, the greedy search in each GOP must be conducted on each reference viewpoint’s texture and depth map separately. When conducting minimum distortion optimization under the optimal PAR, the power allocation SNR chk in (17) changes, depending on whether it is assigned to a texture or a depth map. Hence, a different optimal bandwidth usage K * is chosen, leading to selection of another set of optimal chunks k i * and optimal power allocations  ρ i * .
Algorithm 1: Minimum Distortion Optimization.
   Input: N , τ , c = [texure,depth]
   Output: k i c * , ρ i c * , K c *
   initialization: λ i = 0 , SNR chk t = SNR chk · ( 1 ( 1 + PAR ) · PAR ) , SNR chk d = SNR chk SNR chk t
   1: for  i = 1  to N
   2:   Compute chunk variance λ i t and λ i d
   3: end
   4: for  c = 1  to 2
   5:   for  n = 1  to N
   6:     K n ;
   7:    Calculate optimal power allocation ρ i , n c N λ i c i = 1 K λ i c SNR chk c
   8:    Calculate MSE n c via (12).
   9:    Obtain PSNR n c = 10log 10 ( ( 256 1 ) 2 MSE n c )
   10:   end
   11:   PSNR m a x c = max n { 1 , 2 , , N } PSNR n c .
   12:    K m a x c * n ;
   13:   for  n = 1  to N
   14:     If PSNR n c > τ · PSNR m a x c
   15:       K τ · max c * n ;
   16:      break;
   17:    end
   18:   end
   19:   Calculate k i c * , ρ i c * using K c * as in (11) and (17);
   20: end

4. Target Distortion Optimization

4.1. Investigation and Motivation

Each GOP may have a different compressibility level based on the video sequence content (e.g., rate of HP chunks and LP chunks). Given a limited power and bandwidth usage for a video sequence, some GOPs will be more distorted than others under the minimum distortion optimization. Thus, the distortion fluctuations over GOPs reflect a poor viewing experience (even though the overall PSNR might be maximized). As the channel SNR increases, the quality distortion decreases, given the same limiting constraints for that constrained optimization. Nevertheless, this change in quality is visually indistinguishable by the human eye if the PSNR value is high. These observations show that maintaining a relatively more stable distortion over GOPs is desirable. Thus, for high SNR channels, instead of allowing the PSNR to increase constantly, we keep it at a certain high value. At the same time, other users can use the saved constrained resources. Hence, the problem becomes one of reaching a balanced combination of power allocation and chunk selection for a target distortion.

4.2. Problem Formulation

Since the target distortion optimization comprises bandwidth, power, and distortion, it is considered a three-dimensional problem. As will be demonstrated later, bandwidth and power constraints are exchangeable. Consequently, the optimization problem (5) is decomposed into two two-dimensional subproblems, where one constraint is fixed in each subproblem. For instance, we formulate the minimization of power resource usage given a distortion constraint MSE ¯ and a bandwidth resource usage K as a power distortion optimization problem:
SNR chk ( ρ ) = arg min ρ 1 N i = 1 r ρ i s . t . 1 N i = 1 r λ i ρ i + 1 + i = r + 1 N λ i M S E ¯ 1 r K , r Z
On the other hand, we also formulate the minimization of bandwidth resource usage given a target distortion MSE ¯ and a power budget SNR chk as a bandwidth distortion optimization problem:
K ( ρ ) = arg min ρ K s . t . 1 N i = 1 K λ i ρ i + 1 + i = K + 1 N λ i MSE ¯ 1 N i = 1 K ρ i SNR chk

4.3. Problem Solution

Since the power scaling factor g i is continuous, while the bandwidth usage variable K is discrete, these two subproblems are considered mixed-integer nonlinear programming (MINLP) problems that are NP-hard to solve optimally. Therefore, greedy search algorithms are proposed to approximate the resource allocation for power and bandwidth distortion problems.

4.3.1. The Power Distortion Optimization

The minimal power use SNR c h k in (19) is found by searching exhaustively all the feasible r [ 1 , K ] . For each fixed bandwidth resource usage r, the subproblem is solved as
SNR chk ( ρ ) = arg min ρ 1 N i = 1 r ρ i
s . t . 1 N i = 1 r λ i ρ i + 1 + i = r + 1 N λ i MSE ¯
ρ i 0
To use the Lagrange multiplier method to solve this problem, we define a Lagrange multiplier μ > 0 , then
min ρ J = i = 1 r ρ i + μ i = 1 r λ i ρ i + 1 + i = r + 1 N λ i N · MSE ¯
After setting J ρ i = 0 , i = 0 , 1 , 2 , , r and J μ = 0 , the optimal solution is calculated by solving
J ρ i = 1 + μ λ i ( ρ i + 1 ) 2 = 0 J μ = i = 1 r λ i ( ρ i + 1 ) + i = r + 1 N λ i N · MSE ¯ = 0
The optimal power allocation value ρ i * is derived as
ρ i * = 1 N · λ i j = 1 r λ j MSE ¯ 1 N j = r + 1 N λ j 1 , i = 1 , 2 , , r 0 , i = r + 1 , , N
The algorithm procedures for power distortion optimization demonstrated in Algorithm 2 are performed on a per GOP basis. Throughout the exhaustive search, the optimal power values lie in three regions. At first, ρ i 0 , as the number of channels is still insufficient. Subsequently, the distortion caused by discarded chunks gradually lessens, and when MSE ¯ 1 N i = r + 1 N λ i > 0 , the optimal power will start to comply with the constraint (23). This feasible solution continues for a certain range of r. Finally, beyond this range, up to r = K , we have ρ i 0 . Therefore, it is better to stop the exhaustive search iteration, as in lines 9 to 11. In line 7, the objective function of problem (19) is calculated in a closed form. Finally, among all the SNR chk s, the one with the minimum value is chosen, and the associated power allocation ρ i * is adopted.
Algorithm 2: Power Distortion Optimization.
   Input: N , K , MSE ¯ , c = [texure,depth]
   Output: SNR chk c , ρ i c *
   initialization: λ i = 0
   1: for  i = 1  to N
   2:   Compute chunk variance λ i t and λ i d
   3: end
   4: for  c = 1  to 2
   5:   for  r = 1  to  K c
   6:     K c n ;
   7:    Calculate optimal power allocation ρ i , r c via (26)
   8:    Calculate SNR chk , r c via (21) .
   9:    If ( MSE ¯ 1 N i = r + 1 N λ i c ) > 0  and  ρ i , r c < 0
   10:     break;
   11:   end
   12:   end
   13:    SNR chk c * = min r { 1 , 2 , , K 1 } SNR chk c
   14: end

4.3.2. The Bandwidth Distortion Optimization

We exhaustively search the values of K in ascending order for the minimal bandwidth use K. The subproblem is solved for each K as follows:
M S E ( ρ ) = arg min ρ 1 N i = 1 K λ i ρ i + 1 + i = K + 1 N λ i
s . t . 1 N i = 1 K ρ i SNR chk
Using the Lagrange multiplier technique, the optimal power ρ i * is derived as
ρ i * N · λ i j = 1 K λ j SNR chk
In Algorithm 3, under a given distortion MSE ¯ and power budget SNR chk , the corresponding K is infeasible if the objective value (27) is greater than the expected distortion. The search continues by increasing r by 1 and solving the problem (27) until the search reaches a feasible K. It is also possible that an exhaustive search for a particular GOP does not reach a feasible K, which means the expected distortion requirement is too high for a specific SNR chk or the power budget SNR chk is insufficient to improve transmission up to a particular quality. In both cases, all N channels are used. Hence, more resources must be allocated. Here, it is suggested that the power budget is increased with a 0.01 step size until a value is reached that will lead to a feasible K.   
Algorithm 3: Bandwidth Distortion Optimization.
   Input: N , SNR chk t , SNR chk d , MSE ¯ , c = [texure,depth]
   Output: K c * , ρ i c *
   initialization: λ i = 0 , r = 0 , MSE r =
   1: for  i = 1  to N
   2:   Compute chunk variance λ i t and λ i d
   3: end
   4: for  c = 1  to 2
   5:   while  MSE r c > MSE ¯  do
   6:     r = r + 1
   7:    Calculate optimal power allocation ρ i , r c via (29)
   8:    Calculate MSE r c via (27)
   9: end
   10:    K c * r , ρ i c * ρ i , r , for all i
   11: end

4.3.3. Power and Bandwidth Trade-Off

To achieve a desirable video quality, either the power consumption is optimized under a constrained bandwidth usage or the bandwidth usage is optimized under a limited power budget. Each point of the resultant trade-off curve between power and bandwidth represents a possible power and bandwidth usage combination that attains the same video quality. In multi-user systems, each viewer has a different power and bandwidth resource budget, where a joint optimization that relies on choosing a proper power and bandwidth usage combination can be implemented to preserve resource consumption.

5. Simulation Results

Several simulation experiments were conducted to assess the performance of the proposed uncoded 3D video wireless transmission.
Test Sequence: For the multi-view plus depth video datasets, different standard reference 3D video sequences were considered: Kendo, Balloons [34], Newspaper provided by GIST, South Korea, and PoznanStreet, PoznanHall2 [35], Dancer, gtFly provided by Nokia, Finland, and Shark by NICT, Japan. The standard video sequences Dancer, gtFly, and Shark are computer-generated 3D scenes with ground-truth depth maps, whereas the content of the other 3D videos consist of captured real 3D scenes. The tested sequence configurations are listed in Table 1.
Parameter settings: The GOP size was set to eight frames. Depending on the sequence resolution in Table 1, each frame was divided into 16 × 16 = 256 chunks. As mentioned previously, virtual viewpoint synthesis requires transmitting the texture and depth frames of the two adjacent cameras near the selected virtual point. Hence, one GOP consists of 2 × 2 × 8 × 256 = 8192 chunks. The 3D-HEVC Test Model (HTM) v16.3 software [36] was used to synthesize the virtual viewpoints. MATLAB was used to conduct the simulation experiments, where the wireless transmission parameters were based on the 802.11 a/g standard. In addition, the wireless transmission experiments were investigated over AWGN-based channels.
Metric: For video quality assessment, although the proposed algorithms were designed based on the objective performance metric PSNR, the perceptual metric SSIM still revealed important performance evaluation aspects.

5.1. Performance Evaluation for PAR

The optimal PAR was estimated based on the joint texture/depth power allocation method [29]:
PAR = M + i = 1 L α i m = 1 M ( j = 1 N λ m , j t ) 2 i = 1 L β i m = 1 M ( j = 1 N λ m , j d ) 2
where M is the number of reference viewpoints transmitted in a 3D video SoftCast transmission, L is the number of virtual viewpoints to be synthesized at the receiver, t and d represent the original texture video and depth map, respectively, and α i and β i represent the parameters of the distortion model of both the transmitted reference views and the synthesized virtual views. To reach the global optimal PAR, a full search was conducted iteratively, with the search step set to 0.05 [29]. Although PAR is not theoretically a discrete variable, simulations proved that the 0.05 step was small enough to indicate any changes. The estimated PAR and the global optimal PAR results for the simulation video sequences are listed in Table 2.

5.2. Minimum Distortion Optimization Performance

We investigated the highest PSNR attained given the available resource (i.e., power and bandwidth). As discussed earlier, we let the maximum number of available channel slots N for one GOP be 8192, where each channel slot transmits only one chunk. As assumed, each GOP transmits a separate texture and depth map for each reference view. Thus the available channel slots N for each GOP are split equally N sub = N 2 × No . reference views , and N sub becomes the number of available channel slots for each texture and depth map. Furthermore, the noise variance was fixed at 1, and the transmission power budget P was varied for each GOP (Since the noise level is assumed to be fixed, then in the following we no longer differentiate between P and SNR chk ).
Figure 2 shows the minimum distortion optimization results for the Kendo and PoznanStreet sequences. Note that the average results for the two transmitted reference views were used. As shown, for a given bandwidth resource usage N sub , the quality commonly increased with the power budget SNR chk , as was also proved in (12). Nevertheless, for a given power budget SNR chk , the quality did not necessarily increase with the bandwidth usage, mainly under low power budgets. For instance, in the texture frames, when the transmission power P was 5 dB, the PSNR m a x points for Kendo and PoznanStreet were achieved when N sub = 233 and N sub = 303, respectively. Achieving the highest PSNR did not always correspond to the maximum bandwidth usage because, in a pseudo-analog system, different chunks are not equally important, even though each chunk consumes one channel. Hence, more power allocation to HP chunks and less to LP chunks can improve PSNR performance under a limited power budget.
The minimum distortion optimization curves generally have a flat tail, which implies that above a certain PSNR level, increasing the power budget is more efficient than improving bandwidth use. For instance, for the texture frames in the Kendo sequence, under power budget SNR chk = 10 dB and bandwidth use N sub = 559, the PSNR was 39 dB. The PSNR value did not improve if more bandwidth was assigned. On the other hand, a power increment from 10 dB to 15 dB resulted in a PSNR improvement of 4.3 dB, and a further increment from 10 dB to 20 dB resulted in a PSNR improvement of 7.8 dB. The previous calculations were all carried out under the consideration of an equal texture/depth power allocation ratio or PAR 1:1. The power allocation between the texture and depth map will be rearranged when performing minimum distortion optimization under the optimal PAR mentioned in Table 2. Figure 2 demonstrates the change in behavior after applying the optimal PAR. As more power is assigned to the reference viewpoints’ texture, the received quality of the texture frames will be enhanced, and the bandwidth allocation will increase at the expense of the depth map. In many cases, the bandwidth decrease in the depth map is greater than the bandwidth increase in the texture. The overall 3D video quality between two points (1848, 2048) along the bandwidth is drawn to show the improvement in the proposed resource control scheme.
The Kendo sequence corresponding to the average performance is plotted in Figure 3 to illustrate the bandwidth usage saving. Both the reference viewpoints’ quality, and the five viewpoints’ (two reference plus three virtual) overall quality are considered. As a scheme similar to the proposed scheme, the pioneering uncoded video transmission system SoftCast was applied to transmit each reference viewpoint texture and depth map separately, but with equal power allocation between texture and depth map or PAR 1:1. As shown, the conventional SoftCast employs the entire bandwidth and power resources to obtain a good PSNR. However, with less bandwidth usage, the minimum distortion optimization algorithm can obtain a somewhat comparable PSNR before applying the PAR. When the estimated PAR is applied, it alters the total transmission budget SNR chk in (17) according to the optimal ratio, affecting ρ i * and K * for that reference viewpoint texture and depth map and eventually leading to a PSNR enhancement. For instance, in Figure 3d when SNR chk = 0 dB , for the overall quality of the 3D video in SoftCast, the bandwidth usage for each reference viewpoint texture and depth map was 2048, and the achieved PSNR was 32.379 dB. However, the minimum distortion algorithm achieved a PSNR of 33.428 dB with only 88/2048 = 4.3% and 56/2048 = 2.7% of the texture and depth bandwidth usage. After applying the optimal PAR, the performance improved to 34.358 dB with a bandwidth usage of 6% and 1.4%. Furthermore, due to the flat tail of the minimum distortion curve, sacrificing a slight margin of the PSNR (e.g., τ = 98 % of the highest PSNR), reduced the bandwidth usage further from 124 to 39 chunks for texture and from 30 to 8 chunks for the depth map. These figures show that the proposed resource control scheme saves more bandwidth usage at low SNR chk . Hence, it is more suitable when the transmission power budget is limited or under inferior channel conditions.
Table 3 lists the bandwidth usage for different 3D video standards. The distortion-resource and resource-distortion algorithms [31] were modified to ensure a fair comparison. Therefore, instead of transmitting three reference viewpoints, only two were transmitted, and the number of frames in each GOP was increased from four to eight. Since our proposed resource control scheme and the algorithms in [31] exploit different correlation methods, i.e., 3D-DCT and 5D-DCT, the adopted chunk size differs for each scheme, leading to a different number of chunks. Hence, the percentage of the total bandwidth used in each case is calculated, to compare the bandwidth resource usage. The proposed method remarkably decreases the bandwidth usage while achieving satisfactory results.

5.3. Target Distortion Optimization Performance

5.3.1. Power Distortion Optimization Performance

Assume the sender delivers about 30 GOPs to users whose demanded quality is 35 dB, given the bandwidth constraint n = 8192 chunks per GOP. Figure 4 depicts the power allocation for the consecutive GOPs and the total bandwidth usage for Kendo and PoznanStreet sequence reference viewpoints. It is apparent that the power allocation for depth map video is lower than for texture due to depth map’sbecause of depth map distinct characteristics (i.e., the 3D scene’s geometric information which comes in the form of homogeneous regions separated by sharp edges). The same is true for bandwidth usage. In some GOPs, depth maps may occupy less than 0.005% of the available bandwidth, as in the PoznanStreet sequence. It is evident that the power allocation curves for both texture and depth map frames are relatively smooth, and the total bandwidth usage is kept at a comparatively low level. This complex resource allocation assists in maintaining the 3D video quality at the predefined PSNR value. Thus, the content is delivered with a favorable viewing experience at the user end, and other users can also benefit from the saved resources.

5.3.2. Bandwidth Distortion Optimization Performance

Similarly, assume the sender delivers about 30 GOPs to users whose demanded quality is 35 dB, under a transmission power constraint SNR chk = 10 dB. Figure 5 shows the bandwidth allocation curve for reference viewpoints for different video sequences. Compared with the large chunk number n = 8192 in one GOP, the total bandwidth usage fluctuates at a low and steady level. The bandwidth usage for the depth map can be less than three chunks in some GOPs. Therefore, it will be practical to allocate slightly more bandwidth resources (i.e., 4%), to avoid running Algorithm 3 several times, which will reduce the computational cost.
Figure 6 presents the bandwidth usage saving obtained while achieving the target PSNRs. The conventional SoftCast exhausts all the available resources to achieve a good PSNR, unaware of the user’s predefined distortion. However, the target distortion algorithm approaches the user’s predefined PSNR with a lower bandwidth usage. The performance is slightly enhanced when the optimal PAR is applied. Based on Algorithms 2 and 3, the optimal PAR does not influence the power and bandwidth allocation. Nonetheless, provided the optimal PAR adjusts the power assigned jointly to texture and depth map frames in the pseudo-analog transmission, as shown in Figure 1, it still can influence the overall quality of the 3D video at the receiving end.
For example, in Figure 6c, when the power budget SNR chk = 10 dB and the target PSNR = 40 dB, in SoftCast, the bandwidth usage for texture and depth map videos with a single reference viewpoint is N sub and N s u b , respectively, and the achieved overall quality of 3D video is 42.26 dB. However, we achieved a PSNR of 39.30 dB with the proposed target distortion algorithm using only 167/ N sub = 8.15% and 42/ N sub = 2% of the original bandwidth use for the texture and depth map. When the optimal PAR was applied, the proposed resource control scheme achieved a PSNR of 39.66 dB, close to the desired quality, while maintaining the same bandwidth usage.
The bandwidth usage saving while achieving a target PSNR under a power budget SNR chk = 20 dB is shown in Figure 7. As mentioned earlier, the bandwidth resource usage percentage will be used for the comparison between our proposed scheme and the resource-distortion algorithm [31]. In this figure, the total bandwidth usage percentage is split equally between the texture and depth map. The proposed resource control scheme achieves PSNR results closer to the target quality than the resource-distortion algorithm, with less bandwidth usage. For example, in Figure 7b, the target PSNR = 45 dB. For the resource-distortion algorithm, the bandwidth usage to achieve 43.379 dB was 149/512 = 7.25% for texture and the same for the depth map. In contrast, the proposed resource control scheme achieved a PSNR of 43.556 dB with only 558/2048 = 6.8% and 350/2048 = 4.25% bandwidth usage for the texture and depth map, respectively.

5.3.3. Power and Bandwidth Trade-Off

A trade-off exists between power and bandwidth usage for each expected video quality, as the minimal resource solution is generally not unique. However, the valid ranges for power and bandwidth for a given distortion are constrained. The trade-off curves shown in Figure 8 correspond to the average texture and depth maps. The curve boundaries vary with different predefined distortions. In general, for lower target PSNR values, the required power and bandwidth usages are comparatively minimal. The power resource is high near the N s u b lower boundary. At this point, a slight increase in bandwidth dramatically reduces the power. Nonetheless, allocating more bandwidth after a certain threshold does not reduce the power further. Hence, it is suggested that a minimum power and bandwidth usage pair is adopted close to the curve’s turning point, where a slight change in one factor noticeably affects the other.

5.4. The Resource Control Scheme Performance Comparison

The performance of the proposed scheme was tested regarding the synthesized virtual views quality and compared with two benchmark schemes: SoftCast and the distortion-resource algorithm [31].
Figure 9 illustrates the synthesis quality at virtual viewpoint 2 of the 3D video for the three schemes, along with the synthesized frames. Under a channel bandwidth constraint of 8192, the SoftCast scheme exploits all the available bandwidth resources and allocates power to each chunk. In contrast, the other schemes tend to retain HP chunks and discard LP chunks, which significantly saves bandwidth usage. For instance, when the power usage was 10 dB, the PSNR and SSIM of the proposed resource control scheme were close to the values obtained by SoftCast with only 7.28% bandwidth usage, which is nearly half of the bandwidth saved by the distortion-resource algorithm. As the power usage decreased to SNR = 0 dB, the proposed scheme achieved even better performance. With only 1.1535% bandwidth usage, the proposed scheme PSNR saved 1.21 dB over SoftCast, and in terms of SSIM, the proposed scheme improved the SSIM from 0.8244 to 0.9317.
The robustness of the proposed scheme was evaluated by a comparison of the results of the received video quality with those of the conventional SoftCast and the distortion-resource algorithm. The performance achieved by each method in Table 4 corresponds to the bandwidth usage illustrated in Table 3. SoftCast harnesses all the bandwidth and assigns the available power to all chunks. The distortion-resource algorithm utilizes the available power by assigning the power resource to an optimal number of chunks, to minimize the receiving viewpoints’ distortion. In contrast, the proposed scheme optimally allocates the available power to a limited number of chunks such that the overall quality of the transmitted 3D video is maximized. For instance, for the low power usage of 0 dB, the proposed method could obtain a PSNR gain of 1.5 dB over SoftCast and even a 1.789 dB gain over the distortion-resource algorithm (gtFly sequence), using less than 1.4% of the bandwidth, and a maximum SSIM increment of 0.1588 was achieved over SoftCast (Shark sequence) using 2% of the bandwidth.
Figure 10 shows the constrained resource influence on pseudo-analog transmission. Assuming different bandwidth resources along with different power budgets reveals the pseudo-analog transmission behavior. The impact of discarding LP chunks and retaining HP chunks at lower SNR is different from that at higher SNR. For each given power budget, several bandwidth constraints are suggested. Under each bandwidth resource, SoftCast consumes the available bandwidth resource for the transmission process. Although the available bandwidth resource also bounds the proposed resource control scheme, it achieves PSNR m a x . To save more bandwidth, the proposed scheme was set to achieve τ · PSNR m a x . The dashed line represents the resulting video quality for SoftCast transmission utilizing the whole bandwidth. In Figure 10a the same result is shown as in Figure 9 but for only one virtual viewpoint.

5.5. Complexity Analysis

The proposed resource control scheme complexity is divided into two parts.
In the joint power allocation method, the complexity is almost determined by the preprocessing step. The full search method complexity is directly proportional to the PAR search range. If a search range of R is assumed with a search step of 0.05, the search will preprocess 20 × R times, including 20 × R × M times distortion computing of reference viewpoints and 20 × R × L times virtual view synthesis. However, the estimation method will preprocess only 6 times, including 6 × 2 times distortion computing of the leftmost and rightmost reference views and 6 × L times virtual view synthesis. Only the first frame of each sequence is preprocessed in the proposed estimation method, whereas the whole sequence is preprocessed in the full search method. The significance becomes clear as the sequence frames number grows. Thus, the estimation method complexity is 96.25% lower than that of the full search method, which is trivial.
In the greedy search algorithms (Algorithms 1–3), all possible chunk (n) or channel (r) values are traversed iteratively, and for each n or r, all the operations involved are linear. Thus, Algorithms 1–3 all have an O ( N ) complexity, which is negligible. The N chunks energy sorting process in each GOP is the major source of complexity, and is O ( N l o g N ) . However, instead of strictly sorting all the chunks according to their energy distribution, the sorting process can be avoided, as chunks can be sorted in a zigzag scanning manner (i.e., JPEG image compression). Furthermore, the complexity can be further reduced by considering the calculation of all the optimal values for the first GOP only and then applying the result to the remaining GOPs. This is possible if the video content of consecutive frames does not change greatly.

6. Conclusions

This paper presented a resource control scheme for pseudo-analog transmission based on the integration of joint power allocation with resource optimization for multi-view plus depth video transmission. While the joint texture/depth power allocation maximized the transmission efficiency, the resource allocation algorithms handled the data traffic burden caused by multi-view video content. The estimated optimal PAR between texture and depth performed similarly to the full search method but with negligible complexity. Following the SoftCast perspective, a minimum distortion algorithm was used to achieve the best viewing quality under a resource constraint. In contrast, the target distortion algorithm minimizes the resource usage for a predefined distortion requirement, which results in receiving videos of consistent viewing quality. As the minimal resource solution is generally not unique, we analyzed the power and bandwidth usage trade-off. The reported results verified the efficiency of the proposed scheme for reference and virtual viewpoints. During 3D video streaming, if the 3D scene changes, the video content characteristic in the subsequent frames may differ. Hence, the model parameters of the joint power allocation estimation using the first frames become inaccurate for the other frames, which have video content with considerably different characteristics. Adding scene change detection allows the system to update the model parameters automatically whenever the video content changes dramatically. In future work, the inter-view correlation could be exploited to optimize the transmission performance using 4D-DCT.

Author Contributions

Conceptualization, S.K.S.T., O.A., A.M.S., E.O.-M. and O.B.; methodology, S.K.S.T., O.B. and O.A.; software, S.K.S.T., O.A. and E.O.-M.; validation, S.K.S.T., O.A. and E.O.-M.; formal analysis, S.K.S.T. and O.B.; investigation, S.K.S.T.; resources, S.K.S.T. and O.A.; data curation, S.K.S.T. and O.A.; writing—original draft preparation, S.K.S.T. and E.O.-M.; writing—review and editing, S.K.S.T., E.O.-M. and O.B.; visualization, S.K.S.T., A.M.S. and E.O.-M.; supervision, S.K.S.T. and O.B.; project administration, S.K.S.T., O.B. and E.O.-M.; funding acquisition, S.K.S.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We acknowledge the support of our laboratory, the Advanced Visual Communication and Computing (AVC2) laboratory.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Merkle, P.; Smolic, A.; Muller, K.; Wiegand, T. Multi-view video plus depth representation and coding. In Proceedings of the 2007 IEEE International Conference on Image Processing, San Antonio, TX, USA, 16 September–19 October 2007; Volume 1, pp. 201–204. [Google Scholar]
  2. Cao, B.; Xia, S.; Han, J.; Li, Y. A distributed game methodology for crowdsensing in uncertain wireless scenario. IEEE Trans. Mob. Comput. 2020, 19, 15–28. [Google Scholar] [CrossRef]
  3. Fehn, C. Depth-image-based rendering (DIBR), compression, and transmission for a new approach on 3D-TV. In Proceedings of the Stereoscopic Displays and Virtual Reality Systems XI, San Jose, CA, USA, 19–22 January 2004; Volume 5291, pp. 93–104. [Google Scholar]
  4. Zhu, C.; Li, S. Depth image based view synthesis: New insights and perspectives on hole generation and filling. IEEE Trans. Broadcast. 2016, 62, 82–93. [Google Scholar] [CrossRef]
  5. Li, S.; Zhu, C.; Sun, M.T. Hole filling with multiple reference views in DIBR view synthesis. IEEE Trans. Multimed. 2018, 20, 1948–1959. [Google Scholar] [CrossRef] [Green Version]
  6. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
  7. Vembu, S.; Verdu, S.; Steinberg, Y. The source-channel separation theorem revisited. IEEE Trans. Inf. Theory 1995, 41, 44–54. [Google Scholar] [CrossRef] [Green Version]
  8. Yuan, H.; Fu, H.; Liu, J.; Xiao, J. End-to-end distortion-based multiuser bandwidth allocation for real-time video transmission over LTE network. IEEE Trans. Broadcast. 2017, 63, 338–349. [Google Scholar] [CrossRef]
  9. Kratochvíl, T.; Štukavec, R. DVB-T digital terrestrial television transmission over fading channels. Radioengineering 2008, 17, 96–102. [Google Scholar]
  10. Katabi, D.; Rahul, H.; Jakubczak, S. Softcast: One Video to Serve All Wireless Receivers; Tech. Rep. MIT-CSAIL-TR-2009-005; Massachusetts Institute of Technology: Cambridge, MA, USA, 2009. [Google Scholar]
  11. Jakubczak, S.; Katabi, D. SoftCast: One-size-fits-all wireless video. In Proceedings of the ACM SIGCOMM 2010 Conference, New Delhi, India, 30 August–3 September 2010; pp. 449–450. [Google Scholar]
  12. Jakubczak, S.; Katabi, D. A cross-layer design for scalable mobile video. In Proceedings of the 17th Annual International Conference on Mobile Computing and Networking, Las Vegas, NV, USA, 19–23 September 2011; pp. 289–300. [Google Scholar]
  13. Giannopoulos, A.; Spantideas, S.; Kapsalis, N.; Karkazis, P.; Trakadas, P. Deep reinforcement learning for energy-efficient multi-channel transmissions in 5G cognitive hetnets: Centralized, decentralized and transfer learning based solutions. IEEE Access 2021, 9, 129358–129374. [Google Scholar] [CrossRef]
  14. Cumino, P.; Lobato Junior, W.; Tavares, T.; Santos, H.; Rosário, D.; Cerqueira, E.; Villas, L.A.; Gerla, M. Cooperative UAV scheme for enhancing video transmission and global network energy efficiency. Sensors 2018, 18, 4155. [Google Scholar] [CrossRef] [Green Version]
  15. Fan, X.; Xiong, R.; Wu, F.; Zhao, D. WaveCast: Wavelet based wireless video broadcast using lossy transmission. In Proceedings of the 2012 Visual Communications and Image Processing, San Diego, CA, USA, 27–30 November 2012; pp. 1–6. [Google Scholar]
  16. Fan, X.; Wu, F.; Zhao, D.; Au, O.C. Distributed wireless visual communication with power distortion optimization. IEEE Trans. Circuits Syst. Video Technol. 2013, 23, 1040–1053. [Google Scholar] [CrossRef]
  17. Xiong, R.; Wu, F.; Fan, X.; Luo, C.; Ma, S.; Gao, W. Power-distortion optimization for wireless image/video SoftCast by transform coefficients energy modeling with adaptive chunk division. In Proceedings of the 2013 Visual Communications and Image Processing (VCIP), Kuching, Malaysia, 17–20 November 2013; pp. 1–6. [Google Scholar]
  18. Cui, H.; Xiong, R.; Luo, C.; Song, Z.; Wu, F. Denoising and resource allocation in uncoded video transmission. IEEE J. Sel. Top. Signal Process. 2015, 9, 102–112. [Google Scholar] [CrossRef]
  19. Balsa, J.; Fresnedo, Ó.; García-Naya, J.A.; Domínguez-Bolaño, T.; Castedo, L. JSCC-Cast: A Joint Source Channel Coding Video Encoding and Transmission System with Limited Digital Metadata. Sensors 2021, 21, 6208. [Google Scholar] [CrossRef]
  20. Cui, H.; Song, Z.; Yang, Z.; Luo, C.; Xiong, R.; Wu, F. Cactus: A hybrid digital-analog wireless video communication system. In Proceedings of the 16th ACM International Conference on Modeling, Analysis & Simulation of Wireless and Mobile Systems, Barcelona, Spain, 3–8 November 2013; pp. 273–278. [Google Scholar]
  21. Liu, X.L.; Hu, W.; Luo, C.; Pu, Q.; Wu, F.; Zhang, Y. ParCast+: Parallel video unicast in MIMO-OFDM WLANs. IEEE Trans. Multimed. 2014, 16, 2038–2051. [Google Scholar] [CrossRef]
  22. He, D.; Luo, C.; Lan, C.; Wu, F.; Zeng, W. Structure-preserving hybrid digital-analog video delivery in wireless networks. IEEE Trans. Multimed. 2015, 17, 1658–1670. [Google Scholar] [CrossRef]
  23. Yu, L.; Li, H.; Li, W. Wireless scalable video coding using a hybrid digital-analog scheme. IEEE Trans. Circuits Syst. Video Technol. 2014, 24, 331–345. [Google Scholar] [CrossRef]
  24. Yu, L.; Li, H.; Li, W. Wireless cooperative video coding using a hybrid digital–analog scheme. IEEE Trans. Circuits Syst. Video Technol. 2015, 25, 436–450. [Google Scholar] [CrossRef]
  25. Liang, F.; Luo, C.; Xiong, R.; Zeng, W.; Wu, F. Superimposed Modulation for Soft Video Delivery with Hidden Resources. IEEE Trans. Circuits Syst. Video Technol. 2018, 28, 2345–2358. [Google Scholar] [CrossRef]
  26. He, C.; Wang, H.; Hu, Y.; Chen, Y.; Fan, X.; Li, H.; Zeng, B. MCast: High-quality linear video transmission with time and frequency diversities. IEEE Trans. Image Process. 2018, 27, 3599–3610. [Google Scholar] [CrossRef]
  27. He, C.; Hu, Y.; Chen, Y.; Fan, X.; Li, H.; Zeng, B. Mucast: Linear uncoded multiuser video streaming with channel assignment and power allocation optimization. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 1136–1146. [Google Scholar] [CrossRef]
  28. Yang, T.; Luo, L.; Zhu, C.; Tang, S. Block DCT based optimization for wireless SoftCast of depth map. IEEE Access 2019, 7, 29484–29494. [Google Scholar] [CrossRef]
  29. Luo, L.; Yang, T.; Zhu, C.; Jin, Z.; Tang, S. Joint texture/depth power allocation for 3-D video SoftCast. IEEE Trans. Multimed. 2019, 21, 2973–2984. [Google Scholar] [CrossRef]
  30. Liu, D.; Wu, J.; Cui, H.; Zhang, D.; Luo, C.; Wu, F. Cost-distortion optimization and resource control in pseudo-analog visual communications. IEEE Trans. Multimed. 2018, 20, 3097–3110. [Google Scholar] [CrossRef]
  31. Zhang, T.; Mao, S. Joint power and channel resource optimization in soft multi-view video delivery. IEEE Access 2019, 7, 148084–148097. [Google Scholar] [CrossRef]
  32. Xiong, R.; Wu, F.; Xu, J.; Fan, X.; Luo, C.; Gao, W. Analysis of decorrelation transform gain for uncoded wireless image and video communication. IEEE Trans. Image Process. 2016, 25, 1820–1833. [Google Scholar]
  33. Cui, H.; Luo, C.; Chen, C.W.; Wu, F. Robust uncoded video transmission over wireless fast fading channel. In Proceedings of the IEEE INFOCOM 2014-IEEE Conference on Computer Communications, Toronto, ON, Canada, 27 April–2 May 2014; pp. 73–81. [Google Scholar]
  34. Saito, T. Nagoya University Multi-View Sequences Download List. Nagoya University, Fujii Laboratory. 2015. Available online: http://www.fujii.nuee.nagoya-u.ac.jp/multiview-data/ (accessed on 1 May 2015).
  35. Domañski, M.; Grajek, T.; Klimaszewski, K.; Kurc, M.; Stankiewicz, O.; Stankowski, J.; Wegner, K. Poznan multiview video test sequences and camera parameters. Paper presented at the MPEG/M17050, Xian, China, October 2009. ISO/IEC JTC1/SC29/WG11. [Google Scholar]
  36. Fraunhofer Heinrich Hertz Institute. 3D High Efficiency Video Coding (3D-HEVC)—JCT-VC. Available online: https://hevc.hhi.fraunhofer.de/3dhevc (accessed on 10 September 2021).
Figure 1. Diagram of the proposed 3D video SoftCast.
Figure 1. Diagram of the proposed 3D video SoftCast.
Applsci 12 05047 g001
Figure 2. Minimum distortion optimization performance and PAR influence: (a) Kendo; (b) PoznanStreet.
Figure 2. Minimum distortion optimization performance and PAR influence: (a) Kendo; (b) PoznanStreet.
Applsci 12 05047 g002
Figure 3. Bandwidth usage savings and PAR influence: (a) SNR chk = 15 dB; (b) SNR chk = 10 dB; (c) SNR chk = 5 dB; (d) SNR chk = 0 dB.
Figure 3. Bandwidth usage savings and PAR influence: (a) SNR chk = 15 dB; (b) SNR chk = 10 dB; (c) SNR chk = 5 dB; (d) SNR chk = 0 dB.
Applsci 12 05047 g003
Figure 4. Power allocation for texture and depth map of 3D video sequences: (a) Kendo; (b) PoznanStreet.
Figure 4. Power allocation for texture and depth map of 3D video sequences: (a) Kendo; (b) PoznanStreet.
Applsci 12 05047 g004
Figure 5. Bandwidth usage for texture and depth map of 3D video sequences: (a) Kendo; (b) PoznanStreet.
Figure 5. Bandwidth usage for texture and depth map of 3D video sequences: (a) Kendo; (b) PoznanStreet.
Applsci 12 05047 g005
Figure 6. Bandwidth usage savings and PAR influence: (a) SNR chk = 20 dB (PSNR target = 50 dB); (b) SNR chk = 15 dB (PSNR target = 45 dB); (c) SNR chk = 10 dB (PSNR target = 40 dB); (d) SNR chk = 5 dB (PSNR target = 35 dB).
Figure 6. Bandwidth usage savings and PAR influence: (a) SNR chk = 20 dB (PSNR target = 50 dB); (b) SNR chk = 15 dB (PSNR target = 45 dB); (c) SNR chk = 10 dB (PSNR target = 40 dB); (d) SNR chk = 5 dB (PSNR target = 35 dB).
Applsci 12 05047 g006
Figure 7. Bandwidth usage savings and PAR influence under SNR chk = 20 dB: (a) PSNR target = 50 dB; (b) PSNR target = 45 dB; (c) PSNR target = 40 dB; (d) PSNR target = 35 dB.
Figure 7. Bandwidth usage savings and PAR influence under SNR chk = 20 dB: (a) PSNR target = 50 dB; (b) PSNR target = 45 dB; (c) PSNR target = 40 dB; (d) PSNR target = 35 dB.
Applsci 12 05047 g007
Figure 8. Power and bandwidth usage trade-off curves for different video sequences: (a) Kendo; (b) PoznanStreet.
Figure 8. Power and bandwidth usage trade-off curves for different video sequences: (a) Kendo; (b) PoznanStreet.
Applsci 12 05047 g008
Figure 9. Synthesis quality at virtual view point 2 for different algorithms: (a) SNR = 20 dB; (b) SNR = 10 dB; (c) SNR = 0 dB.
Figure 9. Synthesis quality at virtual view point 2 for different algorithms: (a) SNR = 20 dB; (b) SNR = 10 dB; (c) SNR = 0 dB.
Applsci 12 05047 g009
Figure 10. Performance comparison of different schemes under multiple SNR chk and bandwidth values: (a) Kendo; (b) PoznanStreet.
Figure 10. Performance comparison of different schemes under multiple SNR chk and bandwidth values: (a) Kendo; (b) PoznanStreet.
Applsci 12 05047 g010
Table 1. 3D video sequences used in the experiments [29].
Table 1. 3D video sequences used in the experiments [29].
Sequence3D Scene NatureReference ViewpointsSynthesized ViewpointsResolution
Balloonsreal1 and 52, 3,41024 × 768
Kendoreal1 and 52, 3, 41024 × 768
Newspaperreal2 and 52.5, 3, 3.51024 × 768
Dancercomputer-generated1 and 52, 3, 41920 × 1088
gtFlycomputer-generated1 and 52, 3, 41920 × 1088
PoznanHall2real5 and 75.5, 6, 6.51920 × 1088
PoznanStreetreal3 and 53.5, 4, 4.51920 × 1088
Sharkcomputer-generated1 and 52, 3, 41920 × 1088
Table 2. Power allocation ratio of the 3D video sequences [29].
Table 2. Power allocation ratio of the 3D video sequences [29].
SequenceEstimated PARGlobal Optimal PAR
Balloons2.232.05
Kendo2.552.60
Newspaper3.322.85
Dancer3.083.65
gtFly5.787.00
PoznanHall22.843.45
PoznanStreet2.242.25
Shark4.504.25
Table 3. Bandwidth usage comparison for different 3D video sequences.
Table 3. Bandwidth usage comparison for different 3D video sequences.
(a) SNR chk = 20 dB
3D Video
Sequences
SoftCastDistortion
Resource Algorithm
Proposed Resource
Control Scheme
Balloons100%67.5292%51.1108%
Kendo100%60.0586%45.8862%
Newspaper100%65.1855%48.3398%
Dancer100%89.3554%56.4208%
gtFly100%91.5039%49.1577%
PoznanHall2100%56.9824%34.9609%
PoznanStreet100%57.0801%49.9633%
Shark100%95.0195%83.1665%
(b) SNR chk = 15 dB
3D Video
Sequences
SoftCastDistortion
Resource Algorithm
Proposed Resource
Control Scheme
Balloons100%36.9629%19.5922%
Kendo100%30.6641%18.1274%
Newspaper100%33.8379%17.6775%
Dancer100%73.3398%40.8203%
gtFly100%75.0977%41.6748%
PoznanHall2100%37.5000%21.6919%
PoznanStreet100%34.6191%24.9146%
Shark100%84.9609%59.1309%
(c) SNR chk = 10 dB
3D Video
Sequences
SoftCastDistortion
Resource Algorithm
Proposed Resource
Control Scheme
Balloons100%9.3262%4.5898%
Kendo100%12.0117%7.2875%
Newspaper100%10.4980%4.3212%
Dancer100%40.3809%22.4365%
gtFly100%39.2578%25.8056%
PoznanHall2100%19.5801%11.7187%
PoznanStreet100%17.2852%10.3271%
Shark100%49.7558%19.2138%
(d) SNR chk = 5 dB
3D Video
Sequences
SoftCastDistortion
Resource Algorithm
Proposed Resource
Control Scheme
Balloons100%2.7344%1.5380%
Kendo100%4.7852%2.8930%
Newspaper100%3.4180%1.5014%
Dancer100%11.9141%6.8603%
gtFly100%8.9356%8.0688%
PoznanHall2100%5.8594%3.4058%
PoznanStreet100%6.3965%3.4790%
Shark100%12.3046%7.0313%
(e) SNR chk = 0 dB
3D Video
Sequences
SoftCastDistortion
Resource Algorithm
Proposed Resource
Control Scheme
Balloons100%1.0254%0.6225%
Kendo100%2.1484%1.1352%
Newspaper100%1.1230%0.6225%
Dancer100%2.5391%1.5136%
gtFly100%1.2207%1.3672%
PoznanHall2100%2.2461%0.9521%
PoznanStreet100%2.2539%1.2939%
Shark100%3.4179%2.0019%
Table 4. Preformance comparison for different 3D video sequences.
Table 4. Preformance comparison for different 3D video sequences.
(a) SNR chk = 20 dB
3D Video
Sequences
SoftCastDistortion
Resource Algorithm
Proposed Resource
Control Scheme
PSNRSSIMPSNRSSIMPSNRSSIM
Balloons49.31960.996446.11960.994246.27610.9943
Kendo49.78990.995747.63920.993648.01360.9939
Newspaper49.21410.996346.64350.994346.97530.9946
Dancer46.02530.991643.22920.989343.60100.9913
gtFly48.96090.993047.61940.991647.11730.9927
PoznanHall253.18290.996950.13850.994451.21160.9957
PoznanStreet48.21540.995245.10480.991245.95600.9925
Shark48.12040.993346.32050.989947.03110.9929
(b) SNR chk = 15 dB
3D Video
Sequences
SoftCastDistortion
Resource Algorithm
Proposed Resource
Control Scheme
PSNRSSIMPSNRSSIMPSNRSSIM
Balloons45.41150.992842.59710.987842.57160.9878
Kendo45.82770.991143.99520.986244.32600.9875
Newspaper45.17310.992142.89190.987743.03340.9884
Dancer41.54280.978139.27600.974039.60650.9790
gtFly45.13690.984543.69270.981443.82690.9852
PoznanHall249.07600.993546.22650.988247.17990.9905
PoznanStreet43.93200.988841.59150.981242.25850.9837
Shark44.18870.984542.50720.977642.94800.9847
(c) SNR chk = 10 dB
3D Video
Sequences
SoftCastDistortion
Resource Algorithm
Proposed Resource
Control Scheme
PSNRSSIMPSNRSSIMPSNRSSIM
Balloons41.32680.983039.28380.974339.30900.9769
Kendo41.54700.978039.87230.969640.60420.9760
Newspaper40.74410.980139.02260.972838.79930.9733
Dancer37.09330.940235.00930.932935.32950.9453
gtFly40.66330.958839.06180.954540.32150.9655
PoznanHall244.59690.984442.39170.973143.62130.9788
PoznanStreet39.59280.971337.77000.955738.59420.9634
Shark39.76860.958738.53680.951638.64940.9663
(d) SNR chk = 5 dB
3D Video
Sequences
SoftCastDistortion
Resource Algorithm
Proposed Resource
Control Scheme
PSNRSSIMPSNRSSIMPSNRSSIM
Balloons36.80790.952635.83470.951236.29200.9613
Kendo36.84310.938935.59590.939836.69000.9568
Newspaper36.16320.946035.07360.939135.38600.9479
Dancer32.40940.843430.90760.846931.63660.8695
gtFly36.03980.889834.42770.902836.05970.9268
PoznanHall240.00810.958438.48550.946639.81480.9566
PoznanStreet34.87350.919233.87320.906934.81940.9270
Shark35.05050.888034.03450.906734.92200.9307
(e) SNR chk = 0 dB
3D Video
Sequences
SoftCastDistortion
Resource Algorithm
Proposed Resource
Control Scheme
PSNRSSIMPSNRSSIMPSNRSSIM
Balloons31.91650.857832.05930.901532.92640.9328
Kendo31.67850.822731.39430.888732.78960.9280
Newspaper31.18090.851330.97880.870732.00760.9064
Dancer27.46760.648327.83620.734628.52600.7732
gtFly31.13360.731030.84600.846932.63450.8842
PoznanHall235.07960.882734.78320.912736.28440.9345
PoznanStreet29.78310.775730.24800.832931.22670.8715
Shark29.93800.718830.23110.833631.29550.8776
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Thabet, S.K.S.; Osei-Mensah, E.; Ahmed, O.; Seid, A.M.; Bamisile, O. Resource Optimization for 3D Video SoftCast with Joint Texture/Depth Power Allocation. Appl. Sci. 2022, 12, 5047. https://doi.org/10.3390/app12105047

AMA Style

Thabet SKS, Osei-Mensah E, Ahmed O, Seid AM, Bamisile O. Resource Optimization for 3D Video SoftCast with Joint Texture/Depth Power Allocation. Applied Sciences. 2022; 12(10):5047. https://doi.org/10.3390/app12105047

Chicago/Turabian Style

Thabet, Saqr Khalil Saeed, Emmanuel Osei-Mensah, Omar Ahmed, Abegaz Mohammed Seid, and Olusola Bamisile. 2022. "Resource Optimization for 3D Video SoftCast with Joint Texture/Depth Power Allocation" Applied Sciences 12, no. 10: 5047. https://doi.org/10.3390/app12105047

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop