Adaptive Transmission Strategy for Non-Uniform Coding of 360∘ Videos

Guo, Jia; Li, Shiqiang; Zhu, Jinqi; Li, Xiang; Sun, Bowen; Feng, Weijia

doi:10.3390/electronics13163266

Open AccessArticle

Adaptive Transmission Strategy for Non-Uniform Coding of 360^∘ Videos

by

Jia Guo

¹

,

Shiqiang Li

²,

Jinqi Zhu

¹,

Xiang Li

^1,*,

Bowen Sun

¹ and

Weijia Feng

¹

College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300380, China

²

School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(16), 3266; https://doi.org/10.3390/electronics13163266

Submission received: 16 July 2024 / Revised: 11 August 2024 / Accepted: 15 August 2024 / Published: 17 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

A 360° video offers a more immersive experience, gaining increasing popularity among users. However, enhancing the transmission efficiency of 360° videos under limited bandwidth conditions remains a significant challenge. This paper segments the video into three areas: the attention area, the edge area, and the viewpoint-switching transition area. Based on the segmentation of these three distinct video areas, a novel non-uniform coding transmission method for 360° videos is presented, along with mathematical modeling to define the optimization problem. A heuristic algorithm is subsequently introduced, which adaptively determines the optimal number of tiles and allocates the bitrate for each tile in real time to enhance the user’s quality of experience (QoE). Finally, a simulation platform has been developed to validate the efficacy of the proposed algorithm by conducting comparative analyses with existing algorithms.

Keywords:

360° video; non-uniform coding; adaptive transmission

1. Introduction

Due to their ability to provide an immersive experience, 360° videos are becoming increasingly popular among users. When watching 360° panoramic videos, users can switch viewpoints freely by wearing head-mounted display devices or by rotating handheld devices to access the surrounding 360° video content. However, 360° videos are characterized by their large encoding volume, necessitating substantial bandwidth for transmission. For instance, a 4K-resolution 360° video requires up to 4.2 Gbps of network bandwidth [1]. Consequently, the transmission of 360° videos within bandwidth limitations presents a substantial challenge. Focusing on transmitting only the video data relevant to the current viewpoint can significantly reduce the data load. Researchers frequently employ the tile-based encoding method for 360° videos, as it minimizes the chances of stuttering during viewpoint transitions [2].

The authors of Ref. [3] introduced a multimodal spatiotemporal attention transformer designed to generate multiple viewpoint trajectories and their associated probabilities based on a historical trajectory. The approach aims to better handle abrupt changes in the user’s perspective. By framing viewpoint prediction as a classification problem, the method leverages the attention mechanism to capture both spatial and temporal features from input video frames and viewpoint trajectories, thereby enabling accurate multi-viewpoint prediction [3].

Many scholars have conducted research on 360° transmission technology and have achieved significant results. Tile-based encoding is currently a common encoding scheme for 360° videos that can reduce the overall transmission load, thereby enhancing the user’s media experience. However, since each tile is encoded and decoded individually, a problem arises: when the tile size is small, the overall video encoding volume increases; when the tile size is too large, more stuttering and buffering may occur during viewpoint switching. Therefore, tile size directly affects the user’s quality of experience (QoE). Furthermore, when users watch videos, their range of attention is limited, and this range changes with the distance between their eyes and the viewing device. Determining the optimal number and size of tiles, along with the appropriate bitrate for each tile, based on the user’s eye-to-device distance, poses a significant challenge that requires careful consideration.

To address the aforementioned issues, this study first proposes a non-uniform coding method for 360° videos, as shown in Figure 1. The 360° video is divided into three different areas: the attention area (AA), the field-of-view (FoV) area, and the out-of-field-of-view (OFoV) area. The tile sizes in these three areas vary. The size of the attention area is calculated based on human physiological principles and the distance from the eyes of the user to the viewing device. The tile sizes for the field-of-view and out-of-field-of-view areas are calculated using the algorithm introduced in this paper. Subsequently, a 360° video adaptive transmission architecture with non-uniform coding, tailored to the user’s viewing distance, is presented. Finally, an optimization algorithm is proposed to choose the suitable number, size, and bitrate of tiles for the user.

The rest of this paper is organized as follows. Section 2 examines the current state of research in adaptive 360° video transmission. In Section 3, the architecture, model, and analysis of 360° non-uniform coding are presented. Section 4 proposes an intelligent algorithm. In Section 5, we establish a simulation environment and conduct a performance evaluation to validate our approach. Finally, in Section 6, we summarize our findings and highlight future research directions.

The primary contributions of this research are as follows:

Introduction of an adaptive transmission framework for 360° videos based on non-uniform coding, which, for the first time, calculates the user’s attention range through the user’s video viewing environment (distance between the eyes and the screen) and dynamically segments video fragments into tiles of varying numbers and sizes, selecting appropriate bitrates for each tile in real time.
Mathematical modeling of the non-uniform coding adaptive transmission method for 360° videos and identification of the optimization problems that need to be addressed. A complexity analysis of the problem to be solved is conducted, proving it to be NP-hard.
Proposal of a non-uniform coding algorithm for 360° panoramic video tiling and bitrate selection (NETBS), which selects the appropriate number and size of tiles and suitable bitrates for the user in real time based on the network environment. Finally, the performance of the proposed algorithm is confirmed through simulation experiments and subjective user experiments.

2. Related Works

As a popular media format today, 360° videos have become a hot research topic. Numerous scholars have conducted extensive research on this subject.

In recent research, the authors of [4] investigated a field-of-view (FoV) prediction technique based on spherical convolution. The method combines salient features extracted from 360° videos with FoV feedback, forming a multi-source prediction framework. The authors of [5] proposed a novel transcoding-supported VR video caching and delivery framework designed to optimize computational and storage resource utilization in edge-enhanced next-generation wireless networks. Additionally, the authors of [6] introduced a single-frame and multi-frame joint network, which can upscale low-resolution spherical videos to high-resolution outputs. In [7], the collaborative transcoding and caching process of 360° videos was modeled as a Markov decision process (MDP), and the authors employed a model-free deep reinforcement learning method to derive caching replacement and computational power allocation strategies. They also proposed a novel FoV-guided scheme to accelerate the training of the DDPG agent. Moreover, in a multi-access edge computing (MEC) caching system, a tile-based caching strategy for 360° videos has been proposed. The authors applied the Combinatorial Multi-Armed Bandit (CMAB) theory, which can optimize users’ quality of experience and reduce MEC energy consumption without prior knowledge of video content popularity [8]. A viewpoint reconstruction-based 360° video caching solution has been proposed, wherein the authors designed a QoE-driven reconstruction triggering scheme to maximize viewers’ QoE. The scheme determines whether reconstruction is needed based on current caching information and network conditions. To efficiently use cache space for viewpoint reconstruction, the authors proposed a heuristic aggregation-based cache replacement algorithm that carefully selects tiles to store, thereby enhancing the probability of successful viewpoint reconstruction [9]. To optimize end-to-end video streaming, a viewport-aware real-time 360-degree video streaming framework has been proposed. The framework decomposes the transmission optimization problem into two sub-problems, which are asynchronously optimized for upstream and downstream processes. The method prioritizes the uploading of more popular content at higher bitrates based on users’ real-time viewing interests [10]. Lastly, a practical neural-enhanced 360-degree video streaming framework called Masked360 has been proposed. The framework significantly reduces bandwidth consumption by transmitting only the low-resolution masked version of each video frame instead of the full frames, thereby conserving bandwidth [11].

Caching can significantly reduce video transmission latency. A novel intelligent edge caching framework has been proposed, which maximizes users’ video experience by considering fairness and long-term system costs based on a new edge caching architecture [12]. A multi-category 360° video streaming strategy based on multi-agent deep reinforcement learning (MADRL) for joint edge caching and bitrate selection has also been introduced. The approach enables more refined performance control by adopting different caching strategies and bitrate selection methods for various video categories [13]. Furthermore, an edge-client collaborative caching and super-resolution strategy based on MADRL has been proposed. The authors first constructed a video experience evaluation function that includes video quality, temporal smoothness, rebuffering time, and device energy consumption. They then modeled the edge-client collaborative caching and super-resolution problem as a multi-agent cooperative Markov decision process, summarizing it as an optimization problem. To address decision coupling among agents, a multi-agent dual-agent regularized critic algorithm based on adaptive learning has been proposed. The algorithm improves users’ video experience through appropriate collaborative caching and super-resolution methods [14].

A multi-control adaptive 360° video streaming algorithm has been proposed, where the authors integrated multiple auxiliary controls with the main control to adapt to changes in the user viewport [15]. In [16], the authors explored Non-Orthogonal Multiple Access (NOMA) and Scalable Video Coding (SVC) technologies and subsequently proposed a new multicast transmission scheme for 360° videos, aiming to maximize users’ video experience through multicast video data transmission. In another study, a transformer-based saliency model was proposed, first trained on 2D images for viewpoint prediction. Based on these results, the authors introduced a new bitrate allocation algorithm that can enhance users’ video experience even in the absence of head-movement data [17]. Additionally, a new spherical super-resolution method was introduced to enhance users’ viewing experience [18]. Guo et al. proposed an optimal wireless streaming scheme for transmitting multi-quality tiled 360° VR video from a multi-antenna server to multiple single-antenna users based on Multi-Input Multi-Output (MIMO) and Orthogonal Frequency-Division Multiple-Access (OFDMA) technologies. The method minimizes total transmission power by jointly optimizing beamforming, subcarrier allocation, transmission power, and rate allocation [19]. An optimal wireless streaming technology for transmitting multi-quality tiled 360° VR video from a server to multiple users has been introduced. This technology maximizes transmission efficiency by leveraging the characteristics of multi-quality tiled 360° VR video and user-side computational resources to increase multicast opportunities [20]. Guo et al. studied the optimal streaming method for tiled 360° VR video from a server (base station or access point) to multiple users, considering random viewing directions and stochastic channel conditions. They adopted Time-Division Multiple Access (TDMA) to optimize transmission time and power allocation, thereby reducing the average transmission energy [21]. Furthermore, Guo et al. investigated the adaptive streaming of tiled 360° videos in a multi-carrier wireless system from a multi-antenna base station (BS) to single-antenna users. Their goal was to enhance video quality while minimizing rebuffering time by adjusting the encoding bitrate for each Group of Pictures (GOP) and tuning transmission for each slot [22].

A novel joint utility-based two-tier 360° video streaming system has been proposed. The method finely estimates the contribution of each tile, allowing for longer buffer durations and bandwidth-sharing strategies. The optimal bitrate allocation strategy is determined by dynamically selecting tiles based on Model Predictive Control (MPC) [23]. Additionally, an online adaptive bitrate algorithm for 360° videos has been proposed. The authors aimed to improve users’ video experience by coordinating the download of video segments from the server. By downloading data within the user’s field of view, network bandwidth is conserved, and the real-time adjustment of tile bitrates reduces buffering occurrences [24]. A new method called DCRL360 has been proposed, where the authors incorporated automatic curriculum learning and deep reinforcement learning into the bitrate allocation and tile scheduling algorithms for 360° videos, thereby enhancing the algorithm’s efficiency [25]. For optimal adaptive 360° video streaming, a synergistic spatiotemporal user-perceived viewport prediction scheme, SPA360, has been proposed. This scheme employs a user-perceived viewpoint prediction model, providing a transparent solution for field-of-view prediction [26]. Furthermore, a tile-weighted rate-distortion (TWRD) packet scheduling optimization system has been proposed to reduce the amount of data needed for transmission and enhance users’ video experience. The authors then introduced an attention transformer to calculate discardable data packets during transmission and finally proposed a dynamic programming algorithm to solve the problem [27].

The aforementioned studies have made significant contributions to the transmission of 360° video. In these studies, 360° videos are segmented into tiles for transmission, with all tiles being the same size. However, encoding each tile separately can lead to a large amount of video encoding data if the tiles are too small, and if the tiles are too large, it can affect the user’s viewpoint-switching experience. Finding the optimal tile size to balance encoding volume and viewpoint switching remains a challenge.

To address this, this paper proposes a non-uniform coding method for 360° panoramic video tiling and bitrate selection. This method segments the 360° video into tiles of different sizes according to the human-eye attention mechanism. The central area, which is the focus of attention, is divided into a single tile, while the other areas are segmented using the proposed tile-partitioning algorithm. This algorithm determines the final number of tiles, their sizes, and bitrates, thereby enhancing users’ media experience. In previous studies, we also conducted related research, which was published in various journals. Building upon this research, we conducted the present study. The differences between this study and our previous research are shown in Appendix A.

3. Architecture and Model

In this section, we outline the concept and non-uniform coding transmission system architecture of the proposed transmission method. Initially, we digitize the model of the transmission system and formulate the optimization problem. Subsequently, we introduce a heuristic algorithm designed to solve the optimization problem presented in this paper.

3.1. Non-Uniform Coding Transmission System

Figure 2 presents the 360° adaptive transmission architecture with non-uniform coding proposed in this paper. The transmission process is as follows: the user predicts the bandwidth of the network for the next moment using a bandwidth prediction algorithm. Based on the user’s predicted bandwidth and the eye-to-device distance, the algorithm proposed in this paper calculates the required number, size, and bitrate of video tiles for the next moment. The calculation results are then sent to the media server, which encodes and transmits the video tiles to the user based on the received information.

3.2. System Model

This paper divides a 360° video into four areas: an attention area (AA), an out-of-field-of-view (OFoV) area, a peripheral area (PA), and other areas (OAs). The AA represents the most important part of the video, the area where the user’s attention is concentrated, and it has the greatest impact on the user’s QoE, hence its highest weight. In our approach, this part is segmented into a single tile. The PA is slightly less important and has a lesser impact on the user’s QoE, but it is still within the user’s field of view and can be affected by the user’s viewpoint switching. Therefore, the number and size of tiles in this part need to be calculated by the algorithm. Although the OFoV area is outside the user’s field of view, it can become part of the FoV with a change in the user’s perspective. Thus, it has the least impact on the user’s QoE and the lowest weight.

Definition 1

(Attention Area (AA)). The central area of the field of view, representing the region where the user’s attention is concentrated.

Definition 2

(Out-of-Field-of-View (OFoV) Area). The area within the user’s current field of view, excluding the AA.

Definition 3

(Peripheral Area (PA)). The area outside the user’s field of view but surrounding it.

Definition 4

(Other Areas (OA)). Any regions of the 360° video that do not fall into the aforementioned categories.

The user’s video experience is quantified as the

Q o E

, which is composed of three parts: the

Q o E

gain from the AA, the

Q o E

gain from the OFoV area, and the

Q o E

gain from the PA. Due to the uneven distribution of user attention across different areas while watching a 360° video, the

Q o E

gains from these areas vary. For instance, the AA, which captures the most user attention, contributes the most to the overall

Q o E

.

The QoE (quality of experience) of the user, as defined in this text, is composed of three parts: one part is the objective quality of the video, another part is the stability of the playback, and the third part is the latency in video transmission. These three parts are each represented by different weighting parameters. The influence of each part on the user’s QoE is not uniform, and these three weighting parameters can be adjusted according to the user’s own preferences. Therefore, the user’s QoE can be calculated using Equation (1):

Q o E = α \cdot Q o E_{A} + β \cdot Q o E_{O} + γ \cdot Q o E_{P},

(1)

where

α

,

β

, and

γ

represent different weighting parameters, indicating the attention distribution in different regions.

\begin{matrix} Q o E_{A} & = Q o S_{A} - ξ R J_{μ}^{i} - T T P, \end{matrix}

(2)

\begin{matrix} Q o E_{O} & = Q o S_{O} - ξ R J_{μ}^{i} - T T P, \end{matrix}

(3)

\begin{matrix} Q o E_{P} & = Q o S_{P} - ξ R J_{μ}^{i} - T T P, \end{matrix}

(4)

R J_{μ}^{i} = (| j_{μ}^{i} - j_{μ - 1}^{i} | + 1) N_{μ}^{i}, 1 \leq i \leq M,

(5)

where M represents the number of video tiles that need to be transmitted.

R J_{μ}^{i}

represents a monotonically increasing function of

N_{μ}^{i}

. Here,

N_{μ}^{i}

denotes the number of bitrate switches a user has encountered in the previous segments. The parameter

ξ

controls the balance between the bitrate and stability. The current bitrate of the user is given by

j_{μ}^{i}

, and

j_{μ - 1}^{i}

is the corresponding video bitrate index.

T T P

stands for the FoV switching delay in the video.

T T P = ψ (t_{d} + t_{c}),

(6)

where

t_{d}

represents the sum of the time taken to segment a 360-degree video segment into tiles and the time for encoding and decoding these tiles;

t_{c}

represents the time taken for the 360-degree video data to be transmitted from the server to the client; and

ψ

represents the weight of these two times on the user’s QoE.

Q o S = M S S I M (P_{μ}, Q_{μ}),

(7)

where

P_{μ}

and

Q_{μ}

represent the original video on the server and the reconstructed video on the client after transmission, respectively. Their resolutions are

r_{v}^{x} \times r_{v}^{y}

and

r_{f}^{x} \times r_{f}^{y}

, respectively.

The

M S S I M

can be computed using the following equations. The entire screen of the media player, with a resolution of

D L

, is divided into s regions.

\{\begin{matrix} M S S I M (P_{μ}, Q_{μ}) = \frac{1}{s} \sum_{j = 1}^{x} S S I M (x_{μ, j}, y_{μ, j}) \\ s = \frac{D L}{min \{{(r_{c}^{x} \times r_{c}^{y})}_{1}, {(r_{c}^{x} \times r_{c}^{y})}_{2}, \dots, {(r_{c}^{x} \times r_{c}^{y})}_{M}\}} \\ S S I M (x_{μ, j}, y_{μ, j}) = g (x_{μ, j}, y_{μ, j}), \end{matrix}

(8)

where g is a function of

x_{μ, j}

and

y_{μ, j}

.

x_{μ, j}

is the original j-

t h

tile video.

y_{μ, j}

is the aware j-

t h

tile video. Moreover, there is a one-to-one mapping relationship between

y_{μ, j}

and

b_{μ, j}^{j}

.

b_{μ, j}^{j}

is the bitrate of the j-

t h

tile.

M S S I M (P_{μ}, Q_{μ}) = f (b_{μ}^{i}),

(9)

f is a mathematical function that represents the one-to-one mapping relationship between the

M S S I M

value and the video bitrate. The bitrate of the

μ

-

t h

segment of the i-

t h

tile is denoted by

b_{μ}^{i}

.

The predicted network bandwidth, denoted as

B

, is determined by the following equations:

B = \{\begin{matrix} \frac{\sum_{k = 1}^{N} (β_{ν}^{k} \times τ_{s})}{τ_{ν} - τ_{ν - 1}}, & if it is the initial segment \\ γ \times \frac{\sum_{k = 1}^{N} (β_{ν - 1}^{k} \times τ_{s})}{τ_{ν - 1} - τ_{ν - 2}} + (1 - γ) \times B^{*}, & for subsequent segments \end{matrix}

(10)

In this context,

B

signifies the forecasted bandwidth. The term

β_{ν}^{k}

represents the bitrate of the

ν

-

t h

segment of the k-

t h

tile. The parameters

τ_{ν}

and

τ_{ν - 1}

denote the reception and transmission times, respectively. The playback duration of the segment is denoted by

τ_{s}

. The term

B^{*}

refers to the previously available bandwidth, and

γ

is the weighting factor for the current bandwidth. This framework remains effective even when alternative methods are utilized for estimating the available bandwidth.

The aggregate bitrate of the segments is expressed as follows:

\sum_{k = 1}^{N} β_{ν}^{k} = max \{\sum_{k = 1}^{N} β_{ν, m}^{k} | m = 1 \dots L, \sum_{k = 1}^{N} β_{ν}^{k} \leq B\},

(11)

where L represents the number of quality layers for the video.

As shown in Figure 3, the user’s area of attention while watching the video is related to the distance between the user’s eyes and the viewing device. Assuming the user’s attention angle is

θ

, the value of

θ

is related to the user’s eyes. The relationship can be described by the equation

tan (θ / 2) = (p / 2) / d

, where the variable d represents the distance between the user’s eyes and the viewing device. The side length P of the attention area can be calculated using Equation (12):

p = tan (\frac{θ}{2}) \cdot d \cdot 2 .

(12)

Consequently, the optimization objective of this paper is as follows:

\begin{matrix} maximize (Q o E) \\ subject to Equations (1) - (12) \end{matrix}

(13)

3.3. Complexity Analysis

Within the aforementioned problem, it is necessary to determine the transmission rate for each tile. Initially, we demonstrate that this problem is NP-hard.

Our approach involves proving the complexity of the optimization problem delineated by Equation (13) through the identification of a specific instance of this problem. Subsequently, we show that this particular instance is analogous to the classic NP-hard exact cover problem.

Proof.

Consider a particular scenario of Equation (13), where the current bandwidth is sufficient to transmit the lowest bitrate video for all tiles. In this case, Equation (13) simplifies to

\begin{matrix} maximize \sum_{i = 1}^{M} (Q o E_{i}) \\ subject to \sum_{i = 1}^{M} N_{i} ⩽ B \end{matrix} .

(14)

The variable

N_{i}

represents the total bitrate of all transmitted tiles.

Equation (13) can be converted into a set problem. Consider a family P comprising d sets, each containing bit-vectors. The aggregate length of P is

3 d

. For example, the families P: 110001 and 001110 are associated with

{x_{1}, x_{2}, x_{6}}

and

{x_{3}, x_{4}, x_{5}}

, respectively. A sequence can be represented as an integer in the base

(N + 1)

system. Consequently, the integer representing the sequence is determined by

U_{i} = \sum_{x_{i} \in Q_{i}} {(N + 1)}^{i - 1} .

(15)

This methodology enables the formation of various integers:

U_{1}, U_{2}, \dots, U_{d}

, and B. A subfamily of P that precisely covers

T = {x_{1}, \dots, x_{3 d}}

exists if and only if the integer representing the set

U_{i}

matches B. Moreover, B aligns with the sequence

11 \dots 1

, which has a length of

3 d

.

\sum_{i = 0}^{3 d - 1} {(N + 1)}^{i} = B .

(16)

Now, suppose there exists a set

T \subseteq {1, 2, \dots, N}

such that

\sum_{i \in T} U_{i} = B

. In base

(N + 1)

arithmetic, the elements in the resulting set of the sum of T are either 1 or 0. When the elements of set T are combined, the total number of summands is less than

N + 1

. Consequently, there is no carry in the addition. If

\sum_{i \in T} (U_{i}) = B

, each position in all sets has exactly one number 1. Thus, the subfamily

U = {Q_{i} : i \in T}

precisely covers

{x_{1}, \dots, x_{3 d}}

, thereby demonstrating sufficiency.

Now, if

U = {Q_{i} : i \in T}

covers

{x_{1}, \dots, x_{3 d}}

exactly, it is evident that

\sum_{i \in T} (U_{i}) = B

. Hence, the necessity is established.

This concludes the demonstration. □

Thus, the issue outlined in Equation (13) is classified as NP-hard. NP-hard problems are notoriously challenging to solve optimally. The 360° video adaptive transmission algorithm with non-uniform coding is employed.

4. Algorithm

This section primarily describes the 360° video adaptive transmission algorithm with non-uniform coding used to solve the problem. The algorithm is divided into two parts: the non-uniform tiling algorithm and the bitrate allocation algorithm. It addresses the issues of how to segment video fragments and how to allocate bandwidth resources to each video tile. The pseudocode for the specific algorithm is described below.

The Algorithm 1 first initializes the user’s field-of-view (FoV) position and size, defines different regions (AA, OFoV, PA, and OA), and sets the number of tiles and video quality levels, as described in lines 1–5. Next, in line 6, it initializes the utility function matrix E. Lines 7–14 use nested loops to iterate through all possible combinations of tile numbers for the PA and OFoV regions (

x_{1}

and

x_{2}

). For each combination, lines 10–12 further loop to calculate the quality of experience (QoE) for each tile at different quality levels, and line 13 stores the maximum QoE value in the utility function matrix E. Lines 15–16 sort the values in matrix E in descending order and select the tile and quality level corresponding to the maximum value. Lines 17–24 iterate through the tiles, checking whether the remaining bandwidth B is sufficient. Lines 19–22 ensure that the selected tile’s video bitrate does not exceed the bandwidth limit. If the condition is met, the highest quality video is selected, and the bandwidth is updated; otherwise, the loop is exited. Finally, line 25 calculates the overall QoE for all video tiles, and lines 26–28 determine the optimal tiling scheme and bitrate configuration that maximize the overall QoE.

Algorithm 1: Non-uniform coding for 360° panoramic video tiling and bitrate selection algorithm (NETBS)

Initialize the user’s field-of-view (FoV) position and size.
Define the FoV attention area (AA), out-of-field-of-view (OFoV) area, peripheral area (PA), and other areas (OAs).
The AA area consists of 1 tile.
Initialize the number of tiles, where AA: 1 tile; PA: $x_{1}$ tiles; OFoV: $x_{2}$ tiles.
The video can be divided into $x_{3}$ quality levels.
Initialize the utility function matrix E, where $E = Q o E / b i t r a t e$ .

for $x_{1} = 1$ to x do
for $x_{2} = 2$ to y do
for $i = 1$ to $1 + x_{1} + x_{2}$ do
for $j = 1$ to $x_{3}$ do
Calculate QoE
end for
Store the value of $max ({QoE}_{1}, {QoE}_{2}, \dots, {QoE}_{x_{3}})$ in the utility function $E (i, j)$
end for
Sort the values of $E (i, j)$ in descending order
The highest value of $E (i, j)$ , where i represents the i-th tile and j represents the j-th quality level
for $i = 1$ to $1 + x_{1} + x_{2}$ do
if $B \geq E (i, j)$ then
Select the j-th quality level video for the i-th tile
$B = B - bitrate (i, j)$
else
break
end if
end for
Calculate the overall QoE for all video tiles
end for
end for
Calculate the maximum QoE to determine the tiling scheme values $x_{1}$ and $x_{2}$ , as well as the bitrate for each video tile.
The values of x and y, respectively, represent the maximum number of segments in the PA and OFoV area, which can be defined by the user according to their specific requirements.

The Complexity Analysis of NETBS

The time complexity of this algorithm is

O (n^{2} \cdot m log m)

, where n represents the maximum values of x and y, which are the maximum number of segments in the peripheral area (PA) and out-of-field-of-view (OFoV) area, respectively. Assuming x and y are of the same order of magnitude, the outer double loop iterates over all combinations of x and y, resulting in

O (n^{2})

iterations. The middle loop processes

(1 + x + y) \cdot x_{3}

elements for each combination, which is of the order

O (n \cdot m)

. The sorting operation has a time complexity of

O (m log m)

, where m is the number of quality levels. Hence, the overall time complexity is

O (n^{2} \cdot m log m)

.

The total number of operations for this algorithm is calculated as follows: the outer double loop iterates

x \times y

times, where

x = 10

and

y = 10

, totaling

10 \times 10 = 100

iterations. The maximum number of iterations for the middle loop is

(1 + x + y) \times x_{3}

, where

x_{3} = 5

, resulting in

(1 + 10 + 10) \times 5 = 21 \times 5 = 105

iterations. The time complexity of the sorting operation is

O ((1 + x + y) \times x_{3} log ((1 + x + y) \times x_{3}))

, which equates to

105 \times 6.72 = 705.6

. Therefore, the total number of operations is

100 \times 105 + 100 \times 705.6

= 10,500 + 70,560 = 81,060 operations.

Assuming each basic operation takes an average of 10 clock cycles and the Intel Core i9-14900HX CPU has a clock speed of 3.9 GHz, the time for each operation is

10 \times \frac{1}{3.9 \times 10^{9}}

s, or

\frac{10}{3.9 \times 10^{9}}

s. The total time T is

81, 060 \times \frac{10}{3.9 \times 10^{9}}

s, approximately

2.079 \times 10^{- 4}

s, which converts to 207.9 ms. Therefore, the estimated time to run this algorithm is 207.9 ms. Based on the above analysis, it can be concluded that our proposed algorithm (NETBS) fully supports real-time operation.

5. Simulation

5.1. Experimental Method

In this section, we implement the proposed transmission approach using MATLAB and evaluate the performance of the allocation method in NS3. In our experiment, the position of the user was generated randomly. The distance between the two base stations was fixed at 2 km. The simulation parameters are summarized in Table 1. Based on these parameters, we used the model proposed by Zhao to calculate the bandwidth of the 5G network in the experiment [28]. We simulated background traffic using the ON/OFF model. The 4K video sequences “Crosswalk” and “Boat” [29] were encoded using the scalable H.264 reference software JSVM (version 9.19) [30]. Both video sequences consist of 300 frames, with a frame rate of 60 frames per second and a GoP size of eight images. The videos were set to play repeatedly. We extracted six layers employing a layer extraction technique for DASH video sources. The resolutions of the layers are as follows: the first layer is 2160P, followed by layers with resolutions of 1440 P, 1080 P, 720 P, 480 P, and 360 P. The “Crosswalk” video contains more dynamic content compared to the “Boat” video. Finally, a subjective quality assessment experiment was conducted. In this study, video data was re-encoded based on the transmission strategies derived from the simulation environment. The video was segmented and encoded into tiles according to the calculated strategies. These videos were then played for the participants to watch. A total of 21 participants were recruited for the experiment. Under the same conditions, they watched videos processed by different algorithms and rated them based on their subjective experiences. The scores ranged from 1 to 6, with higher values indicating a better subjective user experience.

Some important parameters need to be explained.

The

P S N R

(peak signal-to-noise ratio) is a metric utilized to evaluate the quality of reconstructed images or videos after compression by comparing the original and compressed versions. It is calculated using the mean squared error (MSE) between the original and compressed images, with higher

P S N R

values indicating better quality and lower levels of distortion or noise.

The

S S I M

(structural similarity index measure) is a perceptual metric that assesses the visual similarity between two images, focusing on their structural information, luminance, and contrast. It provides a value between 0 and 1, where 1 indicates identical images and 0 signifies no similarity. This makes the SSIM a measure of image quality that aligns more closely with human visual perception compared to traditional error-based metrics like the

P S N R

.

The stability of the playback represents the stability of the video quality during playback, indicating the variation in video quality during playback. In the transmission of 360° videos, the quality level within the field of view is represented by the average quality level of all tiles. It is related to the switching of video quality at different times and the total number of switches. Its value ranges from 0 to 1, with a value closer to 1 indicating more stable video quality. The specific calculation method is shown in Equation (17). In this paper, the complete method for calculating video quality levels is as follows: the quality level of the current video playback is determined by calculating the average quality level of all tiles within the user’s field of view.

The NETBS analysis includes the following performance metrics:

Efficiency: This metric is evaluated using objective quality indices like the $P S N R$ and $S S I M$ . It measures video quality and transmission effectiveness by comparing the sampled image with the original resolution.
Cumulative Distribution Function (CDF): This metric assesses the distribution of video quality levels that are transmitted.
Playback Stability: This metric evaluates the stability of video playback, using a specific calculation method:

$P S = \frac{\sum_{e = 1}^{n} \frac{\sum_{d = 0}^{I - 1} | j_{i - d}^{'} - j_{i - d - 1}^{'} | * ω (d)}{\sum_{d = 1}^{I} j_{i - d}^{'} * ω (d)}}{n} .$

(17)
QoE: The QoE defined by Equation (1) is normalized to enhance its accuracy:

$Q o E_{n o r} = (Q o E - Q o E_{m i n}) / (Q o E_{m a n} - Q o E_{m i n}) .$

(18)

The instability index

P S

is computed by taking the weighted sum of all bitrate changes in the previous I segments and dividing this by the weighted sum of the maximum received bitrate levels over the same duration. Segment i has its highest received bitrate level, denoted as

j_{i}^{'}

.

To evaluate the performance of the proposed NETBS algorithm, we conducted simulation experiments and compared it with other algorithms. The first algorithm, referred to as Base-M, sends only the minimum-quality level video to ensure smooth video playback. The second comparison algorithm is NETBS-S, a variant of NETBS. This algorithm allocates the same bitrate to all video tiles within the same region instead of using the utility function to allocate bitrates to each tile individually. The third comparison algorithm is TBRA [31], which dynamically divides panoramic videos into different numbers of tiles. The fourth comparison algorithm is Option-2 [32], which differentiates a transition region in the user’s FoV movement direction to enhance viewpoint-switching sensitivity based on uniform tiling. The fifth comparison algorithm is Live360 [10], which optimizes 360° video transmission using dynamic programming. The sixth comparison algorithm is the full-frame technique (FFT), where the 360° video is encoded as a single block without tiling, and the basic DASH protocol is used for transmission.

5.2. Experimental Results

The

P S N R

and

M S S I M

values in this study were determined by comparing the received video with the original highest-quality video. If the resolution of the received video differed from that of the original, the received video was upsampled to match the original video’s resolution prior to calculating the

P S N R

and

M S S I M

values. Figure 4 illustrates a comparison of the

P S N R

values for different algorithms. Specifically, Figure 4a presents the

P S N R

values when users viewed the video “Boat”, and Figure 4b shows the

P S N R

values for the video “Crosswalk”. The results demonstrate that the proposed method, NETBS, achieved the highest

P S N R

values, whereas Base-M, which only transmits the lowest-quality video, achieved the lowest

P S N R

values.

Figure 5 shows a comparison of the

M S S I M

values for different algorithms. Figure 5a shows the

M S S I M

values when users watched the video “Boat”, and Figure 5b shows the

M S S I M

values when users watched the video “Crosswalk”. The

M S S I M

values ranged from 0 to 1, with higher values indicating that the received video was closer to the original video. The results show that the proposed NETBS algorithm achieved the best

M S S I M

values, indicating the highest video quality obtained by the proposed algorithm.

Figure 6 shows the CDF values of video quality levels received by clients over the entire duration of watching different videos. Figure 6a shows the CDF of video quality levels when users watched the video “Boat”, and Figure 6b shows the CDF of video quality levels when users watched the video “Crosswalk”. The figures demonstrate that the proposed NETBS method resulted in the highest proportion of high-quality videos received by clients. The Base-M method only transmitted the lowest-quality-level video, resulting in a 100% proportion of the lowest-quality-level video.

Figure 7 illustrates the QoE when users watched different videos. Figure 7a represents the QoE when users watched the video “Boat”, and Figure 7b represents the QoE when users watched the video “Crosswalk”. As an important parameter of the QoE, latency occurred during tile transmission when users switched perspectives, and the size of the tile also affected the transmission latency. From the figures, it can be seen that the performance of the proposed NETBS algorithm was superior to that of the other four algorithms.

Figure 8 shows the subjective video experience scores of users for different algorithms. As depicted in the figure, the proposed algorithm NETBS achieved the highest scores, indicating that the proposed non-uniform tiling strategy and bitrate selection algorithm provide a better media experience for users.

Figure 9 shows the bitrates of the 360° video segments in the transmission strategies calculated by different algorithms. This study considers only the bitrate of the tiles within the field of view, and the bitrate for the FFT algorithm is calculated based on the proportion of the view area to the total video area. From the figure, it can be observed that the BASE-M and FFT algorithms achieved lower bitrates because BASE-M only transmitted video data at the lowest quality level, and the FFT algorithm allocated more network resources to out-of-view videos. The bitrates of the other algorithms were similar, while the proposed NETBS algorithm achieved a higher QoE because it allocated more network resources to the tiles that had the greatest impact on users’ QoE.

Table 2 shows the stability of playback when users watched different videos, where higher values indicate more stable video quality during transmission. The table indicates that Base-M had a consistent value of 1, as it only sent the lowest-quality-level video, resulting in no quality jitter. The proposed NETBS method outperformed the other methods.

Table 3 shows the average execution time for a single run of each algorithm. In this study, Matlab was used as the simulator for the algorithms, and the computer’s CPU was the latest i9-14900. Each algorithm was run 500 times, and the average execution time was calculated. The results indicate that all algorithms met the real-time computation requirements for video. Additionally, the execution time can be further reduced if the algorithms employ parallel programming techniques.

6. Conclusions

This paper presents an innovative adaptive transmission strategy designed to optimize the transmission efficiency of 360° videos under bandwidth-constrained conditions. As 360° videos gain increasing popularity among users, their high bandwidth demands pose significant challenges. To address this issue, this paper first segments the 360° video into three distinct areas: the attention area, the out-of-field-of-view area, and the viewpoint-switching transition area. Based on this segmentation, a novel non-uniform coding transmission method is proposed, along with detailed mathematical modeling to identify the optimization problems inherent in the transmission process. To achieve real-time optimized transmission, this paper introduces a heuristic algorithm, termed the non-uniform coding for 360° panoramic video tiling and bitrate selection algorithm. This algorithm dynamically determines the optimal number and size of video tiles based on the user’s viewing environment, such as the distance between the user’s eyes and the screen, and allocates the appropriate bitrate to each tile. In this way, the proposed strategy effectively reduces unnecessary data transmission without compromising QoE, thereby enhancing overall transmission efficiency. To validate the effectiveness of the proposed algorithm, this paper constructs a simulation platform and conducts extensive experimental comparisons with various existing transmission algorithms. The experimental results demonstrate that the NETBS algorithm performs excellently across multiple key metrics, such as

P S N R

,

S S I M

, and QoE. These findings indicate that, compared to traditional uniform coding transmission strategies, the proposed algorithm not only delivers high-quality video transmission but also significantly enhances the user’s viewing experience. Furthermore, subjective user evaluation experiments reveal that videos transmitted using the NETBS algorithm exhibit better playback stability and image quality, aligning more closely with user expectations. Overall, the main contribution of this paper lies in the proposal of a non-uniform coding-based adaptive transmission strategy. By dynamically adjusting the number, size, and bitrate of video tiles, this strategy effectively improves the transmission efficiency of 360° videos under various network conditions while significantly enhancing user experience quality.

Author Contributions

Conceptualization, J.G. and J.Z.; methodology, J.G.; software, B.S.; validation, B.S. and X.L.; formal analysis, X.L.; investigation, J.G.; resources, J.G.; data curation, W.F.; writing—original draft preparation, J.G.; writing—review and editing, J.G.; visualization, W.F.; supervision, S.L.; project administration, S.L.; funding acquisition, J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Tianjin Municipal Education Commission Research Program Project No. 2022KJ012.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to being related to our subsequent research.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Additional Information

This section mainly describes the preliminary work of this study and its relationship with the current research. The details are given below.

The work in [33] is a previously published article of ours. The main focus of that paper is the MMT (MPEG Media Transport) network transmission protocol, which supports multiple video streams to users, allowing them to play simultaneously on a single screen. The core problem addressed is how to select the appropriate bitrate for multiple video streams on a single screen, considering the user’s viewing environment, viewing device, and network bandwidth.

The similarities between the aforementioned paper and the current paper are as follows:

The optimization objective is the same: to maximize users’ QoE.

The simulation software used is the same: the open-source network simulation software NS3.

The evaluation methods for video

P S N R

and

M S S I M

values are the same: upsampling the received video data to the original video resolution, followed by

P S N R

and

M S S I M

comparison.

Both articles consider the physiological limitations of the human eye. However, the paper “EAAT: Environment-Aware Adaptive Transmission for Split-Screen Video Streaming” primarily focuses on the human eye’s ability to discern the resolution of physical display devices. This paper, on the other hand, places more emphasis on the range of human attention.

The differences between the aforementioned paper and the current paper are as follows:

The research content is different: "EAAT: Environment-Aware Adaptive Transmission for Split-Screen Video Streaming" focuses on the transmission of multiple video streams playing on the same screen, where each video stream has a fixed playback area on the screen. Since multiple media streams are played on one screen, the resolution of the playback device affects users’ QoE. This paper maximizes users’ QoE by dynamically selecting the appropriate bitrate for each video stream based on the user’s viewing environment.

The current paper studies the adaptive transmission of 360° videos, aiming to maximize users’ QoE by dynamically segmenting the video data into tiles of varying numbers, sizes, and bitrates in real time.

The work in [34] is another previously published article of ours. The main focus of that paper is wireless cellular networks, which have two transmission modes: unicast, a point-to-point transmission mode, and multicast, a point-to-multipoint transmission mode. This paper mainly addresses three issues: how to select the transmission mode for users when a large number of users request video data, how to choose the appropriate bitrate for unicast users, and how to select the appropriate group and bitrate for multicast users.

The similarities between the aforementioned paper and the current paper are as follows:

The optimization objective is the same: to maximize users’ QoE.

The simulation software used is the same: the open-source network simulation software NS3.

The differences between the aforementioned paper and the current paper are as follows:

The research content is different: the types of video transmission, the number of users, and the problems being solved are different. “An Optimized Hybrid Unicast/Multicast Adaptive Video Streaming Scheme Over MBMS-Enabled Wireless Networks” focuses on the optimization of hybrid unicast/multicast transmission. This paper focuses on the issues of tile segmentation and bitrate allocation.

The methods used to solve the problems are different.

References

Yang, S.; He, Y.; Zheng, X. Fovr: Attention-based vr streaming through bandwidth-limited wireless networks. In Proceedings of the 2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), Boston, MA, USA, 10–13 June 2019; pp. 1–9. [Google Scholar]
Dziembowski, A.; Mieloch, D.; Jeong, J.Y.; Lee, G. Immersive Video Postprocessing for Efficient Video Coding. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 4349–4361. [Google Scholar] [CrossRef]
Wang, H.; Long, Z.; Dong, H.; Saddik, A.E. MADRL-Based Rate Adaptation for 360° Video Streaming with Multi-Viewpoint Prediction. IEEE Internet Things J. 2024, 1, 1. [Google Scholar] [CrossRef]
Li, J.; Han, L.; Zhang, C.; Li, Q.; Liu, Z. Spherical Convolution Empowered Viewport Prediction in 360 Video Multicast with Limited FoV Feedback. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 19, 1551–6857. [Google Scholar] [CrossRef]
Xiao, H.; Xu, C.; Feng, Z.; Ding, R.; Yang, S.; Zhong, L.; Liang, J.; Muntean, G.-M. A Transcoding-Enabled 360° VR Video Caching and Delivery Framework for Edge-Enhanced Next-Generation Wireless Networks. IEEE J. Sel. Areas Commun. 2022, 40, 1615–1631. [Google Scholar] [CrossRef]
Liu, H.; Ma, W.; Ruan, Z.; Fang, C.; Shang, F.; Liu, Y.; Wang, L.; Wang, C.; Jiang, D. A single frame and multi-frame joint network for 360-degree panorama video super-resolution. Eng. Appl. Artif. Intell. 2024, 134, 108601. [Google Scholar] [CrossRef]
Yang, T.; Tan, Z.; Xu, Y.; Cai, S. Collaborative Edge Caching and Transcoding for 360° Video Streaming Based on Deep Reinforcement Learning. IEEE Internet Things J. 2022, 9, 25551–25564. [Google Scholar] [CrossRef]
Yu, Z.; Liu, J.; Liu, S.; Yang, Q. Co-Optimizing Latency and Energy with Learning Based 360° Video Edge Caching Policy. In Proceedings of the 2022 IEEE Wireless Communications and Networking Conference (WCNC), Austin, TX, USA, 10–13 April 2022; pp. 2262–2267. [Google Scholar]
Ye, Z.; Li, Q.; Ma, X.; Zhao, D.; Jiang, Y.; Ma, L.; Yi, B.; Muntean, G.-M. VRCT: A Viewport Reconstruction-Based 360° Video Caching Solution for Tile-Adaptive Streaming. IEEE Trans. Broadcast. 2023, 69, 691–703. [Google Scholar] [CrossRef]
Chen, J.; Luo, Z.; Wang, Z.; Hu, M.; Wu, D. Live360: Viewport-Aware Transmission Optimization in Live 360-Degree Video Streaming. IEEE Trans. Broadcast. 2023, 69, 85–96. [Google Scholar] [CrossRef]
Luo, Z.; Chai, B.; Wang, Z.; Hu, M.; Wu, D. Masked360: Enabling Robust 360-degree Video Streaming with Ultra Low Bandwidth Consumption. IEEE Trans. Vis. Comput. Graph. 2023, 29, 2690–2699. [Google Scholar] [CrossRef]
Jin, Y.; Liu, J.; Wang, F.; Cui, S. Ebublio: Edge-Assisted Multiuser 360° Video Streaming. IEEE Internet Things J. 2023, 10, 15408–15419. [Google Scholar] [CrossRef]
Zeng, J.; Zhou, X.; Li, K. MADRL-Based Joint Edge Caching and Bitrate Selection for Multicategory 360° Video Streaming. IEEE Internet Things J. 2024, 11, 584–596. [Google Scholar] [CrossRef]
Zeng, J.; Zhou, X.; Li, K. Towards High-Quality Low-Latency 360° Video Streaming with Edge-Client Collaborative Caching and Super-Resolution. IEEE Internet Things J. 2024. early access. [Google Scholar] [CrossRef]
Xu, R.; Liu, C.; Hu, M.; Qian, S.; Zhang, Y.; Lin, T. OMMS: Multiple Control based Adaptive 360° Video Streaming. In Proceedings of the 15th ACM Multimedia Systems Conference, Bari, Italy, 15–18 April 2024; pp. 429–434. [Google Scholar] [CrossRef]
Gao, N.; Liu, G.; Feng, M.; Hua, X.; Jiang, T. Non-orthogonal Multiple Access Enhanced Scalable 360-degree Video Multicast. IEEE Trans. Multimed. 2024. early access. [Google Scholar] [CrossRef]
Ao, A.; Park, S. Applying Transformer-Based Computer Vision Models to Adaptive Bitrate Allocation for 360° Live Streaming. In Proceedings of the 2024 IEEE Wireless Communications and Networking Conference (WCNC), Dubai, United Arab Emirates, 21–24 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
Li, N.; Liu, Y. VertexShuffle-Based Spherical Super-Resolution for 360-Degree Videos. ACM Trans. Multimed. Comput. Commun. Appl. 2024. accepted. [Google Scholar] [CrossRef]
Guo, C.; Zhao, L.; Cui, Y.; Liu, Z.; Ng, D.W.K. Power-Efficient Wireless Streaming of Multi-Quality Tiled 360 VR Video in MIMO-OFDMA Systems. IEEE Trans. Wirel. Commun. 2021, 20, 5408–5422. [Google Scholar] [CrossRef]
Long, K.; Cui, Y.; Ye, C.; Liu, Z. Optimal Wireless Streaming of Multi-Quality 360 VR Video By Exploiting Natural, Relative Smoothness-Enabled, and Transcoding-Enabled Multicast Opportunities. IEEE Trans. Multimed. 2021, 23, 3670–3683. [Google Scholar] [CrossRef]
Guo, C.; Cui, Y.; Liu, Z. Optimal Multicast of Tiled 360 VR Video. IEEE Wirel. Commun. Lett. 2019, 8, 145–148. [Google Scholar] [CrossRef]
Zhao, L.; Cui, Y.; Liu, Z.; Zhang, Y.; Yang, S. Adaptive Streaming of 360 Videos with Perfect, Imperfect, and Unknown FoV Viewing Probabilities in Wireless Networks. IEEE Trans. Image Process. 2021, 30, 7744–7759. [Google Scholar] [CrossRef]
Li, Z.; Wang, Y.; Liu, Y.; Li, J.; Zhu, P. JUST360: Optimizing 360-Degree Video Streaming Systems with Joint Utility. IEEE Trans. Broadcast. 2024, 70, 468–481. [Google Scholar] [CrossRef]
Zeynali, A.; Hajiesmaili, M.H.; Sitaraman, R.K. BOLA360: Near-optimal View and Bitrate Adaptation for 360-degree Video Streaming. In Proceedings of the 15th ACM Multimedia Systems Conference, Bari, Italy, 15–18 April 2024; pp. 12–22. [Google Scholar] [CrossRef]
Xie, Y.; Zhang, Y.; Lin, T. Deep Curriculum Reinforcement Learning for Adaptive 360° Video Streaming with Two-Stage Training. IEEE Trans. Broadcast. 2024, 70, 441–452. [Google Scholar] [CrossRef]
Wang, Y.; Li, J.; Li, Z.; Shang, S.; Liu, Y. Synergistic Temporal-Spatial User-Aware Viewport Prediction for Optimal Adaptive 360-Degree Video Streaming. IEEE Trans. Broadcast. 2024, 70, 453–467. [Google Scholar] [CrossRef]
Wang, H.; Dong, H.; Saddik, A.E. Tile-Weighted Rate-Distortion Optimized Packet Scheduling for 360° VR Video Streaming. IEEE Intell. Syst. 2024, 39, 60–72. [Google Scholar] [CrossRef]
Zhao, Y.; Wang, X.; Wang, G.; He, R.; Zou, Y.; Zhao, Z. Channel estimation and throughput evaluation for 5G wireless communication systems in various scenarios on high speed railways. China Commun. 2018, 15, 86–97. [Google Scholar] [CrossRef]
Xiph.Org. Xiph.Org Video Test Media [Derf’s Collection]. Available online: http://media.xiph.org/video/derf/ (accessed on 5 June 2024).
Reichel, J.; Schwarz, H.; Wien, M. Joint Scalable Video Model 11 (JSVM 11); JVT-X202; Joint Video Team: Geneva, Switzerland, 2007; p. 23. [Google Scholar]
Zhang, L.; Suo, Y.; Wu, X.; Wang, F.; Chen, Y.; Cui, L.; Liu, J.; Ming, Z. TBRA: Tiling and bitrate adaptation for mobile 360-degree video streaming. In Proceedings of the 29th ACM International Conference on Multimedia, Online, 20–24 October 2021; pp. 4007–4015. [Google Scholar]
Nguyen, D.V.; Tran, H.T.; Pham, A.T.; Thang, T.C. An optimal tile-based approach for viewport-adaptive 360-degree video streaming. IEEE J. Emerg. Sel. Top. Circuits Syst. 2019, 9, 29–42. [Google Scholar] [CrossRef]
Guo, X.; Gong, J.; Liang, W.; Wang, W.; Que, X. EAAT: Environment-Aware Adaptive Transmission for Split-Screen Video Streaming. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 4355–4367. [Google Scholar] [CrossRef]
Guo, J.; Gong, X.; Liang, J.; Wang, W.; Que, X. An Optimized Hybrid Unicast/Multicast Adaptive Video Streaming Scheme Over MBMS-Enabled Wireless Networks. IEEE Trans. Broadcast. 2018, 64, 791–802. [Google Scholar] [CrossRef]

Figure 1. The proposed non-uniform coding method.

Figure 2. The 360° adaptive transmission architecture with non-uniform coding.

Figure 3. The relationship between the attention area and human eyes.

Figure 4. PSNR values when watching different videos. (a)

P S N R

values when watching the video ‘Boat’. (b)

P S N R

values when watching the video ‘Crosswalk’.

Figure 4. PSNR values when watching different videos. (a)

P S N R

values when watching the video ‘Boat’. (b)

P S N R

values when watching the video ‘Crosswalk’.

Figure 5.

M S S I M

values when watching different videos. (a)

M S S I M

values when watching the video “Boat”. (b)

M S S I M

values when watching the video “Crosswalk”.

Figure 5.

M S S I M

values when watching different videos. (a)

M S S I M

values when watching the video “Boat”. (b)

M S S I M

values when watching the video “Crosswalk”.

Figure 6. CDF values when watching different videos. (a) CDF values when watching the video “Boat”. (b) CDF values when watching the video “Crosswalk”.

Figure 7. QoE values when watching different videos. (a) QoE values when watching the video ‘Boat’. (b) QoE values when watching the video ‘Crosswalk’.

Figure 8. MoS when watching different videos. (a) MoS when watching the video “Boat”. (b) MoS when watching the video “Crosswalk”.

Figure 9. Bitrates when watching different videos. (a) Bitrates when watching the video “Boat’. (b) Bitrates when watching the video “Crosswalk’.

Table 1. Simulation parameters.

System Bandwidth	20 MHz
Number of RBs	0–200
BS Tx Power	30 dBm
Subcarriers per RB	12
Subcarrier Spacing	15 KHz
Bandwidth per RB	180 KHz
End-to-End RTT	100 ms
Pathloss Model	COST 231 HATA urban
Fading Model	Rayleigh Fading
Antenna Type	Omnidirection
Doppler Shift	30 Hz
Thermal Noise Density	−174 dBm/Hz
Modulation/Coding Rate Settings	M-QAM
TCP Layer	TCP SACK
TCP Receive Window	65,535 Bytes
Distance From Base Station	2 km

Table 2. Playback stability.

PS	Boat	Crosswalk
NETBS	0.8952	0.9475
BASE-M	1	1
NETBS-S	0.8752	0.9203
TBRA	0.8031	0.8262
Option-2	0.8149	0.8302
Live360	0.8912	0.9455
FFT	0.9856	0.9912

Table 3. Execution times.

TIME/Microsecond	Boat	Crosswalk
NETBS	402.254	424.671
BASE-M	0.672	1.239
NETBS-S	356.241	402.576
TBRA	504.235	428.245
Option-2	321.245	421.487
Live360	312.125	264.588
FFT	1.875	3.698

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, J.; Li, S.; Zhu, J.; Li, X.; Sun, B.; Feng, W. Adaptive Transmission Strategy for Non-Uniform Coding of 360^∘ Videos. Electronics 2024, 13, 3266. https://doi.org/10.3390/electronics13163266

AMA Style

Guo J, Li S, Zhu J, Li X, Sun B, Feng W. Adaptive Transmission Strategy for Non-Uniform Coding of 360^∘ Videos. Electronics. 2024; 13(16):3266. https://doi.org/10.3390/electronics13163266

Chicago/Turabian Style

Guo, Jia, Shiqiang Li, Jinqi Zhu, Xiang Li, Bowen Sun, and Weijia Feng. 2024. "Adaptive Transmission Strategy for Non-Uniform Coding of 360^∘ Videos" Electronics 13, no. 16: 3266. https://doi.org/10.3390/electronics13163266

APA Style

Guo, J., Li, S., Zhu, J., Li, X., Sun, B., & Feng, W. (2024). Adaptive Transmission Strategy for Non-Uniform Coding of 360^∘ Videos. Electronics, 13(16), 3266. https://doi.org/10.3390/electronics13163266

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Transmission Strategy for Non-Uniform Coding of 360^∘ Videos

Abstract

1. Introduction

2. Related Works

3. Architecture and Model

3.1. Non-Uniform Coding Transmission System

3.2. System Model

3.3. Complexity Analysis

4. Algorithm

The Complexity Analysis of NETBS

5. Simulation

5.1. Experimental Method

5.2. Experimental Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Additional Information

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI