**3. Technical Background**

#### *H.265/HEVC*

H.265/HEVC standard was released in 2013 by JCT-VC, which can reduce bit-rates by about 50% over H.264. In addition, H.265/HEVC adopts hybrid video compression technology, and the typical structure of H.265/HEVC encoder is shown in Figure 1. The main modules of H.265/HEVC encoder include: (1) Intra-prediction and inter-prediction, (2) transform (T), (3) quantization (Q) and (4) context-adaptive binary arithmetic coding (CABAC) entropy coding. Moreover, inter-prediction and intra-prediction modules are used to decrease the spatial and temporal redundancy. Transform and quantization modules are used to decrease visual redundancy. The entropy coding module is used to decrease the information entropy redundancy. It is noted that the inter-prediction module is the most critical tool, which consumes about 50% computation complexity. Then, in order to achieve real-time coding, the computation complexity of H.265/HEVC encoder should be reduced by decreasing spatiotemporal redundancy.

**Figure 1.** The structure of high efficiency video coding (HEVC, or H.265) encoder.

The video frame is divided into a lot of coding tree units (CTUs) in H.265/HEVC standard. A CTU includes a coding tree block (CTB) of the luma samples, two CTBs of the chroma samples, and associated syntax elements. The CTU size can be adjusted from 16 × 16 to 64 × 64. Each CTU can be divided into four square CUs, and a CU can be recursively divided into four smaller CUs. A CU consists of a coding block (CB) of the luma samples, two CBs of the choma samples, and the associated syntax elements. The CU size can be 8 × 8, 16 × 16, 32 × 32, or 64 × 64. Figure 2 shows an example of the CTB structure for a given CTU. The CTU in Figure 2a is divided into different sized CUs. Correspondingly, the CTB structure is shown in Figure 2b. In each depth of CTB, the rate-distortion (RD) cost of each node is checked until the RD cost is minimum.

**Figure 2.** coding tree unit (CTU) partitioning and coding tree block (CTB) structure.

The prediction unit (PU) can be transmitted in the bitstream, which identifies the prediction mode of CU. A PU consists of a prediction block (PB) of the luma, two PB of the chroma, and associated syntax elements. Figure 3 shows the eight partition modes that may be used to define the PUs for a CU in H.265/HEVC inter-prediction. For a CU configured to use inter-prediction, all eight partitions include four symmetry modes (2*N* ∗ 2*N*, 2*N* ∗ *N*, *N* ∗ 2*N*, *N* ∗ *N*) and four asymmetric modes (2*N* ∗ *nU*, 2*N* ∗ *nD*, *nL* ∗ 2*N*, *nR* ∗ 2*N*).

A CU can be recursively divided into transform units (TUs) according to the quadtree structure, and CU is the root of the quadtree. The TU is a basic representative block having residual or transform coefficients. In TU, one syntax element named coded block flag (cbf) indicates whether at least one non-zero transform coefficient is transmitted for the whole CU. When there is a non-zero coefficient, cbf is equal to 0. When there is no non-zero coefficients, cbf is equal to 1. Moreover, cbf is an important factor for the CU size decision [14].

**Figure 3.** PU modes in H.265/HEVC inter-prediction.

The advantage of block partitioning structure is that the arbitrary size of CTU enables the codec to be readily optimized for various contents, applications, and devices. However, the recursive structure of coding block causes lots of redundant computing. In order to support the real-time video transmission over VANETs, the redundant computing of the H.265/HEVC encoder should be decreased significantly.

#### **4. The Proposed Low-Complexity and Hardware-Friendly H.265/HEVC Encoder for VANETs**

#### *4.1. The Novel Spatiotemporal Neighborhood Set*

The object motion is regular in video sequences and there is some continuity in the depth between adjacent CUs. If the depth range of the current CU can be inferred from the encoded neighboring CU, then some hierarchical partitioning is directly skipped or terminated. Therefore, the computational complexity has been reduced, significantly.

In order to utilize the spatiotemporal correlation, the four neighborhood set G is defined as

$$\mathcal{G} = \{ \mathbf{C} \mathbf{U}\_{L\prime} \mathbf{C} \mathbf{U}\_{TL\prime} \mathbf{C} \mathbf{U}\_{TR\prime} \mathbf{C} \mathbf{U}\_{\mathcal{C}O} \}. \tag{1}$$

Set *G* is shown in Figure 4, where *CUL*, *CUTL*, *CUTR*, and *CUCO* denote the left, top-left CU, top-right, and collocated of the current CU, respectively.

**Figure 4.** Spatiotemporal neighborhood set.

#### *4.2. CTU Depth Decision*

For video compression techniques, a smooth coding block popularly has the smaller CU depth. By contrast, the larger depth value is suitable for a complex area. Previous works show that the object motion in the same frame remains directional, and the motion and texture of the neighboring CUs are

similar. In this work, the depths of neighboring CTU in the set *G* are used to predict the depth range of current CTU, and the predicted depth of current CTU is calculated as

$$\widehat{Depr}\_{CTU} = \sum\_{k=0}^{3} \theta\_k \times \text{Depp}\_{k\prime} \tag{2}$$

where *k* is the index of neighboring CTU in set *G*, *Depk* is the depth of neighboring CTU in the set *G*, and *θ<sup>k</sup>* is a weight factor of neighboring CTU's depth, respectively, in the set *G*. In the H.265/HEVC standard, the range of CTU is depth 0, 1, 2, and 3. Hence, the calculated depth of the current CTU (*Dep* -*CTU*) satisfies

$$
\widehat{Dep}\_{\text{CTU}} \le 3.\tag{3}
$$

In Equation (3), *Depk* ≤ 3. Therefore, weight factor *θ<sup>k</sup>* satisfies

$$\sum\_{i=0}^{3} \theta\_k \le 1. \tag{4}$$

If the range of current CTU is depth 0, 1, 2, and 3, then the sum of weight factor *θ<sup>k</sup>* is 1 in this work. Moreover, Zhang's work confirms that, when the weight factor of the spatial neighboring CTU's depth is more than the weight factor of the temporal neighboring CTU's depth, the calculated CTU depth is closer to the actual depth of the current CTU [29]. In this work, each weight factor of the spatial neighboring CTU's depth is equal, and the weight factor of spatial neighboring CTU's depth is more than the weight factor of temporal CTU's depth. Then *θ<sup>k</sup>* satisfies

$$
\theta\_k = \begin{cases} \ 0.3, & \text{if } k = 0, 1, 2 \\ \ 0.1, & \text{if } k = 3 \end{cases} . \tag{5}
$$

However, the calculated value of *Dep* -*CTU* is a non-integer most of the time. It is not suitable to directly predict the depth of current CTU by the value of *Dep* -*CTU*. Therefore, the rule of CTU depth range has been formulated as Table 1, and the depth range of current CTU can be generated with the value of *Dep* -*CTU*.


**Table 1.** The CTU depth range.

Due to the predicted depth of the current CTU, each CTU can be divided into three types: *T*1, *T*2, *T*3. The CTU depth range can be decided from Table 1. The expressions of the relation between CTU type, *Dep* -*CTU*, and CTU depth are as follows.


and is classified as type *T*2. In this case, the minimum depth of current CTU *Depmin* is equal to "1", and the maximum depth of current CTU *Depmax* is equal to "3".

(3) when the predicted depth of current CTU *Dep* -*CTU* satisfies 2.5 < *Dep* -*CTU* ≤ 3, it means that the motions of neighboring CTUs are intense and the depths of neighboring CTUs are high. The current CTU belongs to the fast motion region and is classified as type *T*3.

In this case, the minimum depth of current CTU *Depmin* is equal to "2", and the maximum depth of current CTU *Depmax* is equal to "3".
