**1. Introduction**

In recent years, the number of various types of surveillance and data collection cameras located both indoors and outdoors have been constantly increasing. Popularity of home security cameras is also growing as even high-quality models become more affordable. Typical surveillance cameras applications include public safety, protection of facilities against theft or vandalism, remote video monitoring, traffic surveillance, weather monitoring, or more special cases, such as animal monitoring or data collection, for statistical or marketing purposes. Today, due to the pandemic situation, face recognition with and without a protective mask is also becoming a point of interest for researchers in cooperation with technology companies [1–3]. It is important to realize that each such employed sensor produces a tremendous amount of data to be subsequently transmitted over the network or further processed, which calls for effective video compression. Furthermore, whether the image or video is presented to a live person or a machine learning algorithm (most often for its classification or segmentation), the best results can be achieved when the image is of the highest achievable quality. This implies one common goal for the distributors, communication service providers, or even broadcasting companies, to optimally set the compression parameters so that perceived video quality is maximal, while the bandwidth requirements are minimal. This challenge leads to increased interest in the analysis of video content followed by the individual setting of the compression parameters of video sequences with different types of scene content. Even though many

**Citation:** Uhrina, M.; Holesova, A.; Sevcik, L.; Bienik, J. Impact of Scene Content on High Resolution Video Quality. *Sensors* **2021**, *21*, 2872. https://doi.org/10.3390/s21082872

Academic Editor: Carlos Tavares Calafate

Received: 1 March 2021 Accepted: 13 April 2021 Published: 19 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

studies deal with the video quality assessment using subjective methods, the demand exceeds the supply; there is still a lack of video quality datasets, as well as recorded subjective tests, conducted on these datasets. Very popular and extensively used datasets, such as References [4–27], come from the University of Texas and were developed by the Laboratory for Image and Video Engineering. Another very popular option is the VQEG-HDTV database [28], which is a result of international project of VQEG (Video Quality Experts Group) consortium. Other well-known datasets are BVI-HD [29], BVI textures [30] and BVI-HFR [31] developed at the University of Bristol, AVT-VQDB-UHD-1 database [32] made by the Ilmenau University of Technology, Ultra Video Group (UVG) dataset [33] composed at the Tampere University, SJTU 4K video quality database [34] from the Shanghai Jiao Tong University, Image and Video Processing Subjective Quality Video Database [35] developed at the Chinese University of Hong Kong, collection of IRC-CyN/IVP databases from the Institut de Recherche en Communications et Cybernétique de Nantes [36], Konstanz Natural Video Database (KoNViD-1k) made by the Universitaet Konstanz [37,38], MCL-V [39], and [40] databases from the MSC University of Southern California, Scalable Video Database [41,42] composed at the EPFL, ReTRiEVED Video Quality Database [43] made by the Universita Degli Studi or TUM databases [44,45] developed by the Technical University of Munich. Taking into account the demand and importance of measuring the performance of video quality assessment techniques, a number of studies on perceptual evaluation was presented. Most of them merely compare the quality of video sequences with various characteristics evaluated by different subjective methods and do not examine the content aspect. Rerabek et al. [46] examined a rate-distortion performance analysis and mutual comparison of one of the latest video coding standards H.265/HEVC with VP9 codec. Ramzan et al. [47] presented a performance evaluation of three coding standards—Advanced Video Coding (H.264/MPEG-AVC), High-Efficiency Video Coding (H.265/MPEG-HEVC), and VP9, based on subjective and objective quality evaluations. Two different sequences at both resolutions (Full HD and Ultra HD) were tested using the DSIS method. Bienik et al. [48] measured the impact of the compression formats, namely H.264, H.265, and VP9 on perceived video quality. The evaluation was performed on four Full HD sequences using the Absolute Category Rating (ACR) and DSCQS methods. Xu et al. [49] presented a subjective video quality assessment on 4K Ultra-High Definition (UHD) videos using the DSCQS method. Six different test sequences were used for the evaluation.Herrou et al. [50] focused on a performance comparison between HEVC and VP9 in the HDR context through both objective and subjective evaluations. Dumic et al. [51] offered findings on subjective assessment of H.265 versus H.264 Video Coding for High-Definition Video Systems. For the evaluation, a database consisting of 120 degraded HD video sequences with 4 contents encoded at various compression rates to H.265/HEVC and H.264/AVC formats was compiled. Milovanovic et al. [52] subjectively compared the coding efficiency of three video coding standards (MPEG-H HEVC, H.264/MPEG-4 AVC, and H.262/MPEG-2). Sotelo et al. [53] presented a subjective quality assessment of HEVC/H.265 compressed 4K Ultra-High-Definition (UHD) videos in a laboratory viewing environment. Kufa et al. [54] explored coding efficiency performance of High Efficiency Video Coding (HEVC) and VP9 compression formats on video content in Full HD and UHD resolutions. Deep et al. [55] focused on the comparison of HEVC and VP9 based on both subjective and objective evaluation on various (720p, 1080p, and 2160p) test videos. Akyazi et al. [56] examined the compression efficiency of HEVC/H.265, VP9, and AV1 codecs based on subjective quality assessment. Our survey of research papers shows that there is still a lack of databases of video sequences annotated according to subjective evaluation. Therefore, this paper brings new subjective results and also explores the impact of the video content on the subjective assessment. We decided to compare today's most used compression standards—H.264/AVC and H.265/HEVC—on video sequences at Full HD and Ultra HD resolutions. Our publication follows Reference [57], where a new 4K video dataset was compiled with full subjective scores (Mean Opinion Score (MOS)) of videos at different bitrates compressed by HEVC/H.265 codec evaluated by the Double

Stimulus Impairment Scale (DSIS) method, variant II. For our measurements, we decided to use the Absolute Category Rating (ACR) method.

#### **2. Dataset Description and Preparation**

*2.1. Dataset Description*

For our measurements, we used the dataset from the Media Lab of the Shanghai Jiao Tang University [34]. We selected eight sequences with various scene content, illustrated in Figure 1, classified according to the Temporal Information (TI) and Spatial Information (SI) from this database. SI defines the amount of spatial detail in an image and is higher for more spatially complex scenes, while TI represents the number of temporal changes in a video sequence and is higher for high motion sequences [58]. The spatial perceptual information is based on the Sobel filter and is represented by the formula:

$$SI = \max\_{time} [std\_{space} |Solid(F\_n)]|\_\prime \tag{1}$$

where *Fn* stands for video frame, and *stdspace* for the standard deviation over the pixels in each Sobel-filtered frame. The temporal information is computed as:

$$TI = \max\_{time} [\text{std}\_{space}[\text{M}\_n(i, j)]],\tag{2}$$

where *Mn*(*i*, *j*) is the difference between pixels at the same position in the frame belonging to two consecutive frames, i.e.,

$$M\_{\rm tr}(i,j) = F\_{\rm tr}(i,j) - F\_{n-1}(i,j),\tag{3}$$

where *Fn*(*i*, *j*) is the pixel at the *i*-th row, and *j*-th column of *n*-th frame in time [58]. Both of these parameters were calculated for each sequence using the Mitsu tool [59] and plotted in Figure 2. The general specification of the dataset is given in Table 1, and the content of individual sequences is briefly described in Table 2.
