The VIoT has already progressed with a set of global standards for MPEG and HEVC. Due to fundamental differences in video streams, adopting video compression in VIoT is an open research issue. In general, while compressing entertainment films, the high-definition reproduction of virtually all images is essential for comfortably watching video content. The ultimate goal of VIoT applications is to preserve the relevant information rather than the pixel value. As a result, keeping the spatial contents and settings of VIoT data is the essential quality for VIoT video compression and sensor data processing. Novel video compression algorithms fundamentally differ from current MPEG and HEVC standards that should maintain the best interpretation and background of the sensor data throughout the compression process. Recently, the development of a new video coding standard for machine communications produced enormous amounts of information for various applications. Because of the unique features of visual sensor data, VIoT systems can reveal more insights than those typical of IoT systems. These unique features enable VIoT systems to reach a wide range of new application industries by adding new aspects to existing IoT applications. The obtained visual sensor data can be pooled, analyzed, and interpreted using modern data modeling, machine learning, and supervised learning techniques. The resulting knowledge, which includes the recognition of behaviors and trends, reveals new perspectives that can affect each aspect of our lives, from better congestion control to crime prevention and from primary prevention to environmental protection [
11,
12,
13].
2.1. LVT Model
Lossy Video Transmission (LVT) simulator is a framework for analyzing the impact of network congestion on the segmentation of video frames (at WVSNs) (on the decoder side). The main goal of WSNs in system simulators is scalability (i.e., their ability to grow) capability to manage large groups of nodes. Because LVT focuses solely on image quality assessments, it can work enormous sets of simulations with multiple images, procedures, and loss patterns. Simply put, LVT simulation consists of five major components or stages [
14,
15].
(1) Forward Feature Extraction: Using an image processing algorithm on an input image. This method yields a customized rendition of the original image [
16].
(2) Packetization: The use of a packetization system. This step connects processed image data to packets.
(3) Packet loss Modeling: This stage simulates data loss during network transmission. The losses are created randomly or by inputting a loss pattern file.
(4) Depacketization: The packetization scheme’s inverse application.
(5) Reverse Image Acquisition: The process of running the inverse image processing method. As a result, a version of the source image is created with some lost sequences.
(6) Error Hiding: An error concealment approach might be used to fill the gaps.
(A) LVT simulated model
(i) Proposed simulation model: The models used are fundamental. An incoming video frame I is an (L, B) matrix, I = {Ir,c}, with r, c ϵ Χ ˄ 0 ≤ r ˂ L ˄ 0 ≤ c ˂ B; each Ir,c pixel has its b bits in each pixel, where b ϵ R+ considers the communication system Ґ by transmitting I in ⌊(L × B × b)/m⌋ with packets P, and where m states the number of bits allocated for video data transfer in a packet. During communication, each packet pl has a chance of being lost. Various loss models can be employed to accomplish this. Because packet drops are expected to occur over a wireless channel with one or more intermediary nodes, the path characteristics should be irrelevant to the simulation (clearly, the number of nodes and customized communication protocols may alter the loss rate). Averaging the well-received neighboring pixels yield an estimate of lost data for error concealing.
(ii) Encoding of Frames: As previously noted, the earliest form of LVT included a frame coding error-resilient computation technique. In a typical single block-based transmission, the image is first divided into blocks Fi,j, where Lf and Bf are the length and breadth of the block frame, and Fi,j = {Irf,cf}, where, (j + 1).
We allocate and deliver the ith block to the packet in a sequential transmission, with . This sequence is disrupted by interleaving. It is a bijective function: I, where is defined as a new bitmap where all original blocks Fi,j can be kept in position (). An improved model covers sequential processes on a low-resource network (requiring less memory and calculations). During the packetization process, semi-pixel intensities are produced, but interleaving methods are utilized to select the data to put into the under-construction packets.
(B) Video Quality Assessment
LVT aims to support the measurement of the quality of produced image frames in WVSNs. Both subjective and objective assessment indicators for quality evaluation are conducted. Direct visualization of the rebuilt frames provides subjective judgment.
(i) Peak Signal-to-Noise Ratio
As shown below, the peak signal-to-noise ratio (PSNR) is employed to evaluate the restoration quality of the suggested image restoration by SR, wherein MSE denotes the mean squared error.
(ii) Mean absolute deviation (MAD)
The mean absolute deviation IQA model is formulated as a deviation of spatial regions from its average values in Equations (2) and (3):
where
xi = pixel values;
= Mean value;
n = number of pixels value;
(iii) Structural similarity index (SSIM)
The structural similarity index (SSIM) is used to compare the resemblance of low input resolution and high-resolution images using orthogonal quantitative measures such as luminance (
µ) and contrast (
σ) as follows and as shown in Equations (4) and (5)
where
C1 and
C2 are constants, the picture structure is determined by normalizing, as illustrated in Equation (6):
The measure of the structural similarity is evaluated based on its correlations.