PC-SC: A Predictive Channel-Based Semantic Communication System for Sensing Scenarios

Sun, Yutong; Zhang, Jianhua; Wang, Jialin; Yu, Li; Zhang, Yuxiang; Liu, Guangyi; Xie, Guofu; Li, Ji

doi:10.3390/electronics12143129

Open AccessArticle

PC-SC: A Predictive Channel-Based Semantic Communication System for Sensing Scenarios

by

Yutong Sun

¹

,

Jianhua Zhang

^1,*,

Jialin Wang

¹,

Li Yu

¹,

Yuxiang Zhang

¹

,

Guangyi Liu

²,

Guofu Xie

³ and

Ji Li

⁴

¹

State Key Lab of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

²

Future Research Laboratory, China Mobile Research Institute, Beijing 100053, China

³

Hunan Xiangjiang Intelligent Science and Technology Innovation Center Co., Ltd., Changsha 410000, China

⁴

The State Radio Monitoring_Center Testing Center, Beijing 100037, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(14), 3129; https://doi.org/10.3390/electronics12143129

Submission received: 1 June 2023 / Revised: 7 July 2023 / Accepted: 17 July 2023 / Published: 19 July 2023

(This article belongs to the Special Issue Semantic Communications and Intellicise Networks: A Themed Issue in Honor of Prof. Ping Zhang)

Download

Browse Figures

Versions Notes

Abstract

:

Due to its significant efficiency, semantic communication emerges as a promising technique for sixth-generation (6G) networks. The wireless propagation channel plays a crucial role in system design, as it directly impacts transmission performance and capability. Given the increasingly complex communication scenarios, the channel exhibits high dynamism and poses challenges in acquisition. In such cases, sensing-based methods have drawn significant attention. To enhance system robustness, we propose a predictive channel-based semantic communication (PC-SC) system tailored for sensing scenarios. The PC-SC system is designed with an orientation toward applications by directly taking semantic targets into account. It comprises three modules: transmitter, predictive channel, and receiver. Firstly, at the transmitter, instead of employing global semantic coding, the scheme emphasizes preserving semantic information through target-based semantic extraction. Secondly, the channel prediction module predicts the dynamic wireless channel by utilizing the extracted target-based semantic information. Finally, at the receiver, the target-based semantic information can be utilized to meet specific application requirements. Alternatively, pre-captured background and semantic targets can be composited to fulfill complete image reconstruction needs. We evaluate the proposed approach by using a sensing image transmission scenario as a case study. Experimental results demonstrate the superiority of the PC-SC system in terms of image reconstruction performance and cost savings of bit. We employ beam prediction as a channel prediction task and find that the targets-based method outperforms the complete image-based approach in terms of efficiency and robustness, which can provide 32% time-saving.

Keywords:

6G communication; semantic targets; semantic communication; channel prediction; sensing scenarios; image transmission

1. Introduction

The sixth-generation (6G) mobile networks meet the increased demands of emerging applications and communication scenarios [1,2]. In order to improve efficiency, semantic communications are being explored, which are related to specific tasks such as text, speech, image, and video transmission. Instead of transmitting information at the bit or symbol level, semantic communications enable transmission at the semantic level [3,4,5,6]. Natural language processing (NLP), speech recognition, and computer vision (CV) focus on semantics extraction, segmentation, and understanding, which have rapid development [7,8]. These approaches enable communication from a semantic perspective and provide simultaneous access to intelligent applications. As an important module in communication systems, the wireless propagation channel meets huge challenges and also evolves towards semanticization [9]. In future networks, the massive number of antennas and high frequencies support increasingly high data rates but also bring high costs and weak penetration ability. In other words, the channel becomes more sensitive to the ever-changing propagation environment and introduces more significant uncertainty in transmission. Channel-related propagation environment semantics (PES) [10] can be used to establish a direct mapping between the propagation environment and the channel, enabling the development of an online predictive network.

The joint source-channel coding (JSCC) methods driven by deep learning (DL) are proposed to consider the noisy channel on the transmission [11]. Ref. [12] performs source coding to first compress the text and then channel coding to add robustness to the transmission across the channel. For image transmission, Ref. [13] concludes that end-to-end distortion performance is better than that of separate source and channel coding systems. In [14], a JSCC system is presented that can directly map image pixel values to complex-valued channel input symbols. For transmission over multipath fading channels with non-linear signal clipping, a JSCC scheme is implemented using convolutional neural networks (CNN) [15] for images. Ref. [16] proposes an autoencoder-based JSCC scheme in order to exploit noisy or noiseless channel output feedback, which can achieve considerable gains in performance compared with the JSCC without feedback. Moreover, the deep JSCC is also designed in [17] to improve the multitask learning efficiency. In these approaches, the channel is formulated with some ideal assumptions. However, in realistic wireless propagation scenarios, the complex propagation environment makes it more challenging to obtain accurate channel information than in ideal conditions.

The variability and complexity of the wireless propagation channel arise from the complex and dynamic propagation environment. Therefore, physical environmental data are an out-of-band information source for channel prediction, which can enhance prediction performance compared to traditional methods without considering environmental information [18]. Due to the diverse arrangements of scatterers in the physical environment, electromagnetic waves can follow different paths during propagation, resulting in wireless propagation channels that vary based on the environment. The use of sensing devices and technologies led to increased attention toward sensing-enhanced channel prediction methods that leverage environmental data [9]. These methods can be combined with machine learning (ML) techniques, enhancing efficiency and adaptability to environmental changes. In [10], PES is defined by representing the environment using features and graphs, which improves the efficiency of beam prediction tasks. Images are commonly used to address the prediction challenge due to their ease of collection. Sensing data can be used to predict beams [19], blockages [20], and channel parameters [21]. For instance, in [22], cameras are employed at millimeter-wave (mmWave) base stations to leverage visual data in selecting the optimal beam, thereby reducing significant training overhead. In summary, robust radio environment-based semantic communication design in such sensing scenarios is crucial for seamless system integration.

As we know, semantics is closely tied to the objectives of specific tasks, making it challenging to establish a general representation of semantics. Encoding semantics based on specific preferences inevitably leads to the loss of other semantic information. For example, global semantic segmentation of images makes it hard to capture the details of individual objects. Therefore, relying solely on semantic encoding may not meet reliability requirements in sensing-based intelligent applications such as remote control, monitoring, and machine-to-machine communication. In such cases, it is essential to consider and extract target-related semantics based on the task’s requirements to preserve the integral semantic information. In this paper, we propose a predictive channel-based semantic communication (PC-SC) system tailored for sensing scenarios. This method takes into account sensing data, including images, videos, point clouds, and sensor information, and focuses on extracting semantic targets. To fulfill the needs of sensing-based applications, only objects relevant to the final intention are considered for transmission and channel prediction. This approach captures the perceived environment and enables dynamic channel prediction. We employ a sensing image transmission task with the proposed method and evaluate it in terms of semantic transmission and channel prediction performance.

The remainder of this paper is organized as follows. Section 2 introduces the problem formulation and the architecture. Then, a detailed description of the proposed method for image transmission is given, including the deep learning-based target extraction, image composition, and target image-based beam prediction in Section 3. The simulations and results are presented in Section 4, which presents the dataset, evaluation metrics, and performance analysis. Section 5 concludes this paper and orders future works.

2. Problem Formulation and Architecture

The widespread adoption of sensing devices has significantly enhanced our ability to perceive and comprehend the physical world. Sensing now becomes a fundamental component of numerous intelligent applications. Common objects such as pedestrians, vehicles, and automated machinery can be captured through sensing and utilized for smart interaction. To enable efficient transmission, a coder-decoder-based semantic communication system is implemented for extracting and reconstructing semantic information. This architecture is widely employed in existing research, as shown in Figure 1. In addition to the conventional approach of global semantic coding and decoding, a target-based semantic extraction can be developed to address specific applications. This process prioritizes the preservation of reliable semantic information by targeting specific areas of interest. While semantic coding aims to provide an abstract representation of semantics, it may result in the loss of crucial details.

Moreover, the wireless propagation channel is influenced by the surrounding environment. Radio waves encounter different objects, giving rise to diverse paths. These paths are formed by various propagation mechanisms, including line-of-sight (LOS) transmission, reflection, and diffraction [23,24]. As a result, the position, size, and material of scatterers directly impact the characteristics of the channel. Channel prediction encompasses various specific tasks, such as beam prediction, blockage prediction, parameter prediction, and channel state information (CSI) prediction. Depending on the specific prediction task, it is important to focus on different propagation environment information. Not all environmental information contributes to precise improvements, whereas irrelevant and redundant information can reduce prediction accuracy.

In sensing scenarios, the dynamic nature of the environment is evident through the changing objects within it. Thus, compared to complete environmental information, the target information inherently captures the variation pattern of the channel in the current scenario. For instance, the path loss is a large-scale parameter that is influenced by the global information of all scatterers and the line-of-sight (LOS) blockage state. Let us assume that

E

represents the set of environmental semantics related to the targets. The relationship between path loss and the environment can be described as

p l \overset{M_{1} (\cdot)}{⟶} E {E_{g l o b a l}, E_{b l o c k a g e}},

(1)

where

p l

presents the path loss,

E_{g l o b a l}

is the global environmental semantics, and

E_{b l o c k a g e}

is the blockage-related environmental semantics.

M_{1}

represents the mapping function of the

p l

and

E

.

When it comes to beam prediction, its objective is to identify and select the optimal beam based on the angles and distances between the transmitter, receiver, and scatterers. Considering the physical environment, the layout

E_{l a y o u t}

and distance

E_{d i s t a n c e}

of the main objects have a substantial influence. Hence, the relationship between the optimal beam

o b

and the semantics of the targets can be expressed as follows:

o b \overset{M_{2} (\cdot)}{⟶} E {E_{l a y o u t}, E_{d i s t a n c e}},

(2)

where

M_{2} (\cdot)

is the mapping function between

o b

and the corresponding

E

.

3. PC-SC for Sensing Image Transmission

In typical system designs for multimedia sources, image generation is often regarded as an appropriate transformation method that involves complete segmentation of image semantics. However, synthetic methods have limitations in terms of scene generalization and require extensive training during the learning phase. In the context of transmitting sensing images, the method comprises three main modules: DL-based semantic target extraction in the transmission phase, target image-based beam prediction, and image composition in the receiving phase, as illustrated in Figure 2.

Considering specific intelligent applications, foreground targets play a crucial role in semantics processing and inference. However, the semantic information must be predefined as a foundation for the design of encoders and decoders. This approach inevitably results in the loss of other certain semantic information. For instance, when using complete image encoding, detailed information about a vehicle may not be fully restored, with more emphasis placed on global semantics. In such cases, region extraction can effectively preserve semantic targets without any loss.

In order to achieve optimal beam prediction, ML approaches are employed to learn the mapping between the extracted semantic images and the channel. Efficiency can be enhanced by eliminating redundant background information and focusing on images of objects or objects’ masks. Furthermore, this method allows for a deeper understanding of multimedia data at the object level. The primary focus of semantic target-based image transmission is to prioritize the relevance of images, minimizing reconstruction distortion while preserving semantic consistency. Additionally, this approach avoids the complexity and challenges associated with semantic coding by directly comprehending multimedia data at the object level, which is particularly beneficial for many intelligent applications.

3.1. Deep Learning-Based Target Extraction

An image can be defined by three essential components: the background, foreground, and mixing coefficients (alpha matte) [25]. Essentially, the original image can be seen as the superimposition of the foreground and background, weighted by the alpha matte. Unlike the segmentation task, which involves classifying pixels, the matting task aims to predict the foreground’s probability regressively. In practice, the method is deployed by utilizing the state-of-the-art image matting method called U

^{2}

-Net, which is proposed in [26]. The U

^{2}

-Net is a two-level nested U-structure designed for salient object detection (SOD), maintaining high-resolution feature maps. The main block is a residual U-block, which can extract features from multiple scales directly from each residual block. Compared to solutions based on prior information such as trimap, rough mask, or pose, prior-free matting is more practical and user-friendly, as it can directly predict the alpha channel based on the original image. Consequently, both the object image and mask image can be obtained simultaneously.

3.2. Image Composition

Instead of generation-based image reconstruction, the proposed method focuses on the separation between foreground and background, where the foreground can be considered the target of interest for semantic information transmission. In practice, based on the pre-acquired background and the live-transmitted target and mask images, the original image can be synthesized through the compositing process.

In this method, only the extracted foregrounds are transmitted to the receiver, which can be represented as the set

F = F_{1}, F_{2}, F_{3}, \dots, F_{n}

. Here,

F_{n}

represents the information of the n-th foreground target. The complete image I can then be formed by combining the received target images with the pre-captured background image B. The mask information is required to determine the weight used for superimposing the target and background images. This information can be expressed as the set

R = R_{1}, R_{2}, R_{3}, \dots, R_{n}

. The image composition can be performed at the receiver by merging the background B with the received target information

F

to obtain the reconstructed complete image I. Mathematically, this can be expressed as

I = C {λ R, F, B},

(3)

where

λ

is the superimposed weight of the mask and C represents the composition function. Given the effortless operation, the received target and existing background image can be composited with almost no computation and can be done in real time.

3.3. Target Image-Based Beam Prediction

The image-based beam prediction establishes a strong connection between images and a pre-defined beam-forming codebook. By learning from paired image and beam index data, the image can be mapped to a unique class, which corresponds to the beam index in the codebook. The cornerstone of this approach is the implementation of a residual network with 18 layers (ResNet-18) [27], which extensively utilizes convolution operations for image processing.

The residual network (ResNet) architecture incorporates residual blocks, which consist of batch normalization layers and rectified linear units, leading to significant improvements in feed-forward efficiency. Unlike the sequential flow of information in traditional networks, ResNet introduces shortcut or residual connections that directly add the input of a layer to the output of a subsequent layer. This creates a “skip connection” that enables the network to learn residual functions, representing the difference between the input and output of a layer. The ResNet architecture can be represented by the following formula:

y = G (x, w_{i}) + x,

(4)

where x and y are the input and output of the layer.

G (x, w_{i})

represents the residual function that the layer is trying to learn.

In order to adapt the model for the task of image-based beam prediction, the final fully connected layer is substituted with a layer containing the same number of neurons as the classes for beam prediction. This modification enables the model to learn a new classification function that directly maps an image to a beam index. The input to the model consists of both the complete image and target images, allowing it to incorporate relevant information for the beam prediction task. By leveraging this approach, the model can effectively generate accurate predictions for beam indices based on the provided images.

4. Simulations and Results

The performance of the proposed method is assessed separately in terms of image transmission and beam prediction performance.

4.1. Dataset and Evaluation Metrics

The proposed PC-SC system is evaluated using the DeepSense 6G dataset [22]. This dataset is a large-scale real-world dataset that contains both sensing and communication data, including RGB scene images and corresponding beam information. For evaluation purposes, Scenario 9 is selected, which consists of 4199 training data samples and 593 testing data samples. In this scenario, a 60 GHz frequency band is used, and a receiver with an over-sampled codebook of 64 pre-defined beams is employed. The dataset provides RGB images with a resolution of 960 × 540 pixels and 64-dimensional received power vector data. Each beam index is associated with an image, resulting in 64 classes for labeling. The target-based semantic communication and target image-based beam prediction modules are tested using the samples from the testing set of Scenario 9. The main measurement settings of Scenario 9 are presented in Table 1.

Image distortion impacts the quality of images in terms of perception and understanding, and it can be defined as the disparity between the original and reconstructed images in terms of local structure. Traditional metrics are used to assess the distortion in transmitted testing images, including peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) [28]. PSNR is calculated based on the mean squared error (MSE) between two images and the maximum pixel value, usually 255. The MSE between image one, denoted as

I_{1}

, and image two, denoted as

I_{2}

, both having a size of

m \times n

, can be expressed as follows:

MSE = \frac{1}{m n} \sum_{i = 0}^{m - 1} \sum_{j = 0}^{n - 1} {[I_{1} (i, j) - I_{2} (i, j)]}^{2} .

(5)

Therefore, PSNR can be presented as

PSNR = 10 {log}_{10} (\frac{255^{2}}{M S E}) .

(6)

Rather than utilizing error summation, SSIM is designed by modeling any image distortion as a combination of three factors: loss of correlation, luminance distortion, and contrast distortion, which can be presented as

SSIM (I_{1}, I_{2}) = \frac{(2 μ_{I_{1}} μ_{I_{2}} + c_{1}) (2 δ_{I_{1} I_{2}} + c_{2})}{(μ_{I_{1}}^{2} + μ_{I_{2}}^{2} + c_{1}) (δ_{I_{1}}^{2} + δ_{I_{2}}^{2} + c_{2})},

(7)

where

μ_{I_{1}}

and

μ_{I_{2}}

are the averages of

I_{1}

and

I_{2}

and

δ_{I_{1}}^{2}

and

δ_{I_{2}}^{2}

are the variances of

I_{1}

and

I_{2}

.

c_{1} = {(0.01 L)}^{2}

and

c_{2} = {(0.03 L)}^{2}

, where L is the dynamic range of pixel values.

Moreover, the learned perceptual image patch similarity (LPIPS) is calculated as a weighted average of the network channels to measure the perceptual similarity between two images [29]. Compared to traditional methods, LPIPS is more suitable for human perception. It is based on a deep neural network that has been trained to assess the similarity of image patches. The network is trained using a large dataset of images along with corresponding human perceptual judgments. During training, the network learns to encode perceptual information, such as color, texture, and structure, to provide similarity scores that align with human perception. Specifically, a lower value of LPIPS indicates a more remarkable similarity between the two images. On the other hand, higher values of PSNR and SSIM indicate better performance.

For evaluating beam prediction methods, commonly used metrics to quantify classification accuracy are employed. The main metric utilized to assess the classification ability is the top-h accuracy, wherein h = 1, 2, 3. The top-h accuracy measures the proportion of validation samples where the ground-truth beam is included among the h best-predicted beam candidates, which are sorted based on the output classes. Thus, the top-3 accuracy can be calculated by determining the percentage of validation samples that achieve a top-3 prediction.

4.2. Performance Evaluation of Sensing Image Transmission

We test the effectiveness of our image transmission method using the Additive White Gaussian Noise (AWGN) channel with 16-Quadrature Amplitude Modulation (16-QAM) and 64-Quadrature Amplitude Modulation (64-QAM) systems. The Signal-to-Noise Ratio (SNR) is set to range from 3 dB to 11 dB. Our experiment consisted of two steps. In the first step, we transmit the original images under AWGN with 16-QAM and 64-QAM to obtain the corresponding received images. In the second step, we transmit the target and mask images under the AWGN channel and use the composition method for reconstructing the original image. The original testing image is approximately 210 kB, while the corresponding target and mask images only require around 20 kB in total, significantly reducing the transmitted data by more than 90%.

The PSNR, SSIM, and LPIPS scores are computed for each transmitted image in the test samples. The average scores for each metric are calculated for the proposed method and the baseline AWGN channel with 16-QAM and 64-QAM systems under different SNR regimes. Figure 3 illustrates the results, showing that the proposed approach consistently outperforms the baseline system in terms of PSNR, especially in the low SNR regime. Similarly, Figure 4 displays the average SSIM scores, demonstrating comparable results to PSNR. We can conclude that the proposed method is more robust with the changing SNR and obtains a higher SSIM value than the complete image transmission. Furthermore, Figure 5 presents the performance comparison in terms of LPIPS, revealing that the proposed method performs significantly better than the baselines. The average LPIPS scores of the proposed method are consistently lower across all simulated SNR regions.

The images in Figure 6 display a selection of randomly transmitted samples. These samples consist of images transmitted using the 64-QAM system, including the target and mask images as well as the final composited images. The results clearly demonstrate the effectiveness of the proposed approach, as they align more closely with human perception of images. One particular sample in Figure 6, which depicts a vehicle and a human together, is utilized to assess the approach’s ability to produce diverse results. This specific sample is part of the training data in the DeepSense 6G dataset. Additionally, the time required for the image matting process at the transmitter is recorded, and it is found to be 0.25 s, while the image composition at the receiver took only 0.009 s.

4.3. Performance Evaluation of Beam Prediction

The prediction model is trained using the RGB images and labeled beam index class pairs from the training set of the DeepSense 6G dataset. To evaluate the performance with different input images, we employ the complete RGB images and their corresponding target images. We downscale the images to a size of 32 × 32 pixels for input and utilize 64 neurons for the last fully connected layer to accommodate the beam classes. All the training and testing processes are executed on one NVIDIA GeForce RTX 2080 GPU, NVIDIA, Santa Clara, CA, USA. The other significant hyper-parameters for network training are presented in Table 2.

For evaluating the performance of the target image-based method and complete image-based method and analysis of the robustness of different network configurations, the max pooling and main layers of ResNet-18 are set with different parameters. Because the max pooling and main layers can affect the expressive capacity and ability to capture detailed information about the network, the max pooling layer is configured with parameters of (4, 1) and (10, 8), where (4, 1) indicates a kernel size of 4 and a stride of 1 and (10, 8) represents a kernel size of 10 and a stride of 8. Additionally, the basic blocks in ResNet-18 consist of 4 layers each, with plane sizes of (64, 128, 256, 256) and (32, 64, 128, 128), respectively. Thereby, four configuration groups can be conducted for comparison and analysis. We compare the model accuracies (top-1, top-2, and top-3) for each network configuration, which is trained with the same batch size, epochs, and device. Table 3 presents the number and setting of each configuration group.

The results indicate that the model using both complete and target images demonstrates similar precision in configuration group number 1, achieving a top-2 beam prediction accuracy of 85%, with only 3% of test samples not being correctly predicted within the top-3. However, the target image-based method performs better when employing the other three configurations, particularly in configuration group number 3, where the average accuracy is higher by more than 10% compared to the complete images-based method. Figure 7 illustrates the top-1, top-2, and top-3 accuracies for different network configurations, comparing the complete images-based method with the target images-based method. It clearly demonstrates the network’s robustness with the target images-based method, whereas the complete images-based method is more sensitive to the parameter of planes and max pooling.

To evaluate the effect of plane size, the ablation study is conducted according to the results of group number 1 and number 2, as well as group number 3 and number 4, as shown in Table 3. As we know, increasing the number of planes enhances the expressive capacity of the model [30]. Therefore, as the results show, it can be observed that as the network’s expressive capacity decreases, methods based on the complete image exhibit a more significant decline in accuracy. Therefore, we can conclude that the model trained with target images is more robust to the plane size. In contrast, the complete images-based method with detailed information is more sensitive.

Similarly, we compare group number 2 and number 3, as well as group number 1 and number 4, to examine the impact of the max pooling layer. It can be observed that, despite a decline in precision for both methods under group number 3 when using larger kernel and stride sizes for max pooling, the proposed method still outperforms the model using complete images significantly. When using smaller kernels and strides, it is advantageous for capturing more spatial details and local variations. However, the results indicate that incorporating more background details in the complete image does not lead to improved accuracy when employing smaller kernels and strides. Compared to group number 3 and group number 2, there is a slight decrease in accuracy.

Therefore, we can conclude that the backgrounds do not provide only some information to improve prediction results. On the contrary, the redundancy even increases sensitivity to network setup. The insignificant targets in the propagation environment can cause distractions, while notable dynamic targets already contain essential information reflecting environmental changes. Furthermore, we compare the testing time, where the model using complete images takes 0.055 s to test each sample, while the proposed method only requires 0.037 s. Hence, utilizing target images requires less testing time, saving more than 32% time.

5. Conclusions and Future Works

This article proposes a PC-SC system, which focuses on designing a predictive channel-based semantic communication system. A target-based semantic extraction method is employed for communication tasks in sensing scenarios. Instead of using global semantic coding, the system concentrates on semantic targets relevant to intelligent applications to preserve essential semantics and eliminate redundant information. Specifically, the article studies the transmission task of sensing images as a case study. Semantic communication and beam prediction pipelines are implemented simultaneously by sharing the source sensing data. This approach enhances semantic preservation tailored to specific tasks and improves the channel’s robustness for transmission. The targets in the scene act as carriers of information and support semantic exchange. The problem is addressed through image matting, which accurately segments the regions of interest in the foreground, including vehicles and pedestrians. Subsequently, beam prediction is conducted based on the processed image of the targets at the transmitter.

Simulation results illustrate that target image-based semantic communication can improve efficiency and ensure human perception, especially in low SNR regimes. Additionally, the target image-based beam prediction method requires less time than the original image-based model without compromising accuracy. It also exhibits stronger robustness to network configurations. Semantic targets-oriented communication takes into account the intelligent application requirements, offering flexible interaction. However, a deep understanding of achievements and issues is still lacking, such as the improvement of semantic target extraction and image composition approaches to enhance perception and efficiency. Moreover, further exploration is needed to decrease image distortion by investigating the closed loop of beam prediction results for channel quality improvement. For practical deployment, distributed computing and storage technologies are also crucial, as they directly determine the real-time performance of the system.

Author Contributions

The contributions of Y.S. include conceptualization, methodology, data curation, and analysis; writing—original draft; and writing—review and editing. The contributions of J.Z. include investigation supervision, project administration, and funding acquisition. The contributions of J.W., L.Y., Y.Z., G.L., G.X. and J.L. include investigation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science Fund for Distinguished Young Scholars (Grant 61925102), National Natural Science Foundation of China (Grant 62201087), National Natural Science Foundation of China (Grant 92167202), National Natural Science Foundation of China (Grant 62101069), National Natural Science Foundation of China (Grant 62201086), and Beijing University of Posts and Telecommunications—China Mobile Research Institute Joint Innovation Center.

Data Availability Statement

No new data were created.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

6G	Sixth-Generation
PC-SC	Predictive Channel-Based Semantic Communication
NLP	Natural Language Processing
CV	Computer Vision
JSCC	Joint Source-Channel Coding
DL	Deep Learning
CNN	Convolutional Neural Networks
ML	Machine Learning
mmWave	Millimeter Wave
LOS	Line of Sight
CSI	Channel State Information
SOD	Salient Object Detection
ResNet-18	Residual Network with 18 Layers
ResNet	Residual Network
PSNR	Peak-Signal-to-Noise Ratio
SSIM	Structural Similarity Index Measure
MSE	Mean Squared Error
LPIPS	Learned Perceptual Image Patch Similarity
AWGN	Additive White Gaussian Noise Channel
16-QAM	16-Quadrature Amplitude Modulation
64-QAM	64-Quadrature Amplitude Modulation
SNR	Signal-to-Noise Ratio

References

Saad, W.; Bennis, M.; Chen, M. A vision of 6G wireless systems: Applications, trends, technologies, and open research problems. IEEE Netw. 2020, 34, 134–142. [Google Scholar] [CrossRef] [Green Version]
Tataria, H.; Shafi, M.; Molisch, A.F.; Dohler, M.; Sjöland, H.; Tufvesson, F. 6G wireless systems: Vision, requirements, challenges, insights, and opportunities. Proc. IEEE 2021, 109, 1166–1199. [Google Scholar] [CrossRef]
Zhang, P.; Xu, W.; Gao, H.; Niu, K.; Xu, X.; Qin, X.; Yuan, C.; Ain, Z.; Zhao, H.; Wei, J.; et al. Toward wisdom-evolutionary and primitive-concise 6G: A new paradigm of semantic communication networks. Engineering 2022, 8, 60–73. [Google Scholar] [CrossRef]
Güler, B.; Yener, A.; Swami, A. The semantic communication game. IEEE Trans. Cogn. Commun. Netw. 2018, 4, 787–802. [Google Scholar] [CrossRef]
Shi, G.; Xiao, Y.; Li, Y.; Xie, X. From semantic communication to semantic-aware networking: Model, architecture, and open problems. IEEE Commun. Mag. 2021, 59, 44–50. [Google Scholar] [CrossRef]
Wang, T.; Chen, M.; Luo, T.; Saad, W.; Niyato, D.; Poor, H.V.; Cui, S. Performance optimization for semantic communications: An attention-based reinforcement learning approach. IEEE J. Sel. Areas Commun. 2022, 40, 2598–2613. [Google Scholar] [CrossRef]
Yang, H.; Alphones, A.; Xiong, Z.; Niyato, D.; Zhao, J.; Wu, K. Artificial-intelligence-enabled intelligent 6G networks. IEEE Netw. 2020, 34, 272–280. [Google Scholar] [CrossRef]
Dong, C.; Liang, H.; Xu, X.; Han, S.; Wang, B.; Zhang, P. Semantic communication system based on semantic slice models propagation. IEEE J. Sel. Areas Commun. 2023, 41, 202–213. [Google Scholar] [CrossRef]
Nie, G.; Zhang, J.; Zhang, Y.; Yu, L.; Zhang, Z.; Sun, Y.; Tian, L.; Wang, Q.; Xia, L. A predictive 6G network with environment sensing enhancement: From radio wave propagation perspective. China Commun. 2022, 19, 105–122. [Google Scholar] [CrossRef]
Sun, Y.; Zhang, J.; Yu, L.; Zhang, Z.; Zhang, P. How to define the propagation environment semantics and its application in scatterer-based beam prediction. IEEE Wirel. Commun. Lett. 2023, 12, 649–653. [Google Scholar] [CrossRef]
Xie, H.; Qin, Z.; Li, Y.G.; Juang, B.-H. Deep learning enabled semantic communication systems. IEEE Trans. Signal Proc. 2021, 69, 2663–2675. [Google Scholar] [CrossRef]
Farsad, N.; Rao, M.; Goldsmith, A. Deep learning for joint source-channel coding of text. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 2326–2330. [Google Scholar]
Qin, Z.; Tao, X.; Lu, J.; Tong, W.; Li, Y.G. Semantic communications: Principles and challenges. arXiv 2021, arXiv:2201.01389. [Google Scholar]
Bourtsoulatze, E.; Kurka, D.B.; Gündüz, D. Deep joint source-channel coding for wireless image transmission. IEEE Trans. Cogn. Commun. Netw. 2019, 5, 567–579. [Google Scholar] [CrossRef] [Green Version]
Yang, M.; Bian, C.; Kim, H.-S. Deep joint source channel coding for wireless image transmission with OFDM. In Proceedings of the 2021-IEEE International Conference on Communications, Xiamen, China, 28–30 July 2021; pp. 1–6. [Google Scholar]
Kurka, D.B.; Gündüz, D. DeepJSCC-f: Deep joint source-channel coding of images with feedback. IEEE J. Sel. Areas Inf. Theory 2020, 1, 178–193. [Google Scholar] [CrossRef] [Green Version]
Wang, M.; Zhang, Z.; Li, J.; Ma, M.; Fan, X. Deep joint source-channel coding for multi-task network. IEEE Signal Process. Lett. 2021, 28, 1973–1977. [Google Scholar] [CrossRef]
Zhang, J. The interdisciplinary research of big data and wireless channel: A cluster-nuclei based channel model. China Commun. 2016, 13, 39–61. [Google Scholar]
Alrabeiah, M.; Hredzak, A.; Alkhateeb, A. Millimeter wave base stations with cameras: Vision-aided beam and blockage prediction. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference, Antwerp, Belgium, 25–28 May 2020; pp. 1–5. [Google Scholar]
Wu, S.; Chakrabarti, C.; Alkhateeb, A. LiDAR-aided mobile blockage prediction in real-world millimeter wave systems. In Proceedings of the 2022 IEEE Wireless Communications and Networking Conference (WCNC), Austin, TX, USA, 10–13 April 2022; pp. 2631–2636. [Google Scholar]
Wu, L.; He, D.; Ai, B.; Wang, J.; Qi, H.; Zhong, Z. Artificial neural network based path loss prediction for wireless communication network. IEEE Access 2020, 8, 199523–199538. [Google Scholar] [CrossRef]
Alkhateeb, A. DeepSense 6G: A Large-Scale Real-World Multimodal Sensing and Communication Dataset. 2022. Available online: https://www.DeepSense6G.net (accessed on 25 October 2022).
Zhang, J.; Tang, P.; Yu, L.; Jiang, T.; Tian, L. Channel measurements and models for 6G: Current status and future outlook. Front. Inf. Technol. Electron. Eng. 2020, 1, 39–61. [Google Scholar] [CrossRef]
Sun, Y.; Zhang, J.; Zhang, Y.; Yu, L.; Yuan, Z.; Liu, G.; Wang, Q. Environment features-based model for path loss prediction. IEEE Wirel. Commun. Lett. 2022, 11, 2010–2014. [Google Scholar] [CrossRef]
Qin, X.; Zhang, Z.; Huang, C.; Dehghan, M.; Zaiane, Q.R.; Jagersand, M. U2-Net: Going deeper with nested U-structure for salient object detection. Pattern Recogn. 2020, 106, 107404. [Google Scholar] [CrossRef]
Xu, N.; Price, B.; Cohen, S.; Huang, H. Deep image matting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2970–2979. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Horé, A.; Ziou, D. Image quality Mmetrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural network. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]

Figure 1. The semantic communication for sensing tasks under the dynamic wireless propagation channel.

Figure 2. The architecture of the proposed PC-SC system for sensing image transmission. The beam case is studied for dynamic wireless propagation channel prediction.

Figure 3. PSNR of the transmitted testing images with and without proposed semantic communication method under AWGN channel of 16-QAM and 64-QAM, the higher value means better performance.

Figure 4. SSIM of the transmitted testing images with and without proposed semantic communication method under AWGN channel of 16-QAM and 64-QAM, the higher value means better performance.

Figure 5. LPIPS of the transmitted testing images with and without proposed semantic communication method under AWGN channel of 16-QAM and 64-QAM, the lower value means better performance.

Figure 6. Randomly selected received original images, extracted mask and target images, and reconstructed images with the proposed method. The sample presented in the third column from the left is tested with training data from the DeepSense 6G dataset for human sample visualization.

Figure 7. Compared top-1, top-2, and top-3 accuracies under different groups of network configuration, which are defined in Table 2.

Table 1. Measurement settings of Scenario 9.

Setting	Value
Average data capture rate	7.67 fps
Frame rate of camera used at receiver	30 fps
Transmitter	mmWave omni-directional transmitter
Receiver	16-element antenna array

Table 2. Hyper-parameters for network training.

Parameter	Value
Input image size	32 × 32
Batch size	100
Learning rate	1 × $10^{- 4}$
Weight decay	1 × $10^{- 4}$
Learning-rate reduction factor	0.1
Output layer size	1 × 64
Number of epochs	15

Table 3. Precision and test time evaluation of complete image-based method and target image-based method.

Group	Input Data	Network Configuration	Precision
1	Target images	(4, 1) (64, 128, 256, 256)	top-1	0.59
			top-2	0.85
			top-3	0.97
	Complete images		top-1	0.60
			top-2	0.85
			top-3	0.96
2	Target images	(4, 1) (32, 64, 128, 128)	top-1	0.59
			top-2	0.85
			top-3	0.96
	Complete images		top-1	0.59
			top-2	0.80
			top-3	0.91
3	Target images	(10, 8) (32, 64, 128, 128)	top-1	0.57
			top-2	0.78
			top-3	0.88
	Complete images		top-1	0.48
			top-2	0.66
			top-3	0.73
4	Target images	(10, 8) (64, 128, 256, 256)	top-1	0.58
			top-2	0.85
			top-3	0.96
	Complete images		top-1	0.57
			top-2	0.83
			top-3	0.93

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, Y.; Zhang, J.; Wang, J.; Yu, L.; Zhang, Y.; Liu, G.; Xie, G.; Li, J. PC-SC: A Predictive Channel-Based Semantic Communication System for Sensing Scenarios. Electronics 2023, 12, 3129. https://doi.org/10.3390/electronics12143129

AMA Style

Sun Y, Zhang J, Wang J, Yu L, Zhang Y, Liu G, Xie G, Li J. PC-SC: A Predictive Channel-Based Semantic Communication System for Sensing Scenarios. Electronics. 2023; 12(14):3129. https://doi.org/10.3390/electronics12143129

Chicago/Turabian Style

Sun, Yutong, Jianhua Zhang, Jialin Wang, Li Yu, Yuxiang Zhang, Guangyi Liu, Guofu Xie, and Ji Li. 2023. "PC-SC: A Predictive Channel-Based Semantic Communication System for Sensing Scenarios" Electronics 12, no. 14: 3129. https://doi.org/10.3390/electronics12143129

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PC-SC: A Predictive Channel-Based Semantic Communication System for Sensing Scenarios

Abstract

1. Introduction

2. Problem Formulation and Architecture

3. PC-SC for Sensing Image Transmission

3.1. Deep Learning-Based Target Extraction

3.2. Image Composition

3.3. Target Image-Based Beam Prediction

4. Simulations and Results

4.1. Dataset and Evaluation Metrics

4.2. Performance Evaluation of Sensing Image Transmission

4.3. Performance Evaluation of Beam Prediction

5. Conclusions and Future Works

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI