Channel-Blind Joint Source–Channel Coding for Wireless Image Transmission

Yuan, Hongjie; Xu, Weizhang; Wang, Yuhuan; Wang, Xingxing

doi:10.3390/s24124005

Open AccessArticle

Channel-Blind Joint Source–Channel Coding for Wireless Image Transmission

by

Hongjie Yuan

,

Weizhang Xu

^*,

Yuhuan Wang

and

Xingxing Wang

State Key Laboratory of Media Convergence & Communication, Communication University of China, Beijing 100024, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(12), 4005; https://doi.org/10.3390/s24124005

Submission received: 24 April 2024 / Revised: 3 June 2024 / Accepted: 4 June 2024 / Published: 20 June 2024

(This article belongs to the Special Issue Advances and Applications of Machine Learning for Wireless Communications and Networking)

Download

Browse Figures

Versions Notes

Abstract

:

Joint source–channel coding (JSCC) based on deep learning has shown significant advancements in image transmission tasks. However, previous channel-adaptive JSCC methods often rely on the signal-to-noise ratio (SNR) of the current channel for encoding, which overlooks the neural network’s self-adaptive capability across varying SNRs. This paper investigates the self-adaptive capability of deep learning-based JSCC models to dynamically changing channels and introduces a novel method named Channel-Blind JSCC (CBJSCC). CBJSCC leverages the intrinsic learning capability of neural networks to self-adapt to dynamic channels and diverse SNRs without relying on external SNR information. This approach is advantageous, as it is not affected by channel estimation errors and can be applied to one-to-many wireless communication scenarios. To enhance the performance of JSCC tasks, the CBJSCC model employs a specially designed encoder–decoder. Experimental results show that CBJSCC outperforms existing channel-adaptive JSCC methods that depend on SNR estimation and feedback, both in additive white Gaussian noise environments and under slow Rayleigh fading channel conditions. Through a comprehensive analysis of the model’s performance, we further validate the robustness and adaptability of this strategy across different application scenarios, with the experimental results providing strong evidence to support this claim.

Keywords:

wireless image transmission; joint source–channel coding; attention mechanism; broadcasting

1. Introduction

The assumption underlying Shannon’s theorem is that source and channel coding can be independently performed without compromising information transmission efficiency. However, practical constraints such as complexity and latency often make it impractical to use very large block lengths. In light of this, the design of a joint source–channel coding (JSCC) scheme has been shown to outperform separate source–channel coding in terms of performance, particularly in the finite block length regime [1,2,3,4,5,6,7]. The advent of advanced deep learning technology has sparked considerable interest among researchers in data-driven DL-based JSCC approaches [8,9,10,11,12,13,14,15,16,17,18]. The intelligent and concise design paradigm of DL-based JSCC aligns seamlessly with the fundamental principles of communication. Figure 1 illustrates the newly proposed DL-based JSCC algorithm [13]. These novel approaches use an implicit model, where explicit source and channel encoding schemes are replaced by neural networks that map pixel values to the channel input signal. This mapping process is achieved through an encoder, which converts the source data into a channel input signal, and a decoder, which reconstructs the source data from the channel output signal. The encoder and decoder are trained together to optimize the system’s overall performance.

In recent years, many JSCC methods based on deep learning (DL) techniques have been proposed. Bourtsoulatze et al. were the first to apply a DL-based JSCC model to wireless channel image data transmission in their study [13]. This study suggests that DL-based JSCC methods can effectively avoid the cliff effect compared to traditional SSCC methods. To further improve image transmission quality, Kurka et al. [14] proposed deepjscc-f, a DL-based JSCC model that incorporates a feedback mechanism, inspired by the Schalkwijk–Kailath scheme [19]. Additionally, Wang et al. [11] introduced an iterative receiver for image reconstruction within the DL-based JSCC framework, building upon the concept of turbo codes. This iterative process greatly enhances the overall performance of image reconstruction. Some studies have also improved model transmission performance by dynamically adjusting the resolution [20] and transmission rate [21].

Despite the impressive performance of these DL-based JSCC approaches, a limitation arises from the strict requirement for the model to precisely match the SNR of the current communication channel. To address this issue, Xu et al. [9] proposed the Attention DL-based JSCC (ADJSCC) approach. ADJSCC aims to enable a single model to dynamically adapt to different channel conditions. Specifically, in ADJSCC, an attention fusion (AF) module incorporates the channel SNR as prior information into both the channel encoder and decoder. This allows the network to adapt to varying SNR conditions. ADJSCC has inspired subsequent research on channel-adaptive techniques [22,23,24,25]. However, the ADJSCC approach still faces challenges in real-world scenarios. First, the ADJSCC model requires accurate SNR estimation to adapt to the wireless channel, which is difficult to achieve due to the impact of channel estimation mismatch [26]. Second, the ADJSCC encoding process relies on feedback channel information, making it suitable only for point-to-point communication scenarios. Although Ding et al. [16] proposed a channel-adaptive JSCC (CaJSCC) scheme, which makes it a feasible solution for multi-user scenarios, it comes with the drawback of significantly reduced performance. Additionally, it still relies heavily on accurate channel estimation.

We observe that previous work adheres to traditional communication theory, which posits that while a feedback link cannot increase the channel capacity of the noisy forward link, it can significantly reduce the coding effort needed to attain a specific performance level [19]. However, in DL-based JSCC, or semantic communication, the objective is no longer based on the correct delivery of symbols. Numerous studies have shown that DL-JSCC can overcome the cliff effect, maintaining the ability to transmit content even when channel capacity is severely limited. This suggests that DL-JSCC methods can adapt to different SNR levels. Yin et al. introduced a DL-based channel estimation method that operates without the need for pilot data [27], showcasing the neural network’s strong learning capability with respect to channel characteristics. However, current methods that explicitly encode SNR into the model overlook the neural network’s adaptability to noise at different SNRs in semantic transmission tasks.

In this paper, we propose a novel Channel-Blind JSCC (CBJSCC) method that can adapt to varying channel conditions without the need for channel estimation. CBJSCC adapts to varying SNRs intrinsically through the network, instead of explicitly encoding information into the model. This approach benefits from not suffering performance degradation due to SNR estimation mismatches, promising higher robustness in practical applications.

We observe that the performance of some existing DL-based JSCC models significantly benefits from structural improvements in the model. Therefore, the CBJSCC model incorporates a meticulously designed network that integrates convolution-based attention and self-attention mechanisms in both the encoder and decoder. We evaluate the performance of CBJSCC, and the results indicate that CBJSCC exhibits outstanding performance in additive white Gaussian noise (AWGN) and slow Rayleigh fading channels. Specifically, in the AWGN channel, CBJSCC outperforms existing feedback-based DL-based JSCC methods that currently exhibit the best performance. This advantage is achieved with reduced computational costs and a smaller number of parameters.

The main contributions of this study can be summarized as follows:

We introduce CBJSCC, a channel-adaptive JSCC method that does not rely on channel state information. CBJSCC leverages the inherent learning capability of networks to adapt to wireless channels with varying SNRs, requiring less stringent communication environments. With these characteristics, CBJSCC avoids the detrimental effects of channel mismatch on model performance and is suitable for a broader range of communication scenarios, such as broadcast and deep-space communications.
In CBJSCC, we propose encoders and decoders designed with local attention and self-attention mechanisms. This design ensures a wider receptive field for the network and enhances model performance. Under equivalent transmission quality, CBJSCC has advantages over ADJSCC in both computational and parameter efficiency.
We analyze the impact of SNR on CBJSCC. The results show that the benefits of including SNR information are very limited for CBJSCC. This provides valuable insights for the further exploration of channel adaptability issues in semantic communications.
Considering the alignment of the model with data from different domains in the real world, we train on the ImageNet dataset and evaluate CBJSCC on datasets from various fields. The results demonstrate that CBJSCC has robust domain adaptability, making it suitable for transmission tasks across different domains.

The subsequent sections of this paper are structured as follows: In Section 2, we provide an overview of the related research to CBJSCC. Section 3 outlines the CBJSCC model that we propose. Following that, in Section 4, we present the results of our experiments along with a comprehensive analysis. Lastly, we offer our concluding remarks in Section 5.

2. Related Works

2.1. Autoencoder

The Autoencoder (AE) [28,29] aims to discover a compressed representation of the input data while minimizing the discrepancy between the input and its reconstructed output. AEs have gained prominence in deep learning [30,31,32], proving highly effective for data compression and feature extraction, particularly in JSCC. The primary objective of AE is to learn the mappings between the input space, denoted as

X

, and the feature space, denoted as

Z

. The class of functions that map elements from

X

to

Z

is denoted as

F

, while the class of functions that remap elements from

Z

back to

X

is denoted as

G

. The distortion function, represented by

Δ

, can take discrete or continuous forms. The autoencoder problem can be formulated as follows:

\begin{matrix} F : X \to Z \\ G : Z \to X \\ (f^{*}, g^{*}) = arg min_{f \in F, g \in G} Δ (X, g [f (X)]) \end{matrix}

(1)

In this study, we specifically focus on the auto-associative case, where the input and output features are identical.

The Denoising Autoencoder (DAE) [33] is particularly relevant to JSCC, as it aims to capture stable structures, such as dependencies and regularities, inherent in the observed input distribution. The objective of DAE can be formulated as follows:

(f^{*}, g^{*}) = arg min_{f \in F, g \in G} Δ (X, g [f (\hat{X})])

(2)

The key distinction between DAE and conventional AE lies in the data used for training. DAE derives its input data from a corrupted version, denoted as

\hat{x} \in \hat{X}

, obtained by randomly corrupting the authentic data x using a corruption process defined as

\hat{x} = x ⊙ q_{D} (\hat{x} | x)

. This corruption process enables the model to learn a more robust representation. In contrast, JSCC focuses on data reversibility, reconstruction quality, and the utilization of channel characteristics in compression processing. DAE, on the other hand, prioritizes the precision of reconstructing the original content over the stability or reconstructability of the signal.

2.2. Attention Mechanism

The attention mechanism is a fundamental concept in deep learning, allowing for the efficient scanning of global information while focusing on specific target regions. Various attention mechanisms, such as channel attention, spatial attention, or a combination of both, enable the refinement of feature maps through independent reweighting [34,35,36,37,38]. By assigning weights to channels or spatial locations on the feature maps, these mechanisms enhance the receptive field of convolutional networks, enabling a more focused analysis of significant local areas within the input data.

Given an intermediate feature map

W_{i n} \in R^{C \times H \times W}

as input, the final output vector

W_{o u t}

is computed using the following equation:

W_{o u t} = M (W_{i n}) ⨂ W_{i n}

(3)

In Equation (3),

A = M (W_{i n})

represents the attention map.

Self-attention represents a distinct version of the attention mechanism [39]. The issue with convolution-based attention is its inability to capture long-range dependencies. Conversely, self-attention provides a more flexible strategy, adept at recognizing both local and global dependencies. The computation process can be represented as:

\begin{matrix} Attention (Q W^{Q}, K W^{K}) & = softmax (\frac{(Q W^{Q}) {(K W_{K})}^{T}}{\sqrt{d}}) \\ W_{o u t} & = Attention (Q W_{Q}, K W_{K}) (V W_{V}) \end{matrix}

(4)

In the above equation, Q, K, and V are the query, key, and value vector sequences, respectively, obtained through a linear transformation of the input feature map W. The tensor dimension is denoted by d. The softmax function is applied to the dot product of the query and key vectors, and the resulting attention weights are utilized to compute a weighted sum of the value vectors.

Multi-Head Self-Attention (MSA) is a more intricate form of the attention mechanism. It incorporates multiple attention heads in self-attention models to capture diverse points of interest, which is illustrated as follows:

Multi - Head (Q, K, V) = Concat ({head}^{1}, {head}^{2}, \dots, {head}^{h}) W_{O}

(5)

In Equation (5),

W_{i}^{Q}

,

W_{i}^{K}

, and

W_{i}^{V}

represent the projection matrices for the i-th attention head out of h heads:

{head}^{i} = Attention (Q W_{Q}^{i}, K W_{K}^{i}) V W_{V}^{i}

(6)

The integration of MSA can improve the model’s expressive capacity by allowing it to gather different information through separate attention heads, extracting more comprehensive and nuanced features.

The incorporation of the self-attention mechanism in computer vision tasks has shown promising results, although it comes with certain drawbacks. Notably, the self-attention mechanism can increase both training and inference time, requiring substantial computational resources [40,41,42]. However, recent research has made remarkable strides by integrating the self-attention mechanism with convolutional networks. This integration aims to address these challenges by reducing computational complexity and introducing additional inductive biases [43,44,45]. By calculating dot-product attention scores and performing weighted summation, the self-attention mechanism generates a more concise sequence representation. This representation effectively minimizes redundancy while preserving the original features’ semantic information, making it more suitable for storage, processing, and transmission purposes.

3. Proposed Method

3.1. Overall Architecture

This section presents a comprehensive description of the CBJSCC method as illustrated in Figure 2a. CBJSCC follows the fundamental concept of an autoencoder (AE). In this approach, we consider a wireless system for image transmission, where the image x has dimensions of h (height)

\times w

(width)

\times c

(channels). Here, x is a set of real numbers

R^{n}

, where n equals the product of w, h, and c. The encoding process, guided by parameters

θ

, deterministically maps the input x to complex symbols z through the function

f_{θ}

, satisfying the following relationship:

z = f_{θ} (x) \in C^{k}

(7)

During the encoding phase, the output feature dimension of the image is mapped to a value N, which is set to 192 by default in this study. At the output, the channel dimension of the feature map is converted to T.

The encoder first employs an efficient local attention module, followed by the construction of global features using the ACMix network [46]. The encoded image is transformed into a complex channel signal

z \in C^{k}

via a conjugate transpose, where k represents the dimension of the encoded complex symbols. Here, n is termed the source bandwidth, k is termed the channel bandwidth, and the ratio

k / n

is referred to as the bandwidth compression ratio (CBR).

The transmission rate in the CBJSCC method is adjusted by altering the dimension T of the final self-attention module. Since the encoder includes two downsampling operations with a stride of 2, the formula for calculating k is

T \times n / (2 \times 16 \times c)

. With c generally set to 3, the bandwidth compression of CBJSCC corresponds to

T / 96

. The model can be trained under AWGN or fading channel conditions, including Rayleigh fading channels, using suitably simulated channel conditions. The received signal is modeled as:

\hat{z} = h z + ω

(8)

Here, h denotes the channel fading coefficient (or channel gain), sampled from an independent and identically distributed complex Gaussian distribution.

ω \in C^{k}

represents circularly symmetric complex Gaussian distribution noise. For an AWGN channel, h is equal to 1. The received signal

\hat{z}

is then sent to the decoder, which maps it back.

The encoder and decoder of the CBJSCC are symmetrically designed, with the decoder serving as the mirror image of the encoder, and parameterized by

θ^{'}

:

\hat{x} = g_{θ^{'}} (\hat{z}) \in R^{n}

(9)

CBJSCC is explicitly designed for image transmission over dynamic wireless channels, circumventing the need for Channel State Information (CSI) estimation. Consequently, h and

ω

remain unknown to both the encoder and decoder. This process can be formulated as:

\begin{matrix} θ^{★}, θ^{' ★} & = \underset{θ, θ^{'}}{arg min} \frac{1}{n} \sum_{i = 1}^{n} L (x^{(i)}, {\hat{x}}^{(i)}) \\ = \underset{θ, θ^{'}}{arg min} \frac{1}{n} \sum_{i = 1}^{n} L (x^{(i)}, g_{θ^{'}} (h * f_{θ} (x^{(i)}) + ω)) \end{matrix}

(10)

Here, L denotes the loss function, while

θ^{★}

and

θ^{' ★}

represent the optimal parameters of the encoder and decoder, respectively.

When the value of h is set to 1 and

ω

is set to 0, CBJSCC aligns with the architecture of a traditional AE.

Prior works such as DeepJSCC and ADJSCC were designed for point-to-point communication scenarios. For example, DeepJSCC requires switching to a corresponding model based on the channel’s SNR, while ADJSCC can only encode and decode for a single channel SNR. Serious performance degradation occurs when the estimated SNR does not match the actual SNR. In broadcast and other one-to-many communication scenarios, the encoded signal needs to be transmitted over multiple wireless channels with different SNRs. CBJSCC, however, does not require encoding and decoding for specific SNRs; instead, it can adapt to various SNRs, making it more suitable for broadcast communication than other methods. This design makes CBJSCC particularly well suited for broadcast communication as shown in Figure 2b.

3.2. The Inverted Residual Attention Bottleneck

Recent research indicates that ConvNext can deliver enhanced performance in smaller-scale networks [47]. Inspired by the concept of the inverted bottleneck residual structure in ConvNext and some image super-resolution tasks [48], we introduce the Inverted Residual Attention Bottleneck (IRAB) structure as illustrated in Figure 3a. Here, the input features first pass through a convolution layer with a kernel size of

3 \times 3

. Next, the dimension of the convolution layer is expanded by X times through an extended

1 \times 1

convolution layer. Finally, a convolution layer is used to restore the tensor to its original dimensions. Excluding the bias term, the computational complexity of the IRAB can be represented as the Flops ratio

(2 \times X + 9) / 27

relative to the standard

3 \times 3

residual block. In the design of the IRAB, we set the expansion factor X to 2 in the first downsampling module to balance the expansion of feature maps and computational efficiency. In the second downsampling module, we increase the value of X to 4 to enhance the network’s representational capacity at a higher spatial resolution. This choice is supported by theoretical analysis and experimental observations.

Subsequently, the feature is passed through a

1 \times 1

conventional layer, before being fed into the enhanced spatial attention (ESA) modules as described in [49,50]. As depicted in Figure 3b, to decrease the computational complexity, the ESA block starts with a

1 \times 1

convolutional layer that reduces the channel dimension from N to 16. This is followed by a convolution layer with a stride of 2 and max-pooling. This downsampling of features expands the receptive field of the convolution layers, enhancing the allocation of attention weights. To reinstate the spatial and channel dimensions of the input features, we employ interpolation and a

1 \times 1

convolution layer. Subsequently, the attention weights of the input features are derived by applying the sigmoid function M as depicted in Equation (3).

3.3. Convolution and Self-Attention Mixed Module

The encoding and decoding process can suffer from compromised accuracy and effectiveness when the correlations between encoded symbols over extended distances are overlooked or diminished. We adapt a modified ACMix block [46], which combines self-attention and convolution operations to effectively capture long-range dependencies. This combination allows ACMix to handle images more effectively, capturing both the overall outline and shape of the image and fully utilizing local preferential features within the image. Another advantage is ACMix’s ability to handle input images of different sizes.

Here, we replace the batch normalization layer with generalized divisive normalization (GDN) [51]. GDN better adapts to the statistical characteristics of local filter responses, resulting in natural image blocks and improved performance.

4. Experiments and Analysis

4.1. Training and Implementation Details

We used the ImageNet [52] for training. The training process utilized a single Nvidia RTX 4090 with a batch size of 112. We randomly cropped the images into patches of size

128 \times 128

during each iteration. The training SNR was uniformly sampled from −5 dB to 20 dB. The initial learning rate was set to 0.0001 and decayed exponentially based on the loss. Training stopped once the model converged. We evaluated image restoration quality using two metrics: peak signal-to-noise ratio (PSNR) and Structural Similarity Index Measure (SSIM).

We used L1 Charbonnier loss for training [53]. L1 Charbonnier promotes robustness to noise and outliers compared to the standard L1 loss function [54,55]. Let x and

\hat{x}

denote the original image and the reconstructed image, respectively. The L1 Charbonnier loss function is defined as follows:

L (\hat{x}, x) = \sqrt{{(x - \hat{x})}^{2} + ϵ}

(11)

Here,

ϵ

is a small value to ensure the Charbonnier penalty is never zero.

Our experiment uses the recently proposed efficient optimization algorithm, EvoLved Sign Momentum (Lion) [56]. Our research indicates that the Lion optimizer can speed up the model’s convergence.

4.2. Experiment and Results

The results in Figure 4 highlight the superior performance of the CBJSCC method across various scenarios. We did not include DeepJSCC in the comparison, as ADJSCC has already surpassed DeepJSCC. We also did not compare with other recent models, as those works focus on other functionalities, not channel adaptivity.

Figure 4 presents a comparative analysis of image reconstruction performance using the Kodak dataset under an AWGN channel and slow Rayleigh fading channels. The panels of Figure 4a,b illustrate the results for transmission rate ratios of

1 / 6

and

1 / 12

, respectively, highlighting the robustness of the proposed CBJSCC method. The solid red curve denotes the superior PSNR values of CBJSCC across various SNR levels, outclassing ADJSCC and DeepJSCC, the latter of which was omitted from this comparison due to the previously established dominance of ADJSCC.

However, real-world communication often accompanies issues of SNR estimation mismatch. The dashed lines in the panels of Figure 4a,b depict the performance of ADJSCC under varying feedback SNR conditions, ranging from ideal (perfect feedback) to a practical scenario with estimation errors (feedback SNR = 0 dB, 5 dB, 10 dB, and 20 dB mismatches). These lines reveal a pronounced degradation in the image reconstruction quality of ADJSCC as the feedback SNR estimation error increases, particularly at lower SNR levels. The design of CBJSCC negates the need for explicit SNR encoding, leveraging the neural network’s inherent adaptability to channel variations. This characteristic provides a significant advantage in real-world communications, where SNR estimation mismatch is a common challenge, thereby enhancing the robustness of CBJSCC compared to ADJSCC.

Figure 4c extends the evaluation to slow Rayleigh fading channels. Here, CBJSCC demonstrates exceptional adaptability by modifying the transmission rate, achieving the highest PSNR values across all examined SNR levels compared to other DL-based methods. Notably, CBJSCC with a ratio of

1 / 6

leads the performance. The comparison also includes the performance of traditional image compression algorithms—JPG and BPG—encoded under 16QAM modulation at a transmission rate of

1 / 6

. The performance of CBJSCC in fading channels is on par with or superior to JPG-Capacity at low SNR values. However, its effectiveness decreases marginally when the SNR exceeds 1 dB. BPG, known for its high compression efficiency, is used as a reference point, but a significant performance decline in actual environments restricts its direct comparison. In summary, the empirical results corroborate the enhanced adaptability and robustness of CBJSCC in diverse communication environments, establishing its potential for real-world applications where channel conditions are dynamic and unpredictable.

Figure 5 provides a visual comparison of the two methods. Here, we only consider the case where the ADJSCC model uses perfect SNR feedback. Both CBJSCC and ADJSCC demonstrate impressive overall reconstruction capabilities. By observing the magnified box in Figure 5a, it is evident that CBJSCC can accurately recover the details and structure of the original image, while ADJSCC introduces blurring and distortion. In Figure 5b, the distant branches in the highlighted red box of the image reconstructed by ADJSCC appear blurry, losing many details. In contrast, CBJSCC preserves the clear texture of these branches, making the image more realistic. From these observations, we can infer that CBJSCC provides superior image reconstruction quality compared to ADJSCC, retaining more details in texture-rich areas.

4.3. Exploring Model Capacity

The previous discussion highlighted the CBJSCC model’s strong feature extraction and reconstruction capabilities, which are mainly due to its design. In this section, we will further examine the connection between expressive capacity and effective model complexity within the CBJSCC model. Expressive capacity refers to a model’s ability to approximate complex problems, while effective model complexity measures the usable capacity of a trained model [57]. Expressive capacity is the maximum knowledge a model architecture can encompass. Investigating model complexity is important to understand the depth of a deep model and explore fundamental problems related to JSCC tasks. The model’s capacity can be adjusted by changing the value of N in the IRAB block as shown in Figure 2, while keeping the transmission dimension T constant.

We assessed the performance of the CBJSCC model across various N values while maintaining a constant transmission rate of

1 / 6

, comparing it to ADJSCC. Table 1 presents our evaluation results, with the best PSNR values highlighted in red and the second-best in blue.

As the N value increases, the CBJSCC model’s performance improves significantly. For example, the PSNR at SNR 1 dB increases from 28.53 dB for CBJSCC(64) to 31.90 dB for CBJSCC(192). This trend shows that higher model capacity allows for better reconstruction quality. When compared with ADJSCC, CBJSCC(128) achieves a PSNR of 34.71 dB at SNR 7 dB with 3.21 million parameters and 182.05 G MACs. In contrast, ADJSCC achieves a PSNR of 34.59 dB with 10.76 million parameters and 398.18 G MACs. CBJSCC(128) not only reduces parameters by 70.16% and computational effort by 54.28% but also achieves better or comparable PSNR across all SNR levels. WITT has 28.20 million parameters and 99.00 G MACs, much higher than CBJSCC models. Despite its high parameter count, the WITT image reconstruction quality is comparable to that of CBJSCC(128) but lower than that of CBJSCC(192) and CBJSCC(256) at high SNR levels. CBJSCC(192) and CBJSCC(256) offer superior performance and resource efficiency, particularly at higher SNR levels, showing better scalability and adaptability.

Our results show that increasing model capacity enhances performance without a proportional increase in computation and storage requirements. The adaptability of CBJSCC to channel variations without considering SNR underscores its potential for scalable applications in varying communication environments. CBJSCC models, particularly with higher N values, demonstrate that superior performance can be achieved with fewer resources, making them ideal for practical applications, where computational and storage efficiency are critical. Overall, the CBJSCC models exhibit a significant advantage in both performance and efficiency, affirming their potential for practical deployment in JSCC tasks.

4.4. Impact of Channel SNR Information

In previous experiments, we demonstrated that our proposed CBJSCC model performs better in an additive white Gaussian noise (AWGN) channel, even without SNR information. This suggests that neural networks can adapt to channel characteristics. To further analyze the impact of SNR information on the model, we compared several different schemes.

We introduced the Attention Feature (AF) module from ADJSCC, incorporating channel information into the encoder and decoder components as shown in Figure 6a. The AF module acts as a channel attention mechanism, integrating channel SNR information into the network. Figure 6b shows how we inserted the AF module into the framework. We discuss three scenarios next. In the first scenario, both the encoder and decoder can obtain the current SNR. In the second scenario, only the decoder can obtain the current SNR, with no channel feedback. In the third scenario, neither the encoder nor the decoder can obtain the current channel SNR, which is the situation in this study.

According to the results in Figure 7, different model approaches show some variation in performance at low SNR with channel feedback. However, through repeated experimentation, we found that this variation may be due to the inherent unpredictability of deep learning models. Overall, in the AWGN channel, no method was significantly superior to others. Therefore, our research findings indicate that introducing SNR estimation and feedback does not significantly improve the performance of deep learning-based JSCC.

This study also suggests that learning-based JSCC methods can implicitly learn and adapt to various channels. They can automatically adjust transmission rates based on changes in channel quality. It is important to note that, as deep learning models are black boxes, we cannot directly observe their internal mechanisms. These findings may provide insights for exploring new joint encoding and decoding methods and emphasize that the design of channel feedback mechanisms needs to align with specific scenarios.

4.5. Adaptation Ability of Different SNR Ranges

The previous results indicate that this model can adapt to different SNR levels and outperforms ADJSCC in terms of performance. Now, we will explore the impact of adaptability to the SNR range. In previous experiments, we trained the model with SNR uniformly distributed between −5 dB and 20 dB. For comparison, we introduced three additional training strategies. The first experiment involved training individual models, each using one SNR value from the set {1, 4, 7, 10, 13}. The second experiment used discrete SNR values randomly selected from the set {1, 4, 7, 10, 13} during training. Finally, the third experiment used a continuous uniform sampling technique within the range [1, 13] to determine the training SNR values.

Figure 8 illustrates that the performance curves for both discrete and continuous SNR values largely overlap, indicating no substantial differences within the examined range. However, restricting the training SNR to [1, 13] dB improves the model’s performance compared to the broader [−5, 20] dB range, due to better reconstruction within a smaller, more consistent SNR range. This advantage is offset by a decrease in image reconstruction quality for SNR values below 1 dB. This is understandable, as the model can more easily reconstruct the original image from signals with SNR changing within a smaller range. Of course, this advantage comes at the cost of sacrificing image reconstruction quality below 1 dB SNR.

In Table 2, we compare models trained with multiple different SNR values separately with models trained with SNR randomly sampled within these ranges. As expected, the model can achieve the best image reconstruction quality when the training and testings SNRs are equal. When the SNR exceeds 10 dB, we found that the second-best model is not the one trained using the discrete random sampling strategy. This may suggest that JSCC models can benefit from training at high SNR. We fine-tuned the model trained within [1, 13] dB under [−5, 20] dB, resulting in the fine-tuned model as shown in Figure 8. The performance of the model fine-tuned based on [1, 13] dB surpassed the model directly trained in the [−5, 20] dB range within the 1 to 13 dB range.

4.6. Domain Adaptation Ability

Our objective is to create a model that can generalize effectively across diverse domains. However, deep learning often requires domain adaptation or transfer, which entails training a model on one domain’s dataset and evaluating it on a different domain’s dataset. In this study, we assess the CBJSCC model’s performance across different data domains, a critical factor in determining its robustness and adaptability.

The CBJSCC model was initially trained using the ImageNet dataset and subsequently tested on a selection of diverse test datasets as illustrated in Figure 9. These datasets cover a wide range of scenarios. The iSAID dataset [55] consists of high-resolution aerial images used for instance segmentation. The People-Art dataset [54] is employed for object detection and contains 43 different styles of people, significantly differing from standard photographs. The Labeled Faces in the Wild (LFW) dataset [52] comprises over 13,000 face images downloaded from the internet, exhibiting substantial variations in pose, lighting, and facial expression, making it a prominent dataset for face recognition in computer vision. The Danbooru2021 dataset [53] is a vast collection of anime-style images, encompassing 4.9 million illustrations. The High-Resolution SAR Images Dataset (HRSID) [59] is utilized for ship detection and segmentation in high-resolution Synthetic Aperture Radar (SAR) images, which include SAR images with different resolutions, polarizations, and ports. The Unsplash Dataset [60] is an extensive collection of high-quality images from professional photographers worldwide, covering a broad range of camera brands, models, lenses, and focal lengths. The Chest X-ray8 dataset [61] is a medical dataset comprising a large number of X-ray images.

The results show that CBJSCC consistently performs well across different domains. Remarkably, CT images, being an exceedingly rare type, witnessed the model attaining a reconstruction performance of 50 dB PSNR on the CT Medical dataset. This could be attributed to the smoother color and texture variations in CT images and their fewer high-frequency details. Similarly, the CBJSCC model performed exceptionally well on the iSAID dataset, which contains a large number of water areas with minor frequency changes. The Danbooru2021 and People-Art datasets, which encompass art and cartoon images and typically exhibit strong artistic styles and visual effects, such as bright colors, unique compositions, and smooth lines, showed slightly lower reconstruction performance from the CBJSCC model. Nevertheless, the model still achieved a reconstruction performance of over 30 dB PSNR at an SNR level of 0 dB on these datasets.

The experimental results demonstrate that CBJSCC maintains excellent reconstruction capability across diverse datasets, including some rare data types. This suggests that similar DL-based JSCC methods possess robustness in wireless image transmission tasks.

5. Conclusions

In this study, we introduce the Channel-Blind Joint Source–Channel Coding (CBJSCC) method, a novel technique for wireless image transmission that leverages the adaptive capabilities of deep learning to accommodate various channel conditions without relying on explicit SNR knowledge.

The CBJSCC technique employs a carefully designed encoder–decoder structure enhanced with attention mechanisms, outperforming existing deep learning-based joint source–channel coding strategies without the need for channel feedback. The effectiveness of CBJSCC highlights the potential of deep learning-based JSCC methods to autonomously identify and adapt to the intrinsic characteristics of communication channels, thereby mitigating the adverse effects of channel noise and associated errors. Experimental results have confirmed the scalability and resilience of the CBJSCC method across multiple domain datasets, emphasizing its versatility and wide applicability. The contributions of this paper are multifaceted: (1) the introduction of an innovative deep learning-based JSCC technique that does not depend on specific SNR information, (2) demonstration of the technique’s superiority in image transmission performance, and (3) validation of its generalizability across various communication scenarios.

Future research directions include integrating CBJSCC with advanced signal processing techniques to enhance its performance in challenging channel scenarios, investigating the adaptability of CBJSCC to dynamic channel environments, which requires continuous learning and channel model updates, and expanding the applicability of CBJSCC to other media types, including video and audio, thereby broadening its use in various communication fields such as broadcasting, deep space communication, and undersea communication. We expect this research not only to enrich the JSCC field but also to inspire future efforts to overcome the limitations of traditional communication methods, thereby promoting the development of more efficient and reliable wireless multimedia transmission systems.

Author Contributions

Conceptualization, H.Y. and W.X.; methodology, X.W.; software, H.Y.; validation, X.W., Y.W. and H.Y.; formal analysis, H.Y. and W.X.; investigation, X.W.; resources, W.X.; data curation, H.Y.; writing—original draft preparation, H.Y.; writing—review and editing, H.Y. and Y.W.; visualization, H.Y.; supervision, W.X.; project administration, W.X.; funding acquisition, W.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by “the Fundamental Research Funds for the Central Universities”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gastpar, M.; Rimoldi, B.; Vetterli, M. To code, or not to code: Lossy source-channel communication revisited. IEEE Trans. Inf. Theory 2003, 49, 1147–1158. [Google Scholar] [CrossRef]
Kostina, V.; Polyanskiy, Y.; Verd, S. Joint Source-Channel Coding With Feedback. IEEE Trans. Inf. Theory 2017, 63, 3502–3515. [Google Scholar] [CrossRef]
Skoglund, M.; Phamdo, N.; Alajaji, F. Design and performance of VQ-based hybrid digital-analog joint source-channel codes. IEEE Trans. Inf. Theory 2002, 48, 708–720. [Google Scholar] [CrossRef]
Mittal, U.; Phamdo, N. Hybrid digital-analog (HDA) joint source-channel codes for broadcasting and robust communications. IEEE Trans. Inf. Theory 2002, 48, 1082–1102. [Google Scholar] [CrossRef]
Chatellier, C.; Boeglen, H.; Perrine, C.; Olivier, C.; Haeberlé, O. A robust joint source channel coding scheme for image transmission over the ionospheric channel. Signal Process. Image Commun. 2007, 22, 543–556. [Google Scholar] [CrossRef]
Kozintsev, I.; Ramchandran, K. Robust image transmission over energy-constrained time-varying channels using multiresolution joint source-channel coding. IEEE Trans. Signal Process. 1998, 46, 1012–1026. [Google Scholar] [CrossRef]
Cai, J.; Chen, C.W. Robust joint source-channel coding for image transmission over wireless channels. IEEE Trans. Circuits Syst. Video Technol. 2000, 10, 962–966. [Google Scholar] [CrossRef]
Yang, Y.; Mandt, S.; Theis, L. An Introduction to Neural Data Compression. arXiv 2022, arXiv:2202.06533. [Google Scholar] [CrossRef]
Xu, J.; Ai, B.; Chen, W.; Yang, A.; Sun, P.; Rodrigues, M. Wireless Image Transmission Using Deep Source Channel Coding With Attention Modules. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 2315–2328. [Google Scholar] [CrossRef]
Yang, M.; Bian, C.; Kim, H.S. Deep Joint Source Channel Coding for Wireless Image Transmission with OFDM. In Proceedings of the ICC 2021—IEEE International Conference on Communications, Montreal, QC, Canada, 14–18 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
Wang, S.; Dai, J.; Yao, S.; Niu, K.; Zhang, P. A Novel Deep Learning Architecture for Wireless Image Transmission. In Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; pp. 1–6. [Google Scholar] [CrossRef]
Burth Kurka, D.; Gündüz, D. Joint source-channel coding of images with (not very) deep learning. In Proceedings of the International Zurich Seminar on Information and Communication (IZS 2020), Zurich, Switzerland, 26–28 February 2020; ETH: Zurich, Switzerland, 2020; pp. 90–94. [Google Scholar]
Bourtsoulatze, E.; Burth Kurka, D.; Gündüz, D. Deep Joint Source-Channel Coding for Wireless Image Transmission. IEEE Trans. Cogn. Commun. Netw. 2019, 5, 567–579. [Google Scholar] [CrossRef]
Kurka, D.B.; Gündüz, D. DeepJSCC-f: Deep Joint Source-Channel Coding of Images With Feedback. IEEE J. Sel. Areas Inf. Theory 2020, 1, 178–193. [Google Scholar] [CrossRef]
Sun, L.; Guo, C.; Yang, Y. Joint Source-Channel Coding for Efficient Image Transmission: An Information Bottleneck Based Scheme. In Proceedings of the 2022 IEEE Globecom Workshops (GC Wkshps), Rio de Janeiro, Brazil, 4–8 December 2022; pp. 1852–1857. [Google Scholar] [CrossRef]
Ding, M.; Li, J.; Ma, M.; Fan, X. SNR-Adaptive Deep Joint Source-Channel Coding for Wireless Image Transmission. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 1555–1559. [Google Scholar] [CrossRef]
Zhao, L.; Bai, H.; Wang, A.; Zhao, Y. Multiple Description Convolutional Neural Networks for Image Compression. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 2494–2508. [Google Scholar] [CrossRef]
Liu, X.; Huang, Z.; Zhang, Y.; Jia, Y.; Wen, W. CNN and Attention-Based Joint Source Channel Coding for Semantic Communications in WSNs. Sensors 2024, 24, 957. [Google Scholar] [CrossRef] [PubMed]
Schalkwijk, J.; Kailath, T. A coding scheme for additive noise channels with feedback–I: No bandwidth constraint. IEEE Trans. Inf. Theory 1966, 12, 172–182. [Google Scholar] [CrossRef]
Yang, K.; Wang, S.; Tan, K.; Dai, J.; Zhou, D.; Niu, K. Resolution-Adaptive Source-Channel Coding for End-to-End Wireless Image Transmission. In Proceedings of the GLOBECOM 2022—2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; IEEE: New York, NY, USA, 2022; pp. 1460–1465. [Google Scholar]
Yang, M.; Kim, H.S. Deep Joint Source-Channel Coding for Wireless Image Transmission with Adaptive Rate Control. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022; pp. 5193–5197. [Google Scholar] [CrossRef]
Yang, K.; Wang, S.; Dai, J.; Qin, X.; Niu, K.; Zhang, P. SwinJSCC: Taming Swin Transformer for Deep Joint Source-Channel Coding. arXiv 2023, arXiv:2308.09361. [Google Scholar]
Sun, L.; Yang, Y.; Chen, M.; Guo, C.; Saad, W.; Poor, H.V. Adaptive information bottleneck guided joint source and channel coding for image transmission. arXiv 2023, arXiv:2203.06492. [Google Scholar] [CrossRef]
Zhang, W.; Zhang, H.; Ma, H.; Shao, H.; Wang, N.; Leung, V.C. Predictive and adaptive deep coding for wireless image transmission in semantic communication. IEEE Trans. Wirel. Commun. 2023, 22, 5486–5501. [Google Scholar] [CrossRef]
Wu, H.; Shao, Y.; Mikolajczyk, K.; Gündüz, D. Channel-Adaptive Wireless Image Transmission With OFDM. IEEE Wirel. Commun. Lett. 2022, 11, 2400–2404. [Google Scholar] [CrossRef]
Kim, H.; Jiang, Y.; Kannan, S.; Oh, S.; Viswanath, P. Deepcode: Feedback Codes via Deep Learning. IEEE J. Sel. Areas Inf. Theory 2020, 1, 194–206. [Google Scholar] [CrossRef]
Yin, H.P.; Zhao, X.H.; Yao, J.L.; Ren, H.P. Deep-Learning-based Channel Estimation for Chaotic Wireless Communication. IEEE Wirel. Commun. Lett. 2023, 13, 143–147. [Google Scholar] [CrossRef]
Rumelhart, D.; Hinton, G.; Williams, R. Learning Internal Representations by Error Propagation. In Readings in Cognitive Science; Elsevier: New York, NY, USA, 1988; pp. 399–421. [Google Scholar] [CrossRef]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
Baldi, P. Autoencoders, Unsupervised Learning, and Deep Architectures. In Proceedings of the ICML Workshop on Unsupervised and Transfer Learning, JMLR Workshop and Conference Proceedings, Bellevue, WA, USA, 2 July 2011; pp. 37–49. [Google Scholar]
Ballé, J.; Laparra, V.; Simoncelli, E.P. End-to-end Optimized Image Compression. arXiv 2017, arXiv:1611.01704. [Google Scholar] [CrossRef]
Liu, D.; Chen, Z.; Liu, S.; Wu, F. Deep Learning-Based Technology in Responses to the Joint Call for Proposals on Video Compression With Capability Beyond HEVC. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 1267–1280. [Google Scholar] [CrossRef]
Vincent, P.; Larochelle, H.; Bengio, Y.; Manzagol, P.A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning (ICML ’08), New York, NY, USA, 26–27 July 2008; pp. 1096–1103. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Park, J.; Woo, S.; Lee, J.Y.; Kweon, I.S. BAM: Bottleneck Attention Module. arXiv 2018, arXiv:1807.06514. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Vedaldi, A. Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks. arXiv 2019, arXiv:1810.12348. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
Kim, J.H.; Choi, J.H.; Cheon, M.; Lee, J.S. MAMNet: Multi-path Adaptive Modulation Network for Image Super-Resolution. arXiv 2018, arXiv:1811.12043. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. arXiv 2018, arXiv:1711.07971. [Google Scholar] [CrossRef]
Hao, Y.; Wang, S.; Cao, P.; Gao, X.; Xu, T.; Wu, J.; He, X. Attention in attention: Modeling context correlation for efficient video classification. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 7120–7132. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
Bello, I.; Zoph, B.; Le, Q.; Vaswani, A.; Shlens, J. Attention Augmented Convolutional Networks. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3285–3294. [Google Scholar] [CrossRef]
Srinivas, A.; Lin, T.Y.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 16519–16529. [Google Scholar]
Pan, X.; Ge, C.; Lu, R.; Song, S.; Chen, G.; Huang, Z.; Huang, G. On the Integration of Self-Attention and Convolution. arXiv 2022, arXiv:2111.14556. [Google Scholar] [CrossRef]
Goldblum, M.; Souri, H.; Ni, R.; Shu, M.; Prabhu, V.; Somepalli, G.; Chattopadhyay, P.; Ibrahim, M.; Bardes, A.; Hoffman, J.; et al. Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar] [CrossRef]
Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. arXiv 2022, arXiv:2201.03545. [Google Scholar] [CrossRef]
Wang, F.; Jiang, M.; Qian, C.; Yang, S.; Li, C.; Zhang, H.; Wang, X.; Tang, X. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3156–3164. [Google Scholar]
Liu, J.; Zhang, W.; Tang, Y.; Tang, J.; Wu, G. Residual Feature Aggregation Network for Image Super-Resolution. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 2356–2365. [Google Scholar] [CrossRef]
Ballé, J.; Laparra, V.; Simoncelli, E.P. Density Modeling of Images using a Generalized Normalization Transformation. arXiv 2016, arXiv:1511.06281. [Google Scholar] [CrossRef]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced Deep Residual Networks for Single Image Super-Resolution. arXiv 2017, arXiv:1707.02921. [Google Scholar] [CrossRef]
Charbonnier, P.; Blanc-Feraud, L.; Aubert, G.; Barlaud, M. Two deterministic half-quadratic regularization algorithms for computed imaging. In Proceedings of the 1st International Conference on Image Processing, Austin, TX, USA, 13–16 November 1994; Volume 2, pp. 168–172. [Google Scholar] [CrossRef]
Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5835–5843. [Google Scholar] [CrossRef]
Chen, X.; Liang, C.; Huang, D.; Real, E.; Wang, K.; Liu, Y.; Pham, H.; Dong, X.; Luong, T.; Hsieh, C.J.; et al. Symbolic Discovery of Optimization Algorithms. arXiv 2023, arXiv:2302.06675. [Google Scholar] [CrossRef]
Hu, X.; Chu, L.; Pei, J.; Liu, W.; Bian, J. Model complexity of deep learning: A survey. Knowl. Inf. Syst. 2021, 63, 2585–2619. [Google Scholar] [CrossRef]
Yang, K.; Wang, S.; Dai, J.; Tan, K.; Niu, K.; Zhang, P. WITT: A wireless image transmission transformer for semantic communications. In Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
Wei, S.; Zeng, X.; Qu, Q.; Wang, M.; Su, H.; Shi, J. HRSID: A High-Resolution SAR Images Dataset for Ship Detection and Instance Segmentation. IEEE Access 2020, 8, 120234–120254. [Google Scholar] [CrossRef]
Unsplash Dataset|The World’s Largest Open Library Dataset. 2023. Available online: https://unsplash.com/data (accessed on 29 May 2023).
Wang, X.; Peng, Y.; Lu, L.; Lu, Z.; Bagheri, M.; Summers, R.M. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2097–2106. [Google Scholar]

Figure 1. Illustration of the DL-based JSCC.

Figure 2. (a) Architecture of the proposed CBJSCC method. The blue dashed box and the green dashed box represent the encoder and the decoder, respectively. The numbers under each IRAB block indicate the expansion factor. N and T represent the dimension of the feature map. (b) In real-world deployment of the CBJSCC model, the encoded signal can be transmitted directly to the client through various channels at different ratios.

Figure 3. (a) The IRAB module is characterized by the convolution kernel size, denoted by the numbers in the box. In IRAB, the variable X signifies the multiple of the current convolution kernel size in relation to the input. (b) The ESA block. (c) The ACMix block.

Figure 4. Image reconstruction performance comparison on the Kodak dataset under an AWGN channel. (a,b) display results for transmission rate ratios of

1 / 6

and

1 / 12

, respectively. The solid red curve represents the performance of the proposed CBJSCC method. The dashed lines represent the performance of ADJSCC under different SNR feedback conditions, including perfect feedback and feedback SNRs of 0 dB, 5 dB, 10 dB, and 20 dB. (c) extends the evaluation to slow Rayleigh fading channels, comparing CBJSCC with alternative methods, indicated within parentheses, at different transmission rates.

Figure 4. Image reconstruction performance comparison on the Kodak dataset under an AWGN channel. (a,b) display results for transmission rate ratios of

1 / 6

and

1 / 12

, respectively. The solid red curve represents the performance of the proposed CBJSCC method. The dashed lines represent the performance of ADJSCC under different SNR feedback conditions, including perfect feedback and feedback SNRs of 0 dB, 5 dB, 10 dB, and 20 dB. (c) extends the evaluation to slow Rayleigh fading channels, comparing CBJSCC with alternative methods, indicated within parentheses, at different transmission rates.

Figure 5. (a,b) demonstrate the image reconstruction performance of CBJSCC and ADJSCC on the Kodak dataset under an AWGN channel with a 1 dB SNR at different transmission rates: (a)

1 / 6

and (b)

1 / 12

. (c) shows the image reconstruction under a slow Rayleigh fading channel with a transmission rate of

1 / 6

.

Figure 5. (a,b) demonstrate the image reconstruction performance of CBJSCC and ADJSCC on the Kodak dataset under an AWGN channel with a 1 dB SNR at different transmission rates: (a)

1 / 6

and (b)

1 / 12

. (c) shows the image reconstruction under a slow Rayleigh fading channel with a transmission rate of

1 / 6

.

Figure 6. (a) Architecture of the AF module in ADJSCC. (b) Modified CBJSCC with the AF module. The red dashed line represents the input of SNR information. The yellow block represents the inserted AF module.

Figure 7. PSNR curves for reconstructed images under three different scenarios.

Figure 8. Performance comparison with different training SNR ranges. The red solid line represents the model trained on a wide SNR range from [−5, 20] dB. The pink dashed line indicates the same range but with fine-tuning. The light blue dotted line shows the model trained on discrete SNR values {1, 4, 7, 10, 13} dB. The blue dash–dot line represents the model trained on the continuous range [1, 13] dB.

Figure 9. A clustered column chart showing the performance of CBJSCC across multiple dataset domains. The color of the line indicates the channel SNR.

Table 1. EMC analysis across different model capacity.

Model(N dim)	Params(M)	MACs(G)	PSNR(dB)
Model(N dim)	Params(M)	MACs(G)	SNR 1 dB	SNR 4 dB	SNR 7 dB	SNR 13 dB	SNR 19 dB
CBJSCC(64)	0.87	48.10	28.53	29.36	29.90	30.04	30.54
CBJSCC(96)	1.85	104.26	30.55	32.39	34.06	36.55	37.72
CBJSCC(128)	3.21	182.05	30.95	32.88	34.71	37.85	39.72
CBJSCC(160)	4.96	281.49	31.30	33.14	34.90	37.96	39.76
CBJSCC(192)	7.09	402.57	31.90	33.71	35.41	38.58	40.81
CBJSCC(256)	12.49	709.66	31.81	33.66	35.45	38.70	40.97
ADJSCC [9]	10.76	398.18	30.53	32.60	34.59	38.00	40.02
WITT [58]	28.20	99.00	31.01	32.29	34.36	36.54	37.62

Table 2. Reconstructed image PSNR with different training SNR ranges.

Channel SNR	Training SNR Range
Channel SNR	{1, 4, 7, 10, 13} dB	1 dB	4 dB	7 dB	10 dB	13 dB
1 dB	31.86	32.08	30.31	26.29	23.77	22.27
4 dB	34.24	33.37	34.28	33.27	30.88	28.70
7 dB	36.20	33.85	35.61	36.46	36.07	34.66
10 dB	37.97	34.09	36.25	37.92	38.53	38.22
13 dB	39.49	34.20	36.57	38.73	39.99	40.33

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yuan, H.; Xu, W.; Wang, Y.; Wang, X. Channel-Blind Joint Source–Channel Coding for Wireless Image Transmission. Sensors 2024, 24, 4005. https://doi.org/10.3390/s24124005

AMA Style

Yuan H, Xu W, Wang Y, Wang X. Channel-Blind Joint Source–Channel Coding for Wireless Image Transmission. Sensors. 2024; 24(12):4005. https://doi.org/10.3390/s24124005

Chicago/Turabian Style

Yuan, Hongjie, Weizhang Xu, Yuhuan Wang, and Xingxing Wang. 2024. "Channel-Blind Joint Source–Channel Coding for Wireless Image Transmission" Sensors 24, no. 12: 4005. https://doi.org/10.3390/s24124005

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Channel-Blind Joint Source–Channel Coding for Wireless Image Transmission

Abstract

1. Introduction

2. Related Works

2.1. Autoencoder

2.2. Attention Mechanism

3. Proposed Method

3.1. Overall Architecture

3.2. The Inverted Residual Attention Bottleneck

3.3. Convolution and Self-Attention Mixed Module

4. Experiments and Analysis

4.1. Training and Implementation Details

4.2. Experiment and Results

4.3. Exploring Model Capacity

4.4. Impact of Channel SNR Information

4.5. Adaptation Ability of Different SNR Ranges

4.6. Domain Adaptation Ability

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI