Skip to Content
SensorsSensors
  • Article
  • Open Access

6 August 2024

Lightweight Single Image Super-Resolution via Efficient Mixture of Transformers and Convolutional Networks

,
and
College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
*
Author to whom correspondence should be addressed.

Abstract

In this paper, we propose a Local Global Union Network (LGUN), which effectively combines the strengths of Transformers and Convolutional Networks to develop a lightweight and high-performance network suitable for Single Image Super-Resolution (SISR). Specifically, we make use of the advantages of Transformers to provide input-adaptation weighting and global context interaction. We also make use of the advantages of Convolutional Networks to include spatial inductive biases and local connectivity. In the shallow layer, the local spatial information is encoded by Multi-order Local Hierarchical Attention (MLHA). In the deeper layer, we utilize Dynamic Global Sparse Attention (DGSA), which is based on the Multi-stage Token Selection (MTS) strategy to model global context dependencies. Moreover, we also conduct extensive experiments on both natural and satellite datasets, acquired through optical and satellite sensors, respectively, demonstrating that LGUN outperforms existing methods.

1. Introduction

Single Image Super-Resolution (SISR) is a prominent research field in computer vision that focuses on enhancing the visual details and overall appearance of low-resolution (LR) images by generating high-resolution (HR) versions. It has diverse applications across domains such as surveillance [1,2,3,4], medical imaging [5,6], satellite imagery [7,8], and monitoring [9,10]. Recent advancements in SISR techniques have leveraged advanced algorithms and deep learning models to effectively recover missing high-frequency details and textures from LR inputs, enabling significant improvements in resolution and visual quality.
Convolutional Networks are widely adopted for various visual tasks, including SISR [11,12]. The inherent properties of convolutional operations, such as the ability to aggregate information from adjacent pixels or regions, e.g., 3 × 3 windows, make them effective at capturing spatially local patterns. These properties, including translation invariance, local connectivity, and the sliding-window strategy, provide valuable inductive biases. However, Convolutional Networks suffer from two main limitations. Firstly, they have a local receptive field, restricting their ability to model global context. Secondly, the interaction between spatial locations is fixed through a static convolutional kernel during inference, limiting their flexibility to adapt to varying input content. Transformers, on the other hand, offer a solution to address these limitations. By introducing self-attention (SA) in Vision Transformers (ViTs), global interactions can be explicitly modeled, and the importance of each token can be dynamically adjusted through attention scores computed between all pairs of tokens during inference. However, the computational complexity of Transformers, which grows quadratically with the token length N (or spatial resolution H W ), poses challenges for real-world applications on resource-constrained hardware. This leads to the following natural question: How can we effectively combine the strengths of Convolutional Networks and ViTs to develop a lightweight and high-performance network suitable for resource-constrained devices?
In this work, we address the aforementioned question by focusing on the design of a lightweight and high-performance network for SISR tasks. The performance of our work is shown in Figure 1 compared with others. Our proposed approach, named LGUN, leverages the advantages of Convolutional Networks, such as spatial inductive biases and local connectivity, as well as Transformers, which offer input-adaptation weighting and global context interaction. Therefore, our core concept is illustrated in Figure 2. Compared to uni-dimensional information communication, e.g., spatial-only communication such as EIMN [13] or channel-only communication such as Restormer [14], our method can achieve local spatial-wise aggregation and global channel-wise interaction simultaneously, both of which are crucial for SISR tasks. As commonly known, in Convolutional Networks, the shallow layers of a network employ convolutional filters with smaller receptive fields, capturing local patterns and features like edges, corners, and textures. These low-level features are extracted in the initial layers, providing local information about the input data. By stacking multiple building blocks, Convolutional Networks gradually enlarge their receptive fields, enabling the capture of large-range spatial context information. Based on this prior knowledge, as shown in Figure 3, we divide the core modules, named Local Global Union (LGU), into two stages: Multi-order Local Hierarchical Aggregation (MLHA) and Dynamic Global Sparse Attention (DGSA). In the shallow layers, we employ MLHA to encode local spatial information efficiently. This approach feeds each sub-branch with only a subset of the entire feature, facilitating the explicit learning of distinct feature patterns through the Split–Transform–Fusion (STF) strategy. In the deep layers, we introduce DGSA to model long-range non-local dependencies while obtaining an effective receptive field of H × W. DGSA operates across the feature dimension, utilizing interactions based on the cross-covariance matrix between keys and queries. Considering the potential negative impact of irrelevant or confusing information in the attention matrix, which other methods [14] fail to consider, we incorporate the Multi-stage Token Selection (MTS) strategy into DGSA, which selects multiple top-k similar attention matrices and masks out insignificant elements allocated with lower weights. This reduces redundancy in attention maps and suppresses interference from cluttered backgrounds. The proposed design is robust to changes in the input token length and decreases the computational complexity to O ( N C 2 ) , where C N .
Figure 1. Trade–off between performance and model complexity on Set5 ×4 dataset. Multi-Adds are calculated on 1280 × 720 HR images.
Figure 2. Compared to uni-dimensional information communication, e.g., spatial-only or channel-only, our method can achieve local spatial-wise aggregation and global channel-wise interaction simultaneously, both of which are crucial for SISR tasks.
Figure 3. The architecture of our proposed method, LGUN, consists of three main parts: feature extraction, nonlinear mapping, and image reconstruction. The core modules, named LGU, include two stages: MLHA and DGSA. In the shallow layers, MLHA efficiently encodes local spatial information by utilizing subsets of the entire feature, enabling explicit learning of distinct feature patterns through the STF strategy. In the deep layers, DGSA is employed to model long-range non-local dependencies while achieving a global effective receptive field. DGSA operates across the feature dimension and leverages interactions based on the cross-covariance matrix between keys and queries. Moreover, we incorporate the MTS strategy into DGSA, which selects multiple top-k similar attention matrices and masks out elements with lower weights. This reduces redundancy in attention maps and suppresses interference from cluttered backgrounds. LGUN exhibits robustness to changes in the input token length and significantly reduces the computational complexity to O ( N C 2 ) , where C N .
Our contributions can be summarized as follows:
(1)
We propose LGUN, a hybridization structure designed for resource-constrained devices. It combines the strengths of Convolutional Networks and ViTs, allowing for effective encoding of both local processing and global interaction throughout the network by the proposed LGU.
(2)
In the shallow layer, we employ MLHA to focus on encoding local spatial information. By using the STF strategy, MLHA promotes the learning of different patterns while also saving computational resources. In the deep layer, we utilize DGSA based on the MTS strategy to model global context dependencies. This enhances the network’s ability to model complex image patterns with high adaptability and representational power.
(3)
Experimental results on popular benchmark datasets demonstrate the superiority of our method compared to other recently advanced Transformer-based approaches. Our method outperforms in both quantitative and qualitative evaluations, providing evidence for the effectiveness of the MLHA-with-STF strategy and the DGSA-with-MTS strategy.

3. Methods

3.1. Overall Architecture

The proposed network architecture consists of three primary components: (1) feature extraction FE ( · ) , (2) nonlinear mapping NLM ( · ) , and (3) reconstruction REC ( · ) . The input and output of the model are denoted as I LR R H × W × 3 and I SR R H × W × 3 , respectively. In the initial stage, I LR undergoes an overlapped image patch embedding process, where a 3 × 3 convolution layer is applied at the beginning of the network. This results in F embed R H × W × C feature maps. Subsequently, F embed passes through N stacked blocks to facilitate the learning of local and global relationships. The final reconstructed result is obtained as follows: I SR = REC ( NLM ( F embed ) + F embed ) .

3.2. LGU

The core modules of LGU, as depicted in Figure 3, include Multi-order Local Hierarchical Aggregation (MLHA) and Dynamic Global Sparse Attention (DGSA). The MLHA module efficiently encodes local spatial information by feeding each sub-branch with a subset of the entire feature, facilitating the explicit learning of distinct feature patterns. On the other hand, the DGSA module aims to model long-range non-local dependencies by leveraging interactions across feature dimensions, resulting in an effective global receptive field. This design ensures robustness to changes in the input token length while reducing computational complexity to O ( N C 2 ) , where C N . More specific details are provided below:
S h a l l o w L a y e r X = X + M L H A ( N o r m ( X ) ) X = X + F F N ( N o r m ( X ) )
D e e p L a y e r Z = Z + D G S A ( N o r m ( Z ) ) Z = Z + F F N ( N o r m ( Z ) )

3.3. Multi-Order Local Hierarchical Aggregation (MLHA)

In the shallow layer of our method, we employ MLHA to focus on encoding local spatial information. By using the Split–Transform–Fusion (STF) strategy, MLHA promotes the learning of different patterns while also saving computational resources.
Given the input feature X R H × W × C , it passes through three consecutive units: L i n e a r M L H A L i n e a r . The specific details of MLHA are as follows:
Firstly, split. The input feature F in R H × W × C is divided into m subparts denoted by x i . Each subpart has the same spatial size of H × W and a channel number of 1 s C , where i 1 , 2 , . . . , m .
Secondly, transform. Each subpart feature x i is individually processed by a large kernel convolutional sequence (LKCS) denoted as L K C S i ( · ) , which performs self-adaptive recalibration of the subpart features. Each L K C S i ( · ) has a similar structure: DW - Conv k 1 × k 1 , DW - D - Conv k 2 × k 2 , and Conv k 3 × k 3 .
Finally, fusion. The MLHA integrates multiple re-weighting L K C S i ( · ) processes, enabling the modeling of spatial pixel relationships and the interaction of multi-order context information for input content self-adaptation. Specifically, each subpart feature x i ( i > 1 ) is added to the output of L K C S i 1 ( · ) and then passed to the next branch L K C S i ( · ) for further processing. The output feature y i of L K C S i ( · ) corresponds to the input x i and is passed to the concatenation layer. The concatenation layer aggregates large-range spatial relationships and multi-order context information, treating them as weight matrices for self-adaptive modulation of the input feature F in . By effectively mining the underlying relevance of F in , positions with high scores receive adequate attention while insignificant positions are suppressed. This flexible and effective modulation of the feature representation promotes the modeling of complex image patterns with high adaptability and representational power. The process can be expressed as follows:
F MLHA = F in C o n c a t ( y 1 , . . . , y s )
y i = x i , i = 1 ; L K C S i ( x i + y i 1 ) , 1 < i s

3.4. Dynamic Global Sparse Attention (DGSA)

The token-based SA mechanism calculates the weight matrix along the token dimension. However, the quadratic increase in computational complexity as the sequence length N grows makes it unsuitable for long sequences and high-resolution images. To address this, compromise solutions have been proposed with two approaches: (1) replacing global SA with local SA, which restricts the SA calculation to local windows, and (2) reducing the sequence length of the key and the value through pooling or stride convolution. However, the former method can only capture dependencies within a limited local range, thus constraining the modeling capacity of the entire network to a local region. The latter method, on the other hand, may result in excessive downsampling, leading to information loss or the confusion of relationships, which contradicts the purpose of SISR. In this work, we present an efficient solution that enables global interactions in SA with linear complexity. Instead of considering global interactions between all tokens, we propose the use of Dynamic Global Sparse Attention (DGSA), which operates across feature channels rather than tokens. In DGSA, the interactions are based on the cross-covariance matrix computed over the key and query projections of the token features. The specific details are as follows:
Consider an input token sequence, X R N × D , where N and D denote the length and dimension of the input sequence, respectively. DGSA first generates the query Q , key K , and value V using linear project layers from X ,
Q = X W q , K = X W k , V = X W v
where W q , W k , and W v R D × D h are learnable weight matrices and D h is the number of project dimensions. Next, the output of DGSA is computed as a weighted sum over N value vectors,
A ( Q , K , V ) = V · S o f t m a x ( K · Q d h )
Importantly, DGSA has a linear complexity of O ( N ) rather than O ( N 2 ) in vanilla SA.
As mentioned in the Introduction, to address the potential negative impact of irrelevant or confusing information in the SISR task, we introduce a Multi-stage Token Selection (MTS) strategy. As shown in Figure 4, this strategy involves selecting the top-k similar tokens from the keys for each query in order to compute the attention weight matrix. To achieve this, we employ multiple different k values parallelly, resulting in multiple attention matrices with varying degrees of sparsity. The final output is obtained by combining these matrices through a weighted sum. The DGSA with MTS can be expressed as follows:
D G S A ( Q , K , V ) = n = 1 3 w n D G S A k n ( Q , K , V )
D G S A k n ( Q , K , V ) = V · S o f t m a x T k n ( K · Q d h )
where w 1 , w 2 , and w 3 represent the assigned weight, which is obtained through dynamic adaptation learning by the network, with an initial value of 0.1, and T k n ( · ) is the dynamic learnable row-wise top-k selection operator:
T k ( A ) i j = A i j A i j top - k ( row j i n f otherwise
We set Multi-stage Token Selection thresholds k 1 , k 2 , and k 3 to 1 2 , 2 3 , and 3 4 , respectively.
Figure 4. Multiple attention matrices. Take a head as an example ( D = D h ), where w 1 , w 2 , w 3 , and w 4 represent the assigned weight, which is obtained by dynamic adaptation learning of the network. We set Multi-stage Token Selection thresholds k 1 , k 2 , k 3 , and k 4 to 1 2 , 2 3 , 3 4 , and 4 5 , respectively.
In conclusion, DGSA offers two significant advantages. Firstly, it enables the modeling of global correlations by selecting the most similar tokens from the entire attention matrix while effectively filtering out irrelevant ones. Secondly, by employing a weighted sum of multiple attention matrices with varying degrees of sparsity, the model can adequately capture the underlying relevance between all pairs of positions. This approach assigns higher weights to positions of greater importance while suppressing insignificant positions. Consequently, it facilitates the identification of crucial features and their effective utilization in subsequent processing steps. Through this mechanism, our method adaptively selects high-contributing scores from input elements, promoting the modeling of complex image patterns with enhanced adaptability and representational power.

3.5. Feed-Forward Network (FFN)

The original Feed-Forward Network (FFN) has limitations in modeling local patterns and spatial relationships, which are crucial for SISR. The inverted residual block (IRB) incorporates a depth-wise convolution between two linear transform layers. This design enables the aggregation of local information among neighboring pixels within each channel. Building upon this idea, we adopted the IRB’s design paradigm, and the point-wise convolutional layers in the vanilla FFN were replaced with a combination of depth-wise convolutions and excitation-and-squeeze modules. This modification captures local patterns and structures effectively. Further details are provided below.
F F N ( X ) = L i n e a r ( σ ( S A L ( L i n e a r ( X ) ) ) )
where σ indicates the nonlinear activation function GELU. S A L indicates the spatial awareness layer.

3.6. Discussion

As mentioned earlier, our method combines the strengths of Convolutional Networks, such as spatial inductive biases and local connectivity, with Transformers, which provide input-adaptive weighting and global context processing. This integration allows us to achieve a favorable balance between complexity and performance. The advantages of our approach can be summarized as follows:
(1) Fine-grained local modeling. The MLHA incorporates a re-weighting process into both the sub-branch and entire features. By utilizing the extracted convolutional features as weight matrices, we can self-adaptively re-calibrate the input representations, effectively capturing spatial relationships and enabling multi-order feature interactions. This approach ensures that important positions receive appropriate focus while suppressing insignificant positions. It is worth noting that each sub-branch feature x i can receive features from all subparts x i , j i , and passes through large kernel convolutional sequences, resulting in a larger receptive field.
(2) Efficient global interaction. The DGSA is capable of modeling long-range non-local dependencies while obtaining an effective global receptive field. The interactions in DGSA operate across feature dimensions and are based on the cross-covariance matrix between keys and queries. To avoid interference with subsequent super-resolution tasks, our MTS strategy selects multiple top-k similarity scores between queries and keys for attention matrix calculation. This strategy masks out insignificant elements with lower weights, reducing redundancy in attention maps and suppressing clutter background interference, thereby facilitating better feature aggregation.
(3) Linear complexity. Our method remains robust to changes in the input token length while achieving linear computational complexity of O ( N C 2 ) , where C N . This enables flexible and effective modeling of feature representation, promoting the capture of complex image patterns with high representational power.

4. Experiments

4.1. Implementation Details

Our proposed method comprises 16 fundamental building blocks, with each block having 64 channels. Minor channel adjustments are made only in the image reconstruction part for the ×2, ×3, and ×4 scales. To evaluate the effectiveness of our proposed method, we tested it on five common benchmark datasets: Set5 [60], Set14 [61], BSD100 [62], Urban100 [63], and Manga109 [64]. We measured the average peak-signal-to-noise ratio (PSNR) and the structural similarity (SSIM) on the luminance (Y) channel of YCbCr space. Our method was implemented using Pytorch 1.12.0 and trained on a single NVIDIA RTX 3090 GPU. More hyper-parameters of the training process are shown in Table 1.
Table 1. Hyper-parameters of the training process.

4.2. Comparison with State-of-the-Art (SOTA) Methods

To validate the effectiveness of our method, we present the reconstruction results obtained by various SR models on both natural and satellite remote sensing images. These images were captured using common optical sensors (e.g., CMOS) as well as satellite sensors (e.g., millimeter-wave sensors). First, we verify the effectiveness of our proposed method on natural images. In Section 4.2.3, we verify the effectiveness of the method on satellite remote sensing images.

4.2.1. Quantitative and Qualitative Results

In Table 2, we compare the proposed method with recent SOTA efficient SISR approaches for upscale factors of ×2, ×3, and ×4 on five benchmark datasets. For instance, we used SRCNN [15], VDSR [11], DRCN [65], LapSRN [66], MemNet [67], SRFBN-S [68], IDN [69], CARN [70], EDSR [12], FALSR-A [18], SMSR [71], A2N [72], LMAN [26], DRSDN [24], SwinIR [32], and NGswin [73]. Notably, SwinIR [32] and NGswin [73] are recently advanced Transformer-based methods. Specifically, in Set5, the average PSNR value at ×2 scale is improved by 0.63 and the average SSIM value of ×2 scale is improved by 0.0036 on average over other methods; the average PSNR value at ×4 scale is improved by 0.89 and the average SSIM value at ×4 scale is improved by 0.0144 on average over other methods. In Set14, the average PSNR value at ×2 scale is improved by 0.64 and the average SSIM value at ×2 scale is improved by 0.0079; the average PSNR value at ×4 scale is improved by 0.64 and the average SSIM value at ×4 scale is improved by 0.0165 on average over other methods. Obviously, with a lower complexity, our method (Parameters/Multi-Adds @ PSNR/SSIM: 542K/113G @ 38.24/0.9618) obtains better PSNR/SSIM results compared to recently improved Transformer-based and Convolutional Network-based methods, such as SwinIR (878K/243.7G @ 38.14dB/0.9611) and NGswin (998K/140.4G @ 38.05dB/0.9610).
Table 2. Quantitative comparison with SOTA methods on five popular benchmark datasets. Thicker text indicates the best results. ‘Multi-Adds’ is calculated with a 1280 × 720 HR image. The bold font shows the best value in every group.
In Figure 5, we present the qualitative comparison results for different methods at upscale factors of ×4. For the images “img 024”, “img 067”, “img 071”, “img 073” and “img 076” in the Urban100 dataset, our method demonstrates superior reconstruction of lattice and text patterns with minimal blurriness and artifacts compared to other methods. This observation confirms the usefulness and effectiveness of our approach. Taking the image “img 024” as an example, our method accurately generates stripes with the correct direction and minimal blurring, while the other methods produce incorrect stripes and a noticeable blur over a wide range.
Figure 5. Qualitative comparison of state-of-the-art methods on Urban100 [63]. Our method achieves better performance with fewer artifacts and less blur.

4.2.2. Visualization Analysis

LAM Results. In Figure 6, we analyze the local attribution map (LAM [76]) results for SwinIR [32], AAN [72], LMAN [26], and our method to investigate the utilization range of pixels in the input image during the reconstruction of the selected area. We employ the diffusion index (DI) as an evaluation metric to assess the model’s ability to extract features and utilize relevant information. As illustrated in Figure 6, our method exhibits the utilization of a larger range of pixel information in reconstructing the area outlined by a red box. This observation demonstrates that our approach achieves a larger receptive field through an efficient local and global interaction.
Figure 6. Results of local attribution maps. A more widely distributed red area and higher DI represent a larger range of pixel utilization.
To facilitate intuitive comparisons, we present a heat map, as shown in Figure 7, illustrating the differences in interest areas between the SR networks (referred to as “Diff”). An observation can be made that the proposed LGUN exhibits a more extensive diffusion region compared to CARN [70], EDSR [12], SwinIR [32], and AAN [72]. This observation indicates that our designs enable the exploitation of a greater amount of intra-frame information while maintaining limited network complexity. This is primarily attributed to the MLHA and DGSA employed in LGUN, which facilitate the learning of diverse information ranges and the selective retention of spatial textures deemed useful.
Figure 7. The heat maps exhibit the area of interest for different SR networks. The red regions are noticed by CARN [70], EDSR [12], SwinIR [32] and AAN [72], while the blue areas represent the additional LAM interest areas of the proposed LGUN. (LGUN has a higher diffusion index).

4.2.3. Remote Sensing Image Super-Resolution

Satellite sensors play a vital role in remote sensing by capturing images and data of the Earth’s surface from space. These sensors are mounted on Earth-orbiting satellites and are specifically designed to gather information across multiple wavelengths of the electromagnetic spectrum. Remote sensing images obtained from satellite sensors offer valuable insights for a wide range of applications, including environmental monitoring, land use classification, disaster management, and climate studies.
One crucial task of remote sensing is SISR, which aims to enhance the resolution of satellite images. Higher-resolution images provide more accurate and detailed information about the Earth’s surface, which is crucial for various applications. Therefore, SISR plays a pivotal role in maximizing the usefulness of remote sensing data. To demonstrate the effectiveness of our proposed method in enhancing remote sensing images obtained from satellite sensors, we present the SISR results of different networks in Figure 8. Our network exhibits clear advantages in recovering remotely sensed images, particularly in capturing texture details, lines, and repetitive structures. In contrast, other contrast algorithms often introduce artifacts and blending issues when dealing with remote sensing images that have complex backgrounds. At the same time, our network effectively mitigates blurring artifacts and reconstructs edge details with higher fidelity.
Figure 8. Qualitative comparison of state-of-the-art methods on AID dataset.

4.3. Ablation Study

In Table 3, we present the results of the ablation study for our method. Below, we discuss the ablation results based on the following aspects:
Table 3. Ablation experiments on the micro structure design. The bold font shows the best value in every group.
The influence of the structure configuration. The primary objective of this study was to efficiently encode local spatial information, model long-range non-local dependencies, and achieve a global receptive field by leveraging the strengths of Convolutional Networks, which provide spatial inductive biases and local connectivity, and Transformers, which offer input-adaptive weighting and global context interaction. In order to validate the effectiveness of the two core modules, namely MLHA and DGSA, we conducted experiments where one module was removed while the other was retained. The results, as presented in Table 3(a), demonstrate a significant decrease in model performance when either of the modules is removed. These findings indicate that the model benefits from both the global interaction introduced by the DGSA module and the fine-grained local modeling achieved by MLHA.
The influence of the MLHA part. In the initial layers of our model, we utilize MLHA to efficiently encode local spatial information. This is achieved by feeding each sub-branch with a specific subset of the complete feature. The effectiveness of the STF strategy is demonstrated in Table 3(b), where it is shown to enhance the explicit learning of distinct feature patterns within the network, leading to improved performance compared to models trained without the STF strategy.
The influence of the DGSA part. In the deeper layers of our model, we introduce DGSA to effectively model long-range non-local dependencies and achieve a global receptive field of H × W. To reduce redundancy in attention maps and mitigate interference from cluttered backgrounds, we employ the MTS strategy, which selects multiple top-k similar attention matrices and masks out elements with lower weights. In Table 3(c), we display the results of a series of experiments to assess the effectiveness of the DGSA module. These experiments include scenarios with no sparse attention (w/o top-k), sparse attention (w/top-k), and sparse attention with the MTS strategy (top-k with MTS). The results of these experiments indicate that employing sparse attention with the MTS strategy leads to improved performance.
The influence of the design of LKCS in the MLHA part. We conducted an experiment to verify the effectiveness of three LKCS modules in our MLHA. Specifically, each LKCS module consists of three convolution layers: DW-Conv layer, DW-D-Conv layer, and Conv layer. The three LKCS modules differ in the kernel size of the three convolution layers they contain. In the first LKCS module, the kernel sizes of the three convolution layers are 3, 5, and 1. In the second LKCS module, The kernel sizes of the three convolution layers are 5, 7, and 1. And in the third LKCS module, the kernel sizes of the three convolution layers are 7, 9, and 1. We wanted to show the effectiveness of extracting features using different kernel sizes. We conducted the experiments, in which the three LKCS modules were exactly the same. The kernel sizes of the three convolution layers in all three LKCS modules were set to 5, 7, and 1. The results are shown in Table 3(d), which shows the effectiveness of our proposed LKCS module.

4.4. Application

There are many potential applications of the Lightweight Image Super-Resolution approach. For example, in surveillance, SR techniques can enhance video resolution, making images sharper and clearer so that details, such as facial features and licence plate numbers, can be more easily identified, thus enhancing security. In medical imaging, SR technology can improve the clarity of medical images and help doctors diagnose conditions more accurately. In the field of satellite imagery, SR technology can improve image quality and make remote sensing data analysis more accurate, which is used in environmental monitoring, urban planning, and other fields. The lightweight SR method is particularly suitable for resource-constrained devices and real-time processing scenarios due to its low computation and storage requirements.

5. Conclusions

The aim of this study is to develop a lightweight and high-performance network for SISR by effectively combining the strengths of Transformers and Convolutional Networks. To achieve this objective, we propose a novel lightweight SISR method called LGUN. LGUN focuses on encoding local spatial information within MLHA and utilizes the Split–Transform–Fusion (STF) strategy to facilitate the learning of diverse patterns. Additionally, it models global context dependencies through the core module: DGSA. DGSA selects multiple top-k similar attention matrices and masks out elements with lower weights, thereby reducing redundancy in attention maps and suppressing interference from cluttered backgrounds. The experimental results, evaluated on popular benchmarks, demonstrate the superior quantitative and qualitative performance of our method.

Author Contributions

Conceptualization, L.X.; methodology, L.X.; software, L.X.; validation, X.L.; formal analysis, X.L.; investigation, L.X. and X.L.; resources, C.R.; data curation, L.X.; writing—original draft preparation, L.X.; writing—review and editing, C.R.; visualization, L.X.; supervision, C.R.; project administration, C.R.; funding acquisition, C.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 62171304, the Natural Science Foundation of Sichuan Province under Grant 2024NSFSC1423, and the Cooperation Science and Technology Project of Sichuan University and Dazhou City under Grant 2022CDDZ-09.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The public data used in this work are listed here: Flickr2K [12], Set5 [60], Set14 [61], Urban100 [63], BSDS100 [62], Manga109 [64] and DIV2K [77].

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Silva, N.P.; Amin, B.; Dunne, E.; Hynes, N.; O’Halloran, M.; Elahi, A. Implantable Pressure-Sensing Devices for Monitoring Abdominal Aortic Aneurysms in Post-Endovascular Aneurysm Repair. Sensors 2024, 24, 3526. [Google Scholar] [CrossRef]
  2. Silva, N.P.; Elahi, A.; Dunne, E.; O’Halloran, M.; Amin, B. Design and Characterisation of a Read-Out System for Wireless Monitoring of a Novel Implantable Sensor for Abdominal Aortic Aneurysm Monitoring. Sensors 2024, 24, 3195. [Google Scholar] [CrossRef] [PubMed]
  3. Negre, P.; Alonso, R.S.; González-Briones, A.; Prieto, J.; Rodríguez-González, S. Literature Review of Deep-Learning-Based Detection of Violence in Video. Sensors 2024, 24, 4016. [Google Scholar] [CrossRef] [PubMed]
  4. Liu, H.; Yang, L.; Zhang, L.; Shang, F.; Liu, Y.; Wang, L. Accelerated Stochastic Variance Reduction Gradient Algorithms for Robust Subspace Clustering. Sensors 2024, 24, 3659. [Google Scholar] [CrossRef] [PubMed]
  5. Chakraborty, D.; Boni, R.; Mills, B.N.; Cheng, J.; Komissarov, I.; Gerber, S.A.; Sobolewski, R. High-Density Polyethylene Custom Focusing Lenses for High-Resolution Transient Terahertz Biomedical Imaging Sensors. Sensors 2024, 24, 2066. [Google Scholar] [CrossRef] [PubMed]
  6. Wang, W.; He, J.; Liu, H.; Yuan, W. MDC-RHT: Multi-Modal Medical Image Fusion via Multi-Dimensional Dynamic Convolution and Residual Hybrid Transformer. Sensors 2024, 24, 4056. [Google Scholar] [CrossRef] [PubMed]
  7. Chang, H.K.; Chen, W.W.; Jhang, J.S.; Liou, J.C. Siamese Unet Network for Waterline Detection and Barrier Shape Change Analysis from Long-Term and Large Numbers of Satellite Imagery. Sensors 2023, 23, 9337. [Google Scholar] [CrossRef] [PubMed]
  8. Njimi, H.; Chehata, N.; Revers, F. Fusion of Dense Airborne LiDAR and Multispectral Sentinel-2 and Pleiades Satellite Imagery for Mapping Riparian Forest Species Biodiversity at Tree Level. Sensors 2024, 24, 1753. [Google Scholar] [CrossRef] [PubMed]
  9. Wan, S.; Guan, S.; Tang, Y. Advancing bridge structural health monitoring: Insights into knowledge-driven and data-driven approaches. J. Data Sci. Intell. Syst. 2023, 2, 129–140. [Google Scholar] [CrossRef]
  10. Wu, Z.; Tang, Y.; Hong, B.; Liang, B.; Liu, Y. Enhanced Precision in Dam Crack Width Measurement: Leveraging Advanced Lightweight Network Identification for Pixel-Level Accuracy. Int. J. Intell. Syst. 2023, 2023, 9940881. [Google Scholar] [CrossRef]
  11. Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
  12. Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
  13. Liu, X.; Liao, X.; Shi, X.; Qing, L.; Ren, C. Efficient Information Modulation Network for Image Super-Resolution. In ECAI 2023; IOS Press: Amsterdam, The Netherlands, 2023; pp. 1544–1551. [Google Scholar]
  14. Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
  15. Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
  16. Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
  17. Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
  18. Chu, X.; Zhang, B.; Ma, H.; Xu, R.; Li, Q. Fast, accurate and lightweight super-resolution with neural architecture search. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 59–64. [Google Scholar]
  19. Gao, Q.; Zhao, Y.; Li, G.; Tong, T. Image super-resolution using knowledge distillation. In Proceedings of the Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, 2–6 December 2018; Revised Selected Papers, Part II. Springer: Berlin/Heidelberg, Germany, 2019; pp. 527–541. [Google Scholar]
  20. Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
  21. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
  22. Zhang, Y.; Li, K.; Li, K.; Zhong, B.; Fu, Y. Residual non-local attention networks for image restoration. arXiv 2019, arXiv:1903.10082. [Google Scholar]
  23. Chen, H.; Wang, Y.; Guo, T.; Xu, C.; Deng, Y.; Liu, Z.; Ma, S.; Xu, C.; Xu, C.; Gao, W. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 12299–12310. [Google Scholar]
  24. Cheng, G.; Matsune, A.; Du, H.; Liu, X.; Zhan, S. Exploring more diverse network architectures for single image super-resolution. Knowl. Based Syst. 2022, 235, 107648. [Google Scholar] [CrossRef]
  25. Wang, X.; Dong, C.; Shan, Y. Repsr: Training efficient vgg-style super-resolution networks with structural re-parameterization and batch normalization. In Proceedings of the 30th ACM International Conference on Multimedia, Lisbon, Portugal, 10–14 October 2022; pp. 2556–2564. [Google Scholar]
  26. Wan, J.; Yin, H.; Liu, Z.; Chong, A.; Liu, Y. Lightweight image super-resolution by multi-scale aggregation. IEEE Trans. Broadcast. 2020, 67, 372–382. [Google Scholar] [CrossRef]
  27. Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 2024–2032. [Google Scholar]
  28. Fan, Q.; Huang, H.; Zhou, X.; He, R. Lightweight vision transformer with bidirectional interaction. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
  29. Zhou, X.; Huang, H.; Wang, Z.; He, R. Ristra: Recursive image super-resolution transformer with relativistic assessment. IEEE Trans. Multimed. 2024, 26, 6475–6487. [Google Scholar] [CrossRef]
  30. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  31. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
  32. Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
  33. Huang, Z.; Ben, Y.; Luo, G.; Cheng, P.; Yu, G.; Fu, B. Shuffle transformer: Rethinking spatial shuffle for vision transformer. arXiv 2021, arXiv:2106.03650. [Google Scholar]
  34. Vaswani, A.; Ramachandran, P.; Srinivas, A.; Parmar, N.; Hechtman, B.; Shlens, J. Scaling local self-attention for parameter efficient visual backbones. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 12894–12904. [Google Scholar]
  35. Mehta, S.; Rastegari, M. Separable self-attention for mobile vision transformers. arXiv 2022, arXiv:2206.02680. [Google Scholar]
  36. Wang, S.; Li, B.Z.; Khabsa, M.; Fang, H.; Ma, H. Linformer: Self-attention with linear complexity. arXiv 2020, arXiv:2006.04768. [Google Scholar]
  37. Ho, J.; Kalchbrenner, N.; Weissenborn, D.; Salimans, T. Axial attention in multidimensional transformers. arXiv 2019, arXiv:1912.12180. [Google Scholar]
  38. Dong, X.; Bao, J.; Chen, D.; Zhang, W.; Yu, N.; Yuan, L.; Chen, D.; Guo, B. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12124–12134. [Google Scholar]
  39. Wu, S.; Wu, T.; Tan, H.; Guo, G. Pale transformer: A general vision transformer backbone with pale-shaped attention. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 2731–2739. [Google Scholar]
  40. Child, R.; Gray, S.; Radford, A.; Sutskever, I. Generating long sequences with sparse transformers. arXiv 2019, arXiv:1904.10509. [Google Scholar]
  41. Zhao, G.; Lin, J.; Zhang, Z.; Ren, X.; Su, Q.; Sun, X. Explicit sparse transformer: Concentrated attention through explicit selection. arXiv 2019, arXiv:1912.11637. [Google Scholar]
  42. Cai, H.; Gan, C.; Han, S. Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition. arXiv 2022, arXiv:2205.14756. [Google Scholar]
  43. Yuan, K.; Guo, S.; Liu, Z.; Zhou, A.; Yu, F.; Wu, W. Incorporating convolution designs into visual transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 579–588. [Google Scholar]
  44. Guo, J.; Han, K.; Wu, H.; Tang, Y.; Chen, X.; Wang, Y.; Xu, C. Cmt: Convolutional neural networks meet vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12175–12185. [Google Scholar]
  45. Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 22–31. [Google Scholar]
  46. Graham, B.; El-Nouby, A.; Touvron, H.; Stock, P.; Joulin, A.; Jégou, H.; Douze, M. Levit: A vision transformer in convnet’s clothing for faster inference. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 12259–12269. [Google Scholar]
  47. Li, Y.; Zhang, K.; Cao, J.; Timofte, R.; Van Gool, L. Localvit: Bringing locality to vision transformers. arXiv 2021, arXiv:2104.05707. [Google Scholar]
  48. Xiao, T.; Singh, M.; Mintun, E.; Darrell, T.; Dollár, P.; Girshick, R. Early convolutions help transformers see better. Adv. Neural Inf. Process. Syst. 2021, 34, 30392–30400. [Google Scholar]
  49. Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. Transunet: Transformers make strong encoders for medical image segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
  50. Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the Computer Vision–ECCV 2022 Workshops, Tel Aviv, Israel, 23–27 October 2022; Proceedings, Part III. Springer: Berlin/Heidelberg, Germany, 2023; pp. 205–218. [Google Scholar]
  51. Song, Y.; He, Z.; Qian, H.; Du, X. Vision transformers for single image dehazing. arXiv 2022, arXiv:2204.03883. [Google Scholar] [CrossRef]
  52. Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 568–578. [Google Scholar]
  53. Pan, Z.; Zhuang, B.; Liu, J.; He, H.; Cai, J. Scalable vision transformers with hierarchical pooling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 377–386. [Google Scholar]
  54. Heo, B.; Yun, S.; Han, D.; Chun, S.; Choe, J.; Oh, S.J. Rethinking spatial dimensions of vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 11936–11945. [Google Scholar]
  55. Chen, C.F.R.; Fan, Q.; Panda, R. Crossvit: Cross-attention multi-scale vision transformer for image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 357–366. [Google Scholar]
  56. Chen, Y.; Dai, X.; Chen, D.; Liu, M.; Dong, X.; Yuan, L.; Liu, Z. Mobile-former: Bridging mobilenet and transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5270–5279. [Google Scholar]
  57. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  58. Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 22367–22377. [Google Scholar]
  59. Yoo, J.; Kim, T.; Lee, S.; Kim, S.H.; Lee, H.; Kim, T.H. Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution. arXiv 2022, arXiv:2203.07682. [Google Scholar]
  60. Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi-Morel, M.L. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the 23rd British Machine Vision Conference (BMVC), Surrey, UK, 3–7 September 2012. [Google Scholar]
  61. Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the Curves and Surfaces: 7th International Conference, Avignon, France, 24–30 June 2010; Revised Selected Papers 7. Springer: Berlin/Heidelberg, Germany, 2012; pp. 711–730. [Google Scholar]
  62. Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, ICCV 2001, Vancouver, BC, Canada, 7–14 July 2001; IEEE: Piscataway, NJ, USA, 2001; Volume 2, pp. 416–423. [Google Scholar]
  63. Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar]
  64. Matsui, Y.; Ito, K.; Aramaki, Y.; Fujimoto, A.; Ogawa, T.; Yamasaki, T.; Aizawa, K. Sketch-based manga retrieval using manga109 dataset. Multimed. Tools Appl. 2017, 76, 21811–21838. [Google Scholar] [CrossRef]
  65. Kim, J.; Lee, J.K.; Lee, K.M. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1637–1645. [Google Scholar]
  66. Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
  67. Tai, Y.; Yang, J.; Liu, X.; Xu, C. Memnet: A persistent memory network for image restoration. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4539–4547. [Google Scholar]
  68. Li, Z.; Yang, J.; Liu, Z.; Yang, X.; Jeon, G.; Wu, W. Feedback network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3867–3876. [Google Scholar]
  69. Hui, Z.; Wang, X.; Gao, X. Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 723–731. [Google Scholar]
  70. Ahn, N.; Kang, B.; Sohn, K.A. Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 252–268. [Google Scholar]
  71. Wang, L.; Dong, X.; Wang, Y.; Ying, X.; Lin, Z.; An, W.; Guo, Y. Exploring sparsity in image super-resolution for efficient inference. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 4917–4926. [Google Scholar]
  72. Chen, H.; Gu, J.; Zhang, Z. Attention in attention network for image super-resolution. arXiv 2021, arXiv:2104.09497. [Google Scholar]
  73. Choi, H.; Lee, J.; Yang, J. N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution. arXiv 2022, arXiv:2211.11436. [Google Scholar]
  74. Liu, C.; Lei, P. An efficient group skip-connecting network for image super-resolution. Knowl. Based Syst. 2021, 222, 107017. [Google Scholar] [CrossRef]
  75. Esmaeilzehi, A.; Ahmad, M.O.; Swamy, M. FPNet: A Deep Light-Weight Interpretable Neural Network Using Forward Prediction Filtering for Efficient Single Image Super Resolution. IEEE Trans. Circuits Syst. II Express Briefs 2021, 69, 1937–1941. [Google Scholar] [CrossRef]
  76. Gu, J.; Dong, C. Interpreting super-resolution networks with local attribution maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 9199–9208. [Google Scholar]
  77. Agustsson, E.; Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 126–135. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.