Next Article in Journal
A Predator–Prey System with a Modified Leslie–Gower and Prey Stage Structure Scheme in Deterministic and Stochastic Environments
Previous Article in Journal
Dynamics of a Predator–Prey System with Impulsive Stocking Prey and Nonlinear Harvesting Predator at Different Moments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MambaSR: Arbitrary-Scale Super-Resolution Integrating Mamba with Fast Fourier Convolution Blocks

1
School of Computer Science and Engineering, Macau University of Science and Technology, Macao 999078, China
2
Computer Engineering Technical College (Artificial Intelligence College), Guangdong Polytechnic of Science and Technology, Zhuhai 519090, China
3
School of Mathematics and Statistics, Shaoguan University, Shaoguan 512005, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(15), 2370; https://doi.org/10.3390/math12152370 (registering DOI)
Submission received: 25 June 2024 / Revised: 25 July 2024 / Accepted: 29 July 2024 / Published: 30 July 2024

Abstract

:
Traditional single image super-resolution (SISR) methods, which focus on integer scale super-resolution, often require separate training for each scale factor, leading to increased computational resource consumption. In this paper, we propose MambaSR, a novel arbitrary-scale super-resolution approach integrating Mamba with Fast Fourier Convolution Blocks. MambaSR leverages the strengths of the Mamba state-space model to extract long-range dependencies. In addition, Fast Fourier Convolution Blocks are proposed to capture the global information in the frequency domain. The experimental results demonstrate that MambaSR achieves superior performance compared to different methods across various benchmark datasets. Specifically, on the Urban100 dataset, MambaSR outperforms MetaSR by 0.93 dB in PSNR and 0.0203 dB in SSIM, and on the Manga109 dataset, it achieves an average PSNR improvement of 1.00 dB and an SSIM improvement of 0.0093 dB. These results highlight the efficacy of MambaSR in enhancing image quality for arbitrary-scale super-resolution.

1. Introduction

Single image super-resolution (SISR) is a fundamental task in computer vision, aimed at reconstructing high-resolution (HR) images from low-resolution (LR) inputs [1]. This problem is inherently ill posed, as defined by Hadamard, due to the loss of information when an image is downsampled [2]. According to the Hadamard definition, a problem is well posed if a solution exists, the solution is unique, and the solution’s behavior changes continuously with the initial conditions. Since SISR often does not meet these criteria, it is classified as an inverse ill-posed problem. Image super-resolution has a wide range of applications in various fields, such as face recognition [3], medical imaging [4,5], satellite imagery [6], and video surveillance [7]. According to Yang et al. [8], super-resolution methods can be roughly categorized into prediction methods [9], edge-based methods [10], statistical methods [11,12], patch-based methods [13,14,15], and deep learning methods [16]. Although traditional methods such as Kalman filters [12] have achieved some success in the field of image super-resolution, they have limitations when dealing with complex and large-scale data. These methods rely on manual feature extraction and predefined models, making it difficult to cope with diverse and complex image data. With the development of computational power and big data, deep learning methods are rapidly emerging in the field of image super-resolution. In this article, the methods that employ deep learning methods are the focus. Recent advancements in deep learning have significantly improved the performance of SISR, primarily through the development of convolutional neural networks (CNNs) and various upsampling techniques [17]. However, traditional SISR deep learning methods often focus on integer scale (e.g., ×2, ×3, or ×4) super-resolution, which limits their performance in practical applications where arbitrary-scale upsampling is required. In addition, traditional SISR deep learning methods require separate training for each integer scale, resulting in each model being trained at least three times (×2, ×3, ×4). This consumes substantial computational resources and time.
Arbitrary-scale super-resolution (ASSR) addresses this limitation by allowing for flexible and continuous scaling factors, thus providing a more general solution for real-world applications. For example, ASSR is typically trained with a random input of images magnified one to four times, so it only needs to be trained once. Furthermore, in the context of display applications, the utilization of ASSR enables generating a HR image in any size input. Additionally, the ability to zoom in on an image freely renders ASSR a valuable tool for discerning details in tasks such as face recognition. Several approaches have been proposed to tackle ASSR, including Meta-SR by Hu et al. [18], which employed meta-learning to dynamically predict the weights of upscaling filters based on the input scale factor, and the Learning Implicit Image Function (LIIF) framework by Chen et al. [19], which represents images as continuous functions to allow for flexible upscaling.
However, there are still challenges in achieving high-quality ASSR. The most commonly employed ASSR methods are based on CNNs. Although CNNs can effectively extract local features, they have difficulty in capturing the global context and long-range dependencies [20]. The super-resolution image will be limited by their local receptive field.
Recently, structured state-space models (S4) [21,22] inspired by classical state-space models have gained significant interest for their outstanding ability to model long-range dependencies. Fundamentally, these models can be understood as a hybrid of CNNs and recurrent neural networks (RNNs). Moreover, Mamba [23], a state-of-the-art selective structured state-space model, can model better long-range dependencies in natural language processing (NLP). This implies that Mamba-based ASSR networks can inherently capture global context and long-range dependencies, thereby enhancing the reconstruction quality. However, the potential of state-space models (SSMs) in ASSR networks has not been fully studied. Given the impressive efficiency and powerful long-range dependency modeling capabilities of SSMs, we attempted to employ an SSM in ASSR networks to explore the potential of Mamba for achieving efficient long-range modeling. More specifically, we introduce MambaSR, a novel SSM-based framework for arbitrary-scale super-resolution that leverages Mamba, an innovative sequence modeling technique, in conjunction with Fast Fourier Convolution (FFC) blocks to capture frequency information. MambaSR is designed to efficiently handle arbitrary scaling factors while maintaining high-quality reconstruction. The core contributions of this work are as follows:
  • To the best of our knowledge, this is the pioneering research effort that seeks to apply an SSM to arbitrary-scale super-resolution and demonstrates its effectiveness;
  • We introduce the Residual Fast Fourier Transform State-Space Block (RFFTSSB), which combines the strengths of Vision State-Space Modules (VSSM) and Fast Fourier Transform Convolutional Blocks (FFTConv) to enhance features by leveraging both spatial and frequency domain information;
  • We conduct extensive experiments to evaluate the performance of MambaSR, demonstrating its superiority over existing methods in terms of both visual comparisons and quantitative metrics.

2. Related Work

2.1. Arbitrary-Scale Super-Resolution

Arbitrary-scale single image super-resolution (SISR) improves flexibility by supporting both integer and non-integer scale factors, addressing the limitations of traditional SISR methods. The challenge of arbitrary-scale super-resolution has garnered attention due to its practical importance in various real-world applications, such as security surveillance, medical imaging, and satellite imagery. Meta-SR, proposed by Hu et al. [18], introduced a novel approach to arbitrary-scale super-resolution by leveraging meta-learning principles. Unlike traditional SR methods that require separate models for different scaling factors, Meta-SR employs a single model capable of handling any scaling factor. This is achieved through a Meta-Upscale Module that dynamically predicts the weights of the upscaling filters based on the input scale factor. This method ensures efficient computation and practical scalability, as it eliminates the need for storing multiple models for different scales. The experimental results demonstrated the superiority of Meta-SR over traditional methods in both performance and computational efficiency. The Learning Implicit Image Function (LIIF) framework by Chen et al. [19] extended the concept of arbitrary-scale SR by representing images as a continuous function. This method allows for flexible and continuous upscaling, addressing the limitations of discrete scaling factors in traditional approaches. LIIF utilizes implicit neural representations to infer pixel values at arbitrary coordinates, providing high-quality SR outputs across various scales. The integration of continuous image representation techniques makes LIIF a robust solution for tasks requiring flexible zooming capabilities. LTE, or Learning Texture Encoding, introduced by Jin et al. [24], focuses on enhancing SR performance by explicitly learning texture information. The LTE framework incorporates a texture encoder–decoder structure that captures high-frequency details, which are crucial for high-quality SR. By learning texture priors, LTE effectively reconstructs fine details and textures, outperforming conventional SR methods that often struggle with texture synthesis. This approach highlights the importance of texture information in achieving superior SR results. The Super-Resolution Neural Operator (SRNO) by Li et al. [25] leverages the concept of neural operators to address the arbitrary-scale SR problem. SRNO introduces a neural operator framework that learns mappings between function spaces, allowing for scalable and efficient SR. This method utilizes a hierarchical structure to process images at multiple scales, ensuring that both global structure and local details are well preserved. The neural operator framework provides a flexible and powerful tool for SR, capable of adapting to various scaling requirements with high fidelity.

2.2. State-Space Models

In recent advancements, state-space models (SSMs) [21,22] have demonstrated significant potential in diverse applications. For instance, a state-space model for a continuous reheating furnace was developed using the finite volume method, optimizing energy efficiency and heating quality [26]. A semi-complete data augmentation algorithm was introduced to enhance state-space model fitting efficiency by combining data augmentation with numerical integration [27]. Similarly, a state-space model for a micro-high-temperature gas-cooled reactor (Mi-HTR) with a helium Brayton cycle was developed, demonstrating accuracy under various disturbances [28]. Furthermore, SSMs were applied to monitor multistage healthcare processes, integrating machine learning techniques with statistical control charts to detect anomalies early in surgical outcomes [29].
In addition, state-space models have been extensively studied in the context of sequence modeling due to their ability to capture long-range dependencies effectively. Gu et al. [21] proposed the structured state-space (S4) model, which addressed the computational inefficiencies of traditional SSMs. By introducing a novel parameterization for the SSM, S4 achieved significant improvements in handling long sequences, as demonstrated by its performance on the Long Range Arena (LRA) benchmark. Building on the foundations laid by S4, Goel et al. [30] introduced the Simplified State-Space Layer (S5), which further streamlined the computational process by utilizing a single multi-input, multi-output (MIMO) SSM and efficient parallel scans. This resulted in a model that maintained the theoretical strengths of S4 while being more efficient and easier to implement. The Gated State-Space (GSS) model, presented by Gu et al. [31], leverages the advantages of SSMs in language modeling. By recasting the model as a convolution with a large kernel, the GSS achieves significant performance gains in tasks requiring the integration of information from distant parts of the input. Finally, the Mamba model [23] explores hardware-aware state expansion techniques to optimize the execution of SSMs. By introducing a selective state-space model (S6) that adapts to input-dependent dynamics, Mamba effectively balances computational efficiency and performance, making it suitable for deployment in resource-constrained environments. Following the success of SSMs in modeling long sequences, researchers began exploring their application in computer vision tasks. For example, VMamba [32] and Vim [33] incorporate an innovative vision backbone based on Mamba. As a result of the remarkable performance in visual tasks, researchers actively explored its applications across different fields including image classification [32,33], medical image segmentation [34,35,36,37], and others [38,39,40]. Therefore, this paper proposes a super-resolution Mamba model to explore the potential of Mamba for arbitrary-scale super-resolution.

3. Method

3.1. Preliminaries

Recent advancements in SSM-based frameworks, such as the structured state-space sequence model (S4) and Mamba, are founded on a traditional continuous system to map a unidimensional input function or sequence, designated as a ( u ) R , through an implicit latent state b ( u ) R M to an output c ( u ) R . This framework can be characterized using a linear Ordinary Differential Equation (ODE) [23]:
b ( u ) = D b ( u ) + E a ( u ) , c ( u ) = F b ( u ) .
where D R M × M is the state matrix, and E R M × 1 and F R 1 × M are the projection parameters. More details can be found in the statements in Mamba [23].
Subsequently, the discretization procedure is applied for deep learning purposes by incorporating a timescale parameter Λ to transform D and E into their discrete forms D ¯ and E ¯ using a predetermined discretization rule. The zero-order hold (ZOH) technique is typically utilized for this discretization, and it can be formulated as follows:
D ¯ = exp ( Λ D ) , E ¯ = ( Λ D ) 1 ( exp ( Λ D ) I ) · Λ E .
After the discretization process, a k is applied instead of a continuous input signal a ( u ) . Equation (1) with a time interval Λ can be reformulated as:
b k = D ¯ b k 1 + E ¯ a k , c k = F b k .
As a result, Equation (3) can be mathematically interpreted as a convolution operation:
H ¯ = ( F E ¯ , F D ¯ E ¯ , , F ( D ¯ L 1 ) E ¯ ) , c = a H ¯ ,
where H ¯ R L is a structured convolutional kernel, and L represents the length of the input sequence a. The symbol ⊛ denotes the convolution operation.
Recent enhancements to the Mamba state-space model have improved its ability to support dynamic feature representation, making E ¯ , F , and Λ adaptive to input variations. Mamba’s methodology for image super-resolution leverages the strengths of the S4 model. Mamba employs the same recursive structure as outlined in Equation (3), facilitating the retention of extremely long sequences and the activation of additional pixels for reconstruction. Furthermore, Mamba benefits from a parallel scan algorithm, as described in Equation (4), which facilitates efficient parallel processing and training.

3.2. MambaSR

The proposed MambaSR network mainly consists of three parts: a Feature Representor ( F r e p r e s e n t o r ), a Feature Enhancer ( F e n h a n c e ), and a Feature Reconstructor ( F r e c o n s t r u c t ).
These three parts are illustrated in Figure 1, which shows the process of scaling the input from (3,h,w) to (3,h*2,w*2). The Feature Representor is used to obtain the shallow features from the low-resolution (LR) input image, which can be deployed with a feature extraction network, such as the enhanced deep SR network (EDSR) [41], and residual dense network (RDN) [42]. The second part is designed for feature enhancement, which includes several Residual Fast Fourier Transform State-Space Units (RFFTSSUs) to enhance the features extracted from the first part and facilitate the integration of contextual information from various sources and perspectives. Finally, the Feature Reconstructor reconstructs the high-resolution (HR) image from the enhanced features.
The LR input image I L R is first fed into an encoder to obtain the initial feature representation:
F 0 = F r e p r e s e n t o r ( I L R ) ,
where F 0 represents the extracted shallow features and F r e p r e s e n t o r corresponds to the feature extraction network, either EDSR or RDN.
The extracted features F 0 are then enhanced using the proposed RFFTSSU. Each RFFTSSU consists of several Residual Fast Fourier Transform State-Space Blocks (RFFTSSB). In the Feature Enhancer, there are two RFFTSSUs. The enhanced features are represented as:
F e n h a n c e d = R F F T S S U ( R F F T S S U ( F 0 ) ) .
After passing through the i RFFTSSB, the features are processed by a 3 × 3 convolutional layer C o n v 3 × 3 and then added element-wise to the original features F 0 :
R F F T S S U ( F 0 ) = C o n v 3 × 3 R F F T S S B i R F F T S S B 2 R F F T S S B 1 ( F 0 ) + F 0 ,
where i is equal to six because there are six RFFTSSBs in the RFFTSSU.
The enhanced features F e n h a n c e d are then magnified [25] and passed through a series of convolutional operations and a multi-head attention mechanism to reconstruct the final HR image. First, the magnified features are passed through a 1 × 1 convolutional layer:
F m a g n i f i e d = M a g n i f i c a t i o n ( F e n h a n c e d ) ,
F c o n v 1 = C o n v 1 × 1 ( F m a g n i f i e d ) ,
followed by a multi-head attention mechanism and a convolutional block consisting of two 1 × 1 convolutions:
F a t t n = M u l t i H e a d A t t e n t i o n ( F c o n v 1 ) ,
F c o n v 2 = C o n v 1 × 1 ( F a t t n ) ,
F c o n v 3 = C o n v 1 × 1 ( F c o n v 2 ) .
The reconstructed features are then combined with the bilinearly interpolated LR input to produce the final output:
I S R = F c o n v 3 + I L R u p ,
where I L R u p denotes the bilinearly interpolated LR input.
In summary, the MambaSR network effectively enhances and reconstructs high-resolution images through its three-part architecture, leveraging the innovative RFFTSSU to improve feature representation and contextual information integration.

3.3. Residual Fast Fourier Transform State-Space Block (RFFTSSB)

As shown in Figure 2, the Residual Fast Fourier Transform State-Space Block (RFFTSSB) is a critical component as a Feature Enhancer of our MambaSR network. It is designed to enhance features by leveraging both spatial and frequency domain information. The RFFTSSB consists of two main parts: the Vision State-Space Module (VSSM) and the Fast Fourier Transform Convolutional Block (FFTConv).
The input features F i n p u t first pass through a Layer Normalization layer, followed by the VSSM [32], which can extract the spatial long-term dependencies. The output of the VSSM is added element-wise to the input features scaled by a factor. The resulting features are then passed through another Layer Normalization layer and processed by the FFTConv. Finally, the output of the FFTConv is added element-wise to the original input features to form the residual connection. This process can be summarized as follows:
V n = F v s s m ( L a y e r N o r m ( F R n ) ) + α F R n ,
F R n + 1 = F f f t c o n v ( F f f t c o n v ( L a y e r N o r m ( V n ) ) ) + β V n ,
where V n R H × W × C is the input feature at the n-th layer, F v s s m and F f f t c o n v denote the functions representing the operations of the VSSM and the FFTConv, respectively. The terms α , β R C are learnable parameters that scale the features to modulate the importance of the residual connections. And, n is from 0 to 5.
The variables H, W, and C represent the height, width, and number of channels of the feature map, respectively. The LayerNorm function denotes the Layer Normalization operation, which normalizes the input features across the channel dimension to stabilize and accelerate the training process. The element-wise addition operations are crucial for incorporating the residual connections that help in preserving the original features while enhancing them with the extracted spatial and frequency domain information.

3.4. Vision State-Space Module (VSSM)

The Vision State-Space Module (VSSM) captures spatial dependencies and consists of several layers. It is designed to extract and enhance spatial long-term dependencies within the input features through a series of transformations and operations.
The input features F i n p u t are first passed through a Linear layer, transforming the feature dimensions. This is followed by a Depthwise Convolution (DWConv) layer, which performs spatial filtering to capture local spatial information. The output of the DWConv layer is then processed by a SiLU activation function, which introduces non-linearity and helps in better feature representation. Next, the features are passed through a 2D Selective Scan (2D-SSM) module, which selectively scans and aggregates spatial information from the feature maps.
After the 2D-SSM, the output is normalized using a Layer Normalization layer to ensure stable training and better convergence. The normalized output is added element-wise to a linearly transformed version of the original input features. Finally, the combined features are processed by another Linear layer to produce the output of the VSSM. This process can be summarized with the following equations:
F v 1 = L a y e r N o r m ( 2 D-SSM ( S i L U ( D W C o n v ( L i n e a r 1 ( F i n p u t ) ) ) ) ) ,
F v 2 = S i L U ( L i n e a r 2 ( F i n p u t ) ) ,
F o u t p u t = L i n e a r 3 ( F v 1 F v 2 ) ,
where F i n p u t R H × W × C represents the input feature map, L i n e a r 1 , L i n e a r 2 , and L i n e a r 3 denote the linear transformations applied at different stages of the module, D W C o n v represents the Depthwise Convolution operation, and ⊙ is the Hadamard product.
Equation (16) combines several transformations into a single operation that enhances the features by integrating spatial information. The addition operation in this equation is crucial for incorporating the residual connection, which helps in preserving the original features while enhancing them. Then, Equation (18) represents the final linear transformation that refines the enhanced features to produce the output of the VSSM, ready for further processing in the network.

3.5. 2D Selective Scan Module

Mamba [23] (the selective scan space-state sequence model (S6)) handles input data in a sequential manner, limiting their capability to extract information exclusively from the scanned data segment. Although this method suits natural language processing tasks due to their inherent sequential structure, it faces significant challenges when dealing with non-sequential data like images. To address this issue, we implement the 2D Selective Scan module (2D-SSM) as proposed in [32]. The 2D-SSM model is based on the selective scan space-state sequence model (S6), addressing the issue of direction sensitivity that was identified as a limitation of the S6 model.
As illustrated in Figure 3, this module transforms 2D image features into a 1D sequence by scanning in four different directions: from top-left to bottom-right, bottom-right to top-left, top-right to bottom-left, and bottom-left to top-right. The S6 block is employed to extract features from all sequences, facilitating comprehensive scanning of information from diverse directions. These sequences are eventually merged through summation and reshaped to reconstruct the original 2D structure, allowing a comprehensive representation of the spatial data.

3.6. FFTConv Block

In Figure 2, the FFTConv block can be divided into the frequency branch and the spatial branch. The FFTConv block enhances the features by transforming them into the frequency domain, applying convolutions, and then transforming them back into the spatial domain. The process begins with the input features being passed through a convolutional layer followed by a ReLU activation function:
F c o n v _ r e l u 1 = R e L U ( C o n v 1 × 1 ( F i n p u t ) ) .
The activated features are then transformed into the frequency domain using the Real FFT2D:
F f f t = R e a l F F T 2 D ( F c o n v _ r e l u 1 ) .
The Real 2D Fast Fourier Transform (Real FFT2D) converts spatial domain features into the frequency domain. For a 2D signal x ( m , n ) , the Real FFT2D is defined as:
X ( k , l ) = m = 0 M 1 n = 0 N 1 x ( m , n ) e j 2 π k m M + l n N ,
where X ( k , l ) represents the frequency domain representation of x ( m , n ) , and M and N are the dimensions of the input signal.
In the frequency domain, the features are processed through another convolution and then activated by a ReLU function:
F c o n v _ r e l u 2 = R e L U ( C o n v 1 × 1 ( F f f t ) ) .
The convolved features are then transformed back to the spatial domain using the Inverse Real FFT2D:
F i f f t = I n v R e a l F F T 2 D ( F c o n v _ r e l u 2 ) .
The Inverse Real 2D Fast Fourier Transform (Inv Real FFT2D) converts frequency domain features back into the spatial domain. The inverse transform is defined as:
x ( m , n ) = 1 M N k = 0 M 1 l = 0 N 1 X ( k , l ) e j 2 π k m M + l n N ,
where x ( m , n ) represents the reconstructed spatial domain signal, and X ( k , l ) is the frequency domain representation.
The spatial features are further processed by a convolutional layer:
F c o n v 2 = C o n v 1 × 1 ( F i f f t ) .
Therefore, the output of the frequency branch can be formulated as:
F f r e = C o n v 1 × 1 ( F c o n v 2 + F c o n v _ r e l u 2 ) .
At the same time, the input data are fed into the spatial branch, which can be expressed as follows:
F c o n v _ s p a 1 = C o n v 3 × 3 ( R e L U ( C o n v 3 × 3 ( F i n p u t ) ) ) .
These features are then refined using an ECA (Efficient Channel Attention) layer [43], which implements local cross-channel interactions using one-dimensional convolution to extract inter-channel dependencies:
F s p a = E C A L a y e r ( F c o n v _ s p a 1 ) .
Eventually, the outputs of the frequency domain branch and the spatial domain branch are combined, and a convolutional layer is employed to integrate the outputs.:
F f f t c o n v _ o u t = C o n v 1 × 1 ( F s p a + F f r e ) .

4. Experiment

In this section, the performance of the model was evaluated through extensive experimentation on a continuous scale in four SR benchmark datasets with other arbitrary-scale SR methods. The details of the experimental setup, datasets, and evaluation metrics are introduced. In the end, ablation experiments were conducted to verify the effectiveness of our component.

4.1. Experimental Setup

Following the setup in previous works [19,24], a batch size of 64 and a low-resolution input size of 48 × 48 were employed in the training. To augment the training dataset, random rotation and horizontal flipping was adopted. The Adam optimizer [44] was used, with a learning rate of 4 × 10 5 while utilizing the L1 loss function. All model training was conducted for 1000 epochs, with the learning rate decaying according to the cosine annealing schedule, following a 50-epoch warm-up phase. Using the same input size, we replicated the SRNO with an input size of 48 × 48 to compare performance.

4.2. Dataset and Evaluation Metrics

The images used for training were obtained from the DIV2K dataset [45], which includes 1000 images at a 2K resolution, as is described in [25]. Additionally, the performance of the model on the validation sets Set5 [46], Set14 [47], Urban100 [48], and Manga109 [49] was evaluated using continuous scales, including peak signal-to-noise ratio (PSNR) and structural similarity index measurement (SSIM) values.
The PSNR was employed to quantify the quality between the super-resolution images and their original high-resolution images. It is commonly used in the field of image super-resolution and compression. PSNR is defined as:
PSNR = 10 · log 10 MAX 2 MSE ,
where MAX represents the maximum possible pixel value of the image. For example, in an 8-bit image, MAX is 255. The term MSE stands for Mean Squared Error, which is calculated as:
MSE = 1 m n i = 0 m 1 j = 0 n 1 I ( i , j ) K ( i , j ) 2 .
In this formula, m and n are the dimensions of the image, I ( i , j ) is the pixel value at position ( i , j ) in the original image, and K ( i , j ) is the pixel value at position ( i , j ) in the reconstructed image.
When the images are perfectly matched (i.e., MSE = 0), the reconstructed image is exactly the same as the original image. In this case, the PSNR tends to infinity because the logarithmic function tends to infinity as the input tends to zero. Therefore, theoretically, the PSNR can reach infinity.
When the MSE is large, the difference between images is significant, and the PSNR value becomes very low. When the image difference reaches its maximum (i.e., the reconstructed image and the original image are completely uncorrelated), the MSE reaches its maximum value (for an 8-bit image, the maximum possible value is 255 2 ), and the PSNR value approaches 0.
Consequently, for the value of the MSE in Equation (31), the range is from 0 to 255 2 . For the value of the PSNR in Equation (30), the range is from 0 to infinity. A higher PSNR value generally indicates better quality, as it implies that the reconstructed image is closer to the original.
The structural similarity index measurement (SSIM) is another important metric used to evaluate the quality of images. Unlike the PSNR, which primarily focuses on pixel differences, the SSIM considers changes in structural information and perceptual quality. The SSIM is computed by combining three comparison measurements between the images: luminance, contrast, and structure.
First, the luminance comparison function is defined as:
l ( x , y ) = 2 μ x μ y + C 1 μ x 2 + μ y 2 + C 1 ,
where μ x and μ y are the average pixel values of images x and y, respectively. The constant C 1 is introduced to avoid instability when the denominator is very close to zero. This ratio ranges between 0 and 1. When μ x = μ y , the luminance similarity reaches its maximum value of 1.
Next, the contrast comparison function is given by:
c ( x , y ) = 2 σ x σ y + C 2 σ x 2 + σ y 2 + C 2 ,
where σ x and σ y represent the standard deviations of x and y. Similar to C 1 , the constant C 2 stabilizes the division. This ratio also ranges between 0 and 1. When σ x = σ y , the contrast similarity reaches its maximum value of 1.
Finally, the structure comparison function is expressed as:
s ( x , y ) = σ x y + C 3 σ x σ y + C 3 ,
where σ x y denotes the covariance between x and y. The constant C 3 is typically chosen to be C 2 / 2 for simplicity. This ratio similarly ranges between 0 and 1. When σ x y = σ x σ y , the structure similarity reaches its maximum value of 1.
Combining these three components, the overall SSIM index is calculated as:
SSIM ( x , y ) = [ l ( x , y ) ] α · [ c ( x , y ) ] β · [ s ( x , y ) ] γ ,
where α , β , and γ are parameters used to adjust the relative importance of each component. Commonly, α = β = γ = 1 , which simplifies the SSIM index to:
SSIM ( x , y ) = ( 2 μ x μ y + C 1 ) ( 2 σ x y + C 2 ) ( μ x 2 + μ y 2 + C 1 ) ( σ x 2 + σ y 2 + C 2 ) .
The constants C 1 and C 2 are defined as:
C 1 = ( K 1 L ) 2 and C 2 = ( K 2 L ) 2 ,
where L is the dynamic range of the pixel values (for an 8-bit image, L = 255 ), and K 1 and K 2 are small constants (typically, K 1 = 0.01 and K 2 = 0.03 ). The SSIM index ranges from 0 to 1, where a value of 1 indicates perfect structural similarity.

4.3. Results

In this section, the performance of the proposed MambaSR model is compared with advanced arbitrary-scale super-resolution (SR) methods, such as MetaSR, LIIF, LTE, and SRNO. Each method was evaluated on the Urban100, Manga109, Set5, and Set14 datasets. During the training process, the MambaSR model was trained on low-resolution (LR) datasets with various scale factors. As shown in Table 1, Table 2, Table 3 and Table 4, the performance of each method was quantitatively evaluated using PSNR and SSIM metrics with various scale factors. The experiments were conducted using two different encoders: EDSR and RDN.
To further illustrate the effectiveness of our proposed MambaSR model, the visual results of the super-resolution images reconstructed by different methods are presented in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11. The red square in the figure represents the position of the details that have been cropped from the super-resolution image. The best results have been marked in red and the upward arrow in the figure indicates that higher values correspond to better quality. Figure 4 and Figure 5 show the visual results of MambaSR and other models based on EDSR for the Urban100 dataset, where MambaSR demonstrates superior detail preservation and texture reconstruction in urban scenes, capturing intricate structural elements more effectively in images like “img004.png” and “img015.png”. Figure 6 presents the visual results for the Manga109 dataset using EDSR, where MambaSR excels in maintaining line clarity and edge sharpness in manga illustrations, as seen in “img011.png”, avoiding the blurring common with other methods.
Figure 7 displays results from the Set14 dataset using EDSR, with the “barbara.png” image highlighting MambaSR’s robustness in handling natural images by effectively reconstructing fine textures and minimizing artifacts. In Figure 8, showing the results for the Set5 dataset using EDSR, the “woman.png” image illustrates MambaSR’s ability to maintain high visual quality and sharpness across different visual contexts. Figure 9 provides visual results for the Urban100 dataset using RDN, where MambaSR continues to show its superiority in urban scenes, preserving fine details and structural elements better than other models in “img096.png”. Figure 10 demonstrates the model’s performance on the Manga109 dataset using RDN, with MambaSR maintaining line clarity and edge sharpness in manga illustrations like “img060.png”, outperforming other models. Finally, Figure 11 presents the results for the Set14 dataset using RDN, where MambaSR delivers superior visual quality in the “zebra.png” image, providing clear textures and minimizing artifacts more effectively than other methods. The presented visual results demonstrate MambaSR’s capacity to reconstruct high-quality super-resolution images across a range of datasets and contexts.

4.4. Ablation Study

To evaluate the impact of the FFTConv block in our proposed MambaSR model, we conducted ablation experiments, as shown in Table 5. The experiments were conducted on four benchmark datasets: Set5, Set14, Urban100, and Manga109. We compared the performance of the full MambaSR model with a variant where the FFTConv block was removed. The performance metrics used for evaluation were the PSNR and SSIM. The inclusion of the FFTConv block consistently improved the performance across all datasets. Specifically, the PSNR increased by 0.10 dB on Set5, 0.04 dB on Set14, 0.11 dB on Urban100, and 0.22 dB on Manga109. Similarly, the SSIM saw improvements of 0.0010, 0.0011, 0.0032, and 0.0019 respectively. These results underscore the importance of the FFTConv block in our architecture, validating its role in achieving superior super-resolution performance by effectively leveraging both spatial and frequency domain information.

5. Discussion

The results presented in this paper highlight the significant progress that has been made by MambaSR in the field of arbitrary-scale super-resolution (ASSR). The Mamba state-space model and Fast Fourier Convolution (FFC) blocks are utilized in MambaSR to effectively address several inherent limitations of traditional SISR methods. The integration of Mamba facilitates the extraction of long-range dependencies, which are important for preserving intricate details and textures across varying scales. Moreover, the FFC blocks adeptly handle global frequency domain information, enhancing the overall reconstruction quality.
One of the standout features of MambaSR is its ability to perform well across different datasets, including Urban100 and Manga109, where it demonstrates a clear superiority over existing methods such as MetaSR and LIIF. The performance gains, particularly the improvement in PSNR and SSIM values, highlight the model’s robustness. These improvements can be attributed to the innovative combination of spatial and frequency domain processing, which allows MambaSR to maintain a high quality in image reconstruction regardless of the scale factor.
Moreover, the proposed Residual Fast Fourier Transform State-Space Block (RFFTSSB) plays a pivotal role in enhancing feature representation by seamlessly integrating spatial and frequency domain information. This dual-domain approach ensures that the enhanced features retain critical contextual information, leading to superior visual and quantitative results. The ablation studies further validate the effectiveness of the RFFTSSB, confirming its contribution to the overall performance of MambaSR.
Additionally, a comparative analysis of Normalized Cross Correlation (NCC) and Normalized Absolute Error (NAE) across four datasets (Set5, Set14, Urban100, and Manga109) at scaling factors of 2, 3, and 4 revealed the superior performance of MambaSR. As highlighted in Table 6, MambaSR consistently achieved the best performance metrics. NCC values ranged from −1 to 1, where higher values indicate better similarity between the super-resolved and original images. NAE values ranged from 0 to infinity, where lower values indicate smaller differences between the super-resolved and original images. Notably, MambaSR’s ability to achieve higher NCC and lower NAE across all tested datasets and scaling factors underscores its robustness and efficacy. These results further substantiate the model’s advantage in maintaining a high reconstruction quality and accurate feature representation across various conditions.
Furthermore, in a comprehensive performance evaluation among different super-resolution models on the Urban100 dataset using identical hardware configurations, as shown in Table 7, the EDSR-MambaSR model demonstrated superior efficacy. The experiments were conducted on a server equipped with an NVIDIA V100 GPU with 32 GB memory, 640 GB RAM, and 80 CPU cores. In particular, the EDSR-MambaSR method demonstrated the highest PSNR at 26.90 dB, indicating a markedly superior reconstruction quality in comparison to the other models. Although its runtime of 40.57 s was not the shortest, the model effectively balanced high-quality output with reasonable computational demands, surpassing other models like EDSR-LIIF and EDSR-LTE, which exhibited longer runtimes and lower PSNR values.
In particular, the performance comparison of different super-resolution models on the Urban100 dataset, processed with Gaussian blur (kernel size 5 × 5, standard deviation 0.5), Gaussian noise (standard deviation 0.08), and bicubic downsampling at a scaling factor of 4, indicates that MambaSR consistently outperformed other models shown in Table 8. It achieved the highest PSNR and SSIM values across all scaling factors, demonstrating superior performance in image super-resolution under challenging conditions. This emphasizes the robustness and effectiveness of the MambaSR model in handling degraded image inputs, which more closely resemble real-world images.

6. Conclusions

In this paper, we introduce MambaSR, a pioneering approach to arbitrary-scale super-resolution leveraging the innovative Mamba state-space model combined with Fast Fourier Convolution Blocks (FFTConv). MambaSR addresses the challenges of traditional SISR methods by enabling flexible and continuous scaling factors, which provide a more versatile solution for real-world applications. The core innovation lies in Mamba’s ability to dynamically represent features and capture long-range dependencies through efficient parallel processing.
Our extensive experiments on benchmark datasets such as Set5, Set14, Urban100, and Manga109 validated the superior performance of MambaSR over existing advanced methods. Specifically, MambaSR demonstrated a notable PSNR improvement of 0.93 dB and an SSIM enhancement of 0.0203 dB on the Urban100 dataset compared to MetaSR. On the Manga109 dataset, MambaSR achieved an average PSNR increase of 1.00 dB and an SSIM improvement of 0.0093 dB, underscoring its effectiveness in producing high-quality super-resolved images.
The integration of the FFTConv block further enhances MambaSR’s capability by effectively combining spatial and frequency domain information, resulting in improved feature representation and image reconstruction quality. This study not only showcases the potential of Mamba in advancing the field of arbitrary-scale super-resolution but also sets the stage for future research to optimize and extend the application of the MambaSR architecture across diverse domains.
Future work will focus on refining the MambaSR architecture to further improve its efficiency and exploring its application in other areas, such as medical imaging and video surveillance, where high-quality image reconstruction is critical.

Author Contributions

Conceptualization, J.Y. and Z.C.; methodology, J.Y. and Z.P.; writing—original draft: J.Y.; formal analysis, Z.P.; software: J.Y.; writing—review and editing: J.Y., Z.C., Z.P., X.L. and H.Z.; supervision, X.L.; funding acquisition, X.L. and H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Science and Technology Development Fund, Macau SAR (No. 0096/2022/A), Basic and Applied Basic Research Foundation of Guangdong (No. 2024A1515011822), Scientific Computing Research Innovation Team of Guangdong Province (No. 2021KCXTD052), Guangdong Key Construction Discipline Research Capacity Enhancement Project (No. 2022ZDJS049) and Technology Planning Project of Shaoguan (No. 230330108034184).

Data Availability Statement

The MambaSR model is available at https://github.com/ttys0001/mambasr.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hijji, M.; Khan, A.; Alwakeel, M.M.; Harrabi, R.; Aradah, F.; Cheikh, F.A.; Sajjad, M.; Muhammad, K. Intelligent Image Super-Resolution for Vehicle License Plate in Surveillance Applications. Mathematics 2023, 11, 892. [Google Scholar] [CrossRef]
  2. Kim, M.H.; Yoo, S.B. Memory-Efficient Discrete Cosine Transform Domain Weight Modulation Transformer for Arbitrary-Scale Super-Resolution. Mathematics 2023, 11, 3954. [Google Scholar] [CrossRef]
  3. Singh, N.; Rathore, S.S.; Kumar, S. Towards a super-resolution based approach for improved face recognition in low resolution environment. Multimed. Tools Appl. 2022, 81, 38887–38919. [Google Scholar] [CrossRef] [PubMed]
  4. Zhu, D.; Qiu, D. Residual dense network for medical magnetic resonance images super-resolution. Comput. Methods Programs Biomed. 2021, 209, 106330. [Google Scholar] [CrossRef] [PubMed]
  5. Zhao, X.; Zhang, Y.; Zhang, T.; Zou, X. Channel splitting network for single MR image super-resolution. IEEE Trans. Image Process. 2019, 28, 5649–5662. [Google Scholar] [CrossRef]
  6. Lu, T.; Wang, J.; Zhang, Y.; Wang, Z.; Jiang, J. Satellite image super-resolution via multi-scale residual deep neural network. Remote Sens. 2019, 11, 1588. [Google Scholar] [CrossRef]
  7. Lucas, A.; Lopez-Tapia, S.; Molina, R.; Katsaggelos, A.K. Generative adversarial networks and perceptual losses for video super-resolution. IEEE Trans. Image Process. 2019, 28, 3312–3327. [Google Scholar] [CrossRef] [PubMed]
  8. Yang, C.Y.; Ma, C.; Yang, M.H. Single-image super-resolution: A benchmark. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part IV 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 372–386. [Google Scholar]
  9. Irani, M.; Peleg, S. Improving resolution by image registration. CVGIP Graph. Model. Image Process. 1991, 53, 231–239. [Google Scholar] [CrossRef]
  10. Fattal, R. Image upsampling via imposed edge statistics. In ACM SIGGRAPH 2007 Papers; Association for Computing Machinery: New York, NY, USA, 2007; pp. 95–es. [Google Scholar]
  11. Huang, J.; Mumford, D. Statistics of natural images and models. In Proceedings of the 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), IEEE, Fort Collins, CO, USA, 23–25 June 1999; Volume 1, pp. 541–547. [Google Scholar]
  12. Sirota, A.; Ivankov, A. Block algorithms of image processing based on kalman filter for superresolution reconstruction. Comput. Opt. 2014, 38, 118–126. [Google Scholar] [CrossRef]
  13. Freeman, W.T.; Jones, T.R.; Pasztor, E.C. Example-based super-resolution. IEEE Comput. Graph. Appl. 2002, 22, 56–65. [Google Scholar] [CrossRef]
  14. Chang, H.; Yeung, D.Y.; Xiong, Y. Super-resolution through neighbor embedding. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004, IEEE, Washington, DC, USA, 27 June–2 July 2004; Volume 1, p. 1. [Google Scholar]
  15. Yang, J.; Lin, Z.; Cohen, S. Fast image super-resolution based on in-place example regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 1059–1066. [Google Scholar]
  16. Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
  17. Zhang, Y.; Huang, Y.; Wang, K.; Qi, G.; Zhu, J. Single image super-resolution reconstruction with preservation of structure and texture details. Mathematics 2023, 11, 216. [Google Scholar] [CrossRef]
  18. Hu, X.; Mu, H.; Zhang, X.; Wang, Z.; Tan, T.; Sun, J. Meta-SR: A magnification-arbitrary network for super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1575–1584. [Google Scholar]
  19. Chen, Y.; Liu, S.; Wang, X. Learning continuous image representation with local implicit image function. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8628–8638. [Google Scholar]
  20. Yue, Y.; Li, Z. Medmamba: Vision mamba for medical image classification. arXiv 2024, arXiv:2403.03849. [Google Scholar]
  21. Gu, A.; Goel, K.; Ré, C. Efficiently modeling long sequences with structured state spaces. arXiv 2021, arXiv:2111.00396. [Google Scholar]
  22. Gu, A.; Johnson, I.; Goel, K.; Saab, K.; Dao, T.; Rudra, A.; Ré, C. Combining recurrent, convolutional, and continuous-time models with linear state space layers. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2021; Volume 34, pp. 572–585. [Google Scholar]
  23. Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
  24. Lee, J.; Jin, K.H. Local texture estimator for implicit representation function. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1929–1938. [Google Scholar]
  25. Wei, M.; Zhang, X. Super-resolution neural operator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18247–18256. [Google Scholar]
  26. Skopec, P.; Vyhlídal, T.; Knobloch, J. Development of a continuous reheating furnace state-space model based on the finite volume method. Appl. Therm. Eng. 2024, 246, 122888. [Google Scholar] [CrossRef]
  27. Borowska, A.; King, R. Semi-complete data augmentation for efficient state space model fitting. J. Comput. Graph. Stat. 2023, 32, 19–35. [Google Scholar] [CrossRef]
  28. Qiu, L.; Fan, S.; Liao, S.; Sun, P.; Wei, X. State space modelling development of Micro-High-Temperature Gas-Cooled reactor with helium Brayton cycle. Ann. Nucl. Energy 2024, 197, 110284. [Google Scholar] [CrossRef]
  29. Yeganeh, A.; Johannssen, A.; Chukhrova, N.; Rasouli, M. Monitoring multistage healthcare processes using state space models and a machine learning based framework. Artif. Intell. Med. 2024, 151, 102826. [Google Scholar] [CrossRef]
  30. Smith, J.T.; Warrington, A.; Linderman, S.W. Simplified state space layers for sequence modeling. arXiv 2022, arXiv:2208.04933. [Google Scholar]
  31. Mehta, H.; Gupta, A.; Cutkosky, A.; Neyshabur, B. Long range language modeling via gated state spaces. arXiv 2022, arXiv:2206.13947. [Google Scholar]
  32. Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Liu, Y. Vmamba: Visual state space model. arXiv 2024, arXiv:2401.10166. [Google Scholar]
  33. Zhu, L.; Liao, B.; Zhang, Q.; Wang, X.; Liu, W.; Wang, X. Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv 2024, arXiv:2401.09417. [Google Scholar]
  34. Ma, J.; Li, F.; Wang, B. U-mamba: Enhancing long-range dependency for biomedical image segmentation. arXiv 2024, arXiv:2401.04722. [Google Scholar]
  35. Xing, Z.; Ye, T.; Yang, Y.; Liu, G.; Zhu, L. Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation. arXiv 2024, arXiv:2401.13560. [Google Scholar]
  36. Ruan, J.; Xiang, S. Vm-unet: Vision mamba unet for medical image segmentation. arXiv 2024, arXiv:2402.02491. [Google Scholar]
  37. Liu, J.; Yang, H.; Zhou, H.Y.; Xi, Y.; Yu, L.; Yu, Y.; Liang, Y.; Shi, G.; Zhang, S.; Zheng, H.; et al. Swin-umamba: Mamba-based unet with imagenet-based pretraining. arXiv 2024, arXiv:2402.03302. [Google Scholar]
  38. Islam, M.M.; Hasan, M.; Athrey, K.S.; Braskich, T.; Bertasius, G. Efficient movie scene detection using state-space transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18749–18758. [Google Scholar]
  39. Nguyen, E.; Goel, K.; Gu, A.; Downs, G.; Shah, P.; Dao, T.; Baccus, S.; Ré, C. S4nd: Modeling images and videos as multidimensional signals with state spaces. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2022; Volume 35, pp. 2846–2861. [Google Scholar]
  40. Yamashita, S.; Ikehara, M. Image Deraining with Frequency-Enhanced State Space Model. arXiv 2024, arXiv:2405.16470. [Google Scholar]
  41. Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
  42. Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2472–2481. [Google Scholar]
  43. Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
  44. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  45. Agustsson, E.; Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 126–135. [Google Scholar]
  46. Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi-Morel, M.L. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In Proceedings of the 23rd British Machine Vision Conference (BMVC), London, UK, 3–7 September 2012. [Google Scholar]
  47. Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the Curves and Surfaces: 7th International Conference, Avignon, France, 24–30 June 2010; Revised Selected Papers 7. Springer: Berlin/Heidelberg, Germany, 2012; pp. 711–730. [Google Scholar]
  48. Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar]
  49. Matsui, Y.; Ito, K.; Aramaki, Y.; Fujimoto, A.; Ogawa, T.; Yamasaki, T.; Aizawa, K. Sketch-based manga retrieval using manga109 dataset. Multimed. Tools Appl. 2017, 76, 21811–21838. [Google Scholar] [CrossRef]
Figure 1. The framework of the proposed network: MambaSR.
Figure 1. The framework of the proposed network: MambaSR.
Mathematics 12 02370 g001
Figure 2. The architecture of the Residual Fast Fourier Transform State-Space Block.
Figure 2. The architecture of the Residual Fast Fourier Transform State-Space Block.
Mathematics 12 02370 g002
Figure 3. The architecture of the 2D Selective Scan module.
Figure 3. The architecture of the 2D Selective Scan module.
Mathematics 12 02370 g003
Figure 4. Comparisons with different methods on the Urban100 dataset using EDSR as the encoder.
Figure 4. Comparisons with different methods on the Urban100 dataset using EDSR as the encoder.
Mathematics 12 02370 g004
Figure 5. Comparisons with different methods on the Urban100 dataset using EDSR as the encoder.
Figure 5. Comparisons with different methods on the Urban100 dataset using EDSR as the encoder.
Mathematics 12 02370 g005
Figure 6. Comparisons with different methods on the Manga109 dataset using EDSR as the encoder.
Figure 6. Comparisons with different methods on the Manga109 dataset using EDSR as the encoder.
Mathematics 12 02370 g006
Figure 7. Comparisons with different methods on the Set14 dataset using EDSR as the encoder.
Figure 7. Comparisons with different methods on the Set14 dataset using EDSR as the encoder.
Mathematics 12 02370 g007
Figure 8. Comparisons with different methods on the Set5 dataset using EDSR as the encoder.
Figure 8. Comparisons with different methods on the Set5 dataset using EDSR as the encoder.
Mathematics 12 02370 g008
Figure 9. Comparisons with different methods on the Urban100 dataset using RDN as the encoder.
Figure 9. Comparisons with different methods on the Urban100 dataset using RDN as the encoder.
Mathematics 12 02370 g009
Figure 10. Comparisons with different methods on the Manga109 dataset using RDN as the encoder.
Figure 10. Comparisons with different methods on the Manga109 dataset using RDN as the encoder.
Mathematics 12 02370 g010
Figure 11. Comparisons with different methods on the Set14 dataset using RDN as the encoder.
Figure 11. Comparisons with different methods on the Set14 dataset using RDN as the encoder.
Mathematics 12 02370 g011
Table 1. PSNR/SSIM values achieved by different methods with EDSR and RDN on Urban100 datasets. The best results are in bold. The leftmost column represents a scale of magnification ranging from 2.1 to 4.9.
Table 1. PSNR/SSIM values achieved by different methods with EDSR and RDN on Urban100 datasets. The best results are in bold. The leftmost column represents a scale of magnification ranging from 2.1 to 4.9.
DatasetUrban100
MethodEDSR-MetaSREDSR-LIIFEDSR-LTEEDSR-SRNOEDSR-MambaSR
2.131.53/0.921331.58/0.922231.71/0.923431.87/0.924432.58/0.9311
2.231.02/0.913431.08/0.914531.22/0.915931.36/0.916932.05/0.9242
2.330.55/0.905730.61/0.906830.75/0.908330.89/0.909431.55/0.9173
2.430.14/0.898230.20/0.899330.33/0.901030.47/0.902231.11/0.9104
2.529.73/0.890629.80/0.891729.93/0.893530.07/0.894830.71/0.9036
2.629.36/0.883029.44/0.884329.56/0.886229.70/0.887530.32/0.8966
2.729.02/0.875629.11/0.877229.23/0.879029.37/0.880629.97/0.8901
2.828.69/0.868228.79/0.869928.90/0.871729.05/0.873529.64/0.8833
2.928.40/0.860928.50/0.862728.61/0.864728.76/0.866529.34/0.8765
3.127.86/0.846727.97/0.849028.07/0.851028.23/0.853128.79/0.8638
3.227.61/0.839727.73/0.842427.83/0.844427.99/0.846628.54/0.8574
3.327.37/0.832727.50/0.835527.60/0.837727.76/0.840028.29/0.8510
3.427.15/0.825827.28/0.829027.38/0.831327.54/0.833728.06/0.8450
3.526.93/0.819027.07/0.822427.17/0.824827.33/0.827527.84/0.8390
3.626.72/0.812226.86/0.816026.97/0.818327.13/0.821127.64/0.8331
3.726.54/0.805726.68/0.809726.78/0.812126.94/0.815027.45/0.8272
3.826.34/0.798826.50/0.803326.59/0.805726.75/0.808727.25/0.8213
3.926.16/0.792226.32/0.797026.41/0.799426.58/0.802727.06/0.8154
4.125.82/0.779325.99/0.784926.08/0.787426.24/0.790926.71/0.8041
4.225.65/0.773025.83/0.779125.92/0.781626.08/0.785226.55/0.7987
4.325.51/0.766925.69/0.773325.78/0.775925.94/0.779626.41/0.7934
4.425.35/0.760525.54/0.767425.63/0.770125.79/0.773826.25/0.7879
4.525.22/0.754625.40/0.761725.49/0.764425.65/0.768426.10/0.7824
4.625.08/0.748625.27/0.756125.36/0.758825.53/0.763125.97/0.7773
4.724.95/0.742725.13/0.750425.23/0.753325.38/0.757425.83/0.7723
4.824.83/0.737125.02/0.745225.10/0.747925.27/0.752225.70/0.7671
4.924.70/0.731524.89/0.739824.97/0.742525.14/0.747225.57/0.7623
MethodRDN-MetaSRRDN-LIIFRDN-LTERDN-SRNORDN-MambaSR
2.132.30/0.929332.30/0.929332.45/0.930632.42/0.929932.71/0.9321
2.231.80/0.922231.79/0.922331.95/0.923731.91/0.923032.21/0.9255
2.331.32/0.915231.30/0.915331.47/0.916831.43/0.916031.72/0.9187
2.430.89/0.908430.89/0.908631.05/0.910231.01/0.909231.27/0.9119
2.530.50/0.901530.49/0.901630.64/0.903430.61/0.902530.87/0.9052
2.630.11/0.894230.09/0.894530.26/0.896530.23/0.895430.47/0.8982
2.729.77/0.887629.75/0.887829.91/0.889929.89/0.888830.12/0.8918
2.829.44/0.880729.41/0.880929.57/0.883129.56/0.882129.79/0.8851
2.929.13/0.873929.12/0.874229.26/0.876429.25/0.875329.48/0.8786
3.128.57/0.861028.56/0.861328.71/0.863728.70/0.862528.92/0.8659
3.228.32/0.854628.32/0.855028.46/0.857528.46/0.856428.67/0.8598
3.328.06/0.848028.07/0.848428.22/0.851128.21/0.850028.42/0.8535
3.427.84/0.841927.85/0.842527.98/0.845027.98/0.843828.19/0.8476
3.527.61/0.835727.64/0.836427.77/0.839027.77/0.838027.97/0.8416
3.627.40/0.829427.42/0.830327.56/0.833127.56/0.831927.76/0.8357
3.727.19/0.823227.22/0.824227.37/0.827227.35/0.825927.56/0.8300
3.826.99/0.816827.04/0.818127.17/0.821127.18/0.820027.37/0.8242
3.926.80/0.810726.85/0.812026.99/0.815327.00/0.814227.18/0.8182
4.126.43/0.798426.50/0.800426.63/0.803626.65/0.802826.84/0.8072
4.226.26/0.792626.33/0.794826.48/0.798326.48/0.797326.68/0.8019
4.326.10/0.786726.18/0.789126.33/0.792826.34/0.791826.53/0.7966
4.425.95/0.780826.02/0.783626.17/0.787226.18/0.786526.37/0.7913
4.525.79/0.775025.87/0.778226.02/0.781926.03/0.781026.22/0.7859
4.625.66/0.769525.74/0.772925.89/0.776725.90/0.775926.09/0.7810
4.725.50/0.763625.60/0.767425.74/0.771325.75/0.770325.94/0.7757
4.825.38/0.758425.49/0.762525.63/0.766325.64/0.765725.82/0.7708
4.925.25/0.752725.35/0.757325.49/0.761125.50/0.760325.69/0.7659
Table 2. PSNR/SSIM values achieved by different methods with EDSR and RDN on Manga109 datasets. The best results are in bold. The leftmost column represents a scale of magnification ranging from 2.1 to 4.9.
Table 2. PSNR/SSIM values achieved by different methods with EDSR and RDN on Manga109 datasets. The best results are in bold. The leftmost column represents a scale of magnification ranging from 2.1 to 4.9.
DatasetManga109
MethodEDSR-MetaSREDSR-LIIFEDSR-LTEEDSR-SRNOEDSR-MambaSR
2.137.87/0.974537.98/0.974738.03/0.974838.28/0.975538.84/0.9765
2.237.25/0.971537.37/0.971737.41/0.971937.67/0.972638.27/0.9738
2.336.67/0.968436.78/0.968636.83/0.968937.11/0.969737.72/0.9711
2.436.13/0.965136.24/0.965536.30/0.965836.57/0.966637.20/0.9683
2.535.62/0.961835.72/0.962335.79/0.962636.06/0.963636.71/0.9655
2.635.14/0.958535.22/0.959035.30/0.959435.56/0.960536.23/0.9626
2.734.71/0.955134.76/0.955734.84/0.956235.10/0.957335.77/0.9596
2.834.30/0.951834.35/0.952534.42/0.953034.68/0.954135.34/0.9567
2.933.93/0.948433.96/0.949234.02/0.949734.28/0.950934.94/0.9537
3.133.20/0.941633.22/0.942633.29/0.943233.54/0.944634.18/0.9477
3.232.87/0.938332.89/0.939432.95/0.940033.20/0.941533.83/0.9447
3.332.52/0.934632.54/0.935932.60/0.936532.83/0.938133.48/0.9417
3.432.21/0.931032.24/0.932632.30/0.933232.52/0.934833.17/0.9387
3.531.88/0.927331.95/0.929232.00/0.929932.21/0.931532.87/0.9357
3.631.59/0.923731.64/0.925631.69/0.926431.89/0.928132.54/0.9326
3.731.30/0.919831.36/0.922131.39/0.922931.60/0.924632.24/0.9295
3.831.00/0.915931.08/0.918431.11/0.919331.31/0.921031.95/0.9262
3.930.70/0.911930.79/0.914730.84/0.915731.02/0.917431.65/0.9230
4.130.18/0.903930.29/0.907330.33/0.908630.48/0.910131.09/0.9162
4.229.88/0.899630.03/0.903530.10/0.904930.23/0.906330.84/0.9127
4.329.62/0.895529.78/0.899629.83/0.901229.97/0.902730.59/0.9093
4.429.36/0.891229.56/0.896029.60/0.897429.75/0.899230.35/0.9059
4.529.15/0.887229.37/0.892629.41/0.893929.54/0.895930.17/0.9029
4.628.90/0.883029.13/0.888829.17/0.890129.31/0.892429.91/0.8997
4.728.69/0.878628.92/0.885028.96/0.886329.09/0.888629.70/0.8963
4.828.47/0.874728.71/0.881628.77/0.882928.89/0.885429.52/0.8935
4.928.24/0.870128.51/0.877828.56/0.879028.66/0.881629.32/0.8902
MethodRDN-MetaSRRDN-LIIFRDN-LTERDN-SRNORDN-MambaSR
2.138.70/0.976238.63/0.976138.68/0.976138.70/0.976338.92/0.9768
2.238.13/0.973538.04/0.973338.11/0.973438.12/0.973538.35/0.9741
2.337.56/0.970737.48/0.970537.55/0.970637.56/0.970737.81/0.9714
2.437.06/0.967836.96/0.967637.04/0.967837.05/0.967937.30/0.9687
2.536.55/0.964936.45/0.964736.53/0.965036.54/0.965136.81/0.9659
2.636.07/0.962035.96/0.961836.05/0.962136.06/0.962236.34/0.9632
2.735.62/0.959035.49/0.958835.60/0.959235.61/0.959335.89/0.9604
2.835.18/0.956035.06/0.955835.17/0.956335.17/0.956335.46/0.9575
2.934.79/0.952934.65/0.952734.77/0.953334.77/0.953235.06/0.9545
3.134.02/0.946833.88/0.946634.02/0.947334.00/0.947234.30/0.9487
3.233.70/0.943833.54/0.943633.68/0.944433.68/0.944333.96/0.9458
3.333.35/0.940633.20/0.940433.33/0.941233.32/0.941133.60/0.9427
3.433.04/0.937532.89/0.937433.02/0.938233.01/0.938133.29/0.9398
3.532.70/0.934332.60/0.934432.71/0.935232.69/0.935132.98/0.9369
3.632.40/0.931232.28/0.931232.40/0.932132.38/0.931932.66/0.9338
3.732.13/0.928032.00/0.928032.12/0.929032.11/0.928932.37/0.9308
3.831.83/0.924631.71/0.924731.83/0.925731.80/0.925632.09/0.9277
3.931.56/0.921331.45/0.921531.56/0.922631.53/0.922531.82/0.9246
4.131.01/0.914430.91/0.914831.03/0.915831.00/0.915731.26/0.9183
4.230.75/0.910830.65/0.911330.76/0.912430.75/0.912131.01/0.9151
4.330.48/0.907030.41/0.907830.51/0.909030.50/0.908530.76/0.9120
4.430.20/0.903230.19/0.904530.29/0.905830.25/0.905130.52/0.9087
4.529.98/0.899529.97/0.901230.09/0.902630.04/0.901830.30/0.9053
4.629.75/0.895929.76/0.898029.86/0.899329.82/0.898630.08/0.9022
4.729.51/0.891929.53/0.894529.64/0.896029.61/0.895429.88/0.8990
4.829.29/0.888229.35/0.891429.45/0.893029.41/0.892429.67/0.8959
4.929.06/0.884229.13/0.887929.23/0.889529.18/0.888829.44/0.8922
Table 3. PSNR/SSIM values achieved by different methods with EDSR and RDN on Set5 datasets. The best results are in bold. The leftmost column represents a scale of magnification ranging from 2.1 to 4.9.
Table 3. PSNR/SSIM values achieved by different methods with EDSR and RDN on Set5 datasets. The best results are in bold. The leftmost column represents a scale of magnification ranging from 2.1 to 4.9.
DatasetSet5
MethodEDSR-MetaSREDSR-LIIFEDSR-LTEEDSR-SRNOEDSR-MambaSR
2.137.46/0.958737.47/0.958837.52/0.958937.56/0.959137.70/0.9596
2.236.96/0.955336.96/0.955437.01/0.955537.07/0.955937.23/0.9564
2.336.65/0.952336.67/0.952436.72/0.952536.78/0.952836.93/0.9535
2.436.25/0.949136.26/0.949236.30/0.949236.36/0.949736.55/0.9505
2.535.87/0.945435.91/0.945735.93/0.945736.02/0.946136.20/0.9470
2.635.60/0.942835.64/0.943135.64/0.943135.73/0.943535.90/0.9443
2.735.34/0.939635.36/0.939935.38/0.939935.44/0.940335.63/0.9412
2.835.02/0.936335.03/0.936635.06/0.936735.13/0.937135.34/0.9383
2.934.72/0.933234.74/0.933734.74/0.933834.84/0.934235.08/0.9357
3.134.19/0.926934.21/0.927434.29/0.927834.33/0.928434.58/0.9300
3.233.94/0.923933.97/0.924534.02/0.924734.08/0.925234.37/0.9272
3.333.64/0.920233.69/0.921033.72/0.921133.76/0.921734.13/0.9242
3.433.44/0.917033.52/0.918033.58/0.918333.64/0.918933.96/0.9214
3.533.20/0.914133.29/0.915233.36/0.915833.40/0.916233.73/0.9186
3.632.93/0.910733.02/0.911633.03/0.912133.06/0.912433.43/0.9154
3.732.75/0.907532.89/0.909032.85/0.909132.91/0.909733.26/0.9128
3.832.54/0.904132.67/0.905732.66/0.905832.70/0.906333.01/0.9093
3.932.32/0.900832.43/0.902432.44/0.902632.53/0.903932.80/0.9064
4.131.84/0.893432.02/0.895832.03/0.896132.06/0.897032.42/0.9005
4.231.73/0.891031.89/0.894031.90/0.894031.95/0.895232.33/0.8987
4.331.46/0.886131.65/0.889431.68/0.889731.75/0.890832.10/0.8946
4.431.23/0.882431.46/0.886031.48/0.886231.59/0.887731.89/0.8913
4.531.11/0.879831.29/0.883231.30/0.883331.36/0.884731.76/0.8889
4.630.91/0.876331.11/0.879631.12/0.879731.19/0.881231.52/0.8853
4.730.76/0.873130.95/0.877030.95/0.877031.06/0.879031.41/0.8828
4.830.55/0.869530.79/0.874630.79/0.874230.88/0.876131.23/0.8804
4.930.41/0.865430.65/0.870430.62/0.869930.74/0.872331.09/0.8765
MethodRDN-MetaSRRDN-LIIFRDN-LTERDN-SRNORDN-MambaSR
2.137.70/0.959637.66/0.959537.72/0.959737.73/0.959837.74/0.9599
2.237.19/0.956337.18/0.956337.23/0.956437.22/0.956437.28/0.9566
2.336.91/0.953436.88/0.953336.94/0.953636.94/0.953736.98/0.9537
2.436.48/0.950236.47/0.950236.50/0.950436.54/0.950536.56/0.9506
2.536.16/0.946936.15/0.946836.18/0.947036.18/0.947136.24/0.9472
2.635.81/0.944135.81/0.944135.86/0.944335.87/0.944435.94/0.9447
2.735.58/0.941135.56/0.941035.60/0.941235.61/0.941335.65/0.9415
2.835.28/0.938235.26/0.937935.29/0.938335.31/0.938335.37/0.9386
2.934.98/0.935334.98/0.935335.01/0.935634.99/0.935535.10/0.9359
3.134.46/0.929434.47/0.929434.54/0.929934.53/0.929934.63/0.9306
3.234.24/0.926634.24/0.926534.32/0.927134.32/0.927034.43/0.9279
3.333.92/0.923133.94/0.923234.03/0.924034.01/0.923734.13/0.9244
3.433.81/0.920633.84/0.920633.91/0.921333.92/0.921433.98/0.9219
3.533.56/0.917433.58/0.917733.66/0.918233.66/0.918333.76/0.9193
3.633.25/0.914033.27/0.914433.30/0.914833.29/0.914833.47/0.9160
3.733.06/0.911033.17/0.912133.21/0.912533.20/0.912733.30/0.9136
3.832.84/0.907632.93/0.908432.99/0.909032.94/0.908833.10/0.9103
3.932.61/0.904332.68/0.905232.76/0.905832.72/0.905632.87/0.9073
4.132.18/0.897732.21/0.898732.32/0.899632.24/0.899332.44/0.9013
4.232.04/0.895732.14/0.897032.23/0.897932.17/0.897632.33/0.8992
4.331.84/0.891331.95/0.892932.06/0.894131.99/0.893232.16/0.8958
4.431.64/0.888031.73/0.889531.84/0.890531.79/0.889931.99/0.8928
4.531.48/0.885331.61/0.887131.69/0.888231.64/0.887831.84/0.8903
4.631.31/0.881731.42/0.884231.51/0.884831.48/0.884831.72/0.8881
4.731.14/0.878931.22/0.881031.33/0.881931.29/0.881831.54/0.8853
4.830.95/0.876131.12/0.879331.16/0.879731.08/0.879231.33/0.8824
4.930.79/0.871730.95/0.874530.99/0.875130.94/0.874731.16/0.8782
Table 4. PSNR/SSIM values achieved by different methods with EDSR and RDN on Set14 datasets. The best results are in bold. The leftmost column represents a scale of magnification ranging from 2.1 to 4.9.
Table 4. PSNR/SSIM values achieved by different methods with EDSR and RDN on Set14 datasets. The best results are in bold. The leftmost column represents a scale of magnification ranging from 2.1 to 4.9.
DatasetSet14
MethodEDSR-MetaSREDSR-LIIFEDSR-LTEEDSR-SRNOEDSR-MambaSR
2.133.10/0.912633.17/0.913333.21/0.913833.27/0.914233.56/0.9166
2.232.65/0.905232.72/0.905932.76/0.906232.78/0.906633.09/0.9093
2.332.24/0.896832.31/0.897332.35/0.897832.37/0.898232.70/0.9012
2.431.89/0.889531.97/0.890232.01/0.890932.01/0.891132.33/0.8939
2.531.57/0.881731.62/0.882431.66/0.883031.69/0.883631.99/0.8862
2.631.34/0.874331.37/0.875431.41/0.875931.46/0.876431.70/0.8787
2.731.06/0.867231.10/0.868231.13/0.868531.19/0.869431.41/0.8718
2.830.83/0.860130.86/0.861230.90/0.862030.96/0.862731.14/0.8650
2.930.57/0.852730.60/0.853930.62/0.854630.71/0.855830.87/0.8581
3.130.13/0.839330.17/0.840930.21/0.841530.29/0.842730.46/0.8459
3.229.87/0.832529.92/0.834029.94/0.834530.02/0.835930.21/0.8394
3.329.70/0.826329.77/0.828229.76/0.828329.85/0.829730.04/0.8336
3.429.52/0.820129.56/0.821729.59/0.822329.67/0.823529.86/0.8271
3.529.35/0.814029.41/0.815929.42/0.816229.51/0.817529.71/0.8212
3.629.19/0.808529.25/0.810229.27/0.810929.36/0.811929.52/0.8154
3.729.03/0.802329.10/0.804629.13/0.805229.20/0.806129.35/0.8091
3.828.88/0.797728.95/0.799928.97/0.800629.05/0.801729.20/0.8048
3.928.70/0.791428.80/0.793928.82/0.794628.90/0.795629.07/0.7994
4.128.43/0.781228.50/0.783428.52/0.784128.60/0.785328.78/0.7892
4.228.28/0.776028.36/0.778328.39/0.779128.48/0.780628.64/0.7843
4.328.16/0.770828.24/0.773328.26/0.774028.36/0.775628.55/0.7800
4.428.04/0.766828.13/0.769328.14/0.770028.25/0.771528.46/0.7762
4.527.87/0.761527.96/0.764127.96/0.764528.07/0.766028.30/0.7713
4.627.75/0.756327.84/0.758827.87/0.759327.94/0.760728.18/0.7667
4.727.63/0.751427.72/0.754027.72/0.754327.81/0.755528.05/0.7618
4.827.54/0.747127.62/0.749427.63/0.749927.71/0.751127.94/0.7575
4.927.41/0.743027.49/0.745527.52/0.746127.61/0.747427.83/0.7535
MethodRDN-MetaSRRDN-LIIFRDN-LTERDN-SRNORDN-MambaSR
2.133.49/0.916833.47/0.916333.58/0.916833.55/0.917633.67/0.9173
2.233.03/0.910133.03/0.909433.10/0.910033.04/0.909933.17/0.9099
2.332.60/0.901332.66/0.901232.70/0.901632.59/0.901332.74/0.9013
2.432.26/0.894032.25/0.893632.31/0.894432.26/0.894132.37/0.8940
2.531.87/0.886131.87/0.885631.91/0.886331.89/0.886232.01/0.8865
2.631.60/0.878431.64/0.878331.66/0.878931.64/0.878931.74/0.8793
2.731.35/0.871531.36/0.871331.39/0.872031.39/0.872031.42/0.8723
2.831.08/0.864731.06/0.864531.12/0.865131.12/0.865231.18/0.8656
2.930.80/0.857730.80/0.857530.84/0.858230.84/0.858330.94/0.8590
3.130.39/0.845130.37/0.845030.40/0.845830.45/0.845930.51/0.8466
3.230.10/0.838130.09/0.838030.14/0.838830.15/0.838930.26/0.8402
3.329.95/0.832229.92/0.831829.97/0.832829.99/0.833030.07/0.8342
3.429.73/0.825829.73/0.825629.77/0.826529.79/0.826629.92/0.8284
3.529.58/0.819929.58/0.819829.63/0.820929.64/0.820629.75/0.8224
3.629.43/0.814129.38/0.813929.44/0.815029.46/0.814929.58/0.8167
3.729.26/0.808429.23/0.808029.29/0.809129.29/0.808929.40/0.8107
3.829.11/0.803929.09/0.803929.14/0.805029.13/0.804729.25/0.8060
3.928.96/0.798128.96/0.798329.00/0.799329.01/0.799129.11/0.8005
4.128.69/0.788128.69/0.788228.75/0.789428.75/0.789128.84/0.7906
4.228.55/0.783228.57/0.783728.62/0.784728.62/0.784428.71/0.7859
4.328.45/0.778628.45/0.779028.51/0.780228.51/0.779928.62/0.7811
4.428.35/0.774728.36/0.775028.43/0.776328.40/0.775828.51/0.7776
4.528.16/0.769428.18/0.770328.12/0.769728.22/0.770828.34/0.7726
4.628.04/0.764528.07/0.765427.99/0.765128.12/0.766028.25/0.7683
4.727.91/0.759927.94/0.760827.87/0.760228.01/0.761728.11/0.7632
4.827.80/0.755127.85/0.756627.78/0.755727.89/0.756928.00/0.7588
4.927.65/0.751027.71/0.752527.72/0.752827.75/0.752227.89/0.7553
Table 5. Ablation experiments for FFTConv.
Table 5. Ablation experiments for FFTConv.
MethodSet5Set14Urban100Manga109
MambaSR + w/o FFTConv32.23/0.897728.60/0.783226.44/0.795530.62/0.9108
MambaSR32.33/0.898728.64/0.784326.55/0.798730.84/0.9127
Table 6. Comparison of Normalized Cross Correlation (NCC) and Normalized Absolute Error (NAE) on four datasets: Set5, Set14, Urban100, and Manga109, at scaling factors of 2, 3, and 4. The best performance for each metric is highlighted in bold. In the table, the upward arrow indicates that a higher value corresponds to better performance, while the downward arrow indicates that a lower value corresponds to better performance.
Table 6. Comparison of Normalized Cross Correlation (NCC) and Normalized Absolute Error (NAE) on four datasets: Set5, Set14, Urban100, and Manga109, at scaling factors of 2, 3, and 4. The best performance for each metric is highlighted in bold. In the table, the upward arrow indicates that a higher value corresponds to better performance, while the downward arrow indicates that a lower value corresponds to better performance.
DatasetScaleRDN-MetaSRRDN-LIIFRDN-LTERDN-SRNORDN-MambaSR
NCC↑NAE↓NCC↑NAE↓NCC↑NAE↓NCC↑NAE↓NCC↑NAE↓
Set520.99810.02340.99810.02350.99810.02320.99810.02320.99820.0230
30.99590.03320.99590.03330.99590.03290.99590.03300.99600.0327
40.99290.04210.99310.04150.99330.04120.99320.04130.99350.0406
Set1420.99000.03740.98980.03760.98980.03750.99010.03720.99020.0371
30.97930.05360.97920.05370.97940.05340.97960.05320.97980.0529
40.97050.06500.97040.06490.97080.06440.97070.06460.97130.0639
Urban10020.98990.04070.98990.04070.99020.04000.99010.04010.99060.0393
30.97680.06250.97660.06240.97730.06150.97720.06170.97810.0604
40.96360.08040.96380.07920.96460.07800.96490.07800.96610.0765
Manga10920.99810.01410.99810.01420.99810.01400.99810.01410.99820.0137
30.99440.02280.99430.02310.99450.02270.99450.02270.99480.0221
40.98900.03120.98900.03100.98920.03050.98920.03070.98980.0298
Table 7. Performance comparison of different super-resolution models on the Urban100 dataset at scaling factors of 4.
Table 7. Performance comparison of different super-resolution models on the Urban100 dataset at scaling factors of 4.
EDSR-MetaSREDSR-LIIFEDSR-LTEEDSR-SRNOEDSR-MambaSR
PSNR on Urban100 (dB)25.9526.1526.2426.4126.90
Runtime (s)13.1537.8748.3029.7340.57
Table 8. Performance comparison of different super-resolution models on the Urban100 dataset (processed with Gaussian blur, Gaussian noise, and bicubic downsampling) at scaling factors of 2, 3, and 4. The best performance for each metric is highlighted in bold. In the table, the upward arrow indicates that a higher value corresponds to better performance.
Table 8. Performance comparison of different super-resolution models on the Urban100 dataset (processed with Gaussian blur, Gaussian noise, and bicubic downsampling) at scaling factors of 2, 3, and 4. The best performance for each metric is highlighted in bold. In the table, the upward arrow indicates that a higher value corresponds to better performance.
DatasetScaleEDSR-MetaSREDSR-LIIFEDSR-LTEEDSR-SRNOEDSR-MambaSR
PSNR↑SSIM↑PSNR↑SSIM↑PSNR↑SSIM↑PSNR↑SSIM↑PSNR↑SSIM↑
Urban100222.190.454022.240.460222.270.458722.130.451922.390.4633
321.150.404621.040.398621.160.404821.000.394321.270.4074
420.400.376320.440.378520.420.377120.380.373020.520.3793
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yan, J.; Chen, Z.; Pei, Z.; Lu, X.; Zheng, H. MambaSR: Arbitrary-Scale Super-Resolution Integrating Mamba with Fast Fourier Convolution Blocks. Mathematics 2024, 12, 2370. https://doi.org/10.3390/math12152370

AMA Style

Yan J, Chen Z, Pei Z, Lu X, Zheng H. MambaSR: Arbitrary-Scale Super-Resolution Integrating Mamba with Fast Fourier Convolution Blocks. Mathematics. 2024; 12(15):2370. https://doi.org/10.3390/math12152370

Chicago/Turabian Style

Yan, Jin, Zongren Chen, Zhiyuan Pei, Xiaoping Lu, and Hua Zheng. 2024. "MambaSR: Arbitrary-Scale Super-Resolution Integrating Mamba with Fast Fourier Convolution Blocks" Mathematics 12, no. 15: 2370. https://doi.org/10.3390/math12152370

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop