Progressive Multi-Scale Fusion Network for Light Field Super-Resolution

Zhang, Wei; Ke, Wei; Sheng, Hao; Xiong, Zhang

doi:10.3390/app12147135

Open AccessArticle

Progressive Multi-Scale Fusion Network for Light Field Super-Resolution

¹

Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR 999078, China

²

State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(14), 7135; https://doi.org/10.3390/app12147135

Submission received: 26 May 2022 / Revised: 9 July 2022 / Accepted: 12 July 2022 / Published: 15 July 2022

Download

Browse Figures

Versions Notes

Abstract

:

Light field (LF) cameras can record multi-view images from a single scene, and these images can provide spatial and angular information to improve the performance of image super-resolution (SR). However, it is a challenge to incorporate distinctive information from different LF views. At the same time, due to the inherent resolution of the image sensor, high spatial and angular resolution are trade-off problems. In this paper, we propose a progressive multi-scale fusion network (PMFN) to improve the LFSR performance. Specifically, a progressive feature fusion block (PFFB) based on an encoder-and-decoder structure is designed to implicitly align disparities and integrate complementary information between complementary views. The core module of the PFFB is a dual-branch multi-scale fusion module (DMFM), which can integrate the information from a reference view and auxiliary views to produce a fusion feature. Each DMFM consists of two parallel branches, which have different receptive fields to fuse hierarchical features from complementary views. Three DMFMs with a dense connection are used in the PFFB, which can fully exploit multi-level features to improve the SR performance. Experimental results on both synthetic and real-world datasets demonstrate that the proposed model achieves state-of-the-art performance among existing methods. Moreover, quantitative results show that our method can also generate faithful details.

Keywords:

light field; super-resolution; multi-scale; feature fusion; complementary information

1. Introduction

Light field (LF) cameras are able to record the spatial and angular information of a scene. This camera technology has been successfully used in many applications, such as VR [1,2], 3D reconstruction [3,4], and saliency detection [5,6]. However, the resolution limitation of camera sensors inhibits the development of LF imaging technology. The higher the angular resolution required of the LF, the lower the spatial resolution obtained for each view. Consequently, LF super-resolution (LFSR) algorithms are widely investigated to retrieve high-resolution (HR) information from low-resolution (LR) images.

Compared with the single image SR (SISR) [7,8], LFSR achieves SR images by using angular and spatial information from different LF views. Based on the complex geometrical structure of LF images, some traditional methods based on disparities have been proposed [9,10]. However, the performance of these methods relies on the accuracy of disparity estimation, and their computational costs are very high. Although optimisation-based methods are constantly researched, obtaining accurate disparities from LR sub-aperture images (SAIs) is still challenging.

With the development of deep learning, a straightforward way is fine-tuning the network parameters of SISR. However, these methods can hardly preserve the complex 4D structure of LF. Recently, learning-based methods [11,12,13,14,15,16] are utilised to effectively integrate the spatial and angular information, and improve the SR reconstruction. The work in [11] could reconstruct the final high-resolution (HR) SAIs, which simply used the stacked horizontal and vertical directional views. After the structure of the proposed residual network, Zhang et al. [12] proposed the resLFs, which could extract complementary information by stacking the SAIs in four directions. Recently, Jin et al. [13] proposed an all-to-one (ATO) model to generate SR LF images, which combined the reference view and surrounding auxiliary views via combinatorial geometry embedding. Moreover, Wang et al. [14] utilised a different convolution to extract spatial and angular information for LFSR. More recently, Mo et al. [15] designed a view and channel attention to fuse hierarchical features and distillate valid information, which could reconstruct HR LF images. Wang et al. [16] proposed a deformable convolutional network to address the problem of disparities among LF images. Although most of these networks can incorporate spatial and angular information to achieve high accuracy for LF reconstruction, the disparities among different SAIs are still under-investigated. Furthermore, due to the occlusions and non-Lambertian reflections in LF, an image from one view may contain distinctive details compared with images from other views, as shown in Figure 1. The structures of all-to-all networks are not well utilised informative information from auxiliary views for further performance improvement. Consequently, there are two problems existing in LFSR methods, which align the disparities among LF views and supplement the sufficient complementary information.

In order to handle these issues, we propose a progressive multi-scale fusion network, PMFN, which has an encoder-and-decoder structure with a progressive multi-scale convolution. Our method is designed to fully use the complementary information from all auxiliary images and implicitly address the problem of disparities among the reference view and auxiliary views. Specifically, we propose a progressive feature fusion block (PFFB) with three dual-branch multi-scale fusion modules (DMFMs). Each DMFM has a dual-branch structure. The two branches of DMFB interact with the extraction feature with different receptive fields, respectively. Then, we concatenate the output features of these two branches with informative information. The PFFB is mainly constructed from three DMFMs. These DMFMs adopt a dense skip connection to strengthen the long-term information from previous DMFMs. Furthermore, this connection can fully exploit hierarchical features and fuse complementary information from auxiliary views. Among these DMFMs, a collect-and-distribute strategy is designed to aggregate informative features from prior DMFMs, and distribute them to the next DMFM. Finally, a feature enhancement block (FEB) is designed to obtain robust SR features. The experimental results on real-world and synthetic datasets demonstrate that our PMFN achieves both higher quantitative and better qualitative performance compared with the state-of-the-art methods. The main contributions of this paper are listed as follows:

We design the DMFM using a dual-branch structure to implicitly cover the influence of disparities and incorporate informative information from auxiliary views.
The core PFFB of our network is mainly constructed by three DMFMs with a dense connection. This block can fully exploit multi-level features and the multi-scale fusion information can be preserved among complementary views. It is demonstrated that complementary features are effectively fused by this block to improve SR performance.
The performance of our PMFN has achieved improvements compared with the state-of-the-art methods developed in recent years.

The rest of this paper is organised into the following sections. Section 2 introduces a brief overview of the related work. Section 3 mainly describes the architecture of our PMFN. In Section 4, we provide extensive analysis, comparative experiments and ablation studies by using synthetic and real-world datasets. Finally, Section 5 summarises the conclusion of this paper.

2. Related Work

In this section, we first review the existing SISR algorithms. Then, some LFSR algorithms are briefly introduced.

2.1. Single Image Super-Resolution

The process of SR is an ill-posed inverse problem, which can reconstruct an HR image. Due to the advantage of deep learning, we briefly review several significant works using deep learning for this task. Among them, Dong et al. [17,18] proposed a seminal network called SRCNN to achieve SISR by utilizing the powerful representation capability of CNN. Compared with the shallow architecture of SRCNN, Kim et al. [8] proposed a residual learning network named VDSR, which mainly learned the high-frequency residual information. Lim et al. [19] proposed an enhanced deep SR network (EDSR), which contained the local and global residual connection. Recently, many SR networks with attention mechanisms had superior performance. Dai et al. [20] proposed a second-order attention network (SAN), which applied the trainable second-order attention module to capture spatial information. To improve the performance of remote-sensing images, Wang et al. [21] proposed a contextual transformation network (CTN), which had a lightweight convolution layer to extract and enrich features. These methods have achieved promising performance in SISR reconstruction. It is noted that the SISR can be applied directly to SR for each SAIs. However, the performance of LFSR is hindered because of the deficient use of complementary information from different views.

2.2. Light Field Super-Resolution

Learning-based methods have greatly improved the performance of LFSR compared to traditional methods. Yoon et al. [22] proposed the LFCNN, which was the first application of deep learning to the LFSR. Inspired by the recurrent convolutional neural network, Wang et al. [11] proposed the LFNet and stacked generalisation technique to synthesise the final SAIs. In this structure, only the structure of horizontal and vertical directional views was used to improve LFSR. Inspired by residual networks, Zhang et al. [12] proposed a residual network (resLF) to extract complementary details from four directions of auxiliary views. The sub-pixel information from auxiliary views was stacked to extract the internal geometric structure relations. In order to maximise the number of auxiliary views, Jin et al. [13] proposed an all-to-one model (ATO) to generate SR LF images via combinatorial geometry embedding. In this structure, the complementary information of each SAIs can be used. Moreover, some effective methods directly calculate angular and spatial dimensions to extract cross-view information. Wang et al. [14] proposed an LF-InterNet to extract and incorporate spatial and angular information. Furthermore, Mo et al. [15] proposed a dense dual-attention network (DDAN) containing a view and channel attention to fuse hierarchical features and distillate valid information. The above methods do not fully deal with the disparity issue among different SAIs. Wang et al. [16] proposed a deformable convolution to incorporate angular information due to disparities among LF images. To fully utilise all angular information, Zhang et al. [23] proposed a multiple epipolar geometry network (MEG-Net), which used multi-direction epipolar images to reconstruct all views images. With the development of Transformer, Wang et al. [24] proposed a detail-preserving Transformer (DPT) to recover the details of LF images by leveraging gradient maps of light field to guide the sequence learning. However, the structures of these methods are all-to-all models, whose complementary information is not well utilised for further performance improvement.

3. Progressive Multi-Scale Fusion Network

Following most existing LFSR methods [13,14,15,16], we only use the Y-channel images as input, which are obtained by converting the RGB-channel images into the YCbCr-channel images, and keeping the Y-channel. Ignoring the channel dimension, the 4D LF can be denoted as

L_{LR} \in R^{U \times V \times H \times W}

, where U, V are angular dimensions, and H, W are spatial dimensions. The process of the LFSR can be described as generating an HR image from an LR image with a spatial resolution of

H \times W

choosing from the angular resolution of

U \times V

. The reconstructed LF images are

L_{HR} \in R^{U \times V \times α H \times α W}

, where

α

is the upsampling scale. In this section, we introduce our PMFN in detail.

3.1. Overview

Our PMFN network is shown in Figure 2, which is designed as an all-to-one structure. This network consists of three parts: feature extraction (

f_{FE}

), feature fusion (

f_{FF}

), and reconstruction (

f_{RC}

). Specifically, the residual receptive field block (ResRFB) is designed to extract the shallow features of each SAI, which consists of two RFBs. Given

L_{LR}

as input, the

L_{LR}

is first fed to a

1 \times 1

convolution to generate the initial features. These features are processed by the ResRFB and residual block (ResBlock), respectively. In this part, the shallow features (

G_{1}

) are extracted by the

f_{FE}

, which can be expressed by the following,

G_{1} = f_{FE} (L_{LR}),

(1)

where

G_{1} \in R^{N \times C \times H \times W}

represents the shallow features, C is the feature depth, and

N = U \times V

is the number of SAIs. We divide the

G_{1}

into a two part reference-view feature and several auxiliary-view features, which can be expressed as

G_{1}^{ref} \in R^{C \times H \times W}

and

G_{1}^{aux, k} \in R^{C \times H \times W}

. Specifically,

G_{1}^{ref}

is arbitrarily selected from

G_{1}

and

G_{1}^{aux, k}

are composed of the remaining features of

G_{1}

. The number of

G_{1}^{ref}

is 1 and the number of

G_{1}^{aux, k}

is

N - 1

. Then, the

G_{1}^{ref}

combined with each of

G_{1}^{aux, k}

is put into the PFFB, respectively. Notably, the key component of our network is the

f_{FF}

(PFFB). Through this component, the fusion feature (

G_{2}

) is generated, connected with informative features of different auxiliary views, i.e.,

G_{2} = f_{FF} ([G_{1}^{ref}, G_{1}^{aux, 1}], \dots, [G_{1}^{ref}, G_{1}^{aux, k}]),

(2)

where

[\cdot]

denotes the connection operation, and

G_{2} \in R^{C \times H \times W}

represents the final fusion feature. After generating the

G_{2}

, the feature enhancement block (FEB) is used to bridge the gap between the obtained reference view and the given HR reference view. Then, the residual map (

G_{3}

) is fine-tuned. This block is very useful to obtain more distillation of valid information and promote more HR reconstruction. The architecture of the FEB is shown in Figure 2e. Finally, the

G_{3}

adds a coarse HR image processed by the bicubic interpolation (

Up

) to generate an HR image (

L_{HR}^{ref}

). This process can be simply expressed as

\begin{matrix} G_{3} & = f_{RC} (G_{2}), \\ L_{HR}^{ref} & = G_{3} + Up (L_{LR}^{ref}), \end{matrix}

(3)

where

G_{3}

denotes the residual map, and

L_{LR}^{ref} \in R^{1 \times H \times W}

is an LR reference image, respectively. For other auxiliary views (

L_{HR}^{aux}

), they will share the same weights of this network and be combined with different

L_{LR}^{aux}

processed by the bicubic interpolation. Eventually, all the HR SAIs are generated, hence the

L_{HR}

.

In summary, our all-to-one structure can capture the absent information in the Ref image through other Aux images, which is shown in Figure 3. Moreover, the all-to-all structure can not find unique information from individual views in that the average error over all views is used to optimise during the process of training network [13]. Thus each view in our structure can directly and efficiently incorporate the information from all views.

3.2. Residual Receptive Field Block (ResRFB)

Extracting and using discriminative features with rich context information is meaningful to reconstructing HR images with more details. Inspired by [25], we propose the ResRFB to enlarge the receptive field and extract hierarchical features from each LF SAIs. As shown in Figure 4, the RFB consists of the convolutions of different kernel parameters and the dilated convolutions of different dilated rates, which can imitate the human receptive field and increase the diversity of convolution. In the feature extraction part, two RFBs are used with a residual connection. Compared with the atrous spatial pyramid pooling (ASPP) used in [15,16], this block has superior discriminative representation and robustness.

As shown in Figure 2a, we first put the SAIs into a

1 \times 1

convolution to extract the initial features. Then, these features are fed into cascaded ResRFBs and ResBlocks, whose structures are shown in Figure 2b,c. The ResRFB is constructed from two identical RFBs and applied with the shortcut design, whose parameter is set to

0.1

. For each RFB, it has two branches to obtain the hierarchical features, as illustrated in Figure 4. Then, a ReLU activation is used after each ResRFB. For each ResBlock, it consists of two

3 \times 3

convolutions and a ReLU activation. In summary, multi-scale features of

L_{LR}

are extracted by using these blocks. The effectiveness of ResRFB is discussed in Section 4.4.

3.3. Progressive Feature Fusion Block (PFFB)

After the feature extraction, the purpose of the PFFB is to implicitly align the disparities between the reference-view and auxiliary-view features. Moreover, this block can effectively exploit the complementary information among LF views. Here, we propose a structure of encoder and decoder, which embeds a progressive receptive field. Figure 2d shows that the basic and core component is the feature-fusion block (FFB), which contains three DMFMs. With this structure, it can map the pairs of features from the reference view and auxiliary views to higher dimensions for fusion by the encoder, and the receptive fields with different scales are suitable for extracting distinctive information from feature maps with different sizes. In this paper, we take t-th DMFMs to perform complementary information fusion. Without loss of generality, the determination of the number of DMFMs is demonstrated in Section 4.4.

As shown in Figure 2d, the reference-view feature is stacked respectively with the auxiliary-view features to construct the feature pairs, and these pairs are first concatenated and fused by a

1 \times 1

convolution layer in succession. The output of this convolution is

G_{1}^{fu, k}

. Here, we use t-th DMFMs to generate a fusion feature, which can bring great benefits to implicit feature alignment and feature fusion. The details of the FFB are shown in Figure 5. Specifically, each DMFM has a dual-branch structure mainly consisting of the encoder and decoder convolutions to execute upscaling and downscaling operations. In each branch, we insert two convolutions with different receptive fields in front of the encoder and decoder convolutions to capture the multi-scale information. Then, we concatenate these two branches to obtain the informative high-level feature,

G_{1}^{1, k}

. This feature can be specifically expressed as

G_{1}^{1, k} = f_{DMFM}^{1} (G_{1}^{fu, k}),

(4)

where

f_{DMFM}^{1}

represents the operation of our first DMFB,

k = 1, 2, \dots

, and the output feature is

G_{1}^{1, k} \in R^{C \times H \times W}

.

In our PMFN, three DMFMs are used with a dense connection to fully exploit multi-level features and preserve important information. For the second DMFM, we concatenate

G_{1}^{fu, k}

and

G_{1}^{1, k}

as input and feed them into a

1 \times 1

convolution to process features. That is,

G_{1}^{1^{*}, k} = f_{Conv}^{1} ([G_{1}^{fu, k}, G_{1}^{1, k}]),

(5)

where

f_{Conv}^{1}

is a

1 \times 1

convolution. Similarly, we can obtain the output of the second DMFM, which is

G_{1}^{2, k} = f_{DMFM}^{2} (G_{1}^{1^{*}, k}),

(6)

where

f_{DMFM}^{2}

denotes the second DMFM. Due to the structure of dense connections, the input of the third DMFM consists of three parts:

G_{1}^{fu, k}

,

G_{1}^{1^{*}, k}

, and

G_{1}^{2, k}

. We use a

1 \times 1

convolution to process these features and feed the output into the third DMFM, which can be denoted by

\begin{matrix} G_{1}^{2^{*}, k} & = f_{Conv}^{2} ([G_{1}^{fu, k}, G_{1}^{1^{*}, k}, G_{1}^{2, k}]), \\ G_{1}^{3, k} & = f_{DMFM}^{2} (G_{1}^{2^{*}, k}), \end{matrix}

(7)

where

f_{Conv}^{2}

is a

1 \times 1

convolution. Finally, we collect the features of different levels from the output of each DMFM and fuse them together by using a

1 \times 1

convolution. The output of the FFB is

G_{2}^{k}

, defined as

G_{2}^{k} = f_{Conv}^{3} ([G_{1}^{1^{*}, k}, G_{1}^{2^{*}, k}, G_{1}^{3, k}]),

(8)

where

f_{Conv}^{3}

is a

1 \times 1

convolution. The remaining feature pairs, which are constructed by the reference view and each auxiliary view, are produced in the same way, expressed as

G_{2}^{k}

. To generate the reference feature

G_{2}

with other auxiliary views information, we cascade the

3 \times 3

convolution with a ReLU and a ResBlock, i.e.,

G_{2}^{k} = f_{Conv}^{5} (f_{Res} (f_{Conv}^{4} ([G_{2}^{1}, G_{2}^{2}, \dots, G_{2}^{k}]))),

(9)

where

f_{Conv}^{4}

and

f_{Conv}^{5}

denote the

3 \times 3

convolutions with a ReLU, and

f_{Res}

is the ResBlock consisting of two

3 \times 3

convolutions and a ReLU.

As a result, this PFFB can not only supplement complementary-view informative information but also implicitly handles disparities among complementary views. This block is beneficial to improve the fusion ability on complementary views. The effectiveness of the structure of the PFFB is demonstrated in Section 4.4.

3.4. Feature Enhancement Block (FEB)

After generating the fused feature

G_{2}

with different auxiliary-view information, this FEB is designed to bridge the gap between the obtained reference view and the given HR reference view. The residual map

G_{3}

is fine-tuned by using our FEB. As shown in Figure 2e, the

G_{2}

is first proceeded by the pixel shuffle block to generate the coarse feature, which has the same size as

G_{3}

. Then, we feed the coarse feature into three cascaded

3 \times 3

convolutions with a ReLU to refine the HR residual map

G_{3}

. That is,

G_{3} = f_{Conv}^{8} (f_{Conv}^{7} (f_{Conv}^{6} (f_{PS} [G_{2}]))),

(10)

where

f_{Conv}^{6, 7, 8}

have the same structure and denote the

3 \times 3

convolution closely connected to the ReLU. The role of the last

f_{Conv}^{8}

is to squeeze the number of the feature channels to 1. Then, the final SR reference image is generated by adding the output of the bicubic operation. This block is very useful to obtain more distillation of valid information and promote more HR reconstruction.

4. Experiments

In this section, we conduct a series of experiments in order to demonstrate the performance of the proposed PMFN. First, we introduce the details of the datasets and the experimental setup. Then, we compare our PMFN to several state-of-the-art SISR and LFSR methods. Finally, we go through the ablation study to investigate the significance of each component in our network.

4.1. Datasets

In this work, we conducted experiments on both synthetic and real-world datasets. It is more meaningful for LF algorithms to adapt to different datasets with different scenes. As listed in Table 1, HCInew [26], HCIold [27], EPFL [28], INRIA [29], and STFgantry [30] are the six public LF datasets we use, each having different characteristics. STFgantry has the largest LF disparity, including the most distinctive information between two adjacent views compared with other datasets. EPFL, INRIA, and STFgantry are captured by the Lytro Illum cameras, consisting of real-world scenes. Each dataset is divided into the training and testing parts. Specifically, 144 LF images were used for training and 23 LF images were used for testing our method. The original SAIs of LF for these datasets have an angular resolution of

9 \times 9

.

4.2. Settings and Implementation Details

To generate the LR LF images, the bicubic downsampling method is used to generate LR patches, with

2 \times

and

4 \times

downsampling factors. Our PMFN sets the channel number of features to 64. The patch size is set to

5 \times 5 \times 64 \times 64

. The resolutions of the angular and spatial are

5 \times 5

and

64 \times 64

, respectively. These LF images are randomly processed by flipping the images horizontally or vertically and rotating them 90 degrees. Our network adopts the L1 Loss function and Adam optimiser (

β_{1} = 0.9

,

β_{2} = 0.999

). The initial learning rate was set to 1 × 10

^{- 4}

and decreased by a factor of 0.5 every 250 epochs. The total training steps are set at 400 epochs. We trained our network with NVIDIA RTX 2080 TI GPU, and this model is based on the PyTorch framework.

4.3. Comparison to the State of the Art

We compare the results of the PMFN with recent state-of-the-art single image SR (SISR) and LFSR methods, which are VDSR [8], EDSR [19], GB [31], resLF [12], LFSSR [32], LF-ATO [13], LF-InterNet [14], LF-DFNet [14], MEG-Net [23], and DPT [24]. The bicubic method is the baseline of comparison. The VDSR and EDSR are the typical methods for SISR. The GB is the traditional method for LFSR. The resLF, LFSSR, LF-ATO, LF-InterNet, LF-DFNet, MEG-Net, and DPT are the LFSR methods by using deep learning. The learning-based methods have been retrained on the same training datasets for consistency. Meanwhile, we used peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) to evaluate the quantitative performance.

4.3.1. Quantitative Results

As shown in Table 2, the quantitative results have a

5 \times 5

LF views for

2 \times

SR and

4 \times

SR. These results show that our approach is comparable with the state-of-the-art SR methods. We also notice that our network achieves the best performance on EPFL, INRIA, and STFgantry datasets. The best results are shown in red and the second results are shown in blue. In Table 2, the value of PSNR (SSIM) of our PMFN far exceeds that of EDSR in STFgantry, which differs by 3.73 dB (0.011) in terms of PSNR (SSIM) for

2 \times

SR. That is because these SISR methods ignore the complementary information existing in other views, which limits the performance of SR. Compared with the traditional LFSR method (GB), our method has 3.97 dB (0.041) higher in terms of PSNR (SSIM) for

4 \times

SR. Our method is based on deep CNNs to achieve better performance compared with GB, which benefits from the representation learning capability of CNN. The average PSNR (SSIM) results of all testing scenes for LF-DFNet, MEG-Net, DPT, and our method are 31.32 dB (0.934), 31.72 dB (0.937), 31.56 dB (0.940), and 31.97 dB (0.940) for the

4 \times

task, respectively. Note that it can be observed that our PMFN has the best generalisation among all datasets. The performance of our network is mediocre in HCInew and HCIold. That is because our network implicitly addresses the influence of disparities by using multiple receptive fields to capture the corresponding pixels, which is not suitable for disparities in the middle. In summary, our method can incorporate complementary information among different views, especially in small and large disparity datasets (EPFL, INRIA, and STFgantry).

4.3.2. Qualitative Results

Figure 6 is the qualitative results for

2 \times

and

4 \times

SR. Compared with the state-of-the-art SISR and LFSR methods, it can be seen that our model effectively generates not only faithful details but fewer artefacts. Specifically, we achieve better performance of stairway in scene INRIA_Sculpture visual quality for

2 \times

SR. We also notice that our approach is able to recover challenging scenes due to the occlusions and complex scenes, such as STFgantry_Cards for

4 \times

SR. In general, our method can effectively reconstruct LF images.

4.3.3. Parameters and FLOPs

In Table 3, we compare the number of parameters (Param.), FLOPs, and the average of PSNR and SSIM (Avg. PSNR/SSIM) between our PMFN and other LFSR methods on testing datasets for 4× upscaling. Note that our method consumes little computational efficiency but achieves the best results compared with SISR and LFSR methods, especially for the reference view. Furthermore, our PMFN has fewer parameters and better performance than DPT [24]. In summary, our method has a good performance not only on the performance of the model but also on the scores of PSNR and SSIM.

4.4. Ablation Study

We conducted extensive comparative experiments to verify the contributions of different components in our PMFN, including the ResRFB for feature extraction, the FFB for feature fusion and the FEB for reconstruction. As shown in Figure 7, they are visual results with different variants. It is demonstrated that our designed network can achieve the best SR performance. Meanwhile, we investigate the performance of different numbers of DMFM and the influence of the connection mode in our FFB.

PMFN-FE w/o ResRFB: The ResRFB is used to extract multi-scale initial features in feature extraction. We investigated the benefit of this block through these two experiments (PMFN-FE + ASPP and PMFN-FE + ResBlock). In order to ensure the parameters of these networks are similar, we introduce two blocks (ASPP and ResBlock) to replace the part of ResRFB to extract the features. As shown in Table 4, the value of the average PSNR is decreased by 0.18 and 0.32 for

4 \times

SR in EPFL, HCInew, HCIold, INRIA, and STFgantry datasets, respectively. Compared with ResBlock and ASPP, our ResRFB has progressive receptive fields, which are beneficial to extract hierarchical features.

Number of DMFMs: This component is the key to our network to achieve feature fusion. Table 5 shows the computational efficiency and quality of the reconstruction affected by different amounts of our DMFMs. It can be observed that the accuracy consistently improves as the number of DMFMs increases. However, the computational efficiency continues to decline and the performance is not significant. It can be observed that the values of PSNR are very similar when the number of DMFMs is between three and five. In this paper, we decided to set three DMFMs to achieve a meaningful trade-off between computational efficiency and the quality of reconstruction.

Structure of FFB: We compare the SR performance with different architectures of PFFB. As shown in Figure 8, it demonstrates the different structures based on DMFM. We adopt three DMFMs to investigate the effectiveness of our structure. Compared with the FFB, the structure of the FFB1 is only three cascaded DMFMs, there is no other way to achieve connection. This variant suffers a decrease of 0.48 dB as compared to PMFN. The FFB2 removes the branches of the input feature, which suffers a decrease of 0.16 dB as compared to PMFN. These two demonstrate the effectiveness of dense connection in PMFN. Although the FFB3 and FFB have the same structure, the FFB3 removes the progressive multi-scale convolution. This result suffers a decrease of 0.42 dB as compared to PMFN. Moreover, we remove our FFB in the variant of the PMFN-FF - FFB to investigate this contribution. This result suffers the worst decrease of 0.79 dB in all variants. Our three DFFMs with dense connections can achieve better performance. That is because both low-level and high-level features are beneficial to the SR performance, and progressive multi-scale design can improve feature representation and enlarge the receptive fields.

Structure of FEB: The FEB is used in our PMFN for high-frequency details reconstruction. To demonstrate its effectiveness, we remove the middle two cascaded

3 \times 3

convolutions with a ReLU and only preserve the last convolution. The result (PMFN-FF-FEB) is lower than PMFN (0.05 dB). That is because the high-frequency features of HR images have been further fine-tuned.

Number of angular resolution: We also analyse the performance of our PMFN in different SAIs. We used

3 \times 3

,

5 \times 5

and

7 \times 7

SAIs in both synthetic (STFgantry) and real-world (HCInew) datasets. As shown in Table 6, the values of PSNR and SSIM can be improved as the number of angular resolutions increases. That is because more complementary information is fused into the reference view with angular resolution increases. In this paper, all results compared with state-of-the-art are obtained with

5 \times 5

SAIs, because these SAIs are enough to investigate the efficiency of our method.

5. Conclusions

In this paper, we propose an all-to-one network PMFN for LFSR. To make full use of complementary information between the reference view and auxiliary views, we design a PFFB with three densely connected DMFMs. This block can implicitly deal with disparities among different views. Additionally, in this block, low-level and high-level features are incorporated to improve the SR performance. Moreover, we design an FEB to further refine the HR residual map to improve the SR reference view. Experimental results have demonstrated the superiority of our PMFN over other state-of-the-art methods.

In future work, an adaptive receptive field network can be investigated to improve the accuracy of generated texture details. This network should adapt to different disparities, and have generalization in synthetic and real-world datasets.

Author Contributions

Conceptualization, W.Z.; methodology, W.Z., W.K., H.S. and Z.X.; writing—original draft preparation, W.Z.; writing—review and editing, W.Z., W.K., H.S. and Z.X.; Funding acquisition, W.K., H.S. and Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This study was partially supported by the National Key R&D Program of China (2018YFB2101100), the National Natural Science Foundation of China (61872025), the Science and Technology Development Fund, Macau SAR (0001/2018/AFJ) and the Open Fund of the State Key Laboratory of Software Development Environment (SKLSDE-2021ZX-03).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Huang, F.; Chen, K.; Wetzstein, G. The light field stereoscope. ACM Trans. Graph. 2015, 34, 1–12. [Google Scholar]
Yu, J. A light-field journey to virtual reality. IEEE MultiMedia 2017, 24, 104–112. [Google Scholar] [CrossRef]
Kim, C.; Zimmer, H.; Pritch, Y.; Sorkine-Hornung, A.; Gross, M.H. Scene reconstruction from high spatio-angular resolution light fields. ACM Trans. Graph. 2013, 32, 73. [Google Scholar] [CrossRef]
Zhu, H.; Wang, Q.; Yu, J. Occlusion-model guided antiocclusion depth estimation in light field. IEEE J. Sel. Top. Signal Process. 2017, 11, 965–978. [Google Scholar] [CrossRef] [Green Version]
Piao, Y.; Li, X.; Zhang, M.; Yu, J.; Lu, H. Saliency detection via depth-induced cellular automata on light field. IEEE Trans. Image Process. 2019, 29, 1879–1889. [Google Scholar] [CrossRef] [PubMed]
Zhang, M.; Li, J.; Wei, J.; Piao, Y.; Lu, H.; Wallach, H.; Larochelle, H.; Beygelzimer, A.; d’Alche Buc, F.; Fox, E. Memory-oriented Decoder for Light Field Salient Object Detection. In Proceedings of the NeurIPS, Vancouver, BC, Canada, 8–14 December 2019; pp. 896–906. [Google Scholar]
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
Kim, J.; Kwon Lee, J.; Mu Lee, K. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
Wanner, S.; Goldluecke, B. Variational light field analysis for disparity estimation and super-resolution. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 606–619. [Google Scholar] [CrossRef] [PubMed]
Farrugia, R.A.; Galea, C.; Guillemot, C. Super resolution of light field images using linear subspace projection of patch-volumes. IEEE J. Sel. Top. Signal Process. 2017, 11, 1058–1071. [Google Scholar] [CrossRef] [Green Version]
Wang, Y.; Liu, F.; Zhang, K.; Hou, G.; Sun, Z.; Tan, T. LFNet: A novel bidirectional recurrent convolutional neural network for light-field image super-resolution. IEEE Trans. Image Process. 2018, 27, 4274–4286. [Google Scholar] [CrossRef] [PubMed]
Zhang, S.; Lin, Y.; Sheng, H. Residual networks for light field image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11046–11055. [Google Scholar]
Jin, J.; Hou, J.; Chen, J.; Kwong, S. Light field spatial super-resolution via deep combinatorial geometry embedding and structural consistency regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2260–2269. [Google Scholar]
Wang, Y.; Wang, L.; Yang, J.; An, W.; Yu, J.; Guo, Y. Spatial-angular interaction for light field image super-resolution. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 290–308. [Google Scholar]
Mo, Y.; Wang, Y.; Xiao, C.; Yang, J.; An, W. Dense Dual-Attention Network for Light Field Image Super-Resolution. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 4431–4443. [Google Scholar] [CrossRef]
Wang, Y.; Yang, J.; Wang, L.; Ying, X.; Wu, T.; An, W.; Guo, Y. Light field image super-resolution using deformable convolution. IEEE Trans. Image Process. 2020, 30, 1057–1071. [Google Scholar] [CrossRef] [PubMed]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 184–199. [Google Scholar]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 295–307. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
Dai, T.; Cai, J.; Zhang, Y.; Xia, S.T.; Zhang, L. Second-order attention network for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11065–11074. [Google Scholar]
Wang, S.; Zhou, T.; Lu, Y.; Di, H. Contextual Transformation Network for Lightweight Remote-Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–13. [Google Scholar] [CrossRef]
Yoon, Y.; Jeon, H.G.; Yoo, D.; Lee, J.Y.; So Kweon, I. Learning a deep convolutional network for light-field image super-resolution. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Santiago, Chile, 7–13 December 2015; pp. 24–32. [Google Scholar]
Zhang, S.; Chang, S.; Lin, Y. End-to-end light field spatial super-resolution network using multiple epipolar geometry. IEEE Trans. Image Process. 2021, 30, 5956–5968. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Zhou, T.; Lu, Y.; Di, H. Detail preserving transformer for light field image super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022. [Google Scholar]
Liu, S.; Huang, D. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400. [Google Scholar]
Honauer, K.; Johannsen, O.; Kondermann, D.; Goldluecke, B. A dataset and evaluation methodology for depth estimation on 4d light fields. In Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 19–34. [Google Scholar]
Wanner, S.; Meister, S.; Goldluecke, B. Datasets and benchmarks for densely sampled 4D light fields. In Proceedings of the Vision, Modeling and Visualization (VMV 2013), Lugano, Switzerland, 11–13 September 2013; Volume 13, pp. 225–226. [Google Scholar]
Rerabek, M.; Ebrahimi, T. New light field image dataset. In Proceedings of the 8th International Conference on Quality of Multimedia Experience (QoMEX), Lisbon, Portugal, 6–8 June 2016. [Google Scholar]
Le Pendu, M.; Jiang, X.; Guillemot, C. Light field inpainting propagation via low rank matrix completion. IEEE Trans. Image Process. 2018, 27, 1981–1993. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Vaish, V.; Adams, A. The (New) Stanford Light Field Archive; Computer Graphics Laboratory, Stanford University: Stanford, CA, USA, 2008; Volume 6. [Google Scholar]
Rossi, M.; Frossard, P. Geometry-consistent light field super-resolution via graph-based regularization. IEEE Trans. Image Process. 2018, 27, 4207–4218. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yeung, H.W.F.; Hou, J.; Chen, X.; Chen, J.; Chen, Z.; Chung, Y.Y. Light field spatial super-resolution using deep efficient spatial-angular separable convolution. IEEE Trans. Image Process. 2018, 28, 2319–2330. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Comparison of image content in different SAIs. The location of the two (a) SAI1 and (b) SAI2 are

(1, 1)

and

(9, 9)

of the

9 \times 9

HCI_new named “Origami”. The area indicated by the black arrow shows that the different views contain distinctive information near occlusion edges.

Figure 1. Comparison of image content in different SAIs. The location of the two (a) SAI1 and (b) SAI2 are

(1, 1)

and

(9, 9)

of the

9 \times 9

HCI_new named “Origami”. The area indicated by the black arrow shows that the different views contain distinctive information near occlusion edges.

Figure 2. Architecture of the proposed PMFN. The overall network composes of three parts: feature extraction, feature fusion, and reconstruction. The input of our network is LR SAIs. The reference view is randomly selected from these SAIs, and the remaining images are auxiliary view images. The output is an SR reference-view image.

Figure 3. Illustration of supplementing complementary information. Here, a

L_{L R}

(

L_{L R} \in R^{3 \times 3 \times H \times W}

) is used as an example. We choose the reference view (

U = 2, V = 2

), and remaining views are used as auxiliary views. For better comprehension, complementary information from different auxiliary views is visually represented as stars with different colours. Note that the information is added to the blue box in Ref image. The result are shown in the zoomed-in region.

Figure 3. Illustration of supplementing complementary information. Here, a

L_{L R}

(

L_{L R} \in R^{3 \times 3 \times H \times W}

) is used as an example. We choose the reference view (

U = 2, V = 2

), and remaining views are used as auxiliary views. For better comprehension, complementary information from different auxiliary views is visually represented as stars with different colours. Note that the information is added to the blue box in Ref image. The result are shown in the zoomed-in region.

Figure 4. Architecture of RFB. This block is the basic component of ResRFB.

Figure 5. Architecture of the three DMFMs in the FFB. Specifically, the encoder-and-decoder structure consists of a convolution and a deconvolution. The yellow lines are the input of

G_{1}^{fu, k}

, the orange lines are the input of

G_{1}^{1^{*}, k}

, and the blue line is the input of

G_{1}^{2^{*}, k}

. The structural parameters with different upsampling factors (

2 \times

and

4 \times

) are shown in the yellow dotted box.

Figure 5. Architecture of the three DMFMs in the FFB. Specifically, the encoder-and-decoder structure consists of a convolution and a deconvolution. The yellow lines are the input of

G_{1}^{fu, k}

, the orange lines are the input of

G_{1}^{1^{*}, k}

, and the blue line is the input of

G_{1}^{2^{*}, k}

. The structural parameters with different upsampling factors (

2 \times

and

4 \times

) are shown in the yellow dotted box.

Figure 6. Visual comparisons of different methods on

\times 2

and

\times 4

reconstruction.

Figure 6. Visual comparisons of different methods on

\times 2

and

\times 4

reconstruction.

Figure 7. Visual results of different variants of our network on “Stanford_Gantry_Lego Knights” for visual

4 \times

SR.

Figure 7. Visual results of different variants of our network on “Stanford_Gantry_Lego Knights” for visual

4 \times

SR.

Figure 8. Different structures of the FFB. FFB1 uses a straight forward connection, FFB2 uses a dense connection without introducing input features, FFB3 uses a dense connection without multi-scale features, and FFB is the structure used in our network.

Table 1. Public LF datasets used in our experiments.

Datasets	Training	Test	LF Disparity	Type
HCInew [26]	20	4	[−4,4]	Synthetic
HCIold [27]	10	2	[−3,3]	Synthetic
EPFL [28]	70	10	[−1,1]	Real-world
INRIA [29]	35	5	[−1,1]	Real-world
STFgantry [30]	9	2	[−7,7]	Real-world
Total	144	23

Table 2. PSNR/SSIM values achieved by different methods for

2 \times

and

4 \times

SR, the best results are in red and the second best results are in blue.

Table 2. PSNR/SSIM values achieved by different methods for

2 \times

and

4 \times

SR, the best results are in red and the second best results are in blue.

Methods	Scale	Datasets
Methods	Scale	EPFL	HCInew	HCIold	INRIA	STFgantry
Bicubic	$2 \times$	29.50/0.935	31.69/0.934	37.46/0.978	31.10/0.956	30.82/0.947
VDSR [8]		32.64/0.960	34.45/0.957	40.75/0.987	34.56/0.975	35.59/0.979
EDSR [19]		33.05/0.963	34.83/0.959	41.00/0.987	34.88/0.976	36.26/0.982
GB [31]		31.22/0.959	35.25/0.969	40.21/0.988	32.76/0.972	35.44/0.983
LFSSR [32]		32.84/0.969	35.58/0.968	42.05/0.991	34.68/0.980	35.86/0.984
resLF [12]		33.46/0.970	36.40/0.972	43.09/0.993	35.25/0.980	37.83/0.989
LF-ATO [13]		34.05/0.975	37.11/0.976	44.15/0.994	35.96/0.984	39.36/0.992
LF-InterNet [14]		34.06/0.975	37.05/0.976	44.37/0.994	35.85/0.984	38.60/0.991
LF-DFNet [16]		34.17/0.976	37.31/0.977	44.24/0.994	36.04/0.984	39.28/0.992
MEG-Net [23]		33.30/0.971	35.98/0.970	42.60/0.992	35.19/0.981	36.53/0.986
DPT [24]		33.77/0.975	36.93/0.975	43.88/0.994	35.68/0.984	38.66/0.991
Ours		34.63/0.977	36.92/0.976	43.65/0.993	36.54/0.985	39.33/0.993
Bicubic	$4 \times$	25.14/0.831	27.61/0.851	32.42/0.934	26.82/0.886	25.93/0.843
VDSR [8]		27.22/0.876	29.24/0.881	34.72/0.951	29.14/0.920	28.40/0.898
EDSR [19]		27.86/0.885	29.56/0.886	35.09/0.953	29.69/0.925	28.72/0.907
GB [31]		26.02/0.863	28.92/0.884	33.74/0.950	27.73/0.909	28.11/0.901
LFSSR [32]		28.27/0.908	30.72/0.913	36.70/0.9590	30.31/0.945	30.15/0.939
resLF [12]		28.14/0.902	30.62/0.909	36.56/0.968	30.22/0.940	30.05/0.935
LF-ATO [13]		28.74/0.913	30.16/0.910	37.01/0.970	30.88/0.949	30.85/0.945
LF-InterNet [14]		28.59/0.912	30.88/0.914	36.95/0.971	30.59/0.948	30.33/0.940
LF-DFNet [16]		28.53/0.910	30.66/0.916	36.58/0.967	30.55/0.944	29.87/0.932
MEG-Net [23]		28.75/0.916	31.10/0.918	37.29/0.972	30.67/0.949	30.77/0.945
DPT [24]		28.54/0.911	30.92/0.914	37.00/0.970	30.66/0.948	30.65/0.943
Ours		29.30/0.917	30.93/0.917	36.94/0.969	31.70/0.950	30.97/0.947

Table 3. The comparative results in terms of the number of parameters, FLOPSs, and average PSNR/SSIM on testing sets for

4 \times

LF image SR methods. FLOPs are calculated on 5 × 5 × 32 × 32 input features.

Table 3. The comparative results in terms of the number of parameters, FLOPSs, and average PSNR/SSIM on testing sets for

4 \times

LF image SR methods. FLOPs are calculated on 5 × 5 × 32 × 32 input features.

Ang	Scale	Params.(M)	FLOPs(G)	Avg. PSNR/SSIM
EDSR [19]	4×	38.89	40.66 × 25	30.18/0.911
LF-SSR [13]		1.77	128.44	31.23/0.935
LF-ATO [13]		1.36	28.08 × 25	31.69/0.938
DPT [24]		3.78	58.64	31.56/0.937
PMFN(Ours) [16]		3.11	49.61 × 25	31.97/0.940

Table 4. PSNR/SSIM values achieved by PMFN and its variants for

4 \times

SR.

Table 4. PSNR/SSIM values achieved by PMFN and its variants for

4 \times

SR.

Models	Scale	Datasets					Average
Models	Scale	EPFL	HCInew	HCIold	INRIA	STFgantry	Average
PMFN-FE + ASPP	$4 \times$	29.15/0.910	30.76/0.911	36.82/0.968	31.47/0.947	30.71/0.942	−0.18/−0.005
PMFN-FE + ResBlock		28.95/0.908	30.69/0.910	36.66/0.967	31.35/0.947	30.61/0.941	−0.32/−0.006
PMFN-FF + FFB1		29.07/0.902	30.47/0.905	36.51/0.966	31.08/0.945	30.32/0.931	−0.48/−0.011
PMFN-FF + FFB2		29.17/0.910	30.79/0.911	36.75/0.968	31.51/0.947	30.83/0.942	−0.16/−0.005
PMFN-FF + FFB3		28.92/0.908	30.62/0.909	36.57/0.967	31.21/0.946	30.44/0.938	−0.42/−0.007
PMFN-FF - FFB		28.71/0.904	30.32/0.903	36.28/0.965	30.88/0.943	29.70/0.928	−0.79/−0.012
PMFN-FF - FEB		29.28/0.912	30.88/0.913	36.91/0.969	31.62/0.948	30.88/0.944	−0.05/−0.003
PMFN		29.30/0.917	30.93/0.917	36.94/0.969	31.70/0.950	30.97/0.949	31.97/0.940

Table 5. The comparative results of different number of DMFMs on the EPFL for

4 \times

LFSR.

Table 5. The comparative results of different number of DMFMs on the EPFL for

4 \times

LFSR.

Ang	Num	Scale	Params.(m)	FLOPs(G)	PSNR
5 × 5	1	4×	1.37	19.35	29.07
	2		2.23	34.46	29.15
	3		3.11	49.61	29.30
	4		4.00	64.81	29.31
	5		4.90	80.03	29.33

Table 6. Comparative results of different angular resolutions on the STFgantry and HCInew using our PMFN for

2 \times

and

4 \times

LFSR.

Table 6. Comparative results of different angular resolutions on the STFgantry and HCInew using our PMFN for

2 \times

and

4 \times

LFSR.

Ang	Dataset	Scale	PSNR	SSIM	Scale	PSNR	SSIM
3 × 3	STFgantry	2×	38.76	0.992	×4	30.67	0.946
3 × 3	HCInew	2×	36.57	0.975	×4	30.66	0.914
5 × 5	STFgantry	2×	39.33	0.993	×4	30.97	0.947
5 × 5	HCInew	2×	36.92	0.976	×4	30.93	0.917
7 × 7	STFgantry	2×	39.68	0.994	×4	31.12	0.951
7 × 7	HCInew	2×	37.07	0.978	×4	30.99	0.919

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, W.; Ke, W.; Sheng, H.; Xiong, Z. Progressive Multi-Scale Fusion Network for Light Field Super-Resolution. Appl. Sci. 2022, 12, 7135. https://doi.org/10.3390/app12147135

AMA Style

Zhang W, Ke W, Sheng H, Xiong Z. Progressive Multi-Scale Fusion Network for Light Field Super-Resolution. Applied Sciences. 2022; 12(14):7135. https://doi.org/10.3390/app12147135

Chicago/Turabian Style

Zhang, Wei, Wei Ke, Hao Sheng, and Zhang Xiong. 2022. "Progressive Multi-Scale Fusion Network for Light Field Super-Resolution" Applied Sciences 12, no. 14: 7135. https://doi.org/10.3390/app12147135

APA Style

Zhang, W., Ke, W., Sheng, H., & Xiong, Z. (2022). Progressive Multi-Scale Fusion Network for Light Field Super-Resolution. Applied Sciences, 12(14), 7135. https://doi.org/10.3390/app12147135

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Progressive Multi-Scale Fusion Network for Light Field Super-Resolution

Abstract

1. Introduction

2. Related Work

2.1. Single Image Super-Resolution

2.2. Light Field Super-Resolution

3. Progressive Multi-Scale Fusion Network

3.1. Overview

3.2. Residual Receptive Field Block (ResRFB)

3.3. Progressive Feature Fusion Block (PFFB)

3.4. Feature Enhancement Block (FEB)

4. Experiments

4.1. Datasets

4.2. Settings and Implementation Details

4.3. Comparison to the State of the Art

4.3.1. Quantitative Results

4.3.2. Qualitative Results

4.3.3. Parameters and FLOPs

4.4. Ablation Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI