Our LF-FANet mainly consists of three parts: a feature extraction module (
), a feature fusion and allocation module (
) and a feature blending and upsampling module (
). These three parts are shown in
Figure 2. The feature extraction module is used to obtain the shallow features from each SAI. Then, we design the second part for feature fusion and allocation, which contains two different operators (AFO, SFO), and an interaction information fusion block (IIFB). This part adopts a fusion and allocation strategy to interact with high-dimensional information (angular and spatial dimensions) and distribute the information for further angular-and-spatial deep representation. Finally, the feature blending and upsampling modules are utilized to generate more compact representations and reconstruct residual maps. Specifically, the LR SAIs (
) as input are fed into a
convolution to generate initial features. These features are processed by two cascaded DASPPs; this is introduced in
Section 3.2.1. Processed by our
, the hierarchical features (
) of all SAIs are generated. That is,
where
represents the hierarchical features,
denotes the number of SAIs, and the feature depth is
C. Then,
is directly fed into the SFO. Meanwhile, we reshape features
to
and feed
into the AFO. That is,
where
and
are two operators for angular and spatial information extraction.
and
are the output of AFO and SFO. Following our fusion strategy, these two features are fully fused by IIFB, which can be expressed as,
where
represents the fusion feature with informative angular and spatial information. Then, the allocation operation is performed.
is fed into the second
and
, respectively. In our network, we repeat our strategy twice, which can obtain two hierarchical fusion features
and
. We concatenate these features as the output of
, which can be expressed as,
where
denotes the concatenation operation, and
is the output feature of
. Finally, the
is fed into a feature blending and upsampling module (FB & UP) to generate an HR residual map (
), which adds a bicubic interpolation to generate the final SR images. The process of reconstructing all SR images can be simply expressed as,
where
is the bicubic interpolation operation, and
is the HR SAIs.
3.2.1. Feature Extraction Module (FEM)
Due to the intrinsic characteristics of LF, an amount of redundant information exists among different views. Therefore, it is meaningful to deepen the network to effectively extract discriminative features in each view. Meanwhile, each view contains rich contextual information, which can be captured by enlarging the receptive field. This information is helpful for reconstructing HR LF images with more details. Our FEM is mainly to extract features for the part of feature fusion and allocation. Inspired by the work in Deeplab and deformable convolution [
16,
32], we designed a DASPP block to enlarge the receptive field and extract hierarchical features for each view, as shown in
Figure 3. This block consists of two cascaded residual atrous spatial pyramid poolings (ResASPP) with dense skip connections. Due to this connection, our DASPP can preserve hierarchical features while providing dense representative information to the utmost extent. Compared with the extraction by a residual block, the superior effectiveness of our DASPP is demonstrated in
Section 4.4.
The details of our FEM are shown in
Figure 3, it is mainly composed of two components: the DASPP and the residual block (ResBlock). Each SAI is first put into a
convolution to extract initial features. These features are fed into two DASPPs with identical structures. Each DASPP has two ResASPP blocks, which are constructed by three dilated convolutions with different dilation rates (1, 2, 4). These dilated convolutions, following a Leaky ReLU, are concatenated to obtain hierarchical features, which are concatenated with the initial features to feed into another ResASPP block. The input and output features of this ResASPP are concatenated to feed into a
convolution to adjust the channel depth. The ResBlock consists of two
convolutions and a Leaky ReLU activation. In summary, the deep and hierarchical features (
) of each SAI are extracted by using our FEM.
3.2.2. Feature Fusion and Allocation Module (FFAM)
For the high-dimensionality characteristic of LF, it is challenging to effectively extract deep representation from high dimensions. Meanwhile, fully fusing angular and spatial information among all LFs still needs to be explored, which is beneficial for preserving LF structural consistency. Many approaches concatenate the information from these two dimensions together, seriously affecting the performance of the network. To reduce the impact of this problem and improve the SR performance, we especially designed two generic operators for 4D LF to achieve angular-wise and spatial-wise feature fusion. These two operators are divided into dual branches to deal with. In each branch, AFO and SFO can effectively integrate angular-wise and spatial-wise features, respectively. Then, we propose a fusion and allocation strategy to achieve the angular and spatial information interaction while achieving the parallax structure of LF. The core component of this strategy is IIFB. Following our strategy, the output features of IIFB are individually fed into the next two operators, which can further achieve the process of feature allocation.
In our FFAM, the hierarchical features (
) are fed into two branches to incorporate angular correlations in angular subspace and supplement context information in spatial subspace, as depicted in
Figure 2. Note that, for the top branch, we perform a reshape operation to obtain the input feature
. Comparing with the
, we only swap the first two dimensions. The slice of each
C dimension represents all the angular information (
) from all SAIs. For the bottom branch, we can obtain the spatial information (
) containing all the channels of each SAI.
3.2.3. Angular Fusion Operator (AFO)
The objective of this operator is to effectively exploit the angular correlation. Inspired by the structure of the encoder and decoder, the main structure of AFO is UpConv and DownConv to execute upscaling and downscaling operations, as shown in
Figure 4. Specifically, it can map the angular-wise features from all SAIs to high-dimension space for interaction by the encoder and decoder structure. Moreover, ResASPP blocks make the high-dimension and low-dimension features with different receptive fields, which are beneficial for exploiting the angular correlation.
Taking the first AFO as an example, the reshaped feature
is first fed into a cascaded encoder and decoder structure with a skip connection, which consists of two ResASPP blocks—an UpConv and a DownConv. These two ResASPP blocks are respectively inserted in front of the UpConv and DownConv. The output feature
of DownConv is fed into a
convolution and a Leaky ReLU. Then, the output is concatenated with
to keep hierarchical characteristics. A
convolution and a Leaky ReLU are used for angular depth reduction. This process can be specifically expressed as,
where
represents the ResASPP block,
and
represent two types of convolutions, which have different parameters for different up-sampling scales;
and
represent the
convolution and the
convolution, respectively, and
denotes the concatenate operation. Due to the input
, the number of channels of these modules and convolutions is
N. As shown in
Figure 2, there are two AFOs used in our network. The other AFO can be simply expressed as
where
is the second AFO, and
and
represent the input and output features, respectively.
3.2.5. Interaction Information Fusion Block (IIFB)
Most existing methods of LF image SR lack the consideration of the vinformation fusion of the angular and spatial dimensions, which seriously destroys the parallax structure of LF. Although 4D convolution can be directly utilized to deal with this problem, it causes a significant increase in computation. Inspired by the methods of Yeung et al. [
33] and Jin et al. [
14], we designed our IIFB, which mainly consists of three alternative spatial-angular convolutions (ASA). Compared with the 4D convolutions, this convolution not only reduces computational resources but also fully explores the angular and spatial information. In short, our IIFB can achieve complementary information fusion from all SAIs as well as discriminative information fusion among all SAIs.
As shown in
Figure 6, we take the first IIFB as an example. The input of IIFB consists of two parts—the output of AFO (
) and the output of SFO (
). We first concatenate
and
in the channel dimension. These features are fed into three cascaded ASA blocks. Each ASA contains the convolution of Spa-Conv and Ang-Conv, the kernel size of which is
. Then, a
convolution and a ResBlock are used to further extract the depth feature representation. Following our fusion and allocation strategy, we distribute the output (
) of IIFB into the next AFO and SFO.
In summary, many previous LF image SR methods treat the 4D LF as a unity, which affects representational capacities. Due to the independent properties of angular and spatial information, we propose AFO, SFO and IIFB to extract and fuse the spatial-angular information. The main structure of AFO and SFO refers to the encoder and decoder method, which can better mine the mapping relationship between LR images and HR images. Due to supervised learning, the differences are gradually narrowed between the output features of UpConv and the final features of HR. Meanwhile, the function of DownConv is to reduce computation with low-dimension features. Moreover, our operators are not focused on local information caused by the kernel size of convolution. The AFO can capture global channel information from different views, and the SFO can fuse multi-view spatial information. After this, we apply IIFB to achieve information fusion. Meanwhile, compared with the similar mechanisms (angular feature extractor and spatial feature extractor) in LF-InterNet [
15], our IIFB does not concatenate spatial and angular information directly, which can better preserve the geometry structure, which is better for the performance of LF. Our LF-FANet can not only preserve the parallax structure of LF but also supplement complementary information from different SAIs.
3.2.6. Feature Blending and Upsampling Module (FB & UP)
Combining the hierarchical features is beneficial for constructing the HR residual map. However, directly concatenating these features cannot adaptively adjust the contributions of the hierarchical features. Following Wang et al. [
16], we concatenate each output
and
of IIFB along the channel dimension, and input it into the FB & UP module to obtain the final residual map
. Note that the output
and
are gradually generated containing hierarchical information, which has a different importance for
. Therefore, we introduce the channel attention module [
34] to dynamically adapt the hierarchical features and distillate the valid information.
The architecture of our FB & UP module is illustrated in
Figure 7.
is the hierarchical features, which is the input of FB & UP module. In this module, we cascade four channel attention (CA) blocks and an Up block. Note that the CA block plays a key role in the feature blending, which is formed by a ResBlock cascaded with a CA block. Specifically, Average is a global average pooling operation, which is used to squeeze the spatial information of each channel. Then, we generate an attention map by utilizing two
convolutions and a ReLU layer. The objective of the first convolution is to compress the channel numbers, and the second convolution aims to recover the original channel numbers. The reduction ratio and expansion ratio for these two convolutions are set to
and
, respectively. To refine the features across all channels, this attention map is processed by a sigmoid function. After processing by the other three CA blocks, feature blending can be achieved on the hierarchical features. The Up block is composed of a
convolution, a Shuffle layer and another
convolution. The output
has only one channel number with the enlarged SAI size. The effectiveness of FB is demonstrated in
Section 4.