1. Introduction
Food is one of the basic elements for human life and survival. In order to feed the increasing population, massive and stable food production is necessary [
1]. One of the key factors towards food security and production is the frequent and accurate acquisition of data pertaining to croplands.
Due to recent development in the fields of cloud-computing, robotics, and the internet, several approaches to data acquisition are available, such as Internet of Things (IoT) devices [
2], unmanned aerial vehicles (UAV) [
3], and satellite data [
4]. Among them, satellite remote sensing has the advantages of wide range monitoring with a short return period and low use cost, which enable various stakeholders to provide both cost-efficient and reliable methods for cropland monitoring [
5,
6].
High resolution both temporally and spatially is desirable for cropland monitoring. Higher spatial resolution enables a more precise examination of the land. Moreover, for applications such as land classification, higher spatial resolution can achieve higher accuracy due to the smaller number of mixed pixels [
7]. Higher temporal resolution, on the other hand, leads to more precise detection of land cover changes. This is especially useful for croplands, since changes to cropland cover occur with high frequency. However, satellite images generally suffer from a trade-off between spatial and temporal resolution. In order to acquire images having both high temporal and high spatial resolution, either of two approaches is generally used.
The first approach is to enhance the temporal resolution of high spatial, low temporal resolution images. Temporal interpolation methods [
8], temporal replacement methods [
9,
10] and temporal filtering methods [
11,
12] are usually applied in this approach. In general, these methods assume that adjacent temporal images have the same vegetation type, and this assumption can yield good results in a short time interval. However, land cover of croplands usually changes dynamically and the assumption of adjacent temporal observations is not always applicable to croplands. Also, this method is not suitable for areas with continuous cloud cover, since the interval for interpolation tends to become long under that condition. To overcome this problem, some researches have recently used synthetic aperture radar (SAR) data to support temporal interpolation. SAR has attracted attention due to its high revisit frequency and all-weather imaging capacity and is used in many applications, such as land cover mapping [
13], disaster evaluation [
14], and crop monitoring [
15,
16]. For enhancement of temporal resolution, some researchers have used a deep learning model to predict NDVI time series from pixel-based optical and SAR time series [
17]. SAR images have also been used for temporal image interpolation and cloud removal [
18]. However, these researches were conducted at a single location with flat ground and less cloud cover, which are ideal conditions for temporal interpolation with abundant data. In addition, the application of SAR images can be especially difficult in locations with continuous cloud cover, since optical images with relatively high temporal resolution still need to be collected to ensure a high level of accuracy.
The second approach is to enhance the spatial resolution of low spatial, high temporal resolution images, which is called downscaling. Super-resolution is a straightforward method to achieve this goal. General super-resolution, which is a task to convert low resolution RGB images into high resolution RGB images, is a popular task in computer vision, since datasets can be easily created by mosaicking the images. In general, deep learning models with many parameters have recently been used to achieve high accuracy, and the order for super-resolution is around ×2 to ×8 [
19]. However, the number of satellite images is limited compared to RGB images, especially for cloudy regions, and deep models with many parameters can be difficult to train sufficiently. Also, in order to achieve higher spatial resolution, which is useful for satellite data, it is necessary to perform a higher order of downscaling. For example, downscaling of MODIS (250-m, observed daily) to Sentinel-2 (10-m) requires an order of ×25. Thus, simply applying single image super-resolution models to satellite images can be difficult in practical applications. One way to overcome this problem is to use additional supportive data. Some previous researches made use of the higher resolution bands of Sentinel-2 (10-m~60-m) to enhance the resolution of lower resolution bands [
20,
21]. Some research also achieved a relatively high order of downscaling by making use of higher resolution observations as support [
22,
23]. Also, some researchers have used Sentinel-1 (10-m) SAR images to downscale land surface temperature and soil moisture [
24,
25,
26]. However, very few researches have investigated the use of SAR images to enhance spatial resolution of NDVI, which is a widely used indicator of vegetation growth and coverage.
Several previous studies have shown that there are strong relationships between SAR and vegetation [
18,
27,
28]. One group concluded that SAR backscatter signals change significantly when responding to different vegetation types [
29], and another group showed that the backscatter value varies at different growth stages even for the same vegetation [
30]. Such researches have generally used NDVI as an indicator of vegetation dynamics. NDVI is one of the most used indices to evaluate vegetation and is used for many applications, such as cropland monitoring [
31], landcover monitoring [
32], and climate impact modeling [
33]. In this research, a model to downscale NDVI from 250-m to 10-m was developed. This model is a convolutional neural network (CNN)-based model which learns the concept of downscaling of low spatial resolution NDVI images by making use of Sentinel-1 10-m SAR data as an additional supportive dataset.
Prediction of NDVI using similar CNN-based model was also used in previous work [
27]. However, this research aims to predict the NDVI under clouds using the SAR data acquired in the same day. Since SAR data is strongly affected by daily conditions such as soil moisture, using only SAR data to predict NDVI for a different day results in low accuracy due to overfitting to the condition of the training data (see
Section 4.1.2). Proposed method addresses this issue by also using coarse NDVI as input and using SAR as supportive data for the downscaling.
The main contributions of this work are as follows:
A CNN-based NDVI downscaling model using higher spatial resolution SAR data was proposed.
The model was trained using an up-sampled and original Sentinel-2 image pair with Sentinel-1 image, and was evaluated for different seasons.
MODIS 250-m NDVI data was input into the trained model and showed advantages for application.
6. Conclusions
This study proposed a method to downscale the spatial resolution of NDVI from 250-m MODIS resolution to 10-m Sentinel-2 resolution. The proposed model used 10-m resolution SAR with 250-m resolution NDVI as input to learn the concept of downscaling. The proposed model was first trained with a Sentinel-2 NDVI pair (an original 10-m image and an up-sampled 250-m image) acquired in 2019. The experimental results showed that the prediction was reasonably accurate (MAE = 0.090, ρ = 0.734 on average for the test data acquired in 2020). Then, the trained model was applied to downscale NDVI from MODIS NDVI data in Tsumagoi. Even with the difference between MODIS NDVI and Sentinel-2 NDVI, the accuracy of downscaled NDVI was acceptable (MAE = 0.108, ρ = 0.650 on average). Finally, the model was applied to enhance the temporal resolution (approximately × 2.5 data) of NDVI with high spatial resolution (10 m), and it successfully observed the features of a double cropping of cabbage in Tsumagoi.
However, a method to reduce the difference between the original MODIS NDVI and Sentinel-2 NDVI will be needed for further improvement. A method for locating effective locations for the application is also needed. In summary, this study provides a new approach to produce 10-m resolution NDVI data with acceptable accuracy when cloudless MODIS NDVI and Sentinel-1 SAR data is available. This can be a step for improving both the temporal and the spatial resolution in vegetation monitoring, and is hoped to contribute to the improvement of food security and production.