Next Article in Journal
Application of Mathematical Morphological Filtering to Improve the Resolution of Chang’E-3 Lunar Penetrating Radar Data
Previous Article in Journal
Complex-Valued Convolutional Autoencoder and Spatial Pixel-Squares Refinement for Polarimetric SAR Image Classification
 
 
Article
Peer-Review Record

Temporal Convolutional Neural Network for the Classification of Satellite Image Time Series

Remote Sens. 2019, 11(5), 523; https://doi.org/10.3390/rs11050523
by Charlotte Pelletier *, Geoffrey I. Webb and François Petitjean
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2019, 11(5), 523; https://doi.org/10.3390/rs11050523
Submission received: 30 January 2019 / Revised: 25 February 2019 / Accepted: 26 February 2019 / Published: 4 March 2019

Round 1

Reviewer 1 Report

This paper propose a study of different neural network architectures for the pixel-wise classification of multi-spectral satellite image time series. In particular, they propose to exploit the temporal structure the data with convolutions . They validate their approach on 1419 parcels.


Strengths: 

- the paper is overall well-written and clear

- the related work section is complete and accessible. While a significant portion of the paper present well-known architecture and algorithms, it makes for a nice self-contained state-of-the-art with good didactic value.

- the numeric experiments are thorough


Weaknesses:

- The main problem is that most experiments are inconclusive because all results are within the error bars. It is impossible to draw a clear conclusion from such experiments, as appears clearly Figure 8.a). See suggestions below on ideas on how to improve their significance.

- The paper is called tempCNN, and says it is using 1D-conv to learn the temporal structure, but it seems like the best performing model is the 2D-conv mixing spectral and temporal dimensions.

- The evaluation metrics are insufficient. With such high variance in class count, one should use an averaged per-class metric such as average F-score or mIoU.

- The writing can be unclear or imprecise at times (see detailed comments)

- Missing an important reference to convLSTM: Convolutional LSTMs for Cloud-Robust Segmentation of Remote Sensing Imagery, Russwurm etal.

- I think the different architectures of the networks are rather unclear part 4. See my questions below

- the dataset is rather small (only 1419 parcels). Why work on pixels and not parcels? Also this should be stated clearly and earlier.


Overall:

This is overall a nice paper. The writing is mostly clear, the experiment rigorous, and the take away interesting for both practitioners and researchers. There are some flaws, mostly in writing, but also more importantly in experimental section, but nothing unfixable at all.


Questions and suggested improvements:

- the main problem with the experiments is that the differences between the performance of each methods are within the variance of one another. This high variance can be explained in parts by your experimental protocol. Indeed, if I understood correctly, for each experiment you sample some test parcels. So it doesn't really make sense to compare one run to another, as the test set is always different. This add sample noise, unrelated to the stability of the algorithms. To fix this, you should perform cross validation at each run such that ther score is given on the entire set each time. I believe this will decrease significantly the score variance.

- I am having a hard time understanding the exact configuration of your models in Part 4. Please make sure that the answers to the following questions appear clearly: in "unguided" do you flatten the input across spectrum and temporal value? In the temporal (resp. spectral) guidance network how are each spectral (resp. temporal) channels pooled? Do the spectro-temporal convolution operate on a 149x3 (or 6) array? If so, it is a bit misealding, as a 149x3 2d-convolution is exactly the same thing as 1D convolution of depth 3. By the way, have you tried the case where the spectral dimension is taken as the depth of the 1D temporal signal?

- I find section 3.3.3 interesting but not sufficiently justified.  In particular I fail to see how scaling the different spectral bands independently leads to a "loss of the significance of the magnitude". Shouldn't the network learn this significance?

- You should justify why it is appropriate to do linear convolutions for temporal gap filling (sometimes over a month) for a process which is certainly not linear (growth, harvest, etc...)

- the outputs of your "unguided" methods are certainly not invariant by temporal or spectral shuffling. The RF and the FC will learn temporal of spectral features. However, you are correct that shuffling and then retraining from scratch should give the same results.

Details

L9: exhaustive -> comprehensive

L73 those are very much structured data

L101 are

L116 palm oil

L128 ??

1.3 mention that you work pixel-wise, and do not restrict to 1D convolutions

L367 use wide instead of big

L430 could be illustrated by per-class metrics

L483 unclear

L489 neurons

L519 weights

L560 mention augmentation such as jittering


Author Response

Please see the attached file.

Author Response File: Author Response.pdf

Reviewer 2 Report

---------------------------------------------

 << Reviewer's comments to the author(s) >>

Authors evaluate the contribution of TempCNNs for SITS classification quantitatively and qualitatively. TempCNNs extract temporal and spectral features from SITS data using 1D convolution and fully connected layers. Authors shows the contribution comparing TempCNNs with random forests and recurrent neural networks.

 

The main concern is that authors do not mention influence of the length of spectral time series for an input vector of TempCNNs. This point is important in terms of pattern recognition for time series data. Behaviors of TempCNNs may be different when the length changes. In addition, authors need to change architectures of TempCNNs such as the number of units in the dense layers and the convolution layers when the length also changes. Note that RNN such as LSTM can handle such different length of the time series. This is one of advantages in RNN. Authors should discuss the influence and limitations in TempCNNs when the length changes. As authors said in Line 54-56, how do you handle seasonal changes?

 

Some major comments are as follows:

 

(1) Authors said “This paper does not propose a unique architecture that should be adopted by practitioners for all SITS classification problems.”. Isn’t TempCNN authors’ proposed method? If TempCNN is not proposed method, authors should refer to the original method.

 

(2) How many dimensions do authors use as the input length in all methods? Is it the number of features in Table 2?

 

(3) Please clarify D in each combination of the features such as SB and SB-SF.

 

(4) What is inputs in this paper? Do you use time series data at each pixel? Please clarify experimental settings.

 

(5) Is the input dimension of RF T x D ?

 

(6) Pleas add the ground-truth in Figure 9. We can find the difference between results of RF and TempCNN, however, we cannot understand which result is correct.

 

Some minor comments are as follows:

 

(a) Line154. What is “deep learning network”? Is it deep neural network?

 

(b) Line 173-174. Authors said A0 is X. Is xi correct as A0? In my understanding, X is a set.

 

(c) Line 193: ”… the output of a convolutional layer is therefore a SET of activations.”

What does this sentence mean? Is the output of a convolutional layer is an activation map?

 

(d) Line 199:”… compared to dense layers that apply different weights to the DIFFERENT inputs…”

What does this sentence mean?  Dense layers use different weights for computing activations in different output units of the next layer. When the inputs change, do the weights also change in this paper?

---End---


Author Response

Please see the attached file.

Author Response File: Author Response.pdf

Reviewer 3 Report

Temporal convolutional neural networks TempCNNs are proposed to classify satellite image time series (SITS). Comparing to the conventional methods such as random forest and recurrent neural networks, the TempCNNs showed their higher precisions of classfication.

Improvement:

1) Legends (explanatory notes) of colors need to be added in figures such as Fig. 1, Fig. 5, and Fig. 9.

2) Page 3, line 97:

"this paper proposed an extensive study of TempCNNs (see Section 1.3),"

->"this paper proposed an extensive study of TempCNNs (see Section 2),

3) Page 4, line 128:

"demonstrating the potential of TempCNNs against TempCNNs and RNNS,"?

Is not it "demonstrating the potential of TempCNNs against RF and RNNS,"?!

4) Page 4, line 142, 143:

"This paper presents the results obtained over 2,000 deep learning models."

-> why 2000 models? how to obtain them?



Author Response

Please see the attached file.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The manuscript has been improved, and I think that it will be acceptable.

Back to TopTop