*3.3. Auxiliary Data*

RPG (Graphic Parcel Register) was applied as the ground truth data in our study, used for creating training data and test data. RPG 2019 is the latest version of the very precise, geo-referenced agricultural land database that covers the entire France territory (except Mayotte) that was published by IGN. The databases show the precise crop types (e.g., wheat, corn, vegetables, sunflower) or temporary and permanent grasslands in that are in the recorded agricultural lands in each year [39].

#### **4. Methods**

The methodology of this paper is detailed in two parts, which relate to the two research subjects: mapping winter crop types using Sentinel-2 data, and monitoring crop phenology with Sentinel-1 backscatter time series. The data were processed in QGIS with Orfeo Toolbox, eCognition 10.0, and GEE (Google Earth Engine).

#### *4.1. Winter Crop Types Mapping*

A flow chart of the proposed global methodology is displayed below (Figure 2)

**Figure 2.** Hierarchical classification methodology used in the study for crop mapping.

#### 4.1.1. Image Preprocessing

After the study area selection and satellite image acquisition, the boundary of the northern region of Finistère area was applied in order to extract our area of interest by subsetting the raw images for the purpose of reducing the image size and shortening the processing time.

In remote sensing fields, vegetation indices (VI) are the qualitative and quantitative evaluation of vegetation covers and their growth dynamics, using different combinations of spectral measurements. The results of the indices are different due to the different chemical and morphological characteristics of the surfaces of the organs or leaves of the plants [40], however the spectral responses are also affected by other factors, such as environmental effects, soil reflectance and its components, shadows, and atmospheric and topographic effects [41].

For this reason, over a hundred VIs have been developed and enhanced for various applications over the past three decades in order to enhance spectral responses, increase sensitivity for identifying vegetated areas, distinguish different vegetation types, evaluate the vegetation density, and provide data on the health of the vegetation [42], and also to minimize the effects of other factors that are described above [41]. In this study, six VIs are used with the aim of mapping winter crop types using Sentinel-2 data.

• The Normalized Difference Vegetation Index (NDVI), proposed in 1973 by [43], is the most known and widely used index in research related to vegetation monitoring. NDVI is the normalized difference between the visible red and near-infrared spectral reflectance of vegetation, as follows:

$$NDVI = \frac{NIR - RED}{NIR + RED} \tag{1}$$

Even though the index is often affected by atmospheric conditions and soil reflectance and its components, it remains a highly used index in agriculture-related fields to measure the rate of vegetation cover and evaluate the health of the crops.

• The Normalized Difference Water Index (NDWI), proposed by Gao in 1996 [44], is a vegetation index that is used to highlight the changes in the liquid water content of vegetation canopies with weak atmospheric aerosol scattering effects, while remaining independent of the soil background [44]. The index is the normalized ratio between visible red and short-wave infrared spectral bands, and the expression of this is displayed below:

$$NDWI = \frac{NIR - SWIR}{NIR + SWIR} \tag{2}$$

With its high sensitively to water stress, NDWI is frequently used for the agricultural monitoring of drought and irrigation management. As well, some studies reveal the possibility of distinguishing crop types, especially winter crops, with NDWI [20,45–47].

• The Green Normalized Difference Water Index (GNDVI) was proposed in 1996 by Gitelson et al. [48], as NDVI; it is the index for evaluating the photosynthetic activity of the vegetation, except that the visible red band is replaced by the green band, ranging from 0.54 to 0.57 microns, and the expression of it is as follows:

$$GNDVI = \frac{NIR - Green}{NIR + Green} \tag{3}$$

Besides NDVI, the "Green" NDVI is more sensitive for assessing the chlorophyll concentration at the canopy level and it enables a precise estimation of the pigment concentration [48].

• The Enhanced Vegetation Index (EVI), developed by the MODIS Land Discipline Group [49], is aimed at optimizing the vegetation signal and correcting the imprecision of NDVI with improved sensitivity in high biomass regions by appending several additional spectral bands [50], and it is calculated by the following equation:

$$EVI = \frac{G \ast (NIR - Red)}{(NIR + C1 \ast Red - C2 \ast Blue + L)} \tag{4}$$

EVI is able to reduce the atmospheric conditions and canopy background noise with high sensitivity in densely vegetated areas and it is more responsive to canopy structural variations as compared to NDVI [50].

• The Soil-Adjusted Vegetation Index (SAVI), was proposed by Huete in 1988 [51] as an attempt to improve NDVI, which is frequently affected by the soil background conditions. Therefore, the index aims at minimizing the influence of soil brightness and eliminating the need for the additional calibration for different soils by using a soil-brightness correction factor [51]. It has an adjustment factor (*L*) in its calculation:

$$SAVI = \frac{(NIR - Red)}{(NIR + Red + L)}(1 + L) \tag{5}$$

SAVI was found to be helpful in separating different crop types, especially spring crops from winter ones [52].

• The Modified Soil Adjusted Vegetation Index (MSAVI) was developed by [53] in 1994 as an improved version of SAVI, with the constant soil adjustment factor *L* being replaced. It is proven to increase the dynamic range of the vegetation signal while further minimizing the soil backgrounds spatial and temporal variations, therefore resulting a greater vegetation sensitivity:

$$MSAVI = \frac{\left(2 \ast NIR + 1 - \sqrt{\left(2 \ast NIR + 1\right)^2 - 8 \ast \left(NIR - RED\right)}\right)}{2} \tag{6}$$

Studies have proven that MSAVI can be used in the agriculture field [54,55], and even more in winter crop monitoring [56].

After calculating the vegetation indices, an image stack with the ten original spectral bands and all of the indices was created for further image processing.

#### 4.1.2. Image Processing

In this study, for the better extraction of winter wheat and winter barley in the study area from the satellite image, supervised image processing using different approaches was performed in order to make comparisons and attempt to reach the most adapted classification in this study (Figure 3).

**Figure 3.** Proposed detail image processing methodology chart.

Compared to the direct extraction of winter crops with pixel-based RF algorithms, hierarchical classification methods are effectuated in three progressive levels, each with different objectives. The objectives from the first level of the hierarchy to the last one are extracting vegetation (including croplands) from raw images, extracting croplands from vegetated areas (trees, shrubs, and grassland), and finally, obtaining exclusively winter wheat and winter barley from all crop types detected in previous stages, respectively. Finally, the results of the two classification approaches were evaluated with accuracy indices, in order to distinguish which one had better agreement with the ground truth data.

In addition, inside the hierarchical classification structure, except for during the first step, separating vegetation and non-vegetation exclusively used the pixel-based RF algorithm and this reached a very close agreement with ground truth data. Each step has been performed using the two methods, pixel-based and object-based RF classification, in order to determine the result with better accuracies for further processing and analysis.

#### Pixel-Based Classification (PBC)

The traditional PBC is the most used method in remote sensing, especially for land use classification; therefore, this method was also widely employed in this study. The PBC was done on the pixel level, which is the smallest unit in an image, where only the spectral information of each single pixel was used. During the classification, each pixel was assigned predefined classes by using a model that was well trained with training data and a classification algorithm. In this paper, PBC is performed in both classification approaches.

#### Object-Based Classification (OBC)

OBC starts with an additional processing step before classification, which involves the segmentation of an image into numerous, non-overlapping, homogeneous objects [23], hence, the OBC was done on the object level instead of the pixel level. At the same time, aside from the simple spectral information, the texture, color, form, and size of the objects are taken into account. Later the individual object that was generated by the segmentation algorithm was used for classification.

A Multiresolution Segmentation (MRS) algorithm was applied as the first step of the object-based image analysis (OBIA) in the current study. This relatively complex and user-dependent algorithm has proven to be one of the most successful image segmentation algorithms in the OBIA area [57]. At the beginning of the process, each pixel is considered as a segment; afterwards, pairs of adjacent image objects are merged to form larger segments [58]. The three main parameters of the algorithm: scale, shape, and compactness, can be adjusted by users. The scale parameter is able to define the maximum standard deviation of the heterogeneity in order to control the amount of spectral variation within objects and the size of their results [57,59]. Thereafter, there are also two homogeneity criteria, the shape criterion that defines the weight between the shape and the spectral information of the objects, and the compactness criterion which represents the compactness of the objects during segmentation [60].

In this study, several combinations of parameters were used, and the optimal ones were found on a trial-and-error basis. The scale, compactness, and shape parameters were assigned as follows: 15, 0.5, and 0.3, respectively, for cropland extraction, and 20, 0.5, and 0.1, respectively, for winter crops extraction.

#### Supervised RF Classification

Supervised RF classification was performed on the Sentinel-2 data. It is the most common procedure for the quantitative analysis of remote sensing imagery [61], and it involves the use of training data for machine-learning classification. The training data were selected with reference data, knowledge, and experience provided by the user, and the selected samples were considered representative of each class.

Training data and test data in our research were selected in this step, using the RPG 2019 map. For the purpose of comparing different methods of classification, training data that were selected for PBC and OBC were as similar as possible, such as using the same area and very approximate surfaces, in order to improve the global comparability of the two methods.

RF classification has been one of the most known and widely used classification algorithms, especially in the land cover classification field, over the past two decades. This powerful machine learning classifier has numerous key advantages, such as a low sensitivity to noise or overtraining, the ability to handle high-dimensional data, its high classification accuracy and non-parametric-nature, and its capacity to determine variable importance [19].

RF is also one of the ensemble learning algorithms that builds numerous classifiers that have been proven to improve classification accuracy considerably. For classification, RF forms an ensemble method using a tree-like classifier, where each tree in the forest contributes a single vote for the most popular class, and the majority of the vote determines the final prediction of the RF model [62].

The training and classification of the RF module were applied using the Orfeo toolbox with two user-defined parameters that were set on a trial-and-error basis: the number of decision trees grown in the forest and the maximum tree depth, which is the length of each tree in the forest. The two parameters that were used in this study were defined as 100 and 25, respectively.

#### 4.1.3. Image Post Processing

After the classification, the accuracy assessment was performed with test data in order to evaluate the classification's degree of agreement with the reality and therefore assess the reliability of the classified results. In this study, in order to evaluate the classification quality and compare it amongst the different classification methods, five well-known and highly promoted accuracy indices were calculated for each classification method and each class. Among them, overall accuracy (OA) and kappa were employed for the global accuracy assessment, otherwise, precision, recall, and F-score were computed to assess the classification results of each class.

The OA is one of the traditional measures of classification accuracy that is derived from the confusion matrix [63]. It indicates the probability that an individual pixel will be correctly classified by a classifier [64], hence, it is computed by dividing the number of correctly classified pixels by the total number of pixels in the confusion matrix. The popular kappa coefficient was first applied in the remote sensing community to express the classification accuracy in the early 1980s [65], and it is considered to be the assessment of the inter-classifier agreement and the accuracy of two classifiers; it usually gives a statistically more sophisticated measure of interclass agreement, also it gives a better interclass discrimination than OA does [66].

Among the per-class accuracy assessments, precision and recall were computed from the confusion matrix as well. Precision is also called the positive predictive value, which is related to the rate of correct positive predictions among the total predictions that are classified as positive, and the recall or sensibility represents the rate of the actual positive individual pixels that are correctly detected by the classifier. Furthermore, the F-score was generated from the precision and recall; it provides a single harmonic mean of the model's precision and recall.

In the hierarchical classification process, test data that were used for evaluating the classification of each step were generated as random points from the image that were used to perform classification, which is the result of the previous steps. Afterwards, the random points were labelled manually with the Graphic Parcel Register map as ground truth. However, with the aim of evaluating the performance of the proposed hierarchical classification approach by comparing with traditional direct extraction, a completely new test dataset was produced from the original Sentinel-2 image that was not classified.

#### *4.2. Crops Phenology Monitoring*

This second part of the study was performed, based on the mapped winter crops from the previous step. Due to limited climate conditions in the study area, winter crop phenology monitoring was performed with Sentinel-1 C-band SAR data using the Google Earth Engine (GEE) platform. GEE is an open-source, cloud computing platform with a fast, high-performance computation and visualization system and a large data catalog which hosts a large repository of publicly available geospatial datasets, including a variety of satellite imaging systems [67]. The platform is designed for global-scale geospatial big data storage, processing, and analyzing [68]. The utility of GEE has been examined in different fields, for vegetation mapping and monitoring, land cover change mapping, and agriculture applications [68].

The platform proposes a complete data process chain from a single or a collection of analysis-ready images to library functions or user-defined algorithms that are applied to achieve results generation and visualizing. One of its main advantages is allowing long- term monitoring using a user-defined period with free access to preprocessed time series data. Hence, in this study, the backscatter coefficient (σ◦) in decibels (dB) of both polarizations (VV and VH) and their ratios of a Sentinel-1 image time series during a complete growing period of winter crops (from October to September) on a few chosen croplands was automatically generated in a line chart on the GEE platform. In order to study the scattering behavior of our target croplands, each image was preprocessed, and the backscatter coefficient was converted to dB by GEE using the Sentinel-1 Toolbox (Figure 4) A flow chart of the Sentinel-1 image time series process in GEE is displayed as follow (Figure 4):

**Figure 4.** Sentinel-1 image process in the GEE platform.

The first step in applying the orbit file aims at updating the orbit metadata with a restituted orbit file, then Ground Range Detected (GRD) border noise removal attempts were performed to remove the low intensity noise and invalid data on the scene edges. Afterward, a thermal noise removal function aims to remove the additive noise in subswaths to reduce the discontinuities between sub-swaths for scenes in different acquisition modes. Thereafter, radiometric calibration is applied to compute the backscatter intensity, and subsequently, a terrain correction, also called an orthorectification, was performed to convert the data from ground range geometry to σ◦ [69]. Lastly, dB was calculated from σ◦ with the equation dB = 10 × log10σ◦. A series of line charts in dB were plotted for each polarization and ratio of each winter crop.
