*Article* **Optimizing Wheat Yield Prediction Integrating Data from Sentinel-1 and Sentinel-2 with CatBoost Algorithm**

**Asier Uribeetxebarria \*, Ander Castellón and Ana Aizpurua**

NEIKER—Basque Institute for Agricultural Research and Development, Basque Research and Technology Alliance (BRTA), Parque Científico y Tecnológico de Bizkaia, P812, Berreaga 1, 48160 Derio, Spain **\*** Correspondence: auribeetxebarria@neiker.eus; Tel.: +34-607-142-018

**Abstract:** Accurately estimating wheat yield is crucial for informed decision making in precision agriculture (PA) and improving crop management. In recent years, optical satellite-derived vegetation indices (Vis), such as Sentinel-2 (S2), have become widely used, but the availability of images depends on the weather conditions. For its part, Sentinel-1 (S1) backscatter data are less used in agriculture due to its complicated interpretation and processing, but is not impacted by weather. This study investigates the potential benefits of combining S1 and S2 data and evaluates the performance of the categorical boosting (CatBoost) algorithm in crop yield estimation. The study was conducted utilizing dense yield data from a yield monitor, obtained from 39 wheat (*Triticum* spp. L.) fields. The study analyzed three S2 images corresponding to different crop growth stages (GS) GS30, GS39-49, and GS69-75, and 13 Vis commonly used for wheat yield estimation were calculated for each image. In addition, three S1 images that were temporally close to the S2 images were acquired, and the vertical-vertical (VV) and vertical-horizontal (VH) backscatter were calculated. The performance of the CatBoost algorithm was compared to that of multiple linear regression (MLR), support vector machine (SVM), and random forest (RF) algorithms in crop yield estimation. The results showed that the combination of S1 and S2 data with the CatBoost algorithm produced a yield prediction with a root mean squared error (RMSE) of 0.24 t ha<sup>−</sup>1, a relative RMSE (rRMSE) 3.46% and an R2 of 0.95. The result indicates a decrease of 30% in RMSE when compared to using S2 alone. However, when this algorithm was used to estimate the yield of a whole plot, leveraging information from the surrounding plots, the mean absolute error (MAE) was 0.31 t ha−<sup>1</sup> which means a mean error of 4.38%. Accurate wheat yield estimation with a spatial resolution of 10 m becomes feasible when utilizing satellite data combined with CatBoost.

**Keywords:** backscatter; gradient boosting; machine learning; NDVI; precision agriculture
