**1. Introduction**

World's population is projected to reach 9.8 billion in 2050 [1] and food production needs to increase by 60% to meet the demand [2,3]. One reason for that is the developing countries—that have been growing much more rapidly than the industrial countries—are creating implications for world food demand mainly in products such as animal-based, fruits, and vegetables [4]. However, declining rates of growth in crop yields, slowing investment in agricultural research, and rising commodity prices have raised concerns of a general slowdown in global agricultural harvest area, yield, and production [5].

The rapid per capita income growth in countries like China and India (40% world population) pressure food supply chains shifting towards animal-based products that require disproportionately more agricultural resources in production [4,6] such as land, water, and vegetable protein [7]. Moreover, there is a concern revolving around big agriculture growers such as Brazil and the US using their agriculture areas to produce biofuels [6].

It is not only in the economy that this relationship between food demand and income are finding shelter. It is possible to verify in the literature a connection about technology and agricultural production. Crop yield and production, for instance, have been studied in the light of artificial intelligence. Khan et al. [8] predicted fruit production using deep neural networks. García-Martiínez et al. [9] estimated corn grain yield with a neural network using multispectral and RGB images acquired with unmanned aerial vehicles. Maimaitijiang et al. [10] predicted soybean yield using multimodal data fusion and deep learning. These applications are a clear attempt to improve knowledge about food production and provide decision-makers with valuable information to face the challenges of food demand.

Another possible solution discussed is the use of areas in Latin America and the Caribbean to expand agriculture production [11]. Brazil, for instance, has more than 8 million km<sup>2</sup> of area and uses only 15% of its arable land—approximately 60 of 400 million hectares [12]. The country is an important global food supplier, and it is estimated that one out of four agribusiness products in circulation around the world came from Brazil [13]. Despite the concern of biofuel production, sugarcane occupies only 8.9 million hectares of arable land [14], and the majority is used for sugar production rather than ethanol.

Brazil has more than 300 different crops and exports 350 types of products to 180 countries. The main export products are sugar, coffee, maize, orange juice, cotton, and soybean. Among these products, soybean is the main global source of protein, and the country is the major exporter that corresponded for approximately 29.9% of agribusiness external sales in 2016—USD 25.4 billion [13]. According to the Department of Agriculture of United States—USDA [15], Brazilian exports of the soybean complex are 81% grains, 15.7% meal, and 3.3% refined oil.

Soybean production has overspread inside the country from south, through the center-western to the northeast area. These movements are motivated by low land cost, and investments in agriculture inputs, mechanization, and infrastructure [16–18]. Other factors contributing to soybean growth in Brazil include the genetic improvement of seeds, increasingly productive planting systems [19], favourable climatic conditions, predictable precipitation patterns, and public financing policies for soybean plantations [20].

The soybean production is evaluated considering three categories: harvest area, yield, and production. The two main players are Brazil and the US, the former planted a harvest area of 36.9 million of hectares that produced 120.9 million of tones with a yield of 3.3 tones per hectare [21] and the latter planted a harvest area around 30.8 million of hectares with a production of 96.8 million tons and a yield of 3.1 tones per hectare [22]. These values are constantly predicted using classical methods and presented to stakeholders by government agencies. However, the respective literature is sparse and relates to agronomy aspects of soybean yield [10,23,24]. In this paper, we focus on the prediction of these soybean indicators based on the previous crop data.

Therefore, our study aims to estimate Brazilian soybean harvest area, yield, and production adopting Artificial Neural Networks (ANN) and comparing with classical methods of Time Series Analysis. To this end, we collected the values of harvest area, production, and yield over a period of 56 years (1961–2016). We established the trend lines for five functions: Linear, Exponential, Logarithmic, Polynomial, and Power, and compared these results with an ANN model with 10 neurons and

six delays computed using a Nonlinear Autoregressive Network with External Input-NARX with Levenberg—Marquardt backpropagation for training the network.

The results show that the ANN model is the most efficient method to predict soybean harvest area and production. The novelty of this paper is to obtain a reliable prediction for soybean production measures using an ANN model and dealing with a short data period time series (50 years) [25]. The period of 1961–1966 was used only for ANN model delay.

This paper is divided into sections: Section 1 presents this introduction and literature review, Section 2 shows the methodological procedures, Section 3 deal with results and discussion, and Section 4 presents the conclusions of the study.

### *1.1. Artificial Neural Networks*

Artificial Neural Networks, as the name proposed, use artificial neurons connected in layers to simulate human synapse (Figure 1). A mathematical model mimics the neural structure to learn and to acquire knowledge via experiences (Equations (1) and (2)). This technology is effective to solve problems—dynamic and nonlinear—such as pattern recognition and prediction [25–30].

$$me = \sum\_{i=1}^{n} \mathbf{x}\_i w\_i + b\_i \tag{1}$$

$$
\mu = f(ne) \tag{2}
$$

where *x*1. . . *x<sup>n</sup>* are the input values (data set), *w*1. . . *w<sup>n</sup>* are the weights, and *b* is the activation threshold (bias) in the neuron potential *ne* [25,26,31].

**Figure 1.** Artificial neuron (**left**) and ANN multilayer (**right**).

Among several types of neuron activation functions, the most common are: hyperbolic tangent (Equation (3)), hidden layer, and linear. The last one always assumes values identical to the activation potential *n* [25,26,31]:

$$f\left(ne\right) = \frac{1 - e^{-\beta u}}{1 + e^{-\beta u}}\tag{3}$$

where *β* is the constant associated with the slope of the hyperbolic tangent function and the output values assume numbers between −1 and 1.

ANN uses previous data for training the network and minimizes errors between the insertion and the estimation. This process adjusts the weights and possible bias for each neuron interaction. The training usually stops when finding out the optimal learning rate [25–30].

There are various ANN techniques such as General Regression Neural Network (GRNN), Backpropagation Neural Network (BNN), Radial Base Function Neural Network (RBFNN), and Adaptive Neuro-Fuzzy Inference System (ANFIS) [32]. Backpropagation (BP) is a learning algorithm widely used in forecasting problems with ANN, and the networks [30]. The weights between the different layers may be updated using the BP algorithm, with momentum and learning rate. Moreover, the weights between the different layers may be updated where the error is then propagated backward from the output to the input layer [33].

Some studies have been using ANN to study the agricultural environment. Garg et al. [30] compare the performance between different training methods using an ANN model to forecast wheat production in India. The data contain 95 years of wheat production (1919–2013), and the results revealed that the algorithms most effective in training methods are Bayesian regularization and Levenberg–Marquardt.

Almomani [34] adopted artificial neural networks to predict the biofuel production from agricultural wastes and cow manure at high accuracy. The training and testing of the ANN used to predict the cumulative methane production was assessed by using the root mean square method. The study confirms the capacity of the ANN model to predict the behavior of biofuel production and to identify the optimum conditions in a short time.

Sankhadeep et al. [35] use an ANN model for soil moisture quantity prediction for sustainable agricultural applications. They study soil moisture prediction in terms of soil temperature, air temperature, and relative humidity. The nonlinear relation between soil moisture and the features is realized using a hybrid modified flower pollination algorithm supported by the ANN model. They conclude that for sustainable agricultural application the model is highly suitable.

Khan et al. [8] use deep neural networks to fruit production prediction. They considered different types of fruit production such as apples, bananas, citrus, pears, and grapes with data from the National Bureau of Statistics of Pakistan. They adopted Levenberg–Marquardt optimization, backpropagation, and Bayesian regularization backpropagation. The results reveal that the government of Pakistan needs to further increase fruit production and create better policies for farmers to improve their production.

Wang and Xiao [36] studied recycle agriculture in West China to make a prediction on the comprehensive development status applying a neural network model with the application of backpropagation through the MATLAB program. They conclude that China needs to take measures to promote resources' decrement input and resource reuse efficiency, protect the forest resources, and reinforce harnessing of water loss and soil erosion.

Liu et al. [37] create an artificial neural network model for crop yield responding to soil parameters. The model was established by training a backpropagation neural network with 58 samples and tested with other 14 samples. They conclude that the model can precisely describe crop yield responding to soil parameters.

Fegade and Pawar [38] describe that, in India, farmers have difficulties to select proper crop for farming due to factors such as rainfall, temperature humidity, soil, and so on. Therefore, they used support vector machine and artificial neural networks to predict crop with 86.80% of accuracy.

Regarding grains, Maimaitijiang et al. [10] evaluate the power of an unmanned aerial vehicle (UAV) to estimate soybean grain yield within the framework of deep neural networks (DNN). Thermal images were collected using a low-cost multi-sensory UAV. The results propose that multimodal data fusion improves the yield prediction accuracy and is more adaptable to spatial variations; DNN-based models improve yield prediction model accuracy and were less prone to saturation effects.

Zhang et al. [39] establish a model for forecasting soybean price in China using quantile regression models to describe the distribution of the soybean price range, and using regression-radial basis function neural networks to approximate the nonlinear component of the soybean price. They collected the monthly domestic soybean price in China, and the results of the model indicate that the proposed model is effective.

García-Martínez et al. [9] analyze different multispectral and red-green-blue vegetation indices, canopy cover, and plant density in order to estimate corn grain yield using a neural network model. The neural network model provided a high correlation coefficient between the estimated and the observed corn grain yield with acceptable errors in the yield estimation.

Abraham et al. [40] propose to design, train, and simulate an ANN on to forecast the demand of soybean production in Mato Grosso state, Brazil that is exported by the port of Santos. A nonlinear autoregressive solution was adopted considering 80% of data for training, 5% to validation, and 15% for testing the network—a value of 9.0 million tons for 2017 as an increase of about 26.5% compared with the 2016.

Eventually, Abraham et al. [41] also analyze the relationship between soybean supply (production) and soybean demand (export) using artificial intelligence in a hybrid model neuro-fuzzy. Data from 20 years of soybean production and exportation were used, and the results indicate that the supply tends to be low when the demands of the ports are overloaded.

Specifically, in the present article, we raised two questions regarding ANN in soybean production:


To answer these two questions, we develop an ANN model using NARX with the Levenberg– Marquardt algorithm for backpropagation and data of Brazilian soybean production.
