1. Introduction
Nowadays, activities to preserve a good food system have led to great efforts to reduce food security problems that affect millions of people around the world [
1]. For this, the techniques associated with aquaculture have increased, mainly in Asian countries, with a growth of around 90% [
2].
One of the drawbacks of shrimp farming is that it is practiced with high water replacement rates, ranging from less than 2% to more than 100% daily. Because of this, greater emphasis has been placed on developing crops with no or limited water exchange. Significantly decreasing the rate of water exchange does not decrease shrimp growth, survival, and yields [
3,
4]. If the supply water were the only source of oxygen in ponds, high turnover rates would be necessary, an issue not feasible due to the investment, maintenance, and operation costs of hydraulic systems [
5,
6].
In intensive aquaculture shrimp and fish farming, the main mechanical aeration systems are related to paddlewheels, vertical pumps, and diffused air systems, which have been evaluated for their efficiency using indicators such as standard oxygen transfer rate (SOTR) and standard aeration efficiency (SAE).
Aeration systems in aquaculture farms represent a considerable capital cost in their construction of the basic infrastructure; for example, the energy used in shrimp farming is currently on average 19.8 GJ/shrimp per day [
7]. In addition, these systems consume the majority of on-farm energy, about 90–95% [
8]. In an aquaculture plant, 35.06% of energy consumption is linked to the aeration system, which is a significantly high percentage compared to other equipment [
9].
In an investigation, the recirculation pumps represented 22.6% of the total energy demand during the entire operating cycle, including the oxygen cone pumps, which, together, were responsible for 45.48% of the total energy consumption in the farm, which represents 4359 kWh/kg in terms of specific energy [
10].
For the energy consumption in a farm in Mexico, in the state of Tabasco, 47% was represented by the use of pumps in water replacement [
11]. In the study of [
12] in Cuba, 62% of the energy consumption was related to water replacement systems. In the research of [
13] conducted in Kenya during a period of 155 days, the energy consumption associated with pumps for water replacement was about 50.80% of a total of 8776.3 kW.
In this context, [
14] mentions that because water systems in aquaculture have non-linear dynamics, they may be affected by external factors, such as physical, chemical, and biological activities, which makes it not an ideal solution. At present, according to [
15], developed mathematical models should be accurate, be easy to use, and consider the critical parameters in industrial process design.
For the case study, the volume of water replacement is significant. This represents a total of
, approximately 6% of the maximum volume to be replaced. This, therefore, represents a considerable energy expense since the pumping equipment operates for about 11 h. The analysis of this and other variables in the quality of oxygenation in these ponds makes it necessary to evaluate them using regression techniques.
The author of [
16] considered that the construction of models should be fully automated so that the selection of subsets of multiple linear regression and its validation is based on mathematical programming. In his study, [
17] considered that regression analysis is one of the most popular forms of statistical modeling to analyze the relationship between multiple variables because of its interpretability and simplicity, and it is the most widely used to perform prediction tasks.
In [
18], the authors used prediction methods to determine the dissolved oxygen (DO) content in aquaculture ponds by correlation analysis of data with independent and dependent variables, considering the key parameters in water quality such as pH, temperature, conductivity, salinity, density, and total organic salt content as independent variables and oxygen demand as the dependent variable. The proposed nonlinear DE-GWO-SVR model effectively predicted the performance related to
, MSE, MAE, and RMSE, achieving up to 0.94, 0.108, 0.2629, and 0.3293, respectively.
In the research of [
19], the authors performed an analysis of different modeling techniques reported as the state of the art, considering basic parameters of population dynamics, growth, waste production, and filtration rate that contribute to maximizing production in ponds used in aquaculture, as well as reducing environmental impacts and economic losses.
Authors of [
20] developed a linear mathematical model for aquatic species-rearing systems in aquaculture, which improves the effective management of water quality and overcomes the difficulties that may arise in the daily exchange of water. The model predicted the concentrations of total ammonia nitrogen, nitrifying biofrit, suspended solids, and DO in the culture tank using a MATLAB
® environment.
In his study, [
21] created a linear mathematical model that validates using MATLAB
® and Aspen HYSYS to control the parameters related to water quality and energy demand for 15 weeks. The concentrations of
were the most representative in the order of 2.64 mg/(kg min), and the energy cost of the system was about 663.8 MWh, being the most representative. The consumption of the pumping system was 45.48%, which was employed in the recirculation of water. The authors of [
10] considered energy management as the main task by being economical, environmentally friendly, and operational.
In another context, [
22] mentions that one of the most critical components of healthy aquaculture ponds is the amount of DO since shrimp are delicate creatures susceptible to stress under adverse environmental conditions. When stressed, they do not eat well, tend to fall sick, and grow slowly. The environment of a shrimp pond is assessed by the quality of soil and water.
The research of [
23] categorized pond water with a high DO concentration as high quality, which is necessary for the success and development of shrimp aquaculture. According to the research of [
23,
24,
25,
26,
27], the main sources of oxygenation in aquaculture systems are the following: atmospheric oxygen (diffusion), oxygen in the incoming water (water renewal), oxygen via photosynthesis, and oxygen from mechanical aerators.
According to [
26,
27], the most influential variables in the variation of DO are biomass, solar radiation, amount of phytoplankton, amount of zooplankton, pond water temperature, and outside air velocity.
Shrimp are delicate creatures susceptible to stress under adverse environmental conditions. When stressed, they do not eat well, tend to fall sick, and grow slowly. The environment of a shrimp pond is assessed by the quality of the soil and water. DO is the most critical variable for water quality in a shrimp pond, and producers must pay special attention to it and understand the factors that affect its variation [
28,
29,
30].
The most effective replacement consists of first draining the desired amount of water from the bottom of the pond. This removes the poorest-quality water and detritus accumulated on the bottom. Outlet gates should have the ability to release water from the bottom.
It is currently a challenge for Cuban companies, particularly the case study company, to attain high profit levels. Working on the reduction of consumption is of vital importance, especially in the one referred to as pumping equipment because it represents 35% of the total consumption of electrical energy.
In this context, this research addresses the lack of accurate predictive models that optimize energy use in aquaculture oxygenation systems, especially in scenarios of limited water replacement. Nowadays, many aquaculture systems have relied on high water replacement rates, which significantly increase energy costs.
The main objective of this research was to propose a mathematical model of multiple linear regression that considers the variables that influence water quality and tank oxygenation efficiently, such as zooplankton level, phytoplankton level, solar radiation, and wind speed, which allows for reducing the energy consumption of pumping systems due to the constant replacement in ponds to maintain the level of oxygenation in the progenitor banks of white shrimp Litopenaeus vannamei.
The main contributions of this work are as follows:
- •
The pumping equipment used in the case study represents 35% of the total electrical energy consumption.
- •
A mathematical model was developed that takes into account the turnover volume and other variables associated with oxygenation.
- •
This model describes 94.02% of the change in the variation of dissolved oxygen, where the volume of replacement is not determined as influential.
This work is structured as follows:
Section 1 fundamentally presents studies related to the variables affecting aquaculture ponds’ oxygenation process. It also considers the mathematical models applied to these processes, mainly focused on water replacement.
Section 2, the Materials and Methods, presents a description of the hydraulic system used for water replacement in the case study, consideration of the multiple linear regression model, and a brief description of the instrumentation used to measure each of the variables. In
Section 3, the low required by oxygen demand is determined as a function of DO, the variables associated with oxygenation are measured, and the multiple linear regression model and its validation are presented. Finally, we report a design experiment carried out to analyze how effective or not the water replacement is. In
Section 4, a thorough discussion is presented to compare the results obtained with those from other studies reviewed in the bibliography. Finally,
Section 5 presents the conclusions drawn from the study.
3. Results
3.1. Flow Required for Oxygen Demand as a Function of DO
For the case under study with average animal biomass conditions of 80.12 kg/pond, a respiration rate obtained from [
41], a 2430 kg total feed consumed, and considering that the permanence in a pond is around 240 days, we obtain the following:
- •
mg of DO/kg of feed
- •
The data used to calculate the specific growth rate (SGR) and feed conversion ratio (FCR) of shrimp for four arbitrarily selected ponds are presented below: ponds 1, 3, 4, and 5 are shown in
Table 3.
The formula used to calculate SGR is as follows:
where
- •
: Final weight (g);
- •
: Initial weight (g);
- •
t: Days of the cycle.
The formula used to calculate FCR is as follows:
where
- •
Feed consumed:
;
- •
Weight gain:
(g).
The results obtained for each pond are displayed in
Table 4.
Pond 4 shows the highest SGR (1.11%), indicating that the shrimp in this pond grew more rapidly compared to the others. Pond 3 has the lowest SGR (0.32%), suggesting that growth was slower. Ponds 1 and 5 have moderate growth rates, with SGRs of 0.50% and 0.48%, respectively.
Pond 3 has the highest FCR (530.53), indicating that more feed was needed to obtain each gram of growth compared to the other ponds, suggesting a lower feed-to-weight conversion efficiency. Pond 1 has a more efficient FCR (347.59), meaning that the shrimp grew better in relation to the amount of feed they consumed.
Figure 5a is the result of evaluating different oxygen variations in Equation (
6), in which to guarantee the lowest variability of DO, the highest water levels should be introduced.
In the specific case where an oxygen drop is desired between 1 and 2 mg/L in one hour, continuous flows between 20.03 and
/h should be applied, respectively.
Figure 5b shows the necessary flow according to critical DO conditions from 09:00 p.m. to 06:00 a.m.; to avoid drops to lethal levels below 2 mg/L, at least 5.9% of the pond water volume should be replaced. This replacement volume is approximately 2 to 3 times higher than that recommended by the authors of [
3,
42], respectively.
3.2. Measurement of Variables Associated with the Demand in the Parent Bank
Eight variables associated with DO levels in a shrimp culture pond were considered: volume of water replacement (x1), biomass (x2), days in cycle (x3), phytoplankton level (x4), zooplankton level (x5), solar radiation (x6), water temperature (x7), and wind speed (x8); expressed in
, kg/ha, days/cycle, cell/mL, org/L,
, ∘C and m/s, respectively. The measurements were performed in the following time groups: 09:00 p.m.–01:00 a.m., 01:00 a.m.–05:00 a.m., and 05:00 a.m.–09:00 a.m., which coincide with the hours of operation of the pumping system. In the analyses carried out, the variable DO is not used, but rather
DO.
An example of the DO measurements is shown in
Figure 6.
Figure 6a shows a drop in lethal levels of oxygen in ponds 3 and 10, being the only measurements of the total sample that reflect these levels. The biggest difference is for ponds 5 and 6 with values of 4.50 and 4.15 mg/L, respectively.
The differences observed in the graphs may be indicative of widely varying conditions between ponds, generally due to both internal and external factors. If some ponds contain shrimp at more advanced stages of their life cycle, their metabolic rates and, therefore, their oxygen consumption will be higher. This could correlate with the steep drop in oxygen in certain ponds (such as ponds 3, 14, and 7).
If ponds with lower oxygen levels have more shrimp per cubic meter, it would explain the faster drop. Stress or disease can increase the metabolism of shrimp, which also influences their oxygen consumption. If some ponds harbor shrimp with disease or health problems, it could contribute to a higher oxygen demand.
Some ponds appear to have more effective water inflow than others (e.g., pond 2 in
Figure 6b). It is possible that certain ponds have leaks or leaks that force them to operate intermittently or less efficiently, reducing the ability to replenish oxygen during the night.
During the day, algae can produce oxygen through photosynthesis. However, during the night, algae consume oxygen. In ponds with high algal density, this could explain a more pronounced decrease in dissolved oxygen in the early morning. The difference between ponds with a higher oxygen drop may be related to a higher algal biomass, which, not photosynthesizing at night, consumes oxygen.
The amount of food supplied and feeding cycles may also play a role. If shrimp are being overfed in some ponds, there will be more decomposing food debris, which will increase biological oxygen demand. Also, if the shrimp are more active because they were fed later in the day, they are likely to consume more oxygen.
Both the measurements of the DO and water temperature variables in the pond were carried out with the YSI model 55TM portable oximeter. The variables biomass and days-in-cycle were provided by specialists in the area. The variables phytoplankton level and zooplankton level were measured on a laboratory scale with the Euromex triocular microscope model Bio-Blue.lab 1153-Pli. Wind Speed was measured with the Windlass Anemometer testo 410-1 model 05604101. The data on the Incident Solar Radiation Levels over the terrestrial surface are provided by the Cienfuegos Station obs trihorarias of the Meteorological Service.
3.3. Descriptive Analysis of Variables
The study of the
DO variable had a sample size of 199 observations. The largest difference occurs in the time group 09:00 p.m.–01:00 a.m., with a mean of −1.84 mg/L, as shown in
Figure 7.
The values of standardized skewness (−1.22993) and standardized kurtosis (1.64142) for the total sample are within the expected range for data from a normal distribution between −2 and 2. On the other hand, the graphs in
Figure 8 show a description of the variables associated with oxygenation.
The extreme value most noticeable in
Figure 8a corresponds to biomass with a value of 1044.69 kg/ha. In the case of the replacement variable volume, it has an extreme value of
, or, in other words, a continuous flow of
/h, reached in the time group 09:00 p.m.–01:00 a.m. on 24 October 2018 as shown in
Figure 8b. On the other hand, Solar Radiation is shown as the one with the lowest range, since only solar hours in the time group of 05:00 a.m.–09:00 a.m. are considered.
Table 5 presents the results of the ANOVA analysis for the variation of DO by time group. The F-value is high (250.38), indicating that the variability between the means of the hourly groups is significantly larger than the variability within the groups. The
p-value is extremely low (2.51 × 10
−3), confirming that this difference is statistically significant. This means it is highly unlikely that the observed differences between the groups are due to random chance. In summary, the results suggest that dissolved oxygen (DO) changes significantly across different time intervals. This could have implications for oxygenation cycles or pond management, depending on how these variations affect shrimp growth.
3.4. Multiple Linear Regression Models
The criteria provided for water replacement in a shrimp pond do not consider the influence of other variables on oxygenation [
3,
41,
42]. This is why the effect may not be the desired one, resulting in excessive energy consumption and unnecessary water expenses. Therefore, the regression analyses seek to explain and predict the trend of the dependent variable
DO about the behavior of the independent variables. For a better understanding of the models generated, we use the terms from
to
to denote the independent variables. The multiple linear regression model explains 69.93% of the change in
DO. Since the
p-value is greater than or equal to 0.05, these terms are not statistically significant at a confidence level of 95.0% as shown in
Table 6. Consequently, x
1, x
2, x
5, and x
8 are considered to be removed from the model. Equation (
11) presents the adjusted model.
In this model,
,
y
negatively influence
DO, while
has a positive influence. In addition to the model indicated above, other multiple linear regression models were established using the file command of MATLAB® R2017a software. In this way, the type of fit can be changed: linear in the terms (linear), with interaction in the terms (interactions) and quadratic (quadratic).
3.5. Selection of the Best Model
To determine the best model among those proposed,
and MAPE are used as precision measures.
Table 7 shows that the best of the linear adjustments proposed is the quadratic one, since it has the highest
and the lowest MAPE; hence, it is the selected model.
From the above analyses, it can be seen that none of the proposed models do x8 has a statistically significant influence on the
DO, neither by itself nor through the interaction with other variables. This element contradicts the theory put forward by several specialists of the case study company regarding its use.
To apply Monte Carlo in this context, we proceed to model the variability in the predictions of
DO due to the inherent uncertainty in each independent variable.
The histogram in
Figure 9 shows the distribution of the simulated values obtained from the Monte Carlo method. The 95% confidence interval for this case is approximately [−0.332, 0.326], which indicates the variability in the predictions according to the uncertainties in the independent variables.
The regression model provides a reasonably accurate prediction but shows a marked sensitivity to x7, which affects the confidence interval and overall robustness. With additional adjustments to the precision of the input variables, in particular x7, a narrower confidence interval could be achieved, improving the robustness of the model. This would make the
DO predictions even more reliable and applicable to high-precision scientific contexts.
3.6. Model Validation
The assumptions of a statistical model refer to a series of conditions that must be met to ensure the validity of the model; otherwise, there cannot be a good strategy for predicting the data sample, as explained below:
- •
Linearity: The slope of the regression line in the graph of the residuals vs. predicted follows a trend to zero, so it can be stated that the linearity assumption is met.
- •
Normality of the residuals: In the normal probability plot, the values lie on the diagonal line, which indicates that the assumption is fulfilled.
- •
Homoscedasticity: By using White’s test, we have the following: given that
, the null hypothesis is accepted; therefore, it can be affirmed that there is homoscedasticity in the model.
- •
Absence of multicollinearity: Since all values of VIF
, it can be stated that there is no multicollinearity in the model.
- •
Absence of influential values: Cook’s distances Di
, so it can be stated that there are no extreme values in the model and the assumption is met.
Linearity occurs when there is a linear relationship between the independent variables and the dependent variable. It could be a problem if it is not fulfilled because variables that do not contribute to the model would be considered or because these relationships would not be linear.
It is possible to detect whether linearity exists or not by considering that in a graph of the predicted values vs. the residuals, the trend line must have a slope of zero.
As seen in
Figure 10, the distribution of the residuals versus the predicted values has no obvious pattern, and the trend line has a slope close to zero. This is indicative that the assumption of
linearity in the regression model is adequately met.
Normality of the residuals occurs when the model residuals follow a normal distribution. It could be a problem if it is not met because the global validation tests of the model with reference to the standard deviation could not be applied.
In the normal probability plot in
Figure 11, the values lie on the diagonal line, indicating that the assumption is met. In addition, the frequency histogram demonstrates a skewed distribution (by not being too flat or spiky and having no extreme values), another element indicating the normality of the residuals.
Another way to determine normality is by verifying that the standardized skewness and standardized kurtosis values are in the range of −2 to 2 according to
Figure 12. In this case, both values are within this range. Therefore, it can be stated that the assumption of normality is known.
The homoscedasticity assumption assumes that the error of the regression model does not affect the variance or dispersion of the estimate. It can be observed graphically because all the error terms are distributed in the same way around the regression line. Another way for its detection is by using a hypothesis test with the application of White’s test.
For the study, we have that, given that the null hypothesis is accepted, it can be stated that the model has homoscedasticity.
The absence of multicollinearity occurs when there is a strong or total correlation between the independent variables. It could be a problem because when collinearity is high, it produces very unstable coefficients in the model, i.e., the effects attributed to the independent variables may be erroneous.
The way to detect multicollinearity is through the statistic VIFi, which is determined according to Equation (
12):
Table 8 shows the multicollinearity between x1 and x3, x1 and x4, x3 and x4, and x5 and x6, since they have correlation coefficient values greater than 0.5. This is why it is considered necessary to perform the analysis using the variance inflation factor.
By using Equation (
12), we obtain the regression functions between the independent variables
x, as shown in
Table 9. As all values of VIFi < 5 , it can be stated that there is no multicollinearity in the model.
An influential observation is defined as an observation that is markedly different from the data set and has a large influence on the model output. It can be a problem because it affects the coefficients of the Equation and generates prediction errors. Three measures can be used to identify outliers: distance values, studentized residuals, and Cook’s distance.
Cook’s distance is an indicator to determine if an observation influences the value of the vector of beta coefficients. If Di > 1, observation i influences the vector of Beta coefficients of the model.
Figure 13 shows the Cook distances, where the most relevant Cook’s distances correspond to observations 27, 97, and 195, all with Di < 1, so it can be stated that there are no extreme values in the model.
3.7. Experimental Design
Once it is known that the replacement volume does not influence
DO, a hypothesis demonstrated in the models generated for the measurements performed, the following questions would be effective:
- •
What minimum water volumes are needed for the change from critical DO conditions in a pond to favorable states that do not allow for crop mortality?
- •
Is there another way to make water replacement in the pond more effective?
- •
Will Arescurenaga’s theory that high turnover rates must be generated to ensure oxygenation through this route be fulfilled?
To answer the above questions, the influence of the experimental factors (replacement volume, type of system, and hourly group) on the response variable is analyzed. Note that in this case, the hourly group has been considered. The rest of the disturbing factors such as biomass, days in the cycle, zooplankton level, phytoplankton level, water temperature, wind speed, and solar radiation are not considered because they behave in a non-controllable way.
Ponds No. 1 and 4 were used as experimental units because they have similar characteristics in terms of size, quantity, and weight of shrimp. The levels for the turnover volume factor are 80, 160, 320, and
, for the type of system: with normal replacement and with dispersion, in the case of the hourly group, the three groups indicated above are analyzed.
The multilevel factorial design is completely randomized; orthogonal, 48 runs, and two replications of the design are generated. The adjusted model explains only 60.30% of the variability of
DO, which could be due to the lack of consideration of other influential factors.
Figure 14a, shows the most influential estimated effects on the response variable in decreasing order of magnitude. In this study, the statistically significant effects with a 95.0% confidence level are the type of system, replacement volume, and an interaction between both. In
Figure 14b, the study of the hourly group effect shows that the lower ranges of the
DO are produced in the 05:00 a.m.–09:00 p.m. time group. In addition, the time of highest consumption occurs in the early morning, according to the time group 01:00 a.m.–05:00 p.m. When the system type changes from a normal system to a system with dispersion, there is a decrease in the amplitude of
DO.
As the volume replacement increases,
DO tends to zero, i.e., there is less difference between the final DO and the initial DO. When this effect is low (levels 1 and 2), it does not guarantee a change in the response variable. For there to be a change in the DO in the ponds, the volumes defined in levels 3 and 4 must be supplied (of
y
) in four hours. These represent 17.47% and 26.20% of the pond volume daily, respectively.
As explained, these volumes are higher than those suggested by [
3]; therefore, it is considered that they should only be used for critical DO levels in ponds with this situation and not as a daily practice. Water replacement does not influence the response in the regression model for the hours of operation of the system, in which such volumes are not noticeable. Despite this, it ensures the removal of sediment from the bottom.
On the other hand, the experimental design emphasizes that to guarantee the smallest ranges of
DO, the volume replacement should be high, thus avoiding that in ponds with low DO levels the limit of 2 mg/L can be reached. Therefore, it is not considered justifiable to replace water in the system daily. The proposed strategy would be associated with a replacement with the capacity to release water from the bottom on alternate days and intervention with high volumes in those ponds with low DO levels. This would result in the system being used only half as much as expected in a year so that energy consumption would guarantee a saving of 106 397.5 kWh/year.
According to the Ministry of Finance and Prices, the electricity tariff used for the case study is medium voltage with continuous activity (M1-A). The equation proposed for the early morning, Equation (
13), is used to calculate the income from savings in the pumping system to the parent bank.
In this equation, the early morning consumption term is replaced by the energy saved in one year for the proposed utilization, and the revenues are associated with the revenues from savings in one year. The K factor takes a value of 1.039 for this case .
where
Cm Consumption in the early morning hours.
5. Conclusions
There were 199 measurements distributed in three hourly groups for the variables associated with oxygenation. Noting that the highest consumption occurs from 01:00 a.m. to 05:00 a.m., there were only two measurements with critical levels of DO, and the mean of
DO was −2 mg/L.
The proposed model does not consider the turnover volume as influential in the variation of DO, while the experimental design highlights the need to use large turnover volumes to ensure oxygenation in the culture ponds, meaning that, ideally, the system should work on alternate days with water turnover at the bottom and taking into account the existing oxygen level.
The multiple linear regression model proposed in this study, with an
of 92.90% and a MAPE of 44.29%, presents a correct approximation in comparison with models previously reported in the literature, which indicates its high precision and reliability to predict the variability in the change of DO in shrimp farming systems.
The proposed model reduces energy consumption by about 106,397.5 kWh per year, being a significant saving with a substantial economic impact equivalent to 208,639.52 CUP (Cuban peso) per year, which translates into high profitability and efficiency in the production system, which further validates the feasibility and sustainability of the project.
The research gap addressed by this study lies in the lack of accurate predictive models that optimize energy use in aquaculture oxygenation systems, especially in limited water replacement scenarios. To date, many aquaculture systems have relied on high water replacement rates, which significantly increases energy costs. This article proposes a solution that minimizes this dependence, making aquaculture systems more cost-effective and sustainable in the long term.
The study introduces a novel approach by using a mathematical model that optimizes oxygenation without relying excessively on water exchange, which is traditionally a costly and less efficient approach. The key contribution lies in the model’s ability to reduce water turnover without negatively affecting shrimp growth and health, which represents an improvement in the sustainability of the system.