1. Introduction
Predictive food microbiology is a theoretical field within food microbiology that focuses on developing statistical models to forecast microbial behaviour in food environments by merging traditional microbiological knowledge with mathematical and statistical principles [
1]. While the use of predictive models dates back to the early 20th century, advancements in computer technology have significantly accelerated the progress of predictive microbiology in the 21st century. These models are utilized to determine conditions within food environments that mitigate or delay the adverse effects of microbial contamination. In traditional predictive microbiology, mathematical models are generally categorized into two types: primary and secondary models [
2]. Primary models describe the behaviour of microorganisms over time under static environmental conditions, essentially capturing how microbial populations grow, survive, or die when external factors remain constant. Secondary models, in contrast, account for the influence of environmental variables—such as temperature, pH, and water activity—and food matrices on the parameters of the primary models. While this conventional modelling framework is often effective in predicting microbial behaviour, it does have certain limitations. A significant concern is the potential accumulation and amplification of errors, which can occur because the nonlinear regression process is performed twice—once in developing the primary model and again when integrating environmental factors in the secondary model [
3,
4,
5].
In recent years, the application of machine learning algorithms has gained significant momentum across various research fields. This surge is largely driven by three key technological advancements: first, the ability to quickly capture vast amounts of digital data; second, the exponential growth in affordable computing power and data storage; and third, the development of a global network enabling rapid data transfer. Numerous studies have explored the use of machine learning (ML) techniques in food safety and modelling [
6,
7,
8,
9]. Machine learning methods are particularly effective in identifying underlying relationships between explanatory and response variables in datasets, making ML-based regression approaches capable of predicting population behaviours and enhancing the predictive accuracy of bacterial growth patterns. Despite these promising advancements, the application of machine learning algorithms to predict microbial behaviour in food systems remains relatively uncommon. Furthermore, to the best of our knowledge, no studies have yet compared traditional modelling approaches with machine learning models within the field of predictive microbiology.
Both traditional modelling techniques and machine learning approaches can be utilized to predict microbial behaviour and estimate the shelf life of food products [
10]. Traditional models rely on predefined mathematical equations and structured computational methods, while machine learning techniques leverage algorithms to uncover patterns and generate predictions directly from data [
11]. Machine learning offers a significant advantage in its ability to capture complex, nonlinear relationships, making it particularly useful for analyzing large, diverse datasets. However, it often requires substantial amounts of training data and can pose interpretability challenges. Traditional models, on the other hand, are generally grounded in established biological and chemical principles, which makes them easier to interpret. These models may be preferable when data availability is limited or when there is a need for a straightforward, transparent explanation of results [
12].
The primary objective of this work is to develop software that utilizes machine learning-based regression methods—specifically Support Vector Regression (SVR), Random Forest Regression (RFR) and Gaussian Process Regression (GPR)—to predict and quantify the behaviour of Pseudomonas spp. in culture media. Temperature, water activity, and pH were the key predictor variables used to estimate microbial growth. The performance of these machine learning models was assessed by comparing them to traditional models, such as the modified Gompertz, Logistic, Baranyi, and Huang models, using statistical metrics like the adjusted coefficient of determination (R2adj) and root mean square error (RMSE).
3. Results and Discussion
The growth data points of
Pseudomonas spp. in culture mediums collected from the ComBase database were stored with the following information: record ID, temperature (°C), water activity, pH, initial microbial population (yes/no) and time (h). The data frequency of the collected data categorized into each of the features is shown in
Figure 2.
The maximum specific growth rate and lag phase duration, key growth kinetic parameters, can be modelled in relation to environmental factors such as temperature, water activity and pH. Among these, temperature plays a crucial role in influencing microbial growth behaviour in food products, as noted by [
27]. In this study, the temperature range considered was 5 to 25 °C, reflecting typical conditions encountered by food products during storage, transport and retail. This range includes refrigeration temperatures (around 5–10 °C), which slow microbial growth, as well as warmer conditions up to 25 °C, where microbial activity accelerates, potentially impacting shelf life and safety. Water activity, another essential factor in microbial growth, represents the ratio between the vapour pressure of the food and the vapour pressure of distilled water in identical conditions. Most foods have a water activity level above 0.95, which is sufficient to support microbial growth, as free water is available for cellular processes. In this study, the water activity range was from 0.954 to 0.997, indicating conditions that provide ample moisture to promote microbial growth in fresh and perishable foods. The pH level of food also directly affects microbial growth by influencing enzyme activity and cellular function. In this study, pH values ranged from 4.01 to 7.40 for the culture medium, encompassing both acidic and near-neutral conditions. Acidic environments (pH around 4) inhibit many spoilage organisms, while near-neutral pH conditions (closer to 7) support a broader range of bacterial growth, potentially accelerating spoilage. These environmental factors—temperature, water activity and pH—collectively influence μ
max and
λ, making them critical for predicting microbial behaviour and developing effective storage and preservation strategies in the food industry.
For model comparison, 80% of the data were allocated for training and 20% for testing.
Table 2 presents the performance differences between traditional and machine learning models in microbial growth modelling during the training process. Traditional models, such as Gompertz, Logistic, Baranyi and Huang, are frequently used due to their interpretability and effectiveness in capturing standard S-shaped microbial growth patterns. These traditional models, however, rely on fixed growth structures, which restricts their flexibility in capturing complex, nonlinear growth dynamics. Although both two-step and one-step modelling approaches were initially applied, the two-step approach did not successfully fit the data across any of the traditional models. Consequently, only the results from the one-step modelling approach are presented here as the traditional modelling outcome.
The Gompertz model achieved an R
2adj of 0.813 and an RMSE of 0.022, indicating moderate predictive accuracy but limited flexibility. The Logistic model performed slightly better, with an R
2adj of 0.844 and an RMSE of 0.020, capturing microbial growth dynamics more effectively. Despite incorporating a lag phase, the Baranyi model had the lowest R
2adj among the traditional models (0.790) and an RMSE of 0.023, reflecting its challenges in handling complex growth behaviours. The Huang model was the most accurate among traditional models, achieving an R
2adj of 0.850 and an RMSE of 0.020, though it was still surpassed by machine learning models (
Figure 3).
Machine learning models, which do not rely on predefined relationships, demonstrated greater adaptability. SVR yielded an R
2adj of 0.854 and an RMSE of 0.019, indicating solid predictive performance, though it was slightly less effective than GPR and RFR, likely due to sensitivity in parameter tuning. Random Forest Regression achieved an R
2adj of 0.893 and an RMSE of 0.017, benefiting from its ensemble approach, which captures complex interactions between variables. Gaussian Process Regression provided the highest R
2adj (0.959) and the lowest RMSE (0.010), showcasing exceptional accuracy and robustness in modelling nonlinear growth patterns. These results illustrate that while traditional models like Huang offer reasonable accuracy, machine learning models, particularly GPR, deliver superior predictive performance and are better suited for modelling complex microbial growth dynamics during the training process (
Figure 4).
The bar chart highlights the relative importance of four predictors—time, temperature, water activity and pH—in modelling the growth of
Pseudomonas spp. (
Figure 5). The results show that time is by far the most influential factor in predicting
Pseudomonas growth, indicating that microbial growth patterns are significantly dependent on the duration of exposure under given conditions. This aligns with biological expectations, as microbial populations typically increase exponentially over time when other growth conditions remain constant. Temperature is the second most significant factor. This reflects the sensitivity of
Pseudomonas growth rates to temperature changes, as temperature is known to play a critical role in enzymatic activity and cellular processes. Higher or optimal temperatures generally accelerate microbial growth until a threshold, beyond which growth rates decline. Therefore, temperature control is essential in limiting
Pseudomonas growth, especially in food storage and handling. Water activity shows a smaller yet noticeable impact on growth. Water activity measures the availability of free water for microbial activities, and since
Pseudomonas spp. require moisture to thrive, maintaining low water activity can help inhibit their growth. Most foods have water activities high enough to support microbial growth; however, controlling this variable can be an effective measure in reducing growth rates. Finally, pH has the least impact among the factors. While pH affects microbial growth by influencing enzyme stability and nutrient availability,
Pseudomonas spp. can tolerate a range of pH levels, especially near neutrality, which may explain its lower importance relative to the other factors. However, maintaining pH levels outside of this range can still contribute to controlling growth, although it is less effective compared to controlling time, temperature or water activity.
The test data’s performance for the traditional modelling approaches—Gompertz, Logistic, Baranyi and Huang models—shows varying levels of predictive accuracy. The Gompertz model captures the general trend but exhibits noticeable deviations from the ideal line, indicating limited precision. The Logistic model shows slightly better alignment with the ideal line, suggesting improved accuracy in capturing growth dynamics, though still with some inconsistencies. The Baranyi model has the widest spread from the ideal line, reflecting lower predictive accuracy despite accounting for a lag phase, suggesting it struggles with the complexity of the growth data. Among the traditional models, the Huang model shows the closest fit to the ideal line, indicating the highest predictive accuracy and flexibility in modelling nonlinear growth trends (
Figure 6).
The test data’s performance of the machine learning models—Support Vector Regression, Random Forest Regression and Gaussian Process Regression—demonstrates their superior predictive accuracy and adaptability in capturing microbial growth dynamics. The SVR plot shows a relatively close alignment with the ideal line, though some minor deviations indicate that it may be sensitive to tuning parameters, especially in nonlinear regions. The RFR plot aligns more closely with the ideal line than SVR, illustrating its ensemble approach’s effectiveness in capturing complex interactions within the data, though it still shows slight scattering. GPR, however, displays the closest fit to the ideal line among the machine learning models, showcasing excellent alignment with minimal deviations. This indicates that GPR provides the highest predictive accuracy, capturing intricate patterns with greater robustness compared to SVR and RFR. These results underscore that machine learning models, particularly GPR, are more effective than traditional models in accurately modelling complex microbial growth patterns (
Figure 7).
In comparing traditional modelling approaches with machine learning approaches for microbial growth prediction, it is evident that machine learning models generally provide enhanced accuracy and flexibility in handling complex growth dynamics. Traditional models, including Gompertz, Logistic, Baranyi, and Huang, are well-established and provide relatively straightforward interpretations due to their defined parametric structures, which are ideal for standard microbial growth patterns. Among these, the Gompertz model performs best, achieving an R
2adj of 0.861 and an RMSE of 0.007, closely followed by the Logistic model (
Table 3). These models maintain decent accuracy (highest for the Huang model at 69.8%) and offer good predictive bias (
Bf) and accuracy factor (
Af) scores. However, their limited adaptability to nonlinear and non-standard growth patterns restricts their performance in more complex scenarios, as indicated by their lower accuracy values compared to machine learning models.
Machine learning models, including Support Vector Regression, Random Forest Regression and Gaussian Process Regression, demonstrate superior capability by not assuming a predefined functional form, allowing them to capture intricate, nonlinear growth behaviours effectively. GPR, in particular, stands out with the highest R2adj of 0.923 and the lowest RMSE of 0.005, showcasing its robustness and reliability in handling complex data patterns. Its accuracy (84.3%) surpasses all other models, traditional and machine learning alike. RFR also performs notably well, with an R2adj of 0.884 and an RMSE of 0.006, benefiting from its ensemble approach to account for variable interactions. SVR, while effective with an R2adj of 0.834, shows limitations when compared to GPR and RFR, potentially due to sensitivity in high-dimensional spaces and the need for careful parameter tuning. Furthermore, it is important to note that the R2adj of the SVR is slightly lower than that of traditional models, except for the Branyi model. In terms of the bias factor (Bf) and accuracy factor (Af), machine learning models generally display a closer alignment to ideal values, with RFR having a nearly perfect Bf of 0.998 and GPR achieving the lowest Af (1.100), indicating greater consistency and reliability. All these results affirm that while traditional models offer interpretable and moderately accurate predictions suitable for simpler growth dynamics, RFR and GPR provide a higher degree of predictive power, accuracy and flexibility, making them better suited for complex and nonlinear microbial growth modelling scenarios.
The machine learning models developed for predicting microorganism growth were integrated into a user-friendly software interface, allowing users to easily input parameters and visualize predicted microbial counts. This interface, illustrated in
Figure 8, showcases a streamlined design aimed at simplifying the prediction process, making it accessible even to users without extensive technical knowledge. Key components of the interface include input fields for essential parameters such as temperature, pH, water activity, and other relevant environmental factors. Upon entering these values, the software instantly generates predictions using trained machine learning models like GPR, RFR and SVR, providing outputs on microbial growth rates and expected counts. In addition, this software has been made accessible to a broader audience via the GitHub platform. The repository, located under the name “ftarlak/Pseu_Calculator”, includes not only the code but also a brief video tutorial that guides users through the installation and usage steps. This video demonstration helps users understand the functionality of each component within the interface, from data entry to interpreting prediction outputs.