5.1. Model Fitting and Predictive Measures
In this section, we aim to compare the predictive performance of the proposed DNN-I and DNN-L survival models with the existing AFT and Cox PH models and their corresponding DNN models. All DNN models, including the proposed models, were computed using Python-based TensorFlow-Keras, while the Cox PH model was implemented using the lifelines package in Python.
The total dataset was divided into three separate sets; within each farm, the last three observations were assigned to the test set, the middle four observations were assigned to the validation set, and the remaining observations were assigned to the training set.
The optimal hyperparameters used in the DNN models are summarized in
Table 3; early stopping was employed to prevent overfitting [
20]. In the Cox model DNN, the negative Efron log-likelihood [
22] was used as the loss function, while the RMSE loss was utilized in the AFT-based DNN because it penalizes larger errors more, especially when there is no censoring.
Note that
in
Section 4.1 due to no censoring. The predictive performance for survival models based on the
s without censoring was evaluated using the following measures. For the AFT-type models, we used the RMSE and mean absolute error (MAE), defined as follows:
and
respectively, where
is the
ith observed harvest time and
is the
ith harvest time predicted by the fitted model. The MAE is more robust against outliers compared to the RMSE. Notably, AFT-type models are particularly useful for predicting the harvest time because they directly model the survival times.
For the Cox-type hazard models we used the concordance index (C-index), defined as
here,
is the risk function of
in (2), and is estimated as follows [
23]:
where
is the censoring indicator of the
ith observation and
is the
ith observed harvest time, which is the corresponding predicted harvest time obtained from the fitted model. The C-index takes a value between 0 and 1, with values closer to 1 indicating better predictive performance. Note that Cox-type models are particularly useful for predicting survival probability, i.e., the survival function; the survival probability can be easily computed using the Cox hazard model, as shown in (3) and (4). We used the time-dependent Brier Score (BS) to evaluate the accuracy of the predicted survival function at a given time point
t. The BS represents the average squared distance between the observed survival status and the predicted survival probability at that time point
t, and is defined by for a given time point
t as follows:
Without censoring, the BS is estimated as follows [
24]:
where
represents the predicted survival function obtained from the fitted model. Note that a lower BS indicates better prediction performance, similar to the RMSE. The integrated BS (IBS) provides an overall evaluation of model performance across all available times (
). The IBS over the interval
for
is defined as
5.2. Prediction Results for AFT-Type DNN Models
We first consider the four AFT-type models: AFT, AFT-DNN, AFT-DNN-I, and AFT-DNN-L.
Figure 6 illustrates the predicted values of
T against the observed values of
T on the test set. The results suggest that the AFT DNN-L model effectively predicts the output variables, with a Pearson’s sample correlation coefficient of 0.624. Furthermore,
Table 4 demonstrates that the AFT DNN-L model achieves the lowest RMSE (0.8067) and MAE (0.6090), indicating superior performance.
Table 4 compares the prediction performances of the AFT-type methods with those of three popular machine learning (ML) methods: random forest (RF; [
25]), XGBoost (XGB; [
26]), and support vector regression (SVR; [
27]). The hyperparameters for the three ML methods were tuned using ten-fold cross-validation through the Python packages
RandomForestRegressor,
xgboost, and
svm, and the resulting optimal settings were as follows: (i) for the RF model, the number of trees was 500, the number of features randomly selected as candidates for splitting a node was
, and the maximum depth of trees was ten; (ii) for the XGB model, the number of trees was 300, the learning rate was 0.1, and the maximum depth of trees was one; (iii) for the SVR model, the trade-off between maximizing the margin and minimizing the error was 0.1, the width of the margin was 0.01, and the kernel function was linear. The resulting ML predictive results are summarized as follows: for RF, the RMSE was 1.1929 and the MAE was 0.9384; for XGB, the RMSE was 1.2256 and the MAE was 0.9443; and for SVR, the RMSE was 1.1973 and the MAE was 0.9314. It is worth noting that none of these three ML methods are able to directly account for the farm effect. Consequently, we observe that all three ML methods yield inferior predictive results compared to the AFT–DNN methods presented in
Table 4.
5.3. Prediction Results for Cox-Type DNN Hazard Models
Next, we consider the four Cox-type hazard models: Cox, Cox-DNN, Cox-DNN-I, and Cox-DNN-L.
Table 5 presents the C-index and IBS results for the four Cox-type models on the test set. Among these models, the Cox-DNN-L model demonstrates the highest performance in terms of the C-index and IBS.
Figure 7 displays the time-dependent BS results on the test set for the four Cox-type models. The BS values of the proposed Cox-DNN-L model and Cox-DNN model are very similar at each time point (week), and are consistently lower than those of the other two models (Cox and Cox-DNN-I) across almost all time points. Notably, the base Cox model exhibits exceptionally high BS values for weeks 6 and 8. These results indicate that the proposed Cox-DNN-L model outperforms the other three Cox-type models (Cox, Cox-DNN, and Cox-DNN-F) in terms of overall prediction performance.