**3. Results**

Indexes characterizing the IVC were extracted from long and short axis views by either semi-automated processing or manual estimation (performed in M-mode). Then, they were used to classify patients. As using indexes extracted with the automated processing resulted in better performances, figures and tables shown below refer to those data, indicating in the text some performance indexes of the best BTM developed using the set of indexes obtained by standard manual measurements.

Figure 3 shows the BTM selected as the classifier with best performances on our dataset. The shown BTM was trained on the entire dataset, including the best input features selected by the cross-validation test (described in Section 2.3), where minimum loss was obtained (equal to 0.26; the loss of the best classifiers using either ECOC or Naive Bayes models was 0.28).

Two pulsatility indexes are included: CCI in long axis and CI in short axis. The same loss was obtained by other 4 BTMs: the one with minimum number of input features was selected. The CCI in long axis was included in 4 of these BTMs with minimum loss; the CI in short axis was included in 2 of them. Another feature which was often included was the RCI in short axis, which was used in 3 among the 5 BTMs with minimum loss. In the case in which standard manual measurements were employed, the best BTM was unique, it had a loss of 0.28 and included two indexes: IVC diameter estimated in long axis and CI in short axis.

Distributions of the indexes are shown in Figure 4. The mean Fisher ratios (FR, considering all 3 binary comparisons) of the indexes selected by the best BTM between those estimated by semi-automated processing are among the highest. However, they have not the highest FRs: indeed, the best discrimination in terms of average FR is provided by the mean diameter estimated from the long axis view. This indicates that the selected indexes are those that are both informative and not much redundant, allowing a peak in performance of the classifier using them as inputs. Notice also that the FR is an index of linear discrimination, whereas the adopted classifier allows for nonlinear separation.

It is interesting to see that the indexes estimated manually have even higher FRs, indicating a better linear discrimination of the patients. The two indexes with highest FRs are those selected by the best

BTM using only indexes measured manually. However, the semi-automated processing allows to extract additional information: specifically, the two pulsatility indexes RCI and CCI reflect the effect of different stimulations (respiration and heartbeat, respectively). This further information (and specifically that coming from the CCI) allows the BTM from automated processing to ge<sup>t</sup> better performances than the one developed on the basis of the set of manually estimated indexes.

The confusion matrix of the best BTM shown in Figure 3 is given in Table 2. Notice that all hypo-volemic patients were correctly identified. A few eu-volemic and hyper-volemic subjects were misclassified. No hyper-volemic patient was confused as hypo-volemic or vice-versa. Common performance indexes are the followings: mean sensitivity 90.0% (86.0% for the BTM built using the manually estimated indexes); mean specificity 95.0% (91.9% with manual indexes); positive predictive value 90.0% (86.2% with manual indexes); negative predictive value 94.2% (91.8% with manual indexes); mean accuracy 92.9% (89.8% with manual indexes).

**Figure 3.** BTM with best performances in fitting our data. The list of tested indexes (all estimated by automated processing) is also provided, with indication (in bold) of those selected by the BTM.

**Figure 4.** Distribution of the considered IVC indexes from patients with different volemic conditions. The FR (ratio between squared difference of means and sum of variances, computed for all 3 binary comparisons and averaged) is indicated, as an index of linear discrimination. The indexes selected by the best BTMs (those using either semi-automated or manually estimation approach) are emphasized.

Notice that these performances were obtained using the entire dataset to train our model. As some misclassifications were obtained, we deduce that some information is still missing and/or the features extracted by our processing contain some residual noise. To ge<sup>t</sup> a more faithful indication of performances, a leave-one-out test was performed (i.e., the best features selected before were kept, but each sample was excluded in turn from the training set and used for testing). The confusion matrix in Table 3 was obtained. Some degradation of the performance can be observed, especially in the discrimination of the control and hyper-volemic groups. The following performance indexes were achieved: mean sensitivity 70.0% (66.0% for the BTM built using the manually estimated indexes and tested by a leave-one-out approach); mean specificity 83.2% (80.4% with manual indexes); positive predictive value 70.0% (65.1% with manual indexes); negative predictive value 82.1% (80.5% with manual indexes); mean accuracy 78.0% (75.3% with manual indexes).


**Table 2.** Confusion matrix of the best binary tree model classifying the volemic status, shown in Figure 3 (for comparison, the best error-correcting output codes and Naive Bayes classifiers trained on the entire dataset show a predictive value of 78% and 86%, respectively).


**Table 3.** Confusion matrix obtained by testing the best binary tree model with a leave-one-out approach.
