Author Contributions
Conceptualization, C.-Y.C. and L.-K.M.; methodology, C.-Y.C. and L.-K.M.; software, L.-K.M.; validation, C.-Y.C. and L.-K.M.; formal analysis, L.-K.M.; investigation, C.-Y.C., M.-H.H., C.-H.L. and L.-K.M.; resources, C.-Y.C., M.-H.H., C.-H.L. and Y.-N.S.; data curation, C.-Y.C., M.-H.H., C.-H.L. and L.-K.M.; writing—original draft preparation, L.-K.M.; writing—review and editing, C.-Y.C., M.-H.H., C.-H.L., Y.-N.S. and L.-K.M.; visualization, L.-K.M.; supervision, C.-Y.C., Y.-N.S. All authors have read and agreed to the published version of the manuscript.
Figure 1.
An overview of the proposed architecture for detecting the malposition. It consists of a ResNet-based backbone, Coarse-to-Fine Attention (CTFA), FPN-based neck, FCOS-based detection head, and segmentation head. The legends below demonstrate the operations above.
Figure 1.
An overview of the proposed architecture for detecting the malposition. It consists of a ResNet-based backbone, Coarse-to-Fine Attention (CTFA), FPN-based neck, FCOS-based detection head, and segmentation head. The legends below demonstrate the operations above.
Figure 2.
An illustration of coarse-to-fine attention (CTFA). CTFA consisted of a global-modelling attention (GA) and a scale attention (SA). GA was aimed at capturing long-range relationships and SA was aimed at reweighting with local relationships.
Figure 2.
An illustration of coarse-to-fine attention (CTFA). CTFA consisted of a global-modelling attention (GA) and a scale attention (SA). GA was aimed at capturing long-range relationships and SA was aimed at reweighting with local relationships.
Figure 3.
An illustration of global-modelling attention (GA). GA generated long-range relationships through two branches. The upper branch was aimed at capturing long-range context information and the lower branch was aimed at grabbing local context information. Then, this information is integrated by a series of operations.
Figure 3.
An illustration of global-modelling attention (GA). GA generated long-range relationships through two branches. The upper branch was aimed at capturing long-range context information and the lower branch was aimed at grabbing local context information. Then, this information is integrated by a series of operations.
Figure 4.
An illustration of scale attention (SA). SA addressed the defects of convolutional block attention module (CBAM) by adaptive channel pooling and squeeze-and-excitation (SE) block.
Figure 4.
An illustration of scale attention (SA). SA addressed the defects of convolutional block attention module (CBAM) by adaptive channel pooling and squeeze-and-excitation (SE) block.
Figure 5.
An illustration of post-process.
Figure 5.
An illustration of post-process.
Figure 6.
Ground Truth. (a) Original ground truth. (b) Pre-processed ground truth.
Figure 6.
Ground Truth. (a) Original ground truth. (b) Pre-processed ground truth.
Figure 7.
Ensuring at most one ETT tip/Carina left. (a) Without post-process. (b) With post-process.
Figure 7.
Ensuring at most one ETT tip/Carina left. (a) Without post-process. (b) With post-process.
Figure 8.
Refining the feature point of ETT tip/Cairna by the bbox of ETT/Bifurcation. (a) Without post-process. (b) With post-process.
Figure 8.
Refining the feature point of ETT tip/Cairna by the bbox of ETT/Bifurcation. (a) Without post-process. (b) With post-process.
Table 1.
The performance in ETT–Carina distance error.
Table 1.
The performance in ETT–Carina distance error.
Test Folder | Acc. (%) | Mean (mm) | Std. (mm) |
---|
Folder 1 | 90.37 | 5.130 | 5.609 |
Folder 2 | 87.70 | 5.969 | 8.325 |
Folder 3 | 88.24 | 5.256 | 5.491 |
Folder 4 | 86.63 | 5.437 | 6.663 |
Folder 5 | 91.18 | 4.874 | 5.111 |
Average | 88.82 | 5.333 | 6.240 |
External val. | 90.67 | 5.015 | 5.147 |
Table 2.
The distance error distribution in ETT–Carina.
Table 2.
The distance error distribution in ETT–Carina.
Test Folder | ≤5 mm (%) | ≤10 mm (%) | ≤15 mm (%) | ≤20 mm |
---|
Folder 1 | 62.57 | 85.29 | 93.58 | 96.79 |
Folder 2 | 63.10 | 84.22 | 92.25 | 95.72 |
Folder 3 | 63.90 | 83.96 | 92.25 | 95.45 |
Folder 4 | 63.90 | 87.17 | 92.78 | 97.06 |
Folder 5 | 64.44 | 88.50 | 93.85 | 97.06 |
Average | 63.58 | 85.83 | 92.94 | 96.42 |
External val. | 66.00 | 84.00 | 92.67 | 97.33 |
Table 3.
The confusion matrix of diagnosis.
Table 3.
The confusion matrix of diagnosis.
GT | Suitable | Unsuitable |
---|
Predict |
---|
Suitable | 1350 | 126 |
Unsuitable | 66 | 311 |
Undetection | 12 | 5 |
Table 4.
The confusion matrix of diagnosis (external val.).
Table 4.
The confusion matrix of diagnosis (external val.).
GT | Suitable | Unsuitable |
---|
Predict |
---|
Suitable | 110 | 8 |
Unsuitable | 5 | 26 |
Undetection | 1 | 0 |
Table 5.
The performance in recall and precision.
Table 5.
The performance in recall and precision.
Recall and Precision | ETT Tip | Carina |
---|
Test Folder | Recall (%) | Precision (%) | Recall (%) | Precision (%) |
Folder 1 | 90.64 | 91.37 | 94.65 | 94.91 |
Folder 2 | 89.30 | 89.54 | 93.58 | 93.58 |
Folder 3 | 90.91 | 92.14 | 92.25 | 92.49 |
Folder 4 | 91.18 | 91.42 | 94.92 | 95.17 |
Folder 5 | 92.78 | 93.53 | 94.12 | 94.37 |
Average | 90.96 | 91.60 | 93.90 | 94.10 |
External val. | 92.67 | 93.29 | 88.00 | 88.59 |
Table 6.
The performance in object error.
Table 6.
The performance in object error.
Object Error | ETT Tip | Carina |
---|
Test Folder | Mean (mm) | Std. (mm) | Mean (mm) | Std. (mm) |
Folder 1 | 4.415 | 5.281 | 3.952 | 3.345 |
Folder 2 | 4.858 | 7.869 | 4.236 | 3.663 |
Folder 3 | 3.974 | 4.405 | 4.322 | 3.947 |
Folder 4 | 4.584 | 6.273 | 3.895 | 3.527 |
Folder 5 | 3.690 | 3.800 | 4.185 | 3.793 |
Average | 4.304 | 5.526 | 4.118 | 3.655 |
External val. | 3.733 | 4.613 | 4.688 | 4.043 |
Table 7.
The object error distribution in ETT tip.
Table 7.
The object error distribution in ETT tip.
Test Folder | ≤5 mm (%) | ≤10 mm (%) | ≤15 mm (%) | ≤20 mm (%) |
---|
Folder 1 | 75.94 | 90.64 | 94.39 | 97.06 |
Folder 2 | 75.40 | 89.30 | 94.65 | 96.79 |
Folder 3 | 78.61 | 90.91 | 95.19 | 97.06 |
Folder 4 | 73.26 | 91.18 | 94.92 | 97.86 |
Folder 5 | 81.02 | 92.78 | 97.06 | 97.59 |
Average | 76.85 | 90.96 | 95.24 | 97.27 |
External val. | 83.33 | 92.67 | 94.67 | 96.67 |
Table 8.
The object error distribution in Carina.
Table 8.
The object error distribution in Carina.
Test Folder | ≤5 mm (%) | ≤10 mm (%) | ≤15 mm (%) | ≤20 mm (%) |
---|
Folder 1 | 74.60 | 94.65 | 98.13 | 99.20 |
Folder 2 | 74.06 | 93.58 | 97.59 | 99.20 |
Folder 3 | 73.53 | 92.25 | 96.52 | 98.40 |
Folder 4 | 78.34 | 94.92 | 97.86 | 98.93 |
Folder 5 | 74.06 | 94.12 | 98.13 | 98.40 |
Average | 74.92 | 93.90 | 97.65 | 98.83 |
External val. | 68.67 | 88.00 | 96.67 | 98.00 |
Table 9.
The comparison results of accuracy and ETT–Carina distance error.
Table 9.
The comparison results of accuracy and ETT–Carina distance error.
Method | Malposition Accuracy (%) | ETT-Carina Distance Error |
---|
Mean (mm) | Std. (mm) |
---|
SOTA average [11] | 88.11 | 5.543 | 6.310 |
Ours average | 88.82 (+0.81%) | 5.333 (−3.79%) | 6.240 (−1.11%) |
SOTA external val. [11] | 87.33 | 5.668 | 6.651 |
Ours external val. | 90.67 (+3.82%) | 5.015 (−11.52%) | 5.147 (−22.61%) |
Table 10.
The comparison results of error distribution on the ETT–Carina distance.
Table 10.
The comparison results of error distribution on the ETT–Carina distance.
Method | ETT-Carina Distance Error Distribution |
---|
≤5 mm (%) | ≤10 mm (%) | ≤15 mm (%) | ≤20 mm (%) |
---|
SOTA average [11] | 60.37 | 84.20 | 92.78 | 95.39 |
Ours average | 63.58 (+5.32%) | 85.83 (+1.94%) | 92.94 (+0.17%) | 96.42 (+1.08%) |
SOTA external val. [11] | 64.00 | 82.00 | 90.67 | 94.67 |
Ours external val. | 66.00 (+3.13%) | 84.00 (+2.44%) | 92.67 (+2.21%) | 97.33 (+2.81%) |
Table 11.
The comparison results of recall, precision, and object error on the ETT tip.
Table 11.
The comparison results of recall, precision, and object error on the ETT tip.
Method | ETT Tip |
---|
Recall (%) | Precision (%) | Mean (mm) | Std. (mm) |
---|
SOTA average [11] | 93.31 | 93.49 | 4.122 | 4.402 |
Ours average | 90.96 (−2.52%) | 91.60 (−2.02%) | 4.304 (+4.42%) | 5.526 (+25.53%) |
SOTA external val. [11] | 90.27 | 90.27 | 4.286 | 5.943 |
Ours external val. | 92.67 (+2.66%) | 93.29 (+3.35%) | 3.733 (−12.90%) | 4.613 (−22.38%) |
Table 12.
The comparison results of error distribution on the ETT tip.
Table 12.
The comparison results of error distribution on the ETT tip.
Method | ETT Tip Object Error Distribution |
---|
≤5 mm (%) | ≤10 mm (%) | ≤15 mm (%) | ≤20 mm (%) |
---|
SOTA average [11] | 75.08 | 93.31 | 96.36 | 98.21 |
Ours average | 76.85 (+2.36%) | 90.96 (−2.52%) | 95.24 (−1.16%) | 97.29 (−0.94%) |
SOTA external val. [11] | 79.33 | 90.27 | 95.33 | 96.97 |
Ours external val. | 83.33 (+5.04%) | 92.67 (+2.66%) | 94.67 (−0.69%) | 96.67 (−0.31%) |
Table 13.
The comparison results of recall, precision, and object error on the Carina.
Table 13.
The comparison results of recall, precision, and object error on the Carina.
Method | Carina |
---|
Recall (%) | Precision (%) | Mean (mm) | Std. (mm) |
---|
SOTA average [11] | 94.70 | 95.23 | 4.775 | 5.342 |
Ours average | 93.90 (−0.84%) | 94.10 (−1.19%) | 4.118 (−13.76%) | 3.655 (−31.58%) |
SOTA external val. [11] | 91.64 | 91.96 | 4.567 | 4.513 |
Ours external val. | 88.00 (−3.97%) | 88.59 (−3.66%) | 4.688 (+2.65%) | 4.043 (−10.41%) |
Table 14.
The comparison results of error distribution on the Carina.
Table 14.
The comparison results of error distribution on the Carina.
Method | Carina Object Error Distribution |
---|
≤5 mm (%) | ≤10 mm (%) | ≤15 mm (%) | ≤20 mm (%) |
---|
SOTA average [11] | 68.84 | 94.70 | 95.55 | 97.12 |
Ours average | 74.92 (+8.83%) | 93.90 (−0.84%) | 97.65 (+2.20%) | 98.83 (+1.76%) |
SOTA external val. [11] | 73.33 | 91.64 | 95.33 | 96.54 |
Ours external val. | 68.67 (−6.35%) | 88.00 (−3.97%) | 96.67 (+1.41%) | 98.00 (+1.51%) |
Table 15.
The effect of softmax in GA.
Table 15.
The effect of softmax in GA.
Method | Malposition Accuracy (% | ETT-Carina | ETT Tip | Carina |
---|
Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
---|
w/softmax | 90.11 | 5.209 | 6.628 | 3.968 | 5.800 | 4.203 | 4.097 |
w/o softmax | 91.18 | 4.911 | 5.114 | 3.689 | 3.802 | 4.238 | 3.862 |
Table 16.
The effect of channel, kernel and SE block of SA.
Table 16.
The effect of channel, kernel and SE block of SA.
Method | Malposition Accuracy (%) | ETT-Carina | ETT Tip | Carina |
---|
Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
---|
SA (c1 + k7) | 83.69 | 4.904 | 4.813 | 3.998 | 3.625 | 4.386 | 3.750 |
SA (c1 + k1) | 85.83 | 5.648 | 7.628 | 4.911 | 8.605 | 4.185 | 3.674 |
SA (c4 + k1) | 85.83 | 5.182 | 6.245 | 4.188 | 4.067 | 4.611 | 5.759 |
SA (c8 + k1) | 87.70 | 5.067 | 5.248 | 4.273 | 4.418 | 4.305 | 4.016 |
SA (c8 + k7) | 83.69 | 4.644 | 4.401 | 4.007 | 3.615 | 4.028 | 3.372 |
SA (c16 + k1) | 85.56 | 4.883 | 4.778 | 3.985 | 3.696 | 4.351 | 3.969 |
SA (w/o SE) | 86.36 | 5.491 | 9.697 | 4.619 | 11.997 | 4.391 | 3.956 |
Table 17.
The comparison results of attention modules.
Table 17.
The comparison results of attention modules.
Method | Malposition Accuracy (%) | ETT-Carina | ETT Tip | Carina |
---|
Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
---|
FCOS [13] | 86.10 | 5.335 | 7.831 | 4.254 | 5.427 | 4.659 | 7.497 |
FCOS + SE [36] | 85.03 | 5.424 | 5.854 | 4.284 | 4.156 | 4.543 | 4.943 |
FCOS + CSPnonlocal [45] | 86.10 | 5.404 | 5.817 | 3.980 | 3.708 | 4.332 | 4.416 |
FCOS + nonlocal [33] | 86.10 | 5.422 | 6.139 | 4.521 | 10.059 | 4.411 | 4.423 |
FCOS + CBAM [15] | 86.10 | 5.303 | 5.654 | 4.381 | 4.870 | 4.380 | 4.260 |
FCOS + CCAM [14] | 86.90 | 4.632 | 4.491 | 4.025 | 3.641 | 4.035 | 3.517 |
FCOS + SA | 87.70 | 5.067 | 5.248 | 4.273 | 4.418 | 4.305 | 4.016 |
Table 18.
The comparison results of attention modules in parameters and GFLOPs.
Table 18.
The comparison results of attention modules in parameters and GFLOPs.
Method | Parameters (M) | GFLOPs |
---|
FCOS [13] | 32.118 | 19.764 |
FCOS + SE [36] | 32.126 (+0.02%) | 19.764 (+0%) |
FCOS + CSPnonlocal [45] | 32.284 (+0.52%) | 19.782 (+0.09%) |
FCOS + nonlocal [33] | 33.302 (+3.69%) | 19.882 (+0.60%) |
FCOS + CBAM [15] | 32.127 (+0.03%) | 19.764 (+0%) |
FCOS + CCAM [14] | 34.154 (+6.34%) | 19.964 (+1.01%) |
FCOS + SA | 32.253 (+0.42%) | 19.778 (+0.07%) |
Table 19.
The results of GA and SA fusion method.
Table 19.
The results of GA and SA fusion method.
Method | Malposition Accuracy (%) | ETT-Carina | ETT Tip | Carina |
---|
Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
---|
FCOS | 86.10 | 5.335 | 7.831 | 4.254 | 5.427 | 4.659 | 7.497 |
FCOS + SA + GA | 83.96 | 5.225 | 5.306 | 4.304 | 4.376 | 4.425 | 4.032 |
FCOS + GA || SA | 87.17 | 5.492 | 6.583 | 4.956 | 9.549 | 4.164 | 4.158 |
FCOS + GA + SA | 87.97 | 4.868 | 4.953 | 4.143 | 4.157 | 4.016 | 3.350 |
Table 20.
The effect of fusing global modelling attention and scale attention.
Table 20.
The effect of fusing global modelling attention and scale attention.
Method | Malposition Accuracy (%) | ETT-Carina | ETT Tip | Carina |
---|
Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
---|
FCOS | 86.10 | 5.335 | 7.831 | 4.254 | 5.427 | 4.659 | 7.497 |
FCOS + nonlocal*2 | OOM | OOM | OOM | OOM | OOM | OOM | OOM |
FCOS + CSPnonlocal*2 | 85.56 | 5.800 | 8.991 | 4.391 | 5.543 | 4.703 | 7.512 |
FCOS + CCAM*2 | 86.10 | 4.855 | 4.988 | 4.020 | 4.037 | 4.301 | 3.835 |
FCOS + SA*2 | 87.17 | 5.643 | 6.820 | 4.432 | 5.021 | 4.422 | 5.387 |
FCOS + CSPnonlocal + SA | 86.90 | 5.727 | 6.518 | 4.646 | 5.267 | 4.685 | 4.732 |
FCOS + CCAM + SA | 87.97 | 4.868 | 4.953 | 4.143 | 4.157 | 4.016 | 3.350 |
Table 21.
The results of employing mask branch into FCOS.
Table 21.
The results of employing mask branch into FCOS.
Method | Malposition Accuracy (%) | ETT-Carina | ETT Tip | Carina |
---|
Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
---|
FCOS [13] | 86.10 | 5.335 | 7.831 | 4.254 | 5.427 | 4.659 | 7.497 |
CTFA | 87.97 | 4.868 | 4.953 | 4.143 | 4.157 | 4.016 | 3.350 |
Seg (All) | 87.97 | 4.909 | 5.179 | 3.939 | 4.468 | 4.043 | 3.121 |
Seg (ETT) | 89.04 | 5.486 | 7.682 | 4.398 | 6.754 | 4.521 | 4.244 |
Seg (ETT + Carina) | 90.11 | 5.334 | 6.752 | 4.088 | 5.989 | 4.329 | 4.253 |
Seg (ETT + Carina) + Fusion | 91.18 | 4.911 | 5.114 | 3.689 | 3.802 | 4.238 | 3.862 |
Table 22.
The results of adopting mask prediction or not in the post-process algorithm.
Table 22.
The results of adopting mask prediction or not in the post-process algorithm.
Method | Malposition Accuracy (%) | ETT-Carina | ETT Tip | Carina |
---|
Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) | Mean Err. (mm) | Err Std. (mm) |
---|
w/mask | 85.56 | 7.438 | 10.552 | 6.389 | 10.672 | 4.329 | 4.253 |
w/o mask | 90.11 | 5.334 | 6.752 | 4.088 | 5.989 | 4.329 | 4.253 |
Table 23.
The visualization results.
Table 23.
The visualization results.