2.2.3. Statistical Analysis

The intraclass correlation coefficient (ICC) was used to check the inter-rater reliability of experts' subjective ratings [41]. Good consistency and agreement among different experts are the prerequisite to consider experts' rating as the gold standard for validating the developed DTW-based algorithm. ICC is a widely used reliability index and the general guideline of ICC is as follows: ICC < 0.5, poor

reliability, 0.5 < ICC < 0.75, moderate reliability, 0.75 < ICC < 0.9, good reliability, and ICC > 0.9, excellent reliability [42].

More importantly, final performance scores from the developed algorithm were compared with the experts' ratings (as a gold standard). The Pearson correlation coefficient (r) between final performance scores from the developed algorithm and those from experts was calculated to assess the strength of a linear relationship between those two evaluation methods. In addition, linear regression was used to calibrate performance scores from the algorithm so that the scores from two evaluation methods could be consistent. Differences between algorithm scores after calibration and experts' ratings were analyzed. The SPSS statistical package version 20 (IBM Corp., Armonk, NY, USA) was used for statistical analysis.
