*4.8. System Reliability and Accuracy Evaluation*

A commonly accepted measure of reliability in the context of clinical assessments is the Intra Class Correlation coefficient (ICC). Accordingly, the reliability of the system assessments respect to the neurologist ones was evaluated by the Intra Class Correlation coefficient ICCN12-SY (two-way random effects model with an absolute agreement) [3]. The inter-rater agreement ICCN12 between the two neurologists was evaluated and compared as a baseline with the inter-rater agreement ICCN12-SY among neurologists and system, considering the system as a third "virtual" neurologist.

In the evaluation of ICCN12, the scores of the neurologists for the LA, AC, Po tasks and for the subscale PSPIGD were considered, while for ICCN12-SY both the neurologist scores and the corresponding system scores were used. Concerning the reliability of the remote video-based assessments, motor examination of video recorded UPDRS tasks has already been demonstrated to be a sufficiently accurate alternative to in field ones [61]. In machine learning context, it is more common to assess the reliability of classifiers by their accuracy. Then, we evaluated also this measure of system performance considering the mean accuracies of each classifier, both in discriminating between PD from HC subjects (binary classification problem) and in classifying PD subjects into different severity classes (multi-classes classification problem) [62].
