2. Literature
In the existing literature, a multitude of disturbances have been identified that can negatively impact the accuracy and precision of an RGB-D camera. In the context of this work, disturbances refer to all external factors present in the camera’s environment. In contrast to this, the recording process itself is not analyzed, as is the case with Büker et al. [
3]. The overview also does not concentrate on work that examines the fundamental accuracy of the system as, for example, Kurillo et al. [
4]. The focus is on the environmental variables, in particular. The following literature review presents relevant scientific works (
Table 1), examining:
Initially, the reviewed works are categorized based on the research area into Medicine, Technology, and Sports. Furthermore, a differentiation of recording devices is made, distinguishing between Kinect v1 (Kv1), Kinect v2 (Kv2), Azure Kinect (AK), and Others. The type of external influencing factors is classified into five categories: Camera Position, Human Position, Light Influence, Technical Progress, Velocity and Others. Camera and human positioning refer to the precise placement and orientation of the instance. Regarding light influence, all works investigating the impact of light on the performance of an RGB camera are summarized. Technical progress provides insights into whether newer technologies were compared. The term Velocity explains the analyzed speeds in the work. Finally, the “Others” category encompasses various aspects that do not fit into the aforementioned categories.
The works of Pfister et al. [
5], Ťupa et al. [
6], Xu et al. [
7], Albert et al. [
2], and Yeung et al. [
15] examine the gait of participants with the objective of identifying gait characteristics and investigating the suitability of RGB-D cameras for use in the medical field. In their study, Pfister et al. [
5] studied 20 participants walking and jogging at speeds of 1.34 m/s, 2.01 m/s, and 2.45 m/s on a treadmill. The experiment showed that the best results were achieved at the lowest speed. In their study, Ťupa et al. [
6] examined the gait of 18 individuals with Parkinson’s disease, 18 healthy controls, and 15 students while they walked forward and backward. The highest accuracy was achieved at a speed of 0.73 m/s. As demonstrated by Xu et al. [
7], the degree of measurement error varies contingent on the speed and the variable being measured. To illustrate, the investigated speeds exert no notable influence on the accuracy of step duration and step length (defined as the interval between two consecutive heel strikes of the same foot). Nevertheless, there is a notable impact of speed on the accuracy of step width recognition. To this end, the researchers examined 20 test subjects on a treadmill with the RGD-D cameras positioned in front of the subjects at speeds of 0.85 m/s, 1.07 m/s, and 1.30 m/s. In a further development of the work presented in Xu et al. [
7], Albert et al. [
2] evaluated the performance of Kv2 and AK. The experiment was conducted with five participants at treadmill speeds of 0.85 m/s, 1.07 m/s, and 1.30 m/s in a frontal position. The results demonstrated that the enhanced hardware and motion tracking algorithm of the AK resulted in markedly higher accuracy than the Kv2. However, both versions exhibited a greater measurement error for the extremities in comparison to the center of the body, due to the higher movement speeds and greater range of motion of the extremities. Yeung et al. [
15] conducted an analysis of the performance of treadmill gait pattern recognition from five camera positions (0°, 22.5°, 45°, 67°, 90°) and three walking speeds (0.83 m/s, 1.22 m/s, 1.64 m/s). The camera was positioned at a height of 1.2 m and a distance of 2.4 m from the subject. No significant difference in accuracy was observed at the speeds tested. In general, the AK demonstrated superior tracking accuracy in comparison to Kv2, with the exception of the 0° camera position.
The works of Otte et al. [
8], Abbondanza et al. [
9], Mobini et al. [
10], Shanyu et al. [
13], Wasenmüller and Stricker [
12], Faity et al. [
16], and Novo et al. [
17] investigate the performance of RGB-D cameras under different external conditions. In their study, Otte et al. [
8] recorded and analyzed the performance of 19 participants undertaking six different movement tasks. The findings indicated that head tracking exhibited superior accuracy compared to foot tracking. Furthermore, the accuracy of the system was found to vary according to the type of movement performed. Specifically, vertical movements of the head, shoulder, hand, and wrist exhibited higher accuracy. It was observed that an increase in the range of motion resulted in a corresponding increase in inaccuracy for the same movement. In order to ascertain the accuracy of measurements taken from a mannequin, Abbondanza et al. [
9] conducted experiments under a variety of conditions, including different positions, lighting (on/off), and the presence/absence of clothing and covered limbs. The findings indicated that the optimal positioning of the RGB-D camera is such that it does not obscure any body parts. Furthermore, the study revealed that lighting conditions had no significant impact on the performance of the RGB-D camera. Furthermore, the results demonstrated that a clothed mannequin exhibited superior accuracy compared to an unclothed one. Additionally, the greatest measurement inaccuracy was observed at an angle of 95° to 135° between the person and the camera. In their study, Mobini et al. [
10] examined the performance of ten vertical and ten diagonal hand movements at varying speeds. The movements were classified as either <3 m/s or >3 m/s for vertical movements and either <4 m/s or >4 m/s for diagonal movements. The optimal results for RGB-D cameras were observed for movements at the lowest speed. Furthermore, superior performance was noted at higher speeds for diagonal motion in comparison to vertical motion. In a separate study, Kawaguchi et al. [
11] recorded the body and hand movements of ten participants for eight seconds and compared the data with a reference system. The experiment demonstrated that the measurement error increased in conjunction with the velocity of movement, even at relatively low speeds when the hand was in close proximity to the body. In their study, Wasenmüller and Stricker [
12] conducted a comprehensive comparison of Kv1 and Kv2. Their findings revealed that while Kv2 remained consistent with increasing camera distance, precision exhibited a decline. It was demonstrated that surfaces with lower reflectivity, such as dark clothing, exhibit less reliable depth estimation compared to those with higher reflectivity. In the study conducted by Faity et al. [
16], 26 participants were instructed to perform seated reaching tasks while holding a dumbbell, in order to simulate the movement pattern of a partially paralyzed individual. The findings indicated that Kv2 exhibited moderate accuracy in the assessment of hand movement range, movement time, and average speed. However, the detection of elbow and shoulder range of motion, time to peak velocity, and distance traveled exhibited low to moderate reliability. In their study, Novo et al. [
17] examined the tracking of a mannequin in a two-armed lifting and lowering motion in both standing and sitting positions. The study demonstrated that the optimal positioning of Kv1 was directly in front of and above the mannequin at a distance of 1.30 m. Furthermore, an enhancement in accuracy was noted as the depth contrast between the mannequin and the surrounding environment increased. No significant difference was observed between the slow (0.2 m/s) and fast (0.4 m/s) motion trials. In their study, Shanyu et al. [
13] employed the Kv1 technique to construct a three-dimensional body model. The objective was to identify the optimal parameters for posture screening. The findings indicated that the accuracy of Kv1 diminished with augmented light intensity. Accordingly, an optimal light intensity of 80 lx was proposed. Additionally, the authors advised that the RGB-D camera should be positioned at chest height and at a distance of 1.30 m from the subject. The work of Tölgyessy et al. [
14] evaluates the skeleton tracking accuracy and precision of Kv1, Kv2, and AK across different distances and sensor modes. In their study, a human-sized figurine was placed on a robotic manipulator, and joint detection accuracy was measured at distances of up to 3.6 m. The results indicated that the AK outperformed its predecessors. The AK showed superior performance compared to Kv2 and Kv1, especially in tracking body joints closer to the center of the body. Tracking of extremities showed higher error rates due to the increased range of motion.Bertram et al. [
18] evaluated the accuracy and repeatability of the AK for clinical measurement of motor function in 30 healthy adults. It compared AK’s performance to the previous Kv2 and a marker-based motion capture system (Qualisys). In summary, AK showed the best performance in dynamic tasks and upper body movements, but weaknesses in stationary tasks and tracking movements of the feet and ankles. Büker et al. [
19] investigate the impact of varying illumination conditions on the body tracking performance of the Azure Kinect DK device, with the aim of ensuring the reliability of results in research involving human subjects. Two experiments were conducted. The first experiment employed four distinct lighting conditions, while the second experiment involved repeated measurements under similar lighting conditions. The researchers discovered that maintaining consistent light conditions resulted in comparable outcomes with minimal discrepancies, with a maximum of a 0.06 mm difference. However, varying light conditions introduced inconsistencies, with an error range up to 0.35 mm. The use of supplementary infrared light was identified as a particularly detrimental factor affecting the accuracy of the tracking process.
In conclusion, the majority of relevant literature is of a medical nature. Nevertheless, the existing literature provides valuable insights into factors that may subsequently influence performance. In general, the literature demonstrates that the performance of RGB-D cameras improves with technological advancement [
2,
12,
15]. The positioning of RGB-D cameras should be chosen to avoid obscuring or overlapping body parts [
9,
11]. It has also been shown that the camera angle between the subject and the camera can have a significant effect on the result, emphasizing the need to position the RGB-D camera directly in front of the subject [
9,
15,
17]. The distance between the subject and the RGB-D camera should not be too great, as accuracy decreases with distance [
12]. For Kv1, the optimal distance is 1.30 m [
12,
13]. The performance of RGB-D cameras is also affected by clothing, clothing color, and contrast with the environment [
9,
12,
17]. Abbondanza et al. [
9] found that Kv2 results were not affected by lighting conditions, whether on or off. However, Shanyu et al. [
13] found that the accuracy of Kv1 decreased with increasing light intensity and recommended an optimal light intensity of 80 lx. Regarding the speed-related factor, the scientific community has reached different conclusions. While Pfister et al. [
5], Mobini et al. [
10], Kawaguchi et al. [
11], and Albert et al. [
2] found the best performance of RGB-D cameras at the lowest tested speed, Xu et al. [
7], Yeung et al. [
15], and Novo et al. [
17] found no significant difference at the tested speeds. The different results may be due to the choice of measurement parameter. Otte et al. [
8] and Albert et al. [
2] showed that the measurement error is higher for the lower and upper extremities. This is due to the greater range of motion and faster movement of the extremities compared to the rest of the body. Mobini et al. [
10] also showed that at higher speeds the accuracy of Kv1 is higher for diagonal hand movements than for vertical hand movements. However, Otte et al. [
8] found that Kv2 had the highest accuracy for vertical movements.
5. Results
The results show that the joint (15) (right hand) has the lowest variance for all the test series carried out. If the linear guide is aligned in the X direction, the Y and Z positions should ideally remain stable, as the joints to be observed are firmly connected to the guide. In this arrangement, movement is only possible in the X direction, while the other axes should ideally show no deviations from the initial value. The same principle applies if the guide is aligned along the Y or Z axis; only the movement along the aligned axis should take place, while the values for the others should remain constant. Based on the average of the
s and the corresponding standard deviations (SD) for the axes, which should remain constant, it is found that the average error is the smallest when the movement is in the X direction, indicating better detection. Conversely, the average of the errors is highest for movements in the Z direction, which indicates lower detection accuracy. This trend is visually confirmed by the box plots, which clearly show the differences in accuracy between the different directions of movement. These results are also supported by the subsequent analyses with different parameters. Overall, the joint (15) exhibits the highest precision and accuracy of the joints tested. The highest inaccuracy is found at joint (16) (fingertip on the right) (see
Figure 4).
In the following, the influence of the independent variables is evaluated using joint (15). For this purpose, boxplots are created to visualize the distribution of the values of
_1 and
_2 in relation to the hand movement direction as a function of light intensity, hand movement range and hand movement velocity. The present results show that the highest measured accuracy and precision of 2.64 mm ± 0.84 mm is achieved at a light intensity of 500 lx. Furthermore, it can be seen that the results for the light intensities of 200 lx and 300 lx are almost identical overall in terms of accuracy. However, the precision varies to different degrees depending on the direction of hand movement, as can be seen by comparing
Figure 5 and
Figure 6 and
Table 5.
Table 5 summarizes the results of the average values of the
s (mean) and standard deviation (sd) for
_1 and
_2 as a function of the hand movement and luminous intensity.
The results for the hand movement range show noticeable that in most cases the average
s are highest at 500 mm. This means that the deviation of the normally constant values of the axes not in motion is the greatest on average across the 10 trials for each direction and range. Thus, the highest accuracy and precision for a movement range of 100 mm is measured at 2.24 mm ± 0.30 mm. It can also be seen that the results for the x-axis direction of movement are the most precise. As previously stated, the greatest deviation is recorded in the z-axis movement direction for a movement range of 500 mm at 16.20 mm ± 0.40 mm (see
Figure 7 and
Figure 8 and
Table 6). The calculated average values and the corresponding standard deviation for
_1 and
_2 of the hand movement directions and hand movement ranges investigated are summarized in
Table 6.
With regard to hand movement velocity, there is no clear pattern in terms of speed-dependent precision or accuracy. However, the results confirm the finding already mentioned above that the hand movements in the x-axis direction have the highest precision and accuracy and in the z-axis direction the lowest precision and accuracy with regard to the hand movement direction (see
Figure 9 and
Figure 10). However, the significantly lower scatter of the data sets for the hand movement in the x- and y-axis direction is clearer here. The results of the average values and standard deviation for
_1 and
_2 as a function of hand movement direction and hand movement velocity are summarized in
Table 7.