*2.2. Skeleton-Based Representation*

Human movements can be effectively captured by the positional dynamics of each skeletal joint [12–15], or the relationship between joint pairs [4,16], or even their combination [17–19]. In [8], a tool for monitoring the human skeleton (3D posture) in real time from an image at a single depth was developed. The existing skeleton-based human action recognition can be broadly grouped into two main categories: joint-based approaches and body part-based approaches. Joint-based approaches consider the human skeleton as a set of points, whereas body part-based approaches consider the human skeleton as a connected set of rigid segments between connected pairs of body parts. In [14], human skeletons were represented using the 3D joint locations, and a temporal hierarchy of co-variance descriptors was proposed to model joint trajectories. F.Lv and R.Nevatia in [15] proposed to use the hidden Markov models (HMMs) to represent the position of the joints. Devanne et al. [20] represented the 3D position evolution as a trajectory of movement. The problem of action recognition was then formulated as the problem of calculating the similarity between the shape of trajectories in a Riemannian manifold. Along similar lines, in these works [21,22] presented a Riemannian analysis of distance trajectories for real-time action recognition. In [23], the relative positions of pairwise articulations were used to represent the human skeleton, and the temporal evolution of this representation was modeled using a hierarchy of Fourier coefficients. X.Yang et al. in [16] proposed an effective method using the relative articular positions, temporal displacement of joints, and offset of the joints with respect to the initial frame.

The second category of skeleton-based approaches investigates the body parts. In [2], the human skeleton was represented by points in the Lie group *SE*(3)<sup>×</sup>... <sup>×</sup>*SE*(3), by explicitly modeling the 3D geometric relationships between various body parts within a frame using rotations and translations, then the human action was modeled as curves in this Lie group. The temporal evolution was handled by dynamic time warping (DTW). On the other hand, in [24], the human skeleton was hierarchically divided into smaller parts, and each part was represented using some bio-inspired features. Linear dynamic systems were used to model the temporal evolution of this part. Generally speaking, the joint-based method owns a faster calculation speed, while body the part-based method owns higher accuracy [25].
