*2.1. Depth-Based Representation*

In [8,9], descriptors previously designed for the deep RGB channel were generalized to describe the geometry of the shape and construct a depth-based representation. The limitation of the approach proposed in [8] is the sensitivity to the point of view as the sampling scheme depends on the view. Along similar lines, Oreifej and Liu [3] used the histogram of oriented gradients (HOG) to capture the distribution of the normal orientation of the surface in 4D space. Yang et al. [10] proposed to concatenate the normal vectors into a spatio-temporal sub-volume of depth together to capture more informative geometric clues. In the work of Lu et al. [11], to represent complex human activities involving human–object interactions without taking into account holistic human postures, discriminating local patterns were used, and also, the authors proposed to study the relationship between the sampled pixels in the actor and background regions. A common limitation of depth-based approaches is the view sensitivity and time consumption due to the heavy signature compared to skeleton-based approaches.
