*3.1. Data Preprocessing*

Due to the use of the Weizmann database [30] it is assumed that for each video sequence there is a set of binary images, and these images are foreground masks extracted from video frames. One sequence represents one action type and one image contains one silhouette. Frames in which an object is occluded or too close to the edge of the video frame are removed. The direction of the action is checked and, if necessary, the video frames are flipped so that all objects in the sequence move from left to right. Then, each silhouette is replaced with its convex hull, which reduces the impact of some artefacts (e.g., additional pixels) introduced during background subtraction (see Figure 1 for examples). It is indicated in [29] that the use of convex hulls improves classification accuracy.

**Figure 1.** Sample silhouettes from the Weizmann database [30] and the corresponding convex hulls.

Before the actual classification, the dataset is divided into two subsets based on the centroid locations on the consecutive frames. This is related to action characteristics—some of actions are performed by a person standing in place (short trajectory) and the rest contain a person who changes location in every frame (long trajectory). Examples are given in Figure 2. This procedure can be called a coarse classification. It influences subsequent steps of the approach which are performed separately in each subgroup. Therefore, there is a possibility of selecting different features and parameters better suited to the specific action types.

**Figure 2.** Exemplary trajectories for ten different actions of one actor: actions performed in place are in the top row (bending, jumping-jack, jumping in place, one-hand waving and two-hand waving) and actions with changing location of a silhouette are depicted in the bottom row (jumping forward, running, galloping sideways, skipping and walking). Centroid trajectory is displayed over sample frame from a corresponding video sequence.
