*4.1. Datasets and Settings*

#### 4.1.1. Datasets

In this section, we conduct our experiments on three publicly available video datasets for video-based person re-id: the iLIDS-VID dataset [30], the PRID 2011 dataset [79] and MARS [49], as shown in Figure 8. The iLIDS-VID dataset consists of 600 image sequences for 300 people from two non-overlapping camera views, and each image sequence has variable length consisting of 23 to 192 image frames, with an average number of 73. The dataset is very challenging due to clothing similarities, cluttered background, occlusions, viewpoint variations across camera views (Figure 8a). The PRID 2011 dataset includes 400 images sequences for 200 people from two adjacent camera views. Each image sequence has variable length consisting of 5 to 675 image frames, with an average number of 100. In our experiments, the sequence pairs with more than 21 frames are used to the requirement on the sequence length for extracting walking cycles. The main challenges of the dataset are lighting and viewpoint variations across camera views (Figure 8b). The MARS dataset consists of 1261 identities captured by 2 to 6 cameras. The train and test sets contain 631 and 630 identities respectively. 20,175 tracklets are obtain by DPM detector [80] and GMMCP [81] tracker, among them 3248 are distractors due to false detection or tracking. A large number of tracklets contain 25–50 frames, and most pedestrians have 5–20 tracklets. The MARS dataset is more challenging due to distractors, detected or tracked bounding box, besides the challenges mentioned above (Figure 8c).

**Figure 8.** Example pairs of the image sequences of the same person in different camera views from two datasets. (**a**,**b**) shows images of one person with two different cameras in each row, in iLIDS-VID and the PRID 2011 dataset, (**c**) shows images of one person with six different cameras in MARS.
