*5.3. Results*

We compare the results of human segments based on Mask R-CNN, PointRend, TridentNet, TensorMask, CenterMask on the MADS dataset, which have been divided by the ratios. The results based on box (*b*) are shown in Table 9. In Tables 9 and 10, the human segmentation results on the box (*b*) and mask (*m*) of the CenterMask are the highest (*AP<sup>b</sup>* = 69.47%, *APm* = 61.28%). Human segmentation results on the MADS dataset are also shown in Figure 10, and the wrong human segmentation results are also shown in Figure 11.

**Table 9.** The results (%—percent) of human segmentation (box-*m*) on the MADS dataset is evaluated on the CNNs.



**Table 9.** *Cont.*

The results based on the mask (*m*) are shown in Table 10.

**Table 10.** The results (%—percent) of human segmentation (mask-*m*) on the MADS dataset evaluated on the CNNs.


Figure 11 shows that there are a lot of wrongly segmented pixels (segmented background pixels of human data), and there are also some segmented areas of human data. The problem is the result of the wrong person detection step in the image. In this paper, we also have shared the complete revised source code of CNNs on links (https://github.com/d uonglong289/detectron2.git), (https://github.com/duonglong289/centermask2.git), and the retrained model of CNNs on link (https://drive.google.com/drive/folders/16YHR 8MxOn4l8fMdNCJZv56AcLKfP\_K4-?usp=sharing (accessed on 16 June 2021)). Although there is only one person in the image of the MADS dataset (the data captured by the stereo sensor), it still poses many challenges. Due to the low quality of the images obtained from the stereo sensor, the images are blurred, the lighting is not perfect, and the activities of the people in the image are fast (martial arts, dancing, and sports), so the gestures of the legs and arms are blurred.

**Figure 10.** Examples of human segmentation results on the MADS dataset by CNNs.

**Figure 11.** Examples of false human segmentation results on the MADS dataset.
