*3.1. Problem*

A number of datasets for pedestrian detection have been proposed to date. However, as shown in Table 1, their scale is small compared to those used for object detection. Minoguchi et al. proposed a weakly supervised learning method that eliminates false positives using existing pre-trained models by referring to bounding boxes and SVM and

by constructing a labeled dataset called the Weakly Supervised Person Dataset (WSPD) [2], which far exceeds the scale of previous pedestrian detection datasets. To the best of our knowledge, the WSPD is the largest existing pedestrian dataset. Minoguchi et al. revealed the detection performance of the pre-trained model on that dataset but did not mention the disparity in the miss rate for each age attribute. Table 2 shows the attribute distribution of some bounding boxes in the WSPD. This distribution is based on our random selection of 5000 bounding boxes from the WSPD and their classification by attribute. The "Noise" label indicates that there is no person in the bounding box, while the "Multiple" label indicates that the bounding box contains multiple people. As such, we can see that the existing pedestrian dataset has a large bias in the distribution of the quantity of data; in particular, the data for children are excessively limited. Therefore, it is necessary to check whether this bias in the quantity of data contributes to the disparity in detection performance.

**Table 1.** Comparison of object detection and person detection datasets.


**Table 2.** The age attribute statistics for people in bounding boxes in 5000 randomly sampled images from the WSPD dataset. The "Noise" label indicates that there is no person in the bounding box, whereas "Multiple" label means that one bounding box contains multiple people. In this paper, images labeled "Multiple" are not considered.

