**5. Dataset**

This paper uses the Penn-Fudan Database for pedestrian detection as well as segmentation (see Figure 3), which is available on the website (https://www.kaggle.com/jiweiliu/ pennfudanpe, accessed on 1 February 2021). It contains 170 images with 345 pedestrian objects, and it is compatible with both COCO [55–57] and Pascal VOC format [54]. We used the dataset during our research in COCO format.

**Figure 3.** Pedestrian dataset. Developed from [30].

The database consists of three subfiles, namely Annotation, PedMasks, and PNGImages, where annotation files are in text format, and both PNGImages & PedMasks are in png format. Before applying a Mask R-CNN model, the dataset is pre-processed. Each image is normalized and resized to equal sizes, as shown in Tables 2 and 3 below, where the normalization process transforms the pixel value of the images into the range of 0 to 1.


**Table 2.** The data that is shown in table (**a**) and (**b**) are used to modify the images before importing into the models. (**a**) Normalization of the dataset before importing into the model; (**b**) resizing of all the images in the dataset. Developed from [58].

**Table 3.** Here, Mask R-CNN with different backbones is trained with pedestrian dataset at epoch 10. In a Mask R-CNN, one extra loss called mask loss is added in addition to the losses in faster R-CNN model, where *λc* = loss of classifier, *λb* = loss of box regression, *λm* = loss of mask, *λ*0 = objectiveness loss, *λr*= loss of RPN box, and *λT*= overall loss (the minimum values of the columns are denoted by \*).


The table below (Table 3) introduces the results, where the overall loss *λT* [24] indicates the sum of all losses.

$$
\lambda \tau = \lambda\_{\mathfrak{c}} + \lambda\_{\mathfrak{b}} + \lambda\_{\mathfrak{m}} + \lambda\_{\mathfrak{b}} + \lambda\_{\mathfrak{r}} \tag{1}
$$

Equation (1): Total loss (*λT*) is equal to the sum of all losses.

#### **6. Inverse Gamma Correction**

The modification of the luminance characteristics can cause reduced visibility of an object and decrease the detection capability of the system [59]. However, the effect of the lighting conditions depends on many other factors, such as the distance of the given object. Beyond this, the lighting contrast between the object and background can also significantly influence detection efficiency. Accordingly, the system can capture sometimes darker or sometimes brighter images depending on the related factors.

Many different algorithms can be used to adjust the contrast and increase or decrease the brightness of the image. For instance, Histogram Equalization (HE) [60] or Bi-Histogram Equalization (BBHE) [61] can be applied to modify the lighting-related characteristics of the investigated images.

This paper uses the inverse gamma correction method to modify the brightness and darkness of the images. Thus, inverse gamma correction transforms the lighting characteristics of the input signal by applying a nonlinear power function. The power coefficient (gamma) represents the nonlinear nature of the human perception process related to the lighting conditions. Accordingly, the inverse gamma correction transformation is given by Equation (1) below.

$$I\_0 = I\_1^{1/\gamma} \tag{2}$$

Equation (2): Equation of Gamma Inverse Method, where *I*0 is the output intensity and *I*1 is the input intensity.

The value of *I*0 is between 0 and 1, following the introduced model, and *I*1 is the transformed intensity. This formula is applied when gamma's value is known, and it is commonly determined experimentally.

In accordance with the blind inverse gamma correction techniques [61–63], gamma is varied between 0.1 and 1.5 with a step size of 0.1, as shown in Figure 4 below. Following this, the gamma value of this image is one. The brightness of the image increases as the gamma value becomes larger, and the image becomes darker as the gamma value decreases.

**Figure 4.** The brightness of the image increases when increasing the gamma value (*γ*), and it decreases when decreasing the gamma value. Here Gamma Inverse Method is applied to change the luminance intensity of the image. Developed from [30].

#### **7. Instance Segmentation**

The instance segmentation [35,58] process involves two main steps. First, it detects and indicates the object by bounding boxes within defined categories, and in the second step, segmentation prediction is performed pixel-wise. Instance segmentation (see Figure 5) is different from semantic segmentation since, beyond the object detection phase, instance segmentation labels the objects, according to the investigated categories' sub-classes. In contrast, semantic segmentation performs the detection and then classifies the objects. We used the method of instance segmentation with Mask R-CNN in our research. This paper uses instance segmentation with Mask R-CNN.

**Figure 5.** Instance segmentation of the images where GIMP tool is used to segmen<sup>t</sup> the images.
