**3. Experiments**

#### *3.1. Infrared and Visible Image Dataset*

In this paper, the used infrared and visible image dataset is collected from [49,50]. Each pair of infrared and visible images are collected from aligned infrared and visible cameras, and each pair of images were already registered. Both the infrared and visible images are single-channel gray images. The used infrared images are far-infrared images. Table 1 lists the composition of the infrared and visible image dataset. We can see that the dataset contains a total of 3318 pairs of infrared and visible images, in which 1641 pairs of images are in the daytime and 1677 are in the night. We randomly select 668 pairs of infrared and visible images as the testing images, and the remaining 2650 pairs of infrared and visible images as the training images. In the 668 pairs of testing images, 352 pairs of images are in the daytime and 316 are in the night. In the 2650 pairs of training images, 1289 pairs of images are in the daytime, and 1361 are in the night. The image sizes include 640 × 471 (width × height) and 640 × 480, and we resize all infrared and visible images into 640 × 480. Data augmentation is introduced to avoid the over-fitting of the detection network. We use two augmentation strategies, that is, horizontal flip and Gaussian blur with standard deviation of 2, to increase the number of training images. Through data augmentation, we ge<sup>t</sup> 7950 pairs of infrared and visible images for the training of the detection network.


**Table 1.** The composition of the infrared and visible image dataset.

In the infrared and visible image dataset, there are several different object categories, including person, car, tree, building, and so on. The images of people in the person category include people that are still, walking, running, and carrying various things. In this paper, we only detect one category, that is, person. Figure 6 shows some examples in the infrared and visible image dataset. The images in the first row are visible images, and the images in the second row are infrared images. The first two columns show the images in the daytime, and the last three columns show the images in the night. We can see that although the objects in the visible images contain more detail information, the visible images are easily affected by low brightness (see Figure 6h,j), smoke (see Figure 6i), and noise (see Figure 6h). On the other hand, the contrast of the infrared objects is relatively high (see Figure 6c–e), while detail features in infrared objects are missing (see Figure 6c–e). Besides, from the first two columns, we can see that in the daytime, visible images may have better visual effects than infrared images.

(**f**) Visible Image 1 (**g**) Visible Image 2 (**h**) Visible Image 3 (**i**) Visible Image 4 (**j**) Visible Image 5

**Figure 6.** (**<sup>a</sup>**–**j**) Some examples of infrared and visible images from the used image dataset. The images in the first row are infrared images, and the images in the second row are visible images.
