2.3.3. Network Parameters

Inspired by SiamRPN++ [42], the stride of Stage3 and Stage4 of the backbone ResNet50 are halved and combined with null convolution. The dimensions of the features input to the RPN head for interaction are 25 × 25 and 25 × 25, as illustrated in Figure 2. The threshold of *IoU* is set to 0.75, which is attributed to the fact that the final output is the location rather than the proposed region.

The proposed framework is constructed on PyTorch with an epoch of 50 and batch size of 128, and the framework is optimized by SGD with a momentum factor of 0.9 and weight decay of 0.0001. The learning rate is increased from 0.005 to 0.01 using Warmup in the first 5 epochs, after which the learning rate decayed from 0.01 to 0.005 in exponential form. The backbone is not trained for the first 10 epochs and the whole framework is trained after the 10th epoch. The hardware platforms used for the experiments are Intel-Core i7-8700K CPU@3.70GHz and NVIDIA GeForce RTX3090 GPU.
