**1. Introduction**

Object detection is a kind of approaches for objects localization and category classification in digital images, which is one of the most challenging branches in the field of computer vision.

The early approaches are based on handcrafted image features. In [1], a pedestrian detection system is proposed with histogram of oriented gradients(HOG) feature and support vector machine(SVM). In [2], deformable part-based model(DPM) is proposed which enhanced detection accuracy by utilizing HOG features of the whole and part of objects. As the peak of handcraft feature based detection approach, the detection performances of DPM are still not ideal, due to the lack of effective representation of features. Besides, since hand craft feature extractor are always designed for specific object types, hence often result in low robustness in dealing with different category of objects.

In recent years, several object detection approaches based on convolutional neural networks (CNN) are proposed [3–5]. Image features are supervise trained by measuring error between prediction and annotated ground-truth in large-scale object detection datasets. Compare with handcraft features the CNN features' representation ability and robustness against various types of objects are both significantly enhanced. Therefore the detection performances are highly improved. Although deep learning object detection approaches have shown state of the art performance for general object detection, they are still limited in detecting small objects and the performances in detection various scale objects in single input image is also not ideal. The reasons for a low detection performances are as the follow:

**<sup>\*</sup>** Correspondence: yuh\_0111@bit.edu.cn; Tel.: +86-010-88521997


Overall, in order to further improve detection performances especially in detecting objects mentioned above, the number of small objects and proportion of various scale object in training samples are both important.In this paper an end-to-end object detection approach with multi-scale balanced sampling is proposed to improve the matching mechanism and ensure scale diversity in training samples. The key contribution of the approach is summarized as follow:


The remainder of this paper is organized as follows. In Section 2, background and related works are introduced. In Section 3, framework and implement detail of proposed approach are introduced. Section 4 presents the experiment results and comparisons with other similar approaches. Finally, Section 5 conclude the proposed approach.
