*2.2. Holistic Approaches*

Unlike keypoint-based approaches, holistic approaches are an end-to-end architecture that can be faster than keypoint-based approaches. Kendall et al. [36] proposed PoseNet, which firstly applies the CNN architecture to 6D pose estimation, and found that it was able to adapt well to the environment. Liu et al. [37] proposed SSD, which was the first method to associate bounding box priors with the feature maps of different spatial resolutions in the network that was able to detect objects in images using a single deep neural network. This method improved accuracy and retained a low time cost. Kehl et al. [38] proposed SSD-6D, which extended the SSD method to 6D pose estimation and allowed for easy training and handling of symmetries. Do et al. [39] proposed the deep-6DPose network, the detection and segmentation in which leverage the Region Proposal Network (RPN) [40] based on Mask R-CNN [41]. In pose estimation, it decouples the parameters into translation and rotation so that the rotation can be regressed via a Lie algebra representation. However, because the network uses the ROIs from RPN as inputs and predicts the 6D pose of the object in ROIs, the network was not able to work well when measuring the 6D pose of small or symmetrical objects. To overcome this problem, Xiang et al. [42] proposed a new network PoseCNN. This method calculated the translation vector by ensuring the center of the objects in the image and estimating the distance between the center and the camera. Then it calculated the rotation matrix by regressing to a quaternion representation. Additionally, it especially employed a novel loss function for symmetric objects. The method was able to handle occlusion and symmetric objects in cluttered scenes with RGB or RGB-D images as input.
