*1.2. Classification*

In this paper, the 6D pose estimation approaches are divided into two categories: 1. Learningbased approach, and 2. Non-learning-based approach. The non-learning-based approach is divided into two categories: 1. 2D-information-based approach, and 2. 3D-informationbased approach. The classification in this paper is mainly based on the core principle and the input information of the various methods: Learning-based approaches mainly use CNN, regression or some other methods based on deep learning to train a learning model with adequate training data and then obtain the 6D pose estimation result on the basis of these models. Approaches that do not use deep learning belong to the following two categories. 2D-information-based approaches mainly use the 2D information of the scene, such as RGB images. 3D-information-based approaches mainly use the 3D information of the scene, such as point clouds and RGB-D images. Both 2D-information-based approaches and 3D-information-based approaches convert the 6D pose estimation into image retrieval. The two types of approach both calculate the key points or key features and match the input image with the most similar image in the dataset according to the key points or key features. However, they also have some obvious differences, which will be covered in following sections.

The main purpose of learning-based approaches is to train a proper model to measure the 6D pose of an unknown situation according to the training data. Many kinds of model can be used to measure the 6D pose, such as regression models and CNN models. There are many classification methods for learning-based approaches, and those widely accepted among them are introduced in this paper. Keypoints-based approaches adopt a two-step category to measure the 6D pose, which is easier to implement than other approaches. Meanwhile, the aim of holistic approaches is to train an end-to-end network to measure the 6D pose of an object. It sees the image as a whole and tries to predict the location and orientation of the object in a single step and discretize the 6D space, converting the pose estimation task into a classification task. However, holistic approaches are more complex and time-consuming than keypoints-based approaches.

As for 2D-information-based approaches, the main purpose is to find the correlation between the input image and one of the template images through the 2D information contained in the image. Actually, 2D-information-based approaches converts the pose estimation into an image-matching problem. The matching results have a great influence on the results of the pose estimation. 2D-information-based approaches can be divided into real-image-based approaches and CAD-image-based approaches according to the kind of template used. When the approach uses real images as a template, it belongs to the real-image-based approaches. If the approach uses images generated by CAD model, it can be regarded as a CAD-image-based approach. In general, CAD-image-based approaches are more accurate than real-image-based approaches because the images generated by CAD models contain little noise. However, sometimes CAD models cannot be easily obtained, so real images are used as a template in such situations.

3D-information-based approaches also focus on the matching between the input and the dataset; however, they use the 3D information of the object, such as point clouds and RGB-D images. 3D-information-based approaches can be divided into two categories. The main idea of matching-based approaches is to match the input image and the template directly and to take the 6D pose of the matched template as the pose estimation result of the input image. Local descriptor-based approaches measure the 6D pose using the correspondence between the descriptor of input images and templates. Matching-based approaches require large storage to save enough templates to ensure the accuracy of pose estimation, and the more templates it has, the more accurate the pose estimation result will be.
