3.2.1. Objects Recognition

Sentiment is a complex logical response, to which the relations among objects in the image have a vital contribution. To deeply comprehend the interaction, we build a graph structure (relations among objects) to realize interaction features. And we take the categories of objects as the node and the hand-crafted feature as the representation of the object. However, existing image sentiment datasets, such as Flickr and Instagram (FI) [3], EmotionROI [25], etc., do not contain the object annotations. Inspired by the previous work [9], we employ the panoptic segmentation algorithm to detect objects.

We choose the R101-FPN model of Detectron2, containing 131 common object categories, such as "person", "cat","bird", "tree" etc., to realize recognition automatically. As shown in Figure 3, through the panoptic segmentation model, we process the original image Figure 3a to obtain the image Figure 3b containing the object category and location information.

**Figure 3.** Example of building graph model. Given the input image (**a**), Detectron2 can detect the region and categories of objects and (**b**) is the segmentation result. Based on the detection information, we build a graph (**c**) over the corresponding image.
