*4.1. Cutaneous Cues (Tactile Imprints)*

In piezoresistive tactile sensors, which are widely accepted as a promising solution for tactile object recognition [21], the deformation in the sensor surface when subjected to an external force and in touch with an object can modify the resistance of a bridge circuit producing a proportional differential voltage. This differential voltage is then further processed to generate a tactile image. Inspired from working principle of piezoresistive sensors, we have developed a virtual tactile sensing module where the deformation measure in the surface of the tactile array is simulated as the distance between object surface and all the cells on a determined tangential plane to the object surface when the distance between the center of the plane and the object is zero. Figure 2 illustrates the simulated tactile sensor as a blue plane as well as an example of locally captured deformation profile (Figure 2b) and the produced tactile image (Figure 2c).

**Figure 2.** Adaptive modification of tactile sensor size.

### *4.2. Adaptive Probing*

In this work, the locations on the surface of objects, from which tactile data are captured, are previously determined using object contours (blindly or based on a model of visual attention as it will be discussed in Section 5), and the center of the sensor is considered to be positioned at these locations. Consequently, in concave surfaces, such as the example in Figure 3, the sensor surface intersects the object resulting in negative distance values between the object and sensor. Since a real rigid backing FSR sensor cannot acquire tactile data from such probing cases, we follow the haptic exploration strategy by humans, where tactile information from larger surfaces is obtained by the palm and fingertips are used for finer details and concave surfaces. Accordingly, in this work, we have adaptively adjusted the sensor size to capture the local tactile data. A real counterpart robotic hand can be designed and produced using multiple FSR arrays with different sizes placed in palm and different phalanges. Alternatively, the Barrett robotic hand [22] can be purchased equipped with tactile sensing pads across fingers (smaller pad) and palm (larger pad), which is in accordance to the use of sensors we make in this work. In order to keep the size of the tactile image consistent during the experimentations (i.e., 32 × 32 in the current work), the distance between the sensing points is diminished; therefore, a higher local precision is achieved since the same number of sensing elements are assigned to touch a smaller surface of the object. As such, a tactile imprint of size 32 × 32 is obtained for finer details of objects as well.

**Figure 3.** Adaptive modification of tactile sensor size.

### *4.3. Kinesthetic Cues*

As previously mentioned, kinesthetic cues can supply crucial information regarding the shape and size of the explored objects that are not perceivable by human skin. Drawing inspiration from kinesthetic cues contributing in human sense of touch, such as the angle between finger phalanges and the trajectory of finger motions when exploring an object, we have computed and used the normal vectors to the object surface in the process of object recognition. When probing an object with a real tactile sensor, the normal vectors to the surface are similarly computed and used to bring the sensor in contact with the object.

## *4.4. Sequential Tactile Data Collection*

According to psychological research, when exploring an object with the hands in order to recognize it, humans tend to follow object contours to understand the global shape of the object leading to object recognition [2,3]. Relying on this biological fact, we move the tactile sensor along a complete contour of the object to simulate both cutaneous and kinesthetic cues. As a result, a video of tactile imprints for cutaneous cues is generated for each contour following, where the number of consecutive frames of the video is subsampled to 25 frames to reduce the high computational cost of data processing. Similarly, a trajectory of normal vectors and the 3D coordinate of probing locations are computed.

### **5. Contour Following**

As previously mentioned, this work relies on the classification of sequential tactile data collected around contours of objects. Blind contour following and contour following guided by visually interesting points are the two strategies that are explored in this work to investigate the idea that the contour over which tactile probing takes place can play a decisive role in recognition rate.

Object contours are determined using 3D planes intersecting the object. Finding the equation of each plane, the set of points belonging both to the object and the plane, form a contour around the object. In order to find the equation of a plane in 3D space, three distinct noncolinear points are required. In this work, all the planes produced to determine probing paths are set to pass through the center of the models to avoid the selection of local contours around object extremities. As such, the center of each model is chosen as one of the three required points for formation of all planes. It is worth mentioning that such an implementation does not necessarily require visual data since supplementary tactile explorations such as the grasp stabilization method used in Regoli et al. [23] or a reinforcement learning as described in Pape et al. [24] can assist in determination of such contours. The acquisition of tactile information by exploration is both expensive in time and robot programming effort. Besides, it could lead to the acquisition of unnecessary data. All these, together with the possible advantage of visual cues in selection of more informative contours, incited us to consider the two data acquisition strategies as follows.

In the case of blind contour following, besides the central point of the model, the two other points are randomly selected from the vertices of the object model; in the case where contours are guided by the model of visual attention [19], the two other points are selected randomly from the set of visually interesting points.

The computational model of visual attention presented in [19] is adopted in this work to determine visually interesting points. The model uses the virtual camera of Matlab to collect a series of images from each object such that the complete coverage of the object surface is ensured. The obtained images are then decomposed into nine channels which are believed to contribute in guidance of attentions in humans, including color opponency, DKL color space, intensity, contrast, orientation, curvature, edges, entropy, and symmetry. The contribution weight of each channel is then learned based on a set of ground-truth points identified by a set of users. The extracted visual features are finally integrated according to the computed weights as described by equation 1, where *Smap* is the computed saliency map, *wcol*, ... *wsym* are the contribution weight of each feature, respectively, and *Ccol*, ... *Csym* are the feature maps illustrated in Figure 4.

$$Samp = \frac{w\_{\rm cell} \cdot \mathbb{C}\_{\rm cell} + w\_{\rm sun} \cdot \mathbb{C}\_{\rm cut} + w\_{\rm zurr} \cdot \mathbb{C}\_{\rm cut} + w\_{\rm DLI} \cdot \mathbb{C}\_{\rm DLI} + w\_{\rm alg} \cdot \mathbb{C}\_{\rm nlg} + w\_{\rm mt} \cdot \mathbb{C}\_{\rm rtl} + w\_{\rm int} \cdot \mathbb{C}\_{\rm ori} + w\_{\rm gri} \cdot \mathbb{C}\_{\rm sym}}{\sum w\_{\rm Compacity} \cdot \text{Maps}} \tag{1}$$

Subsequently, the brightest regions on the resulted feature map (saliency map) after setting the intensity of image background to zero are identified using a nonmaximum suppression paradigm leading to determination of visually interesting points on images. A 2D to 3D projection algorithm recuperates the 3D coordinates of each salient point, which are used in the current study to guide the determination of object contours. A detailed description about the computation of each channel as well as further details on the computational model of visual attention are available in Rouhafzay et al. [19] for interested readers.

**Figure 4.** The nine channels contributing to the model of visual attention.

Figure 5a illustrates three examples of probing paths formed by random points (blind contour following), while examples of paths guided by visually interesting points are depicted in Figure 5b.

**Figure 5.** (**a**) Examples of blind contour following paths for model of Plane. (**b**) Examples of guided contour following paths by visually interesting points for model of Plane.

Since the number of vertices on a contour is very large and the collection of tactile data from all those vertices is neither efficient nor necessary, the obtained set of vertices is first subsampled to 25 points with equal distances between them and then cutaneous and kinesthetic data as explained in Section 4 are captured. Six of the twenty-five consecutive frames of the tactile video captured from the model of plane are depicted in Figure 6a, while Figure 6b illustrates an example of a set of normal vectors to the surface.

**Figure 6.** (**a**) Example of six consecutive frames of the tactile video captured from the model of Plane. (b) Example of sequence of normal vectors.

### **6. Sequential Tactile Data Classification**

Once both cutaneous and kinesthetic cues are acquired for the objects, we implement two different approaches for object recognition by classifying the sequences of tactile data to determine if the acquired results for all techniques confirm the superiority of the model of visual attention. Convolutional neural networks allow us to feed the acquired tactile images directly; it automatically performs both feature extraction and classification of the tactile data. In order to use the two other classifiers, i.e., support vector machine and K-nearest neighbors, we need to extract ourselves the relevant features from tactile imprints and verify how these features are altered by moving the sensor around the object. This is the main distinction between the two approaches used in this paper and which are discussed in the next subsections.
