**2. State-Of-The-Art**

Biologically inspired cognitive architectures are a challenging research area aiming to enhance machine intelligence solutions. Many researchers in the field of robotics target visuo-haptic interaction present in humans to design more intelligent robots with capability of sensing and exploring the environment in the way humans do. Despite the vast advancements in processing and learning visual data and the huge research interest in evolving the artificial sense of touch, the optimal integration of visual and haptic information is not yet achieved.

From the psychophysics and neuroscience side, many researchers are trying to explain how the tactile and visual information contribute in humans to interpret their environment. Klatzky et al. suggest that both vision and touch rely on shape information for object recognition [3]. Other researchers study different exploratory procedures that humans apply for tactile object recognition [2] and reveal the superiority of tactile perception in presence of vision [4]. Demarais et al. [6] studied the performance of visual, tactile, and bimodal exploration of objects for both learning and testing procedures for object identification.

From the cognitive computation and robotic side, several researchers are aiming to achieve an optimal integration of visual and tactile data. Magosso [7] trained a neural network reproducing a variety of visuo-haptic interactions, including the improvement of tactile spatial resolution using visual data, resolving conflict situations, and compensating poor unisensory information by cross-modal data. Gao et al. [8] trained a deep neural network by learning both visual and haptic features, confirming the idea that the integration of visual and haptic data outperforms the case where the two sensory data features are employed separately. Burka et al. [9] designed and constructed a multimodal data acquisition system emulating human vision and touch senses. Their sensor suite includes an RGB-D vision sensor, an ego motion estimator, and contact force and contact motion detectors. Kroemer et al. [10] trained a robot using both visual and tactile data to discriminate different surfaces by touch. Calandra et al. [11] trained a deep convolutional neural network to learn regrasping policies from visuo-tactile data. The network was then used to predict the probability of success when grasping, based on a set of grasping configurations. Van Hoof et al. [12] trained a robot by reinforcement learning using an autoencoder to perform tactile manipulations based on visual and tactile data, separately. Fukuda et al. [13] designed and produced a biocompatible tactile sensor used in laparoscopic surgeries, employing both visual and tactile feedback.

On the other hand, the technology, processing, and interpretation of tactile data itself attract huge research interest. The latest advancements in technology of tactile sensors are listed in the paper of Chi et al. [1]. Liu et al. [14] took advantage of joint sparse coding to classify tactile sequences from their dissimilarities as computed by dynamic time wrapping. A sequence of tactile data acquired as palpations on a set of seven objects using a five finger robotic hands is used in Gorges et al. [15]. Song et al. [16] designed and used a tactile sensor constructed from a thin polyvinylidene fluoride film to classify different textures. In another research [17], they took advantage of a similar sensor to evaluate fabric surfaces. They trained a support vector machine over data extracted by fast Fourier-transform followed by a principal component analysis to reduce the dimensionality.

In our previous work [18], we exploited a computational model of visual attention to guide the process of tactile probing by collecting imprints sequentially from a sequence of eye fixations. In this work, using inspiration from the haptic exploration of objects by humans for object recognition [3], we follow object contours to capture sequences of tactile data. The visual information in form of a set of visually interesting points, determined by the enhanced model of visual attention presented in our previous work [19], is advantageously employed to help selecting the contours which can enhance the recognition rate. The tactile exploration of objects is also brought closer to exploration strategies in humans by adaptively changing the size of tactile sensor according to geometrical features of the object.

### **3. Framework**

The proposed tactile object recognition framework is summarized in Figure 1. Starting from 24 object models from 8 classes, we first constructed a dataset of sequential tactile data. Relying on the contour following exploratory procedure employed by humans to perceive the general shape of objects and for object identification, in this work the sequence of tactile data is generated by following a complete contour of each model where the contour following is implemented either blindly or guided by a computational model of visual attention. The main objective is to show that the recognition rate can be improved by engaging visual data. Two different scenarios were then implemented to classify sequential data. In the first scenario, two Convolutional Neural Networks were used to learn the

features from sequences of tactile images (videos of tactile imprints) and sequences of normal vectors to object surface (cutaneous cues). In the second scenario, a series of features is extracted from a set of time series extracted from tactile videos using wavelet-decomposition and then a conventional learning algorithm, such as support vector machines and K-nearest neighbors are trained and tested for object classification. Each of the mentioned time series monitors the alternation in a specific tactile feature while the tactile sensor moves along the contour of objects. These features are themselves extracted using directional contourlet transformation [20]. The rest of the paper details the process of tactile data acquisition as well as the classification.

**Figure 1.** Framework for sequential tactile object recognition.
