*6.1. Feature Extraction from Tactile Data and Classification by Convolutional Neural Networks*

In the first approach we train two separate convolutional neural networks (CNN) to learn the features from the video of tactile images (cutaneous cues) and the sequence of normal vectors to the surface (kinesthetic cues). All tactile images captured by the virtual FSR sensor, are 32 × 32 grayscale images and 25 frames are considered for each exploration (contour).

The first CNN (dedicated to cutaneous cues) takes benefit from two convolution layers with 3D kernels followed by a batch normalization layer speeding up the learning process. No pooling layer is added as the size of tactile images is not so large to require down sampling. Two fully connected layers are then exploited to learn the relationship between the extracted features through the filters in the convolution layers. A SoftMax layer finally outputs the probability distribution values over predicted output classes. The network is trained for 50 epochs.

The second CNN (dedicated to cutaneous cues) has 25 × 3 × 1 data in its input layer. Thus, it can be implemented with 2D kernels in convolution layers. We set up a similar architecture for the second CNN, i.e., two convolution layers with a batch normalization in between followed by two fully connected layers, and a SoftMax layer generating the probability distribution results predicted for the output layer.

Each of these CNNs are applied after training to tactile data captured over identical contours as a test sample and output the probabilities that the sample belongs to each of the eight object classes. The winning class has the highest probability among all. In order to integrate the results obtained by cutaneous and kinesthetic cues, for each probing sequence from the test data, we sum up the obtained probability values computed by the two CNNs for each class and choose the class with the highest probability as the winning class, as illustrated in Figure 7.

**Figure 7.** The two convolutional neural network (CNN) structures and the decision on output class.
