A Visuo-Haptic Framework for Object Recognition Inspired by Human Tactile Perception

Rouhafzay, Ghazal; Cretu, Ana-Maria

doi:10.3390/ecsa-5-05754

Open AccessProceeding Paper

A Visuo-Haptic Framework for Object Recognition Inspired by Human Tactile Perception^†

by

Ghazal Rouhafzay

^*

and

Ana-Maria Cretu

Department of Systems and Computer Engineering, Carleton University, Ottawa, ON K1S 5B6, Canada

^*

Author to whom correspondence should be addressed.

^†

Presented at the 5th International Electronic Conference on Sensors and Applications, 15–30 November 2018; Available online: https://ecsa-5.sciforum.net.

Proceedings 2019, 4(1), 47; https://doi.org/10.3390/ecsa-5-05754

Published: 14 November 2018

(This article belongs to the Proceedings of 5th International Electronic Conference on Sensors and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

This paper addresses the issue of robotic haptic exploration of 3D objects using an enhanced model of visual attention, where the latter is applied to obtain a sequence of eye fixations on the surface of objects guiding the haptic exploratory procedure. According to psychological studies, somatosensory data resulting as a response to surface changes sensed by human skin are used in combination with kinesthetic cues from muscles and tendons to recognize objects. Drawing inspiration from these findings, a series of five sequential tactile images are obtained by adaptively changing the size of the sensor surface according to the object geometry for each object, from various viewpoints, during an exploration process. We take advantage of the contourlet transform to extract several features from each tactile image. In addition to these somatosensory features, other kinesthetic inputs including the probing locations and the angle of the sensor surface with respect to the object in consecutive contacts are added as features. The dimensionality of the large feature vector is then reduced using a self-organizing map. Overall, 12 features from each sequence are concatenated and used for classification. The proposed framework is applied to a set of four virtual objects and a virtual force sensing resistor array (FSR) is used to capture tactile (haptic) imprints. Trained classifiers are tested to recognize data from new objects belonging to the same categories. Support vector machines yield the highest accuracy of 93.45%.

Keywords:

haptic exploration; visual attention; visuo-haptic interaction; tactile object recognition

1. Introduction

Many psychological research articles over the past three decades have focused on the haptic perception and exploratory procedures employed by humans to identify objects and their characteristics. Lederman et al. [1] identified six manual exploratory procedures exploited by humans when interacting with an object, among which “enclosure” and “contour following” provide information about the global and exact shape of objects, respectively. The authors also referred to two different types of information used during these exploratory procedures: mechanoreceptors in the skin (cutaneous cues) capturing fine textural details and mechanoreceptors in joints and tendons (kinesthetic cues) for geometrical shape identification. Reproducing such haptic exploration techniques for humanoid robots has recently attracted a wide research interest. On the other hand, tactile perception is proven to be more reliable in the presence of vision [2], and these two senses contribute closely in a human sensorial loop. Inspired by the visuo-haptic interaction in the human sensorial loop and the sequential nature of haptic exploration to integrate several tactile features of objects, in this work we developed a framework for robotic object recognition based on tactile probing from a sequence of eye fixations on an object’s surface. Accordingly, an enhanced model of visual attention was employed to determine a sequence of eye fixations on the surface of objects from different viewpoints. Subsequently, tactile data were collected by adaptively changing the sensor surface size according to the local object geometry. These cutaneous cues, together with the three-dimensional (3D) coordinates of probing locations and the normal vector of the object’s surface at probing locations as two kinesthetic cues, were used for object recognition. To confirm the efficiency of our framework before implementation, we performed experiments using a virtual tactile sensor and virtual 3D models. The paper is organized as follows: A brief literature review of related work on tactile object recognition is provided in Section 2. Section 3 presents the proposed framework. The obtention of the sequence of eye fixations and the adaptive collection of tactile data are discussed in Section 3.1 and Section 3.2, respectively. Section 3.3 discusses feature extraction, while Section 3.4 presents the kinesthetic cues employed in object recognition. The data processing strategy is introduced in Section 3.5. Finally, Section 4 shows obtained results and Section 5 concludes the work.

2. State of the Art

Enabling humanoid robots with a capability of sensing their environment similar to that of humans is a challenging subject in recent robotic research. Artificial sense of touch allowing the identification of a wide range of object characteristics including deformability, elasticity, textural features, temperature, approximate weight, etc. is a beneficial technology attracting a huge research interest. Accordingly, a variety of tactile sensors have been designed and manufactured. In a recent publication, Chi et al. [3] discussed the latest advancements in the technology of tactile sensors. Tactile arrays developed using force sensing resistors have demonstrated a high reliability for object recognition. Liu et al. [4] used a three-finger robot to collect tactile sequences. Dynamic time wrapping between sequences was then computed to measure dissimilarities for classification based on joint sparse coding. The authors of Reference [5] captured tactile data as displacement of finger joints of a robot when grasping an object. They trained a self-organizing map (SOM) for classification purposes. Luo et al. [6] employed the 3D coordinates of probing locations with the index of clustered tactile data using a k-mean algorithm for object classification. Gorges et al. [7] took advantage of a robotic hand with five fingers to recognize a set of seven objects based on haptic data acquired from a sequence of palpations. There have also been several attempts at the integration of haptic and visual data for object recognition reported in the literature. Gao et al. [8] trained a deep neural network architecture for object classification by learning both haptic and visual features. They demonstrated that the integration of visual and haptic features outperformed the case where visual and tactile characteristics were employed separately. In our previous work [9], we developed a framework for object recognition in which tactile data were gathered from visually salient regions with the aim of overcoming the high computational cost of probing the whole surface of objects. In this work we took advantage of a model of visual attention to comply with the sequential nature of haptic exploration with a sequence of eye fixations. To reproduce the exploration strategies performed by humans in which the global shape of an object is perceived in general by the palm and finer details are captured by fingertips, the size and precision of the tactile sensor adaptively changes based on the geometrical characteristics of the studied object.

3. Framework

Figure 1 illustrates our framework for object recognition based on guided haptic exploration. The first two stages of the work correspond to the development of a model of visual attention and the determination of the sequential strategy to move the tactile sensor. For this purpose, the virtual camera of Matlab turns around each object to capture images from 16 viewpoints, and a computational model of visual attention [10] is used to determine the sequence of eye fixations from each viewpoint. The tactile sensor then follows the sequences of eye fixations to collect tactile imprints at their locations. In next stage, a vector of 16 features is computed for each tactile image. The 3D normal vector and the 3D coordinates of probing locations add six other tactile features. Consequently, five vectors (as we used a series of five sequential tactile images) of size

1 \times 22

are computed for each sequence of eye fixations. To reduce the dimensionality of feature vectors, a self-organizing map (SOM) is trained, resulting in five-dimensional feature vectors. The standard deviation, root mean square (rms )value, and skewness of each sequence and the same measures extracted from wavelet coefficients for a three-level decomposition of sequences by Daubechies 2 wavelets are concatenated and fed to five classifiers—k-nearest neighbors (kNN), support vector machine (SVM), decision trees, quadratic discrimination, and Naïve Bayes—for classification. In this work, we conducted experiments over four classes of objects each containing three objects, two of which were used for training the classifiers while the third was used for testing. Further details are provided in the following sections.

3.1. Sequences of Eye Fixations

In the human visual system, the allocation of the narrow high-resolution part of the retina (fovea) permitting full visual perception is referred to as the focus of attention. Research in the field of neuroscience has confirmed the contribution of a series of features extracted in the field of view such as color opponency, contrast, curvature, intensity, orientation, etc. in the allocation of attentional resources. Accordingly, researchers tried to reproduce this process as a computational model of visual attention for a rapid automatic analysis of scenes. In this work, we adopted the enhanced model of visual attention presented in Reference [10] to compute saliency maps for 3D objects. These saliency maps assign higher intensities to regions attracting the attention, based on which a sequence of eye fixations can be retrieved for each viewpoint of the object in order of importance. Only the first five elements of the sequence of eye fixations for each object are taken into consideration in the remainder of the paper. The tactile sensor follows this sequence of eye fixations to collect tactile data in view of object classification. Figure 2 summarizes the procedure of obtaining the sequence of eye fixations.

3.2. Tactile Data Collection Using Adaptive Sensor Size

In force sensing resistor (FSR)-based tactile sensors, the deformation on the surface of the elastic surface covering the FSR array, when subjected to an external force and in contact with the surface of an object, is transduced to produce a tactile image. In this work, a virtual tactile sensor was simulated such that the deformation in the sensor’s surface was measured as the distance between points on a tangential plane (representing the surface of the sensor) and the object, when the center of the plane was in direct contact with the object. The obtained values were then normalized between zero and one to yield a tactile image. Using the probing locations at the determined eye fixations in Section 3.1, the center of the sensor array was considered positioned at these locations and simulated to touch the probing point. As a result, in certain situations such as concave surfaces, negative distances between the object and the sensor surface can be achieved, indicating an intersection between the sensor and object. Figure 3 illustrates an example of such a case. Since in such probing cases data cannot be acquired in reality (i.e., using real rigid backing FSRs), and in compliance with the haptic exploration strategy by humans where the overall shape of objects is probed by the palm (large tactile surface) and the finer details can be captured by fingertips (small tactile surface), in the current research the sensor’s surface was adaptively diminished to capture the local tactile data. However, to keep the size of the tactile image consistent (i.e.,

32 \times 32

during experimentation), the distance between the sensing points was reduced, thus resulting in a higher local precision.

3.3. Feature Extraction from Tactile Imprints

Feature extraction from tactile data is a determining factor in object recognition. The authors of Reference [11] proved the efficiency of wavelet decomposition in feature extraction from tactile data. On the other hand, contourlet transformation is believed to outperform wavelet decomposition to extract features when applied to images [12]. Consequently, in this paper we used contourlet transformation to extract features from tactile imprints. As such, a 16-directional contourlet transform [12] was first applied to each tactile image and then the standard deviation of obtained coefficients for each directional sub-band was computed to produce a feature vector of size 16.

3.4. Kinesthetic Cues

Human skin is not the only source of tactile information. When exploring an object with the hand, the angle between finger phalanges and the position of fingers as they are in contact with the object surface supply crucial information about the size and shape of the explored objects—information that is not available from skin mechanoreceptors. In neuroscience research, such data from joints, bones, and macules are referred to as kinesthetic cues. In this paper, to employ kinesthetic cues for object recognition, we added the 3D coordinates of the probing locations as well as the local normal vectors at the probing points as features for classification.

3.5. Data Processing

As previously explained, we used 16 features extracted from each tactile imprint in addition to the 3D coordinates and the normal vector of probing points, thus resulting in a final feature vector of size

1 \times 22

. Five consecutive imprints captured over the sequence of eye fixations were used for classification. As high-dimensional data have a negative impact on classification accuracy, we train a self-organizing map to reduce the 22-dimensional feature vector to only five dimensions, thus resulting in five sequences (as we have five features) of length five (five tactile data from the sequence of eye fixations). To train a classifier, we need to determine how each feature varies over the sequence when moving the tactile sensor. For this purpose, we take advantage of Daubechies 2 wavelets to decompose each sequence into three levels. The standard deviation, rms value, and skewness of the wavelet coefficients for each level, as well as those of the sequence itself, are concatenated to produce a final feature vector for classification.

4. Experiments and Results

Figure 4 illustrates the 3D models used for experimentation. The objects in the first two columns were used to train classifiers, and the tactile data from objects in the third column were used for testing. As briefly mentioned before, five classifiers, namely kNN, SVM, decision trees, quadratic discrimination, and Naïve Bayes, were trained using the generated data set according to the previously explained data processing strategy.

The results obtained using the proposed framework are compared in Table 1 with the case in which sequential data were captured by randomly moving the sensor over the object surface. When collecting data at random positions over the surface of the object, we also use a sequence of five tactile imprints for each object. As one can notice in this table, the results confirm that in most cases, the performance using the sequence of eye fixations to guide the probing outperformed the case in which tactile data were acquired at random positions over the surface of objects. SVM demonstrated a superior performance for sequence classifications compared to other classifiers. Quadratic discrimination with a maximum difference of 1.86% closely competed with SVM.

5. Conclusions

In this work we proposed a framework for tactile object recognition where a model of visual attention is used to guide sequential tactile exploration. A virtual tactile sensor was simulated to collect tactile data. Inspired by human tactile exploration and object recognition, the size of the tactile sensor was adaptively modified to capture tactile images. We employed contourlet transformation to extract features from tactile imprints. Two kinesthetic cues—the normal vectors and 3D coordinates of probing locations—were added to provide information about the size and general shape of objects. A self-organizing map was trained to reduce the high-dimensional features. A series of features from sequences were then extracted to construct a final data set. Five different classifiers were trained and tested using data from a new object. Support vector machines and quadratic discrimination classifiers with accuracies of 93.45% and 91.89% respectively achieved the highest accuracies. We thus demonstrated that employing the sequence of eye fixations to guide the tactile probing operation enhances the classification accuracy.

Funding

This work is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery grant program and by the Ontario Graduate Scholarship (OGS) program.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lederman, S.J.; Klatzky, R. Haptic perception: A tutorial. Atten. Percept. Psychophys. 2009, 71, 1439–1459. [Google Scholar] [CrossRef] [PubMed]
Klatzky, R.L.; Lederman, S.J.; Matula, D.E. Haptic Exploration in the Presence of Vision. Hum. Percept. Perform. 1993, 19, 726–743. [Google Scholar] [CrossRef] [PubMed]
Chi, C.; Sun, X.; Xue, N.; Li, T.; Liu, C. Recent Progress in Technologies for Tactile Sensors. Sensors 2018, 18, 948. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Guo, D.; Sun, F. Object Recognition Using Tactile Measurements: Kernel Sparse Coding Methods. IEEE Trans. Instrum. Meas. 2016, 65, 656–665. [Google Scholar] [CrossRef]
Ratnasingam, S.; McGinnity, T. Object recognition based on tactile form perception. In Proceedings of the IEEE Workshop Robotic Intelligence in Informationally Structured Space, Paris, France, 11–15 April 2011. [Google Scholar]
Luo, S.; Mou, W.; Althoefer, K.; Liu, H. iCLAP: Shape recognition by combining proprioception and touch sensing. In Autonomous Robots; Springer: Berlin/Heidelberg, Germany, 2018; pp. 1–12. [Google Scholar]
Gorges, N.; Navarro, S.E.; Goger, D.; Worn, H. Haptic Object Recognition using Passive Joints and Haptic Key Features. In Proceedings of the IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–7 May 2010. [Google Scholar]
Gao, Y.; Hendricks, L.; Kuchenbecker, K.J. Deep learning for tactile understanding from visual and haptic data. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–20 May 2016. [Google Scholar]
Rouhafzay, G.; Pedneault, N.; Cretu, A.-M. A 3D Visual Attention Model to Guide Tactile Data Acquisition for Object Recognition. In Proceedings of the 4th International Electronic Conference on Sensors and Applications, Wilmington, DE, USA, 15–30 November 2017. [Google Scholar]
Rouhafzay, G.; Cretu, A.-M. Perceptually Improved 3D Object Representation Based on Guided Adaptive Weighting of Feature Channels of a Visual-Attention Model. In 3D Research; Springer: Berlin/Heidelberg, Germany, 2018; Volume 9. [Google Scholar]
Adi, W.; Sulaiman, S. Using Wavelet Extraction for Haptic Texture Classification, Visual Informatics: Bridging Research and Practice. IVIC. Lect. Notes Comput. Sci. 2009, 5857, 314–325. [Google Scholar]
Do, M.N.; Vetterli, M. The Contourlet Transform: An Efficient Directional Multiresolution Image Representation. IEEE Trans. Image Process. 2005, 14, 2091–2106. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Framework of the introduced tactile object recognition strategy.

Figure 2. Determination of the sequence of eye fixations for each viewpoint.

Figure 3. Adaptive modification of tactile sensor size.

Figure 4. Three-dimensional (3D) models used for training and testing.

Table 1. Object classification results.

Classifier	Guiding the Tactile Sensor by a Sequence of Five Eye Fixations	Using a Random Movement of the Tactile Sensor
K-nearest neighbors (kNN)	85.58%	84.94%
Support vector machine (SVM)	93.45%	86.14%
Decision trees	73.56%	71.63%
Quadratic discrimination	91.89%	84.28%
Naïve Bayes	56.68%	57.16%

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rouhafzay, G.; Cretu, A.-M. A Visuo-Haptic Framework for Object Recognition Inspired by Human Tactile Perception. Proceedings 2019, 4, 47. https://doi.org/10.3390/ecsa-5-05754

AMA Style

Rouhafzay G, Cretu A-M. A Visuo-Haptic Framework for Object Recognition Inspired by Human Tactile Perception. Proceedings. 2019; 4(1):47. https://doi.org/10.3390/ecsa-5-05754

Chicago/Turabian Style

Rouhafzay, Ghazal, and Ana-Maria Cretu. 2019. "A Visuo-Haptic Framework for Object Recognition Inspired by Human Tactile Perception" Proceedings 4, no. 1: 47. https://doi.org/10.3390/ecsa-5-05754

Article Menu

A Visuo-Haptic Framework for Object Recognition Inspired by Human Tactile Perception^†

Abstract

1. Introduction

2. State of the Art