Next Article in Journal
Characterization of Changes in P-Wave VCG Loops Following Pulmonary-Vein Isolation
Next Article in Special Issue
Object Manipulation with an Anthropomorphic Robotic Hand via Deep Reinforcement Learning with a Synergy Space of Natural Hand Poses
Previous Article in Journal
Metal Oxide Nanorods-Based Sensor Array for Selective Detection of Biomarker Gases
Previous Article in Special Issue
GadgetArm—Automatic Grasp Generation and Manipulation of 4-DOF Robot Arm for Arbitrary Objects Through Reinforcement Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Vision-Based Tactile Sensor Mechanism for the Estimation of Contact Position and Force Distribution Using Deep Learning

1
Information and Communication Engineering, Inha University, 100 Inharo, Nam-gu, Incheon 22212, Korea
2
VisionIn Inc. Global R&D Center, 704 Ace Gasan Tower, 121 Digital-ro, Geumcheon-gu, Seoul 08505, Korea
*
Author to whom correspondence should be addressed.
Sensors 2021, 21(5), 1920; https://doi.org/10.3390/s21051920
Submission received: 29 January 2021 / Revised: 4 March 2021 / Accepted: 5 March 2021 / Published: 9 March 2021

Abstract

:
This work describes the development of a vision-based tactile sensor system that utilizes the image-based information of the tactile sensor in conjunction with input loads at various motions to train the neural network for the estimation of tactile contact position, area, and force distribution. The current study also addresses pragmatic aspects, such as choice of the thickness and materials for the tactile fingertips and surface tendency, etc. The overall vision-based tactile sensor equipment interacts with an actuating motion controller, force gauge, and control PC (personal computer) with a LabVIEW software on it. The image acquisition was carried out using a compact stereo camera setup mounted inside the elastic body to observe and measure the amount of deformation by the motion and input load. The vision-based tactile sensor test bench was employed to collect the output contact position, angle, and force distribution caused by various randomly considered input loads for motion in X, Y, Z directions and R x R y rotational motion. The retrieved image information, contact position, area, and force distribution from different input loads with specified 3D position and angle are utilized for deep learning. A convolutional neural network VGG-16 classification modelhas been modified to a regression network model and transfer learning was applied to suit the regression task of estimating contact position and force distribution. Several experiments were carried out using thick and thin sized tactile sensors with various shapes, such as circle, square, hexagon, for better validation of the predicted contact position, contact area, and force distribution.

1. Introduction

Vision-based processing has been a part of inference in many interdisciplinary fields of research [1,2,3]. The usage of vision-based tactile sensors in industrial applications has grown over the past two decades with the rise in the standard of imaging sensors [4,5,6]. Usually, the tactile sensors can perceive the physical aspects of any object, which indeed guides the handling of the object in terms of strength applied to interact with them [7]. On the contrary, visual sensors, such as cameras, do not interact with the objects physically. Instead, they retrieve the visual cues from the imaging patterns of the objects in various modes [8]. The perceiving capability can be improved by using information, such as visual patterns, adapted force, and contact location, retrieved from the visual sensors without having to interact with the object in a physical manner [9]. This can be made possible using deep learning, which utilizes the data collected from vision sensors, along with the parameters, such as contact position, and force distribution, and trains on it to predict the output parameters in the future [10].

1.1. Background

The vision-based tactile sensing mechanism is developed using the same scheme, where the camera is mounted inside the elastic tactile sensing fingertip. Whenever the object is in touch with the fingertip, the camera gets the transformed grid pattern used to estimate the contact position and force distribution. The correlation between the input load force, contact position and transformed image captured by the camera sensor can be learned throughout various scenarios [11]. In this case, the vision-based tactile sensor technology gets rid of the need for the usage of separate traditional array type tactile sensor strips which are usually less durable and prone to large signal processing burden and breakage [12]. Furthermore, this type of visual-based tactile sensor is more like a single element type with no physical interaction with the elastic body. In the worst-case scenario, the elastic part can be replaced if damaged but the visual sensor always stays safe [13]. Additionally, indirect contact with the elastic body means the signal processing burden reduces by tenfold even if the detection area increases. The image acquisition process in the context of a visual-based tactile sensor can be observed in Figure 1. The industrial vision-based tactile sensor equipment used in this study is depicted, along with the transformed stereo image pair caused due to deformations on the elastic body.

1.2. Problem Statement

  • The problem statement of this study is to predict the force distribution and contact position parameters that are to be estimated by the trained deep learning network using the training data acquired from the visual tactile sensor setup.
Usually, the common inference problems that deep learning models are usually trained on are classification and detection problems which are straightforward using the class labels and corresponding training samples to predict/detect the target class objects. In this study, the models must be tailored to match the problem statement of estimating continuously varying quantities, such as contact location and force distribution. Therefore, the problem statement for this study focuses on implementing a customized problem-specific regression model through transfer learning on top of pre-trained deep learning network architecture. This means the training data has to be collected under diverse conditions, such as various inputs loads with different object shapes, tactile sensor thickness, etc. This collected data then has to be paired with the stereo camera samples (which captures the deformation of the elastic body) in terms of right and left images. This collective data has to be properly handled and pre-processed to train the regression network for better prediction of contact position and force distribution, as shown in Figure 2.

1.3. Purpose of Study

The primary purpose of this study is to develop a learned vision-based tactile sensor mechanism that uses indirect contact with the object to estimate the force and contact position of the impact when the object is interacting with the elastic body. In this study, the deep learning has been utilized as a tool for the training of tactile sensing mechanism w.r.t various parameters, such as images, input loads, contact positions, etc. As an underlying study, aspects, such as the development of the tactile fingertips and optimal setup of the compact stereo system, were detailed for practical purposes. Accordingly, issues, such as materials used in the manufacturing of tactile fingertips and their relative thickness, were discussed to enable the readers to understand the employed test bench equipment in detail. The usage of deep learning as a training and testing tool has been clearly described and the implementation details were explained to make a point regarding how to customize the domain-related network model into a problem-based use-case network model. In other words, this work focuses on detailing the transfer learning of domain-specific classification pre-trained network model, such as VGG16 [14], to deal with the regression problem of estimating the contact position and force distribution. In addition, this work illustrates the simple Yet, effective data pre-processing techniques that can enhance visual-tactile activity detection by a significant degree. The main contributions are as follows:
  • employing deep learning for the transfer learning of VGG16 classification pre-trained network model; and
  • validating the vision-based tactile sensor system to examine the estimation of contact position, contact area, and force distribution using thick and thin tactile sensors with various shapes.
The paper is organized as follows. Section 2 thoroughly discusses the previous works and their characteristics regarding the usage of computer vision/deep learning in vision-based tactile sensor technology. Section 3 explains the overall materials and methodologies utilized in this study. All the aspects, such as overall system installation, stereo camera setup, manufacturing, and practical issues, related to the tactile fingertips, deep learning network architecture, and transfer learning methodology are detailed in this section. Section 4 describes the tactile sensor experiments and related evaluation metrics. Section 5 reports the results and related discussions based on the applied deep learning methodology to estimate the tactile contact position and force distribution. Finally, Section 6 concludes the paper with a summary.

2. Literature Review

2.1. Vision-Based Tactile Sensor Technology

The practice of employing camera sensors to estimate the contact position and force distribution is actively researched in the past decade [15]. The vision sensors are compactly embedded in the tactile sensing mechanism such that the deformations in the elastic body is transformed as tactile force, contact position-based information [16]. With the increase in the pixel resolution of the visual sensors, the vision-based tactile sensitivity has also improved. Researchers have employed image processing and computer vision techniques to measure the force and displacement of markers [17]. The patterns on the deformed materials are analyzed using low-level image processing algorithms and support vector machines [18], and some studies even approached the problem of determining the contact force and tactile location in a machine learning perspective [19]. Some other studies adapted the usage of dynamic vision sensors and depth sensors for tactile sensing [20]. With the accessibility of compact circuit technologies and high spatial resolution vision systems, some studies were able to report 3D displacement in the tactile skins [21]. A few other works tried to embed multiple camera sensors inside the tactile sensor to retrieve the best possible internal tactile force fields [22]. On the other hand, there has been an appeal and enthusiasm towards the learning-based approaches inculcating deep learning for the estimation of tactile information [23]. The visual-based tactile sensing mechanism can be typically classified into two approaches, such as traditional image processing/computer vision-based methods and learning-based methods. In traditional image processing/computer vision methods, various low-level image manipulating techniques are employed to enhance the images retrieved from the deformation source [24]. Often, the traditional methods are directly working on the images retrieved from the input sensor. This enabled devising a pipeline that does not require any training data before the inference. On the contrary, the learning-based techniques heavily rely on the training data for the enhancement of the performance [25].

2.2. Previous Works

In the past decade, few studies were proposed in the context of using the vision-based technique in tactile sensing mechanism. Begej et al. [26] pioneered the usage of the vision-based tactile sensor for measuring the contact force and internal reflection. Lepora et al. [27] reported their studies on implementing super-resolution optical tactile sensor which can localize the contact location, as well as to measure the contact force. Ito et al. [28] proposed a method to estimate the slippage degree using a vision-based tactile mechanism with extensive experiments. Yang et al. [29] focused on analyzing the texture of the material using the micro RGB camera in the context of tactile finger instrumentation. A few studies, such as Corradi et al. [30] and Luo et al. [31], used the vision-based tactile mechanism to recognize various objects. There were also a few remarkable studies by Piacenza et al. [32] which accurately estimated the contact position with indentation depth prediction using visual-tactile sensors.
The work from Johnson et al. [33] demonstrating the measurement of surface texture and shape using their photometric stereo technology has gained prominence in the field. Later these studies were further modified to measure the normal and shear force and were reported in Johnson et al. [34] and Yuan et al. [35] The learning-based methods were employed by a few researchers, like Kroemer et al. [36] and Meier et al. [37], for the estimation of force exhibited in the tactile behavior. Especially, Meier et al. [37] used convolutional neural networks to detect the online slip and rotations. Similarly, Chuah et al. [38] used artificial neural networks (ANN) to improve the accuracy in estimation of normal and shear force. They employed an automatic data collection procedure to acquire the footpad while moving through various trajectories. The concept of transfer learning help speeds up the process of adapting learning-based mechanisms into the vision-based tactile sensing tasks. There are many studies, such as References [39,40,41], that adapted the transfer learning in the context of Convolutional Neural Networks (CNN) to attain better results in terms of determining the force and other tactile aspects. The details of the summarized vision-based tactile sensing techniques are stated in Table 1.

3. Materials and Methods

3.1. System Installation and Flow Schematic

The system installation employed for the vision-based tactile sensor is a combination of multiple systems, such as a motion actuator with a tactile sensor test bench, motion controller, and control personal computer (PC), as depicted in Figure 3a.
  • Motion actuator with vision tactile sensor bench: The motion actuators are used in the test bench to facilitate the motion along the linear ( X Y Z ) and rotational ( R x R y ) axis. The contact shaped tool is activated through actuators in order to make contact with the elastic tactile tip which has a camera fixed inside it.
  • Motion controllers: The motors are controlled using the motion controllers which indeed act as a bridge between the motion actuators and control PC. This motion controller considers all the parameters, such as force, contact position, and angle, so that the motion exhibits the desired outcome as expected.
  • Control PC: The control PC is a general personal computer with a LabVIEW GUI which acts as an activity log of the motions, controls, and data acquisition/processing center for the whole system installation. The training/testing data is collected from the test bench stereo camera setup via a USB port. Then, the LabVIEW software is used to accumulate the data with corresponding tactile control parameters for network training/testing.
All these subsystems gather with an intercommunication mechanism to exhibit the overall system flow schematic. The force gauge and the tactile sensor come into contact to exhibit a deformation on the elastic tactile tip, which is thereby recorded as a pattern by the stereo optical system. This mechanism is then controlled, regulated by the motion controller, and a control PC with processing software as a whole. This flow schematic of the visual-tactile sensor mechanism is shown in Figure 3b.

3.2. Development of Tactile Fingertips

3.2.1. Process of Making Tactile Fingertips

Although there are many ways to make tactile fingertips, this study proceeded with an injection mold technique with the defoaming process, as shown in Figure 4a. Before the injection mold tips, several 3D printing processes were employed to produce tactile fingertips. Yet, they all did not withstand the stress and were torn apart, as shown in Figure 4b. The defoaming process with injection mold structures (upper, lower) used helped in withstanding the elastic stress imposed by repeated force gauging. However, there are some practical issues involved in the process of making the tactile fingertips. One such issue is the problem of surface light reflection on the inside of the tactile fingertip. The injection mold process which was opted by this study posed this issue of light reflection, as shown in Figure 4d. The major concern is that these light reflections will overshadow the deformation patterns inside the tactile fingertips. This could lead to inappropriate optical imagery captured by the stereo camera inside the fingertip. Therefore, the process of sanding was sequentially carried out on the mold surface after the injection process to reduce the light reflections, as shown in Figure 4e.
The reliability of the tactile fingertips is crucial in this study as they are often exposed to a repetitively pressing process to collect the force, contact position, and other tactile-based sensor data. Accordingly, the reliability of the tactile tips can be categorized into physical and visual terms.
  • Physical: The tactile tip must sustain the repetitive stress and must exhibit the same tactility throughout the sensor data acquisition. But, often, the insides portion of the tactile tip severely suffers from air bubbles. This problem was encountered in this study, and it was successfully resolved using the process of vacuum degassing of the tactile tip while manufacturing it. This process is shown in Figure 4f, and it efficiently reduced the air bubbles and offered better endurance to the tactile tips.
  • Visual: The visual reliability of the tactile tip was improved by the marker painting process, as shown in Figure 4g, which helped in the recognition of deformation patterns visually. Initially, a white paint is to mark the markers on the surface of the sensor. During the durability test, the markers were not compatible with the tactile sensor rubber material. Therefore, the marker painting is done using the same rubber material but with white color for easy recognition.

3.2.2. Tactile Fingertip Sensor Design Aspects

The tactile sensor fingertip specifications considered in this study are stated below in Table 2.
In the process of making the tactile fingertips, an ablation study was put forward to analyze certain practical aspects, such as which material should be used to make the tips, what should be the thickness of the tactile tip, etc. These questions were investigated using a proper ablation study in terms of tactile touch sensitivity and tactile stability. The force-displacement characteristic plot was constructed to analyze the effect of shore hardness, i.e, surface hardness of the material and the thickness of the material. The shore hardness is often measured using shore hardness scale or durometer shore hardness scale, denoted as “Shore 00” hardness scale [44]. For example, if a material is very soft, such as gel, then the shore hardness scale will be shore 05; and, if it is a hard rubber, such as shoe heel, then the shore hardness scale will be shore 100.
  • Shore hardness (surface hardness): The tactile materials with a standard thickness t = 1 mm are considered with different shore hardness scales = 40, 60, 70, 80. The force-displacement characteristic plots can be observed in Figure 5 where, with the increase in the force, the tactile tip with shore hardness 40 is easily displaced losing its linearity in terms of elasticity, i.e, the tip with shore hardness 40 is too weak to be used as an elastic body at force 1 N. Similarity, with the increase in the force, the tactile tip material with shore hardness 60 seem to have similar displacement characteristics, like the shore hardness 40 material, but a bit linear. In contrast, the comparison between shore hardness 70 and 80 resulted in choosing the optimal shore hardness of 70 for the study experiments because shore hardness 80 is insensitive to be an elastic material with linearity at various force steps.
  • Thickness (elastic stability): The materials with an optimal shore hardness range 70, were chosen. Then, the thickness t = 1 mm, 1.15 mm, 1.25 mm, 1.50 mm were investigated with an applied force of 1 N, as shown in Figure 6a, and thickness t = 2.0 mm, 2.5 mm were investigated with an applied force of 10 N, as shown in Figure 6b. At an applied force 1 N, the material with thickness of t = 1 mm is suitable for the deformation of 4 mm, and all the rest, t = 1.15 mm, 1.25 mm, 1.50 mm, cannot be used if the expected deformation is 4 mm or higher. For an applied force of 10 N, material with thickness t = 2 mm collapsed when the force reached 7 N, but the material thickness t = 2.5 mm is stable at 10 N. This ablation study facilitates the choice of better tactile fingertips for the experiments.

3.3. Stereo Camera System

The stereo camera system is fixed at the bottom of the tactile elastic tip to capture the deformations caused by the tactile contact. To acquire better image data from the tactile mechanism, the choice of the stereo system has been made. The stereo camera captures both the right image and left image of the deformation and transfers the image data to the control PC for training/testing purposes. The design setup of the stereo camera system used in this study is shown in Figure 7 below.
The visual-tactile sensor system heavily relies on this stereo camera system for the inference in real-time. Therefore, the system must be compact, memory-friendly, and power-efficient. The stereo setup used in this study is compact such that its stereo baseline between the right and left camera lens is a mere 10 mm distance with optimal industrial standard size of 640 × 480, which is efficient in terms of memory and power consumption. Nevertheless, other image resolutions, such as 1280 × 720 with 30 fps, 640 × 360 with 30 fps, and 320 × 240 with 30 fps, were also examined. The design aspects of the stereo camera system employed in the experiments are stated in Table 3.

3.4. Deep Learning Methodology

The deep learning-based contact position and force measurement algorithm is divided into six steps, which are shown in Figure 8 and described in detail below. The stereo image pair consisting of the deformation pattern of the elastic tactile tip serves as an input for the algorithm. Both the right and left images have the same deformation pattern but from a different perspective with a baseline of 10 mm in between both imagery. Data handling, pre-processing, and transfer learning are the crucial steps involved in the learning algorithm.

3.4.1. Region-of-Interest (ROI) and Mode Selection

The ROI setting was carried out to enable memory management and save the processing power of the GPU. While considering a single input image of 3 channels (RGB) with dimensions 640 × 480, the video input from left and right cameras via acquisition equipment in terms of height, width, channels is 480, 640, 3. Therefore, it is essential to design a region of interest that suits both left and right images. Accordingly, a manual ROI area is calculated as per the video input specifications to be the same for the whole stereo pair data. The ROI setting for the Row is: from 24–216 pixels; ROI setting for the Column is: 47–271 pixels; and the cropped area size is (192, 224), which can save GPU memory to the maximum. The ROI design is shown in Figure 9a, which is the same for both the right and left images.
The mode selection is a customized procedure designed to test the best possible input feed to insert into a neural network for better results. This procedure involves the selection of the data as per different modes, as shown in Figure 9b, and then feed them into the neural network as input. Although 4 modes were put-forward, the mode that performs well during training (mode 1) will only be considered for the inference.
  • Mode-0: This mode will only consider the left image from the stereo pair as an input to the neural network.
  • Mode-1: This mode will concatenate left and right gray images per channel and input them to the neural network.
  • Mode-2: This mode will consider the left image binarized to enhance lighting and feed it to the neural network as input.
  • Mode-3: This mode will concatenate the left and right images binarized for each channel to enhance lighting and feed it to the neural network as input.

3.4.2. Zero Centering and Scaling

The combination of image data with a coupled tactile sensing data must be well fused and analyzed for the network to train on the insights of the data, although the video input stereo images received from the equipment are pre-processed by cropping and setting a specific ROI to optimize the memory and power. However, there is also a need to further process the image data such that the fusion of tactile data which is in terms of force, contact location, contact angle, etc., can be possible. In other words, for the deep learning network to converge well during the learning process, unit8 (an unsigned integer) [0–255] image is scaled to [0, 1] and normalized to [−1, 1] by zero-centering. The reason for performing zero centering and scaling is because the attribute to be predicted is different in terms of units and ranges, such as displacements along X, Y, Z, which are (in mm) Force (in N), R a (in degree). Therefore, the zero centering and scaling is essential for the network to learn the insights of the image data in correspondence with the tactile parametric data. The zero centered and scaled data ( x ˜ i j ) will be a function of original data ( x i j ) normalized between the minimum ( m i n j ) and maximum points ( m a x j ), as shown in:
x ˜ i j = x i j m i n j m a x j m i n j ,
where i is the data index, j is the attribute index, x i j is the jth attribute of the ith data, m a x j is the maximum value of the jth attribute of the training data, and m i n j is the minimum value of the jth attribute of the training data.

3.4.3. Network Architecture

The convolutional neural network model used in this study was adopted from the well-known VGG16 structure. Often, the VGG16 model structure is exploited to acquire better accuracy for the object classification tasks in computer vision and AI domains. However, the task that this study has to accomplish is to predict the continuously varying parameters, such as force, contact position, angle, etc. These parameters are indeed the continuous values that cannot be modeled into a classification task. The customized convolutional neural network model consists of total 16 deep layers, including the input layer. The input layer is fed to the neural model, and the input must pass through 16 deep layers, along with 5 max pooling layers. The first 2 layers of the network consist of 64 channel convolution filters of size 3 × 3 with stride 1 followed by a batch normalization and a Rectified Linear Unit (ReLU) activation function. The max pooling of size 2 × 2 with a stride 2 is used after the second convolution layer. The max pooling used throughout the model has a standard configuration of size 2 × 2 with a stride 2. The next 2 convolution layers use a 128 channel convolution filters of size 3 × 3 with stride 1 followed by a batch normalization, ReLU activation function. The max pooling layer is used after the fourth convolution layer. The next 3 convolution layers consists of 256 channel convolution filters of size 3 × 3 with stride 1 followed by a batch normalization, ReLU activation function. The max pooling layer is used after the seventh convolution layer. The next 6 convolution layers contain 512 channel convolution filters of size 3 × 3 with stride 1 followed by a batch normalization, ReLU activation function. The max pooling layer is used after tenth and thirteenth convolution layers. The last two layers are dense fully connected layers with 4096 units each. To prevent the overfitting, a dropout of 0.9 was used. The total output from the fully connected dense layers is used for the regression purpose, as shown in Figure 10.

3.5. Contact Area Estimation

The contact area estimation is designed to use the images acquired from the stereo camera to estimate the 2D contact area using naive computer vision methodology, as depicted in Figure 11. The contact area estimation was put forth to analyze the effect of sensor shapes on the contact area. Similarly, the ground truth of the known sensor tips were employed to investigate the errors in the estimated area.
The input frame is used to identify the deformations on the elastic tip, and the keypoints are detected using image processing techniques, such as image segmentation and blob analysis [45]. These keypoints are then used to calculate the radii (r) depending upon the shape of the contact tool (l) used. The features are then used as a dataset to apply Gaussian regression to get the contact area, as shown in:
x = r 1 , r 2 , r 3 , l ,
where r 1 , r 2 , r 3 are the radii from center to the keypoints; l is the shape of the contact tool, such as circle, square, and hexagon; and x is the feature vector.
y = h ( x ) T β + f ( x ) ,
where f ( x ) is the function from zero mean Gaussian Process, h ( x ) is the transform function, square and hexagon, β is the hyper parameter, and f , h are learned in the training process.

4. Experiments and Evaluations

4.1. Dataset Used

The tactile contact force gauge equipment used for the collection of data is shown in Figure 12. The data retrieved from the equipment is used to construct the training, validation, and testing dataset. The collected data is transferred to the control PC via a USB port, which is then processed using LabVIEW GUI on the PC. Figure 12b shows the log of all the sensor data ( X , Y , Z , R x , R y ) recorded simultaneously with the stereo images. This GUI will have the timestamp of the data which is used to fuse the tactile data with the stereo images. Various shaped contact tools were employed in the experiments to get the force and contact location.
The dataset used in the network training is divided into training, validation, and testing which is shown in Table 4. Data01 and Data02 are two splits of the data which are separated as per the sensor size (thin, thick). Each split of the data is internally divided into training, validation, and testing. In Table 4, the training, validation, and testing are depicted (per point) because this data is acquired by applying diverse force levels starting from 0.1 N to 1 N with an interval of 0.1 N. Therefore, for each force applied point, the acquired image stereo pair count is given Data01 containing ( 2 3380 ) training samples, ( 2 1680 ) validation samples, and ( 2 1690 ) testing samples. Similarly, Data02 containing ( 2 2730 ) training samples, ( 2 910 ) validation samples, and ( 2 910 ) testing samples. On a whole, the total images used in the training are 122,200, validation are 51,800, and testing are 52,000 samples.

4.2. Training Details

The training is carried out with several aspects inculcated into the data, such as considering different data splits with various modes under diverse sensor sizes, such as thin and thick.
The training sessions were carried out on Data01 and Data02 and evaluated using the validation data for each iteration. The approach of validation is carried out to prevent the network from overfitting and the best model is then saved as a final trained network. The models were also trained under various sensor sizes, such as thin sensor and thick sensor, with induced forces of 1 N and 10 N, respectively. The training scenario and trained model on Data01 with mode1 considering stereo pair (both right and left images) for training acquired with a thin sensor exhibited better accuracy. The graphs in Figure 13 represent various training aspects, such as validations over Force (F), Displacement (D), Position ( X , Y , Z ), and Rotations ( R x   a n d   R y ). The seven charts in the figure above are the results of experiments on validation data for 7 attributes [ F , D , X , Y , Z , R x , R y ]. Avg err is the average error of all 7 attributes, which should be as low as possible for a better-trained method. The three graphs in the bottom row of Figure 13 represent data loss, regularization term, and total loss in the learning process. Similarly, the analysis of the training process using Data02 split with Mode1 samples is shown in Figure 14.

4.3. Testing Evaluations

The performance evaluations were carried out for all the testing scenarios (contact force, contact position displacement, contact position rotation, contact area estimation) using several metrics, such as error rates, full scale output, average error, etc. The testings were carried out exhaustively using various shaped tools, force levels, sensor sizes, and displacements, as shown in Figure 15.

4.3.1. Testing Scenario-1: Force Distribution Estimation

The testing scenario of the force distribution estimation is carried out 10 times with each time 10 steps ranging from 0.1 N to 1 N. The testing performance of the trained system in predicting the force (in N) correctly is evaluated by error calculation between the applied force and estimated force value. The evaluation metric named Full Scale Output (FSO) in % is calculated to quantify the performance of the predicted force, as shown in:
F S O [ % ] = F i n F p r e d max F i n max 100 ,
where F i n is the applied input force in Newtons (N), F p r e d is the predicted force by the trained neural network in terms of Newtons (N), | ( F i n F p r e d ) | m a x is the maximum value of the difference between actual and predicted force, and | ( F i n ) | m a x is the maximum value of the applied force.

4.3.2. Testing Scenario-2: Contact Point (Displacement) Estimation along Linear X-axis, Y-axis, an Z-axis

The contact point position (displacement) along the X-axis, Y-axis, and Z-axis is estimated by the trained neural network, and the testing accuracy is calculated by the error between the original displacement along X , Y , Z and estimated displacement along X , Y , Z . The testing evaluations were carried out as follows:
  • Along Z-axis: The force is applied in Z-direction from 0.1 N to 1 N with 0.1 N interval such that total 10 tests were conducted. The difference between the original position along Z-axis and the estimated one is recorded as the error and an average error over 10 tests is calculated to evaluate the performance of the prediction.
  • Along X-axis: For evaluating the displacement along X-axis, the force is applied in intervals of 0.1 N from 0.1 N to 1 N with 1-mm displacement step along the X-axis keeping the Y-axis displacement as 0. Therefore, the testing is done for ( X = −6 mm + 6 mm, with 1 mm step interval, total 13 points, constant Y = 0 ). The difference between the original position along X-axis and the estimated one is recorded as the error and an average error over 13 points is calculated to evaluate the performance of the prediction.
  • Along Y-axis: For evaluating the displacement along Y-axis, the force is applied in intervals of 0.1 N from 0.1 N to 1 N with 1-mm displacement step along the Y-axis keeping the X-axis displacement as 0. Therefore, the testing is done for ( Y = −6 mm + 6 mm, with 1 mm step interval, total 13 points, constant X = 0 ). The difference between the original position along Y-axis and the estimated one is recorded as the error and an average error over 13 points is calculated to evaluate the performance of the prediction.
The evaluation metric used to evaluate these displacements along X Y Z axis is calculated to quantify the performance of the predicted force using mean absolute error (MAE), as shown in:
M A E = 1 n 1 n d orig d est ,
where N is the number of tests/points performed, d o r i g is the original displacement values, and d e s t is the estimated displacement values by the neural network.

4.3.3. Testing Scenario-3: Contact Angle Estimation along Rotational R x y axis

The contact angle estimation along the rotational axis R x R y is evaluated using the 10 tests when force is applied from 0.1 N to 1 N with 0.1 N interval. The tests were performed such that the original angle along the rotational axis R x R y is set to 45 . The mean absolute error (MAE) is calculated between the estimated and original angle, as shown in:
M A E = 1 n 1 n R a orig R a est ,
where N is the number of tests/points performed, R a o r i g is the original angle 45 , and R a e s t is the estimated angle values by the neural network. The sensor is rotated along the × axis and Y-axis to a calibrated ground-truth of 45 , which is considered to be the ground-truth for the rotational test scenarios. The system installation heavily influences the performance due to the rotational motions. Therefore, a constructive ground-truth of 45 is calibrated so as to prevent the system installation issues.

4.3.4. Testing Scenario-4: 2D Contact Area Estimation

The testing for the 2D contact area estimation was carried out using various shaped contact tools that are used to contact the elastic tactile tip. The contact area estimates are derived from the Gaussian regression process described earlier. The ground truth (GT) of the contact area is fixed when the tool is used to make contact, and it is used to calculate the error between the estimated and GT. The evaluation of the performance is calculated by error rates in (%), as shown below:
% E r r o r . r a t e s = C A G T C A e s t C A G T 100 ,
where C A G T is the ground truth contact area, and C A e s t is the estimated contact area from the Gaussian regression.

5. Results and Discussions

5.1. Force Distribution Estimation

The force estimation carried out using the trained network was validated by using 10 different tests among which each test was recorded within a force range of 0.1 N 1 N. The estimation errors were recorded in N and were used to calculate the FSO (%) scores. The force estimation errors which were recorded on all 10 tests are depicted in Table 5. The overall average on a whole (10 tests) is around 0.022 N which is accurate for the system to rely on the estimations for future predictions.
The FSO (%) scores of all the 10 tests are plotted in Figure 16, and the average FSO (%) score seemed promising, within the force range of (0.1 N∼1 N). The entire data samples, and their corresponding estimation errors for each iteration (test) and their averages, FSO (%) scores, etc., are presented in Table A1 in Appendix A.

5.2. Contact Position Estimation w.r.t X,Y,Z Axes

The contact position displacement errors were calculated and are estimated for each force measure ranging from (0.1 N∼1 N) with respect to each ground truth value in × and also in Y spanning from (−6 mm∼+6 mm). The displacement errors covering all the possible test ranges are clearly depicted in Table A2 and Table A3 in Appendix A. Table A2 in Appendix A represents the test results in terms of displacement error in the contact position along the X-axis. Similarly, Table A3 in Appendix A represents the test results in terms of displacement error in the contact position along the Y-axis. The average error displacement readings in correspondence to the 13 point ground truth values (−6 mm∼+6 mm) over the force (0.1 N∼1 N) is shown in Figure 17.
The contact position displacement error along the Z-axis was calculated by evaluating the estimation values of Z-axis displacement for a given force value ranging from 0.1 N∼1 N. The overall estimation error w.r.t force values are depicted in Figure 18.
The results of the contact position displacement estimation in the X, Y, Z axes revealed the performance of the network in predicting the position estimates. The estimation error along the × and Y axes is greater than that in the Z-axis. The reason for that the motion in × and Y indeed requires Z, as well. Therefore, even while acquiring × and Y data, the underlying Z data keeps on feeding into the system.

5.3. Contact Angle Estimation w.r.t Rotational R x y Axis

The contact position estimation in terms of angular displacement was calculated through a series of tests within the force range of 0.1 N∼1 N. The sensor is rotated with a fixed angle of 45 , and the tests were performed through contact tool in touch with the sensor which is inclined. The reason for the calibrated ground truth fixed angle of 45 is discussed in Appendix A and is depicted clearly in Appendix A, Figure A1a.The trained neural network was able to predict/estimate the angular displacement in the contact position. The results of the estimated displacement w.r.t each force value is depicted in Figure 19.

5.4. Contact Area Estimation

The contact area estimation is carried out using the image processing algorithms and Gaussian regression. The estimated contact area is cross checked with the ground truth in correspondence with various contact shaped tools. The corresponding results are reported in Table 6. Figure 20 illustrates the estimation errors w.r.t circular tool with ground truth ( G T = 78.54 mm 2 ), square tool with ( G T = 100.00 mm 2 ) and ( G T = 64.95 mm 2 ).
There were different samples considered for each tool shape for the testing, such as circular ( n = 20 ), square ( n = 18 ), and hexagonal ( n = 18 ). The results suggest that the estimation of the contact area in the case of hexagonal tool seemed more prone to errors. However, on a whole, the total average error is 1.429 % on all the contact shaped tools.

6. Conclusions

This work reports the usage of deep learning-based visual-tactile sensor technology for the estimation of force distribution, contact position displacement along X, Y, Z directions, angular displacement along R x R y direction and contact area. The current study also reports the design aspects, such as choice of the thickness and materials used for the tactile fingertips, encountered during the development of the tactile sensor. The image acquisition was carried out using a compact stereo camera setup mounted inside the elastic body to observe and measure the amount of deformation by the motion and input force. The transfer learning has been employed using the VGG16 model as a backbone network. Several tests were conducted to validate the performance of the network in estimating the force, contact position, angle, area using calibrated ground-truth values of force range 0.1 N∼10 N, position range 6 mm∼ + 6 mm, fixed angular value of 45 . The tests were also carried out using thick, thin tactile sensors with various shapes, such as circle, square, and hexagon, along with their ground truth areas. The results determine the average estimation errors for force, contact position in X, Y, Z, contact angle and contact area are 0.022 N, 1.396 mm, 0.973 mm, 0.109 mm, 2 . 235 , and 1.429 % , respectively. However, the future work should include improvements handling system stability in terms of tactile sensor sensitivity w.r.t reference axes and movements in the vicinity. Nevertheless, the results reported in the study corresponds to the significance of the visual-based tactile sensor using deep learning as an inference tool.

Author Contributions

Conceptualization, V.K., X.C.; Methodology, V.K., X.C.; Validation, V.K., X.C., M.M., H.K.; Formal Analysis, V.K., X.C. and H.K.; Writing—Original Draft Preparation, V.K.; Writing—Review & Editing, V.K., X.C., and H.K.; Visualization, V.K., X.C., and M.M.; Supervision, X.C., H.K.; Project Administration, X.C., H.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Trade, Industry, and Energy grant funded by the Korean government (No. 10080686, Development of fundamental technology of tactile sensor using image information).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank Nam-Kyu Cho and Kwang-Beom Park from Smart Sensor Research Center at Korea Electronics Technology Institute, Seongnam, Korea for their resources and technical support in performing Image-based tactile sensor repetitive reliability testing.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Supplementary Test Results

The trained neural network was employed on the series of ten tests, and the force distribution estimation errors were recorded, along with the FSO (%) score. The test results are recorded and presented in Table A1.
Table A1. Force distribution estimation w.r.t 10 different tests under force ranges (0.1 N∼1 N).
Table A1. Force distribution estimation w.r.t 10 different tests under force ranges (0.1 N∼1 N).
Test-1Test-2Test-3
Original
Force (N)
Estimated
Force (N)
Error
(N)
Original
Force (N)
Estimated
Force (N)
Error
(N)
Original
Force (N)
Estimated
Force (N)
Error
(N)
0.120.20090.08090.120.20140.08140.120.20390.0839
0.220.27760.05760.210.26100.05100.220.28570.0657
0.320.34230.02230.320.33360.00360.330.34470.0147
0.420.39830.02170.420.39310.01690.420.41130.0087
0.510.47720.03280.510.48280.02720.500.47510.0249
0.600.57260.02740.600.58380.01620.600.58440.0156
0.700.68460.01540.700.69290.00710.700.69260.0074
0.780.76960.01040.780.76850.01150.780.77360.0064
0.870.86480.00520.870.87150.00850.880.87450.0055
0.970.97480.00480.970.98190.01190.970.97690.0069
FSO (%)8.34FSO (%)8.39FSO (%)8.65
Test-4Test-5Test-6
Original
Force (N)
Estimated
Force (N)
Error
(N)
Original
Force (N)
Estimated
Force (N)
Error
(N)
Original
Force (N)
Estimated
Force (N)
Error
(N)
0.130.21200.08200.120.20570.08570.130.22410.0941
0.220.27420.05420.220.28120.06120.210.26550.0555
0.330.34680.01680.320.34730.02730.330.35080.0208
0.420.39870.02130.410.39950.01050.410.40140.0086
0.500.47210.02790.500.47760.02240.500.48890.0111
0.590.58030.00970.600.59620.00380.600.59170.0083
0.690.69030.00030.700.69630.00370.700.70200.0020
0.780.77200.00800.780.77800.00200.780.77240.0076
0.870.86930.00070.880.87700.00300.880.87780.0022
0.970.98030.01030.970.97450.00450.970.97990.0099
FSO (%)8.45FSO (%)8.84FSO (%)9.70
Test-7Test-8Test-9
Original
Force (N)
Estimated
Force (N)
Error
(N)
Original
Force (N)
Estimated
Force (N)
Error
(N)
Original
Force (N)
Estimated
Force (N)
Error
(N)
0.120.20540.08540.110.20650.09650.120.20740.0874
0.220.27070.05070.230.26860.03860.230.28710.0571
0.310.32710.01710.330.34560.01560.320.34380.0238
0.420.41770.00230.410.40520.00480.400.39590.0041
0.490.47610.01390.500.48200.01800.510.47840.0316
0.600.59410.00590.600.59850.00150.600.59260.0074
0.710.71540.00540.690.69190.00190.690.69580.0058
0.780.77740.00260.780.77370.00630.780.77990.0001
0.880.88110.00110.870.86930.00070.880.88010.0001
0.970.97340.0340.980.98250.00250.980.98310.0031
FSO (%)8.80FSO (%)9.85FSO (%)8.92
Test-10
Original
Force (N)
Estimated
Force (N)
Error
Force (N)
Original
Force (N)
Estimated
Force (N)
Error
(N)
Original
Force (N)
Estimated
Force (N)
Error
(N)
0.120.21140.09140.500.48680.01320.780.77890.0011
0.210.26430.05430.590.59970.00970.880.87770.0023
0.320.35530.03530.700.70880.00880.970.98040.0104
0.400.40390.0039
FSO (%)9.42
Average FSO (%)8.936
The contact position displacement errors were estimated using simple absolute mean error calculation. However, extensive tests were carried out to retrieve such results. The tests involve the specific ground truth (GT) values for × and Y in a range of 6 mm + 6 mm w.r.t force range within 0.1 N 1 N . In X-axis, the displacement along the X-axis is incremented by step 1 mm while keeping Y value 0 mm , and vice versa in the context of Y displacement estimation. The neural network’s performance in terms of estimation of contact position displacement along linear × and Y axes is recorded and reported in Table A2 and Table A3.
Table A2. Contact position estimation w.r.t linear × displacement under force ranges (0.1 N∼1 N).
Table A2. Contact position estimation w.r.t linear × displacement under force ranges (0.1 N∼1 N).
Xorig = −6 mmXorig = −5 mmXorig = −4 mmXorig = −3 mm
F
(N)
Xest
[mm]
Error
[mm]
F
(N)
Xest
[mm]
Error
[mm]
F
(N)
Xest
[mm]
Error
[mm]
F
(N)
Xest
[mm]
Error
[mm]
0.12−1.8284.1710.13−2.0362.9630.12−1.5302.4690.13−1.0941.9056
0.21−2.8753.1240.21−2.5932.4070.22−2.0651.9340.22−1.5881.411
0.31−3.3882.6110.31−2.9862.0130.33−2.4351.5640.31−1.8281.171
0.40−3.8842.1150.40−3.3521.6470.42−2.5661.4330.40−1.7671.232
0.48−3.9972.0020.49−3.5491.4510.50−2.8011.1980.49−1.7981.201
0.57−4.2601.7400.59−3.7421.2570.60−2.8171.1820.59−1.9031.096
0.66−4.9431.0560.68−3.6431.3570.69−2.8571.1420.69−1.9001.099
0.75−5.6100.3890.76−4.0700.9290.77−3.0740.9260.78−2.1150.884
0.85−5.7380.2610.85−4.5760.4230.86−3.2880.7160.87−2.4910.508
0.95−5.7470.2520.95−4.8180.8180.97−3.5670.4330.96−2.5120.487
Xorig = −2 mmXorig = −1 mmXorig = 1 mmXorig = 2 mm
F
(N)
Xest
[mm]
Error
[mm]
F
(N)
Xest
[mm]
Error
[mm]
F
(N)
Xest
[mm]
Error
[mm]
F
(N)
Xest
[mm]
Error
[mm]
0.12−1.0880.9110.14−1.0950.0950.13−1.0262.0260.12−0.4902.490
0.22−1.4780.5210.22−0.9490.0500.21−0.9521.9520.22−0.2082.208
0.33−1.4090.5910.33−1.0180.0180.33−0.7641.7640.33−0.6082.608
0.41−1.2280.7710.410.9970.0020.41−0.8171.8170.40−0.3562.356
0.50−1.4310.5680.50−0.9030.0960.51−0.2751.2750.500.1661.834
0.59−1.2100.7890.60−0.5900.4090.590.2880.7110.600.9591.040
0.69−1.2970.7020.69−0.5230.4760.690.9500.0490.702.0790.079
0.78−1.2550.7440.78−0.3340.6650.781.4210.4210.782.5670.567
0.88−1.4550.5440.88−0.2380.7610.881.5290.5920.872.5670.567
0.98−1.5190.4800.97−0.3990.6010.971.6070.6070.982.5530.553
Xorig = 3 mmXorig = 4 mmXorig = 5 mmXorig = 6 mm
F
(N)
Xest
[mm]
Error
[mm]
F
(N)
Xest
[mm]
Error
[mm]
F
(N)
Xest
[mm]
Error
[mm]
F
(N)
Xest
[mm]
Error
[mm]
0.12−0.4613.4610.11−0.2464.2460.120.0204.9790.130.1665.833
0.23−0.2793.2790.210.1253.8740.220.6334.3660.210.6925.308
0.32−0.4013.4010.310.9583.0420.321.5883.4110.332.6733.326
0.410.0932.9060.390.9833.8740.402.3172.6520.403.0602.939
0.500.9422.05710.501.6812.3180.502.9312.0680.493.9902.009
0.601.5711.4280.592.5701.4290.583.7441.2560.574.9331.066
0.693.0570.0750.694.1330.1330.674.9170.0820.675.4530.546
0.783.6490.6490.774.7090.7090.765.7450.7450.756.6000.600
0.883.7900.7900.874.8070.8070.865.7870.7870.846.6390.639
0.973.6700.6790.974.8000.8000.965.8840.8840.946.8960.896
Table A3. Contact position estimation w.r.t Y displacement under force ranges (0.1 N∼1 N).
Table A3. Contact position estimation w.r.t Y displacement under force ranges (0.1 N∼1 N).
Yorig = −6 mmYorig = −5 mmYorig = −4 mmYorig = −3 mm
F
(N)
Yest
[mm]
Error
[mm]
F
(N)
Yest
[mm]
Error
[mm]
F
(N)
Yest
[mm]
Error
[mm]
F
(N)
Yest
[mm]
Error
[mm]
0.120.0216.0210.120.4575.4570.13−0.3453.6540.13−0.3672.632
0.20−0.6315.36810.23−1.6193.3800.21−0.4663.5330.21−0.7812.218
0.32−3.0502.9490.32−2.4252.5740.32−1.7932.2060.32−1.4981.501
0.40−4.5481.4520.41−3.5451.4540.41−2.6521.3470.41−1.5031.496
0.49−5.4760.5230.50−4.0280.9710.51−3.0370.9620.50−1.9361.063
0.56−5.9250.0740.58−4.4290.5700.59−3.1730.8260.60−2.1900.809
0.66−5.6980.3010.67−4.4530.5460.69−3.5650.4340.69−2.5230.476
0.75−5.8110.1880.77−4.7040.2960.76−3.8470.1520.78−2.9600.040
0.85−6.0220.0220.85−4.4190.4190.87−4.4850.4580.86−3.4350.435
0.95−61540.1540.95−4.4790.4790.96−4.4640.4640.97−3.3520.352
Yorig = −2 mmYorig = −1 mmYorig = 1 mmYorig = 2 mm
F
(N)
Yest
[mm]
Error
[mm]
F
(N)
Yest
[mm]
Error
[mm]
F
(N)
Yest
[mm]
Error
[mm]
F
(N)
Yest
[mm]
Error
[mm]
0.13−0.1941.8050.120.2711.2710.130.6930.3060.121.0290.970
0.22−0.7181.2810.220.2681.2680.211.2340.2340.221.7360.263
0.31−0.8171.1820.32−0.5880.4110.330.6220.3770.331.1240.875
0.41−1.4440.5550.42−0.6390.3600.410.6440.3550.411.4260.573
0.50−1.4830.5160.50−0.9290.0700.510.7570.2420.501.5850.414
0.60−1.7420.2570.60−0.9730.0260.580.5340.4650.581.4800.519
0.70−1.8070.1920.70−1.2470.2470.700.4240.5750.691.5460.453
0.78−2.0830.0830.78−1.2830.2830.790.6910.3080.771.5760.423
0.87−2.2260.2260.87−1.1550.1550.880.7090.2900.881.8220.177
0.98−2.2710.2710.98−1.1560.1560.980.5550.4440.981.7540.245
Yorig = 3 mmYorig = 4 mmYorig = 5 mmYorig = 6 mm
F
(N)
Yorig
[mm]
Error
[mm]
F
(N)
Yorig
[mm]
Error
[mm]
F
(N)
Yorig
[mm]
Error
[mm]
F
(N)
Yorig
[mm]
Error
[mm]
0.111.3751.6240.111.9202.0790.142.7992.2000.112.4093.590
0.221.8971.1020.212.3901.6090.223.0721.9270.223.6812.318
0.332.2410.7580.310.9581.4840.333.2731.7260.323.8382.162
0.412.1690.8310.392.5151.0310.423.4791.5200.404.2371.762
0.502.3000.6990.502.9681.0770.503.6541.3450.494.3541.645
0.592.3780.6210.592.9220.8460.593.9231.0760.584.7481.241
0.692.4410.5580.693.1530.7750.684.2150.7840.685.1060.893
0.782.4700.5290.773.2440.5810.764.4140.5850.765.1880.811
0.882.6980.3020.873.5960.4030.864.3630.3630.855.2390.760
0.972.7710.2280.973.6920.3070.964.6220.3770.965.3580.642
The contact position estimation errors mentioned in the above two tables are nearly 4 mm∼6 mm in few cases. The main reason behind this is the sensitivity of the tactile sensor setup w.r.t the workbench. It is sightly dependent on the vibrations in the vicinity of the sensor. For instance, when there is a certain activity, such as walking or jumping, happening around the sensor setup, and if the applied force is slightly in the lower magnitudes of 0.1 N, there seems to be vibrations induced into the system. This eventually causes the errors around 4 mm∼6 mm in terms of contact position estimates. However, the overall average error of the contact position still appears to be less than 1.4 mm in × and less than 1 mm in Y, as discussed earlier in Table 5. In addition, the stability of the installation setup heavily influences the performance due to the rotational motions. Therefore, a constructive ground-truth of 45 is calibrated so as to prevent the system installation issues, as shown in Figure A1a. Figure A1 gives a glimpse of various aspects, such as dimensions, camera, and use-cases, that might interest few developers.
Figure A1. Tactile sensor-related aspects: (a) Stable calibration of tactile sensor at an angle of 45 . (b) Measuring the height of tactile sensor (72 mm) using vernier caliper. (c) Width of tactile sensor (44 mm). (d) Tactile sensor camera. (e) Vision-based tactile sensor used in a robotic arm grasping a raw egg. (f) Lifting and placing a bottle. (g) Grasping a tennis ball.
Figure A1. Tactile sensor-related aspects: (a) Stable calibration of tactile sensor at an angle of 45 . (b) Measuring the height of tactile sensor (72 mm) using vernier caliper. (c) Width of tactile sensor (44 mm). (d) Tactile sensor camera. (e) Vision-based tactile sensor used in a robotic arm grasping a raw egg. (f) Lifting and placing a bottle. (g) Grasping a tennis ball.
Sensors 21 01920 g0a1

References

  1. Umbaugh, S.E. Digital Image Processing and Analysis: Human and Computer Vision Applications with CVIPtools; CRC Press: Boca Raton, FL, USA, 2010. [Google Scholar]
  2. Kakani, V.; Nguyen, V.H.; Kumar, B.P.; Kim, H.; Pasupuleti, V.R. A critical review on computer vision and artificial intelligence in food industry. J. Agric. Food Res. 2020, 2, 100033. [Google Scholar] [CrossRef]
  3. Kakani, V.; Kim, H.; Basivi, P.K.; Pasupuleti, V.R. Surface Thermo-Dynamic Characterization of Poly (Vinylidene Chloride-Co-Acrylonitrile)(P (VDC-co-AN)) Using Inverse-Gas Chromatography and Investigation of Visual Traits Using Computer Vision Image Processing Algorithms. Polymers 2020, 12, 1631. [Google Scholar] [CrossRef] [PubMed]
  4. Shimonomura, K. Tactile image sensors employing camera: A review. Sensors 2019, 19, 3933. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Kakani, V.; Kim, H.; Lee, J.; Ryu, C.; Kumbham, M. Automatic Distortion Rectification of Wide-Angle Images Using Outlier Refinement for Streamlining Vision Tasks. Sensors 2020, 20, 894. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Kakani, V.; Kim, H.; Kumbham, M.; Park, D.; Jin, C.B.; Nguyen, V.H. Feasible Self-Calibration of Larger Field-of-View (FOV) Camera Sensors for the Advanced Driver-Assistance System (ADAS). Sensors 2019, 19, 3369. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Luo, S.; Bimbo, J.; Dahiya, R.; Liu, H. Robotic tactile perception of object properties: A review. Mechatronics 2017, 48, 54–67. [Google Scholar] [CrossRef] [Green Version]
  8. Li, W.; Konstantinova, J.; Noh, Y.; Alomainy, A.; Althoefer, K. Camera-based force and tactile sensor. In Proceedings of the Annual Conference Towards Autonomous Robotic Systems, Bristol, UK, 25–27 July 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 438–450. [Google Scholar]
  9. Sferrazza, C.; D’Andrea, R. Design, motivation and evaluation of a full-resolution optical tactile sensor. Sensors 2019, 19, 928. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Yuan, W.; Mo, Y.; Wang, S.; Adelson, E.H. Active clothing material perception using tactile sensing and deep learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 1–8. [Google Scholar]
  11. Yuan, W.; Li, R.; Srinivasan, M.A.; Adelson, E.H. Measurement of shear and slip with a GelSight tactile sensor. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 304–311. [Google Scholar]
  12. Fearing, R.S. Tactile sensing mechanisms. Int. J. Robot. Res. 1990, 9, 3–23. [Google Scholar] [CrossRef]
  13. Chitta, S.; Sturm, J.; Piccoli, M.; Burgard, W. Tactile sensing for mobile manipulation. IEEE Trans. Robot. 2011, 27, 558–568. [Google Scholar] [CrossRef]
  14. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  15. Yamaguchi, A.; Atkeson, C.G. Recent progress in tactile sensing and sensors for robotic manipulation: Can we turn tactile sensing into vision? Adv. Robot. 2019, 33, 661–673. [Google Scholar] [CrossRef]
  16. Hosoda, K.; Tada, Y.; Asada, M. Internal representation of slip for a soft finger with vision and tactile sensors. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Lausanne, Switzerland, 30 September–4 October 2002; Volume 1, pp. 111–115. [Google Scholar]
  17. Kolker, A.; Jokesch, M.; Thomas, U. An optical tactile sensor for measuring force values and directions for several soft and rigid contacts. In Proceedings of the ISR 2016: 47st International Symposium on Robotics, VDE, Munich, Germany, 21–22 June 2016; pp. 1–6. [Google Scholar]
  18. James, J.W.; Pestell, N.; Lepora, N.F. Slip detection with a biomimetic tactile sensor. IEEE Robot. Autom. Lett. 2018, 3, 3340–3346. [Google Scholar] [CrossRef] [Green Version]
  19. Johnsson, M.; Balkenius, C. Neural network models of haptic shape perception. Robot. Auton. Syst. 2007, 55, 720–727. [Google Scholar] [CrossRef]
  20. Naeini, F.B.; AlAli, A.M.; Al-Husari, R.; Rigi, A.; Al-Sharman, M.K.; Makris, D.; Zweiri, Y. A novel dynamic-vision-based approach for tactile sensing applications. IEEE Trans. Instrum. Meas. 2019, 69, 1881–1893. [Google Scholar] [CrossRef]
  21. Ma, D.; Donlon, E.; Dong, S.; Rodriguez, A. Dense tactile force estimation using GelSlim and inverse FEM. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 5418–5424. [Google Scholar]
  22. Wilson, A.; Wang, S.; Romero, B.; Adelson, E. Design of a Fully Actuated Robotic Hand With Multiple Gelsight Tactile Sensors. arXiv 2020, arXiv:2002.02474. [Google Scholar]
  23. Taunyazov, T.; Sng, W.; See, H.H.; Lim, B.; Kuan, J.; Ansari, A.F.; Tee, B.C.; Soh, H. Event-driven visual-tactile sensing and learning for robots. Perception 2020, 4, 5. [Google Scholar]
  24. Pezzementi, Z.; Plaku, E.; Reyda, C.; Hager, G.D. Tactile-object recognition from appearance information. IEEE Trans. Robot. 2011, 27, 473–487. [Google Scholar] [CrossRef] [Green Version]
  25. Zhang, Y.; Yuan, W.; Kan, Z.; Wang, M.Y. Towards Learning to Detect and Predict Contact Events on Vision-based Tactile Sensors. In Proceedings of the Conference on Robot Learning, Boston, MA, USA, 16–18 November 2020; pp. 1395–1404. [Google Scholar]
  26. Begej, S. Planar and finger-shaped optical tactile sensors for robotic applications. IEEE J. Robot. Autom. 1988, 4, 472–484. [Google Scholar] [CrossRef]
  27. Lepora, N.F.; Ward-Cherrier, B. Superresolution with an optical tactile sensor. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 2686–2691. [Google Scholar]
  28. Ito, Y.; Kim, Y.; Obinata, G. Robust slippage degree estimation based on reference update of vision-based tactile sensor. IEEE Sens. J. 2011, 11, 2037–2047. [Google Scholar] [CrossRef]
  29. Yang, X.D.; Grossman, T.; Wigdor, D.; Fitzmaurice, G. Magic finger: Always-available input through finger instrumentation. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, Cambridge, MA, USA, 7–10 October 2012; pp. 147–156. [Google Scholar]
  30. Corradi, T.; Hall, P.; Iravani, P. Object recognition combining vision and touch. Robot. Biomim. 2017, 4, 1–10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  31. Luo, S.; Mou, W.; Althoefer, K.; Liu, H. iCLAP: Shape recognition by combining proprioception and touch sensing. Auton. Robot. 2019, 43, 993–1004. [Google Scholar] [CrossRef] [Green Version]
  32. Piacenza, P.; Dang, W.; Hannigan, E.; Espinal, J.; Hussain, I.; Kymissis, I.; Ciocarlie, M. Accurate contact localization and indentation depth prediction with an optics-based tactile sensor. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 959–965. [Google Scholar]
  33. Johnson, M.K.; Adelson, E.H. Retrographic sensing for the measurement of surface texture and shape. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1070–1077. [Google Scholar]
  34. Johnson, M.K.; Cole, F.; Raj, A.; Adelson, E.H. Microgeometry capture using an elastomeric sensor. ACM Trans. Graph. (TOG) 2011, 30, 1–8. [Google Scholar] [CrossRef] [Green Version]
  35. Yuan, W.; Srinivasan, M.A.; Adelson, E.H. Estimating object hardness with a gelsight touch sensor. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, 9–14 October 2016; pp. 208–215. [Google Scholar]
  36. Kroemer, O.; Lampert, C.H.; Peters, J. Learning dynamic tactile sensing with robust vision-based training. IEEE Trans. Robot. 2011, 27, 545–557. [Google Scholar] [CrossRef]
  37. Meier, M.; Patzelt, F.; Haschke, R.; Ritter, H.J. Tactile convolutional networks for online slip and rotation detection. In Proceedings of the International Conference on Artificial Neural Networks, Barcelona, Spain, 6–9 September 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 12–19. [Google Scholar]
  38. Chuah, M.Y.; Kim, S. Improved normal and shear tactile force sensor performance via least squares artificial neural network (lsann). In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 116–122. [Google Scholar]
  39. Kaboli, M.; Feng, D.; Cheng, G. Active tactile transfer learning for object discrimination in an unstructured environment using multimodal robotic skin. Int. J. Humanoid Robot. 2018, 15, 1850001. [Google Scholar] [CrossRef] [Green Version]
  40. Gandarias, J.M.; Garcia-Cerezo, A.J.; Gomez-de Gabriel, J.M. CNN-based methods for object recognition with high-resolution tactile sensors. IEEE Sens. J. 2019, 19, 6872–6882. [Google Scholar] [CrossRef]
  41. Sferrazza, C.; D’Andrea, R. Transfer learning for vision-based tactile sensing. arXiv 2018, arXiv:1812.03163. [Google Scholar]
  42. Sato, K.; Kamiyama, K.; Kawakami, N.; Tachi, S. Finger-shaped gelforce: Sensor for measuring surface traction fields for robotic hand. IEEE Trans. Haptics 2009, 3, 37–47. [Google Scholar] [CrossRef]
  43. Sferrazza, C.; Wahlsten, A.; Trueeb, C.; D’Andrea, R. Ground truth force distribution for learning-based tactile sensing: A finite element approach. IEEE Access 2019, 7, 173438–173449. [Google Scholar] [CrossRef]
  44. Qi, H.; Joyce, K.; Boyce, M. Durometer hardness and the stress-strain behavior of elastomeric materials. Rubber Chem. Technol. 2003, 76, 419–435. [Google Scholar] [CrossRef]
  45. Moeslund, T.B. BLOB analysis. In Introduction to Video and Image Processing; Springer: Berlin/Heidelberg, Germany, 2012; pp. 103–115. [Google Scholar]
Figure 1. Principle of detection in vision-based tactile sensor technology.
Figure 1. Principle of detection in vision-based tactile sensor technology.
Sensors 21 01920 g001
Figure 2. Problem statement of vision-based tactile sensor mechanism for the estimation of contact location and force distribution using deep learning: Data acquisition and training and inference stage.
Figure 2. Problem statement of vision-based tactile sensor mechanism for the estimation of contact location and force distribution using deep learning: Data acquisition and training and inference stage.
Sensors 21 01920 g002
Figure 3. Equipment setup and schematic: (a) Overall system installation. (b) Flow schematic of visual tactile sensor mechanism.
Figure 3. Equipment setup and schematic: (a) Overall system installation. (b) Flow schematic of visual tactile sensor mechanism.
Sensors 21 01920 g003
Figure 4. Making of tactile fingertips: (a) Defoaming process with upper mold and lower mold structures. (b) Fingertips produced from 3D printing process. (c) Fingertips produced from defoaming injection mold process. (d) Mold injection causes surface light reflection. (e) Sanding the mold surface reduced the light reflection. (f) Vacuum degassing process. (g) Marker painting process.
Figure 4. Making of tactile fingertips: (a) Defoaming process with upper mold and lower mold structures. (b) Fingertips produced from 3D printing process. (c) Fingertips produced from defoaming injection mold process. (d) Mold injection causes surface light reflection. (e) Sanding the mold surface reduced the light reflection. (f) Vacuum degassing process. (g) Marker painting process.
Sensors 21 01920 g004
Figure 5. Selection of tactile fingertip material based on shore hardness (surface hardness): Force displacement characteristic when shore hardness is 40, 60, 70, 80 for t = 1 mm.
Figure 5. Selection of tactile fingertip material based on shore hardness (surface hardness): Force displacement characteristic when shore hardness is 40, 60, 70, 80 for t = 1 mm.
Sensors 21 01920 g005
Figure 6. Choosing the thickness of the tactile fingertip: (a) Force displacement characteristics by thickness at 1 N for shore 70. (b) Force displacement characteristics by thickness at 10 N for shore 70.
Figure 6. Choosing the thickness of the tactile fingertip: (a) Force displacement characteristics by thickness at 1 N for shore 70. (b) Force displacement characteristics by thickness at 10 N for shore 70.
Sensors 21 01920 g006
Figure 7. Stereo camera system: (a) Stereo camera with baseline of 10 mm. (b) Compact stereo camera attached to tactile fingertip.
Figure 7. Stereo camera system: (a) Stereo camera with baseline of 10 mm. (b) Compact stereo camera attached to tactile fingertip.
Sensors 21 01920 g007
Figure 8. Flowchart schematic of transfer learning applied on the images acquired from tactile stereo camera setup.
Figure 8. Flowchart schematic of transfer learning applied on the images acquired from tactile stereo camera setup.
Sensors 21 01920 g008
Figure 9. Pre-processing: (a) Cropping the input data through Region-of-Interest (ROI) setting for the stereo image pair. (b) Types of modes.
Figure 9. Pre-processing: (a) Cropping the input data through Region-of-Interest (ROI) setting for the stereo image pair. (b) Types of modes.
Sensors 21 01920 g009
Figure 10. Network architecture of VGG16 regression model employed in the study.
Figure 10. Network architecture of VGG16 regression model employed in the study.
Sensors 21 01920 g010
Figure 11. Flow schematic of 2D contact area estimation process.
Figure 11. Flow schematic of 2D contact area estimation process.
Sensors 21 01920 g011
Figure 12. Data acquisition procedure for training and testing scenarios: (a) Instrument to conduct experiments. (b) LabVIEW GUI for collecting data under various motions ( X , Y , Z , R x , R y ).
Figure 12. Data acquisition procedure for training and testing scenarios: (a) Instrument to conduct experiments. (b) LabVIEW GUI for collecting data under various motions ( X , Y , Z , R x , R y ).
Sensors 21 01920 g012
Figure 13. Successful case scenario of network training on thin sensor data (Data01) in the form of Mode1.
Figure 13. Successful case scenario of network training on thin sensor data (Data01) in the form of Mode1.
Sensors 21 01920 g013
Figure 14. Successful case scenario of network training on thick sensor data (Data02) in the form of Mode1.
Figure 14. Successful case scenario of network training on thick sensor data (Data02) in the form of Mode1.
Sensors 21 01920 g014
Figure 15. Testing scenarios and outcomes.
Figure 15. Testing scenarios and outcomes.
Sensors 21 01920 g015
Figure 16. Full Scale Output (FSO) (%) output scores for force estimation tests.
Figure 16. Full Scale Output (FSO) (%) output scores for force estimation tests.
Sensors 21 01920 g016
Figure 17. Average displacement error in X , Y contact position w.r.t corresponding ground-truth over 13 points (−6 mm∼+6 mm): (a) Average X-displacement errors (in mm). (b) Average Y-displacement errors (in mm).
Figure 17. Average displacement error in X , Y contact position w.r.t corresponding ground-truth over 13 points (−6 mm∼+6 mm): (a) Average X-displacement errors (in mm). (b) Average Y-displacement errors (in mm).
Sensors 21 01920 g017
Figure 18. Displacement error in Z contact position w.r.t diverse force ranges 0.1 N∼1 N.
Figure 18. Displacement error in Z contact position w.r.t diverse force ranges 0.1 N∼1 N.
Sensors 21 01920 g018
Figure 19. Angular displacement error in R x y axis w.r.t diverse forces ( 0.1 N∼ 0.9 N).
Figure 19. Angular displacement error in R x y axis w.r.t diverse forces ( 0.1 N∼ 0.9 N).
Sensors 21 01920 g019
Figure 20. Estimation of 2D contact area: (a) Contact area estimation w.r.t circular tool. (b) Contact area estimation w.r.t square tool. (c) Contact area estimation w.r.t hexagonal tool.
Figure 20. Estimation of 2D contact area: (a) Contact area estimation w.r.t circular tool. (b) Contact area estimation w.r.t square tool. (c) Contact area estimation w.r.t hexagonal tool.
Sensors 21 01920 g020
Table 1. Insights of traditional and learning-based visual tactile sensing methods.
Table 1. Insights of traditional and learning-based visual tactile sensing methods.
Research StudyMethodologyTactile PropertiesKey Aspects/Limitations
Lepora et al. [27]Bayesian perceptionLocalization
(internal displacement)
40-fold accuracy
compared to traditional
tactile sensor
Ito et al. [28]Adaptive selection
and compensation of
dot positions
Slippage degree
multidimensional force
object contact
Depends on position,
measurements of dots
(tuning is easy)
Yang et al. [29]Magic finger
optical touch sensor
Contact location
force and texture
Can sense the touch
finger XY-footprint
(like optical mouse)
Corradi et al. [30]Object recognition
(vision + touch)
Object shape/textureVector concatenation
object label posterior
Piacenza et al. [32]Elastomer Light
transport mechanism
Contact Localization
and Indentation depth
prediction
Exhibits submillimeter
accuracy 20 mm by 20 mm
active sensing area
Johnson et al. [33]Surface reconstruction
(photometric stereo)
Texture and shapeIn addition, called as
2.5D texture scanner
Johnson et al. [34]Microgeometry using
Elastomeric Sensor
Surface geometryCan only handle
shallow relief geometry
Yuan et al. [35]Object hardness with
GelSight touch sensor
Fine texture,
contact force
and slip conditions
Infer object hardness
without prior knowledge
Kroemer et al. [36]Dynamic tactile sensing
using weak pairing
(vision + tactile samples)
Visual shape and
surface texture
Machine Learning with
lower dimensional
representation of tactile data
Meier et al. [37]Tactile DeepCNN for
Online Slip and
Rotation Detection
Classify contact state
Distinguish rotational and
translation slippage
Final classification
rate is > 97 %
Feasible for adaptive grasp control
Chuah et al. [38]Least Squares ANN
improving shear force
with better optimization
Normal and Shear
tactile force
Better convergence with
Multi-input, multi-output
function approximator
Kaboli et al. [39]Probabilistic active
tactile transfer learning
Surface texture, stiffness,
and thermal conductivity
72 % discrimination accuracy
only one training sample
(on-shot-tactile-learning)
Gandarias et al. [40]Custom CNN (TactNet)
for object recognition
with RGB pressure images
Contact objects
Identify tactile pressure
Used 8 transfer
learning networks, 3
TactNet scratch training
Sato et al. [42]Compact finger-shaped
GelForce sensor for
surface traction fields
Measuring distribution
of force vectors
or surface traction fields
Small size with linearity
of force < 4 N
with refresh rate 67 Hz
Sferrazza et al. [43]Commercial force tactile
sensor images are
matched to groundtruth
data for DNN training
Measuring contact force
and contact center of
the sensor’s surface
Refresh rate of 40 Hz,
performance is dependent
on reference axes alignments
Current studyTransfer learning-based
CNN training using
tactile sensor images
matched with groundtruth
Measuring contact force
contact position in
X , Y , Z , R x , R y and
contact size in mm 2
Refresh rate of 30 Hz
with spatial resolution of
2.5 mm and size of sensor is
high than that of References [42,43]
because of stereo-camera
Table 2. Design aspects and specifications of the tactile fingertip sensor.
Table 2. Design aspects and specifications of the tactile fingertip sensor.
Design AspectsSpecifications
Sensor surface material (including markers)Rubber
Sensor size (width × height)44 mm × 72 mm
Spatial resolution2.5 mm
Refresh rate (sampling frequency)30 Hz
No. of Protrusions292
Table 3. Design aspects and specifications of the stereo camera system.
Table 3. Design aspects and specifications of the stereo camera system.
Design AspectsSpecifications
Camera resolution640 × 480 with 30 fps
Pixel size3 um × 3 um
Image sensor size1/4”
Image active area3888 um × 2430 um
Signal-to-Noise ratio39 dB
Scan modeProgressive
Lens module3.4 mm/F2.8
PowerDC 5 V/150 mA
InterfaceUSB 2.0
Table 4. Dataset employed for training, validation, and testing.
Table 4. Dataset employed for training, validation, and testing.
CategoryTraining (Point)Validation (Point)Testing (Point)
Data01
(Left + Right)
3380 + 33801680 + 16801690 + 1690
Data02
(Left + Right)
2730 + 2730910 + 910910 + 910
Total
(Data01 + Data02)
12,22051805200
Number of images for total
10 points (0.1 N interval from 0.1 N to 1 N)
122,20051,80052,000
Table 5. Force estimation errors; Each error reading is an average of force estimation error recorded under the force range (0.1 N∼1 N) with 0.1 N interval for each test.
Table 5. Force estimation errors; Each error reading is an average of force estimation error recorded under the force range (0.1 N∼1 N) with 0.1 N interval for each test.
Test No.
(1∼5)
Force Estimation
Error (N)
Test No.
(6∼10)
Force Estimation
Error (N)
10.02760.022
20.02670.021
30.02380.018
40.02390.022
50.022100.023
Average Error on all 10 tests0.022
Table 6. Contact area estimation w.r.t different shaped contact tools (circular, hexagonal, and square).
Table 6. Contact area estimation w.r.t different shaped contact tools (circular, hexagonal, and square).
Shape
(Contact Tool)
Number of Images
(Testing)
Error Rates
in (%)
Circle200.597
Square180.926
Hexagon182.857
Total Average Error561.429
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kakani, V.; Cui, X.; Ma, M.; Kim, H. Vision-Based Tactile Sensor Mechanism for the Estimation of Contact Position and Force Distribution Using Deep Learning. Sensors 2021, 21, 1920. https://doi.org/10.3390/s21051920

AMA Style

Kakani V, Cui X, Ma M, Kim H. Vision-Based Tactile Sensor Mechanism for the Estimation of Contact Position and Force Distribution Using Deep Learning. Sensors. 2021; 21(5):1920. https://doi.org/10.3390/s21051920

Chicago/Turabian Style

Kakani, Vijay, Xuenan Cui, Mingjie Ma, and Hakil Kim. 2021. "Vision-Based Tactile Sensor Mechanism for the Estimation of Contact Position and Force Distribution Using Deep Learning" Sensors 21, no. 5: 1920. https://doi.org/10.3390/s21051920

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop