1. Introduction
Haptic texture is a crucial cue to inform users of an interaction state with the surface texture, i.e., fine geometric surface features of an object. Haptic textures have received substantial attention in various fields [
1,
2,
3]. Previous researchers have shown that the addition of haptic texture cues can greatly improve the realism of the virtual environment [
4]. Therefore, the haptic texture rendering technique has several potential applications, such as virtual surgical simulation, haptic feedback teleoperation, on-line e-commerce, and aiding the visual impaired [
5,
6,
7].
In general, there are three basic methods to render haptic textures. The first is the sample-based approach. S. Andrews introduced a system which is based on a tactile probe and a visual tracker for scanning and synthesizing tactile textures [
8]. H. Vasudevan estimated the surface texture according to the frequency spectrum of vertical perturbations by dragging the tip of the tactile device on the object surface [
9]. A. Song developed a PVDF-based haptic texture sensor by imitating human active texture perception to measure real object surface texture for haptic texture rendering in virtual reality [
10]. V. Bove used holograms to record the surfaces and textures of objects in a holo–haptic system. The produced haptic images were felt and shaped by a handheld device [
11]. The second is the procedural texture approach which uses mathematical functions to synthesize the surface texture of objects. Several related researchers were summarized in detail [
12,
13,
14]. The third is the image-based approach. This method constructs a texture force field from 2D image data. Therefore, this kind of haptic texturing approach can also be considered as a type of VR [
15]. L. M. Benjamin computed an elevation map based on the luminance coefficient for Texel images using four different techniques. These techniques were based on the fact that the height value of the pixel is proportional to the luminance value. Then, the elevation map is used to generate 3D bumps on the surface of the detected object, and calculate the corresponding tactile force for haptic rendering [
16]. J. Wu and A. Song processed the 2D images with a Gaussian filter to obtain low-frequency components, and then subtracted the filtered image from the original image; the left components denote the texture information. The forces simulated from textures were applied to the user using a Delta haptic device [
17]. S. Xu proposed an image-based haptic texture generation approach by replacing the Gaussian filter with an improved switching vector median filter for modeling the textured force and simulating the haptic stimuli [
18]. Vasudevan used the conventional edge detection algorithms and proposed a design method of haptic mask to allow the user to feel the contours and textures of the image using haptic devices [
19]. J. Lu and A. Song presented a haptic texture rendering method based on color temperature and luminance to construct 3D texture force fields of 2D color images [
20]. E. R. Vimina and Divya proposed a fixed size descriptor based on local strength for texture calculation, and further expanded texture information by multi-channel color data collection [
21].
Owing to the fact that the image-based haptic texture rendering approach has potential advantages of cost-effective realization, it has attracted substantial attention from researchers. However, the existing haptic texture rendering methods have problems in dealing with the images with fine geometric texture features.
In this paper, we propose a novel fractional differential method for image-based haptic texture rendering. Fractional differentiation is currently a new tool for image signal processing. Complex texture details will show highly self-similar fractal information in image signals, and the mathematical basis of fractal theory includes fractional differentiation [
22,
23]. Therefore, the features of complex texture in the image can be extracted by fractional differentiation and applied to the haptic texture reconstruction.
This paper introduces the Grünvald–Letnikov (G–L) definition of fractional differential in Euclidean space [
24,
25]. Based on the G–L definition, the isotropic m × n fractional differential mask is deduced. Then we propose a novel method to adaptively select the order of a fractional differential operator by using the composite sub-band gradient vector (CSGV) relate with the wavelet decomposition [
26] and human visual characteristics [
27,
28]. Thirdly, we apply the approach in haptic texture rendering and give a quantitative analysis by using the image information entropy and multi-scale structural similarity (MS-SSIM). We apply these extraction results to the haptic display system to reconstruct the three-dimensional texture force filed to render the texture surface of 2D images. Finally, some experiments were carried out on different types of texture images. Experimental results show that the proposed haptic texture rendering method based on the adaptive fractional difference can extract texture features well and obtain an excellent texture force field of 2D images.
2. The Advantage of Fractional Differential
Fractional differential processing of signals can not only enhance the high-frequency components of a signal nonlinearly, but also enhance the intermediate frequency components of the signal nonlinearly to a certain extent, while retaining the low-frequency components of the signal nonlinearly [
29,
30]. Using this property of fractional differentiation, we can preserve the low-frequency contour information of digital images while nonlinearly augmenting high-frequency detailed texture patterns with wider gray distributions. Finally, the enhanced image is subtracted from the original image to obtain the texture extraction result.
The Grünvald–Letnikov (G–L) definition, Kaputo definition, and Riemann Liouville (RL) definition are three commonly used definitions of fractional differential under Euclidean Metric [
29]. Recent research indicates that the implementation of differentials in digital image processing is almost always based on the G–L concept. The Tiansi operator mask that is constructed according to the definition of G–L will actually have inaccuracy in image processing, because it is discrete in digital images and is an approximate expression of functions, so the effect of image texture extraction is often unsatisfactory.
Therefore, we add adaptive augmentation to the G–L fractional differentiation definition, and find it is suitable for image texture acquisition.
The low-frequency components of the image can be well preserved under the fractional differential operator mask filtering. The output gradation value is dramatically enhanced for nearby pixels whose gray value fluctuates rapidly in the area (including picture borders and texture regions), showing that the fractional differential operator mask may significantly improve the original image’s high-frequency component.
3. Differential Order for Adaptive Selection Algorithm
Texture is one of the most essential image processing and analysis properties. Texture gives intuitive assessments of qualities, such as regularity, coarseness, and smoothness. The majority of texture analysis methods analyze the picture at a single scale. As revealed by J. Beck et al. [
31], the visual cortex may be modeled as multiple channels, where each channel can perceive a specific direction and frequency tuning. Multi-scale texture analysis techniques are propelled by multichannel processing. Several multichannel texture analysis systems have been suggested [
32,
33]. In the past 10 years, the rapid development of wavelet theory has also brought new theories and methods to the field of image processing. I. Daubechies suggested a discretization approach for wavelet transform [
34]. The relationship between multiresolution theory and wavelet transforms was further developed by S. G. Mallat [
35]. Since then, wavelet theory has developed into a multi-scale (multi-resolution) mathematical tool in image analysis. The use of multi-scale methods in texture image analysis is based on the premise that lower resolution channels can better record “large” textures, while higher resolution channels can better record “small” textures.
The following describes the process involved. Apply wavelet and scaling filters to the image both horizontally and vertically, then sub-sample each output image by 2-1. This produces a coarse or approximate image
Cj and three detail images with direction-selectivity
Dj,k, where
k = 1,2,3 and
j represents the decomposition level. The same method is utilized to construct the following level of hierarchy resolution. Therefore, the hierarchical wavelet decomposition of the image is expressed as:
where C0 = I represents the original image, ↓1,2 means down-sampling every other pixel in the y direction, ↓2,1 means down-sampling every other pixel in the x direction, and * means the convolution operator.
Gx and
Hy and
Gy and
Hy represent high-pass and low-pass filters in the x and y directions, respectively. Therefore, the original image can be represented by a series of sub-images of multiple scales, {
Cj,
Dj, k} (
j = 1, …, J;
k = 1, 2, 3) is the multi-scale representation of image I at depth J. DAUB4 is used as the wavelet basis in this instance because of its superior average performance.
Gradient direction image may provide texture analysis with useful properties [
36]. After the image is processed by the gradient operator, the change amplitude and direction of the pixel gray value can be obtained, and the image gradient describes the change trend of the image in different directions. A combination of a series of low-pass (H) and high-pass (G) filters enables wavelet decomposition, using multiple sets of filters for sampling, each set at half the sampling frequency of the previous set. Therefore, the original image can be processed to obtain four sub-images, namely:
LL sub-image: low frequencies in both x and y directions.
LH sub-image: low frequencies in the x direction and high frequencies in the y direction.
HL sub-image: high frequencies in the x direction and low frequencies in the y direction.
HH sub-image: high frequencies in both x and y directions.
LL, LH, HL, and HH are four sub-images obtained by wavelet decomposition. A gradient vector is constructed for each sub-image, denoted by , , , , respectively. We define , where is a “superimpose” operation. Therefore, CSGV can better describe the expression of image texture than the original gradient vector.
Studying the human visual characteristics reveals that the sensitivity of the human eye to the gray value in the range of 0~255 in the gray image is not constant. When the gray value is particularly high or low, it is difficult for the human eye to perceive the grayscale change of the intensity value. In the vicinity of gray level 0, the human eye can only feel the change of gray level 8, while in the vicinity of gray level 255, the human eye can only feel the change of gray level 3, and when the gray level is 128, the human eye can perceive changes in 2 gray levels [
37]. In digital image research, the gradient size of a certain point in the image is calculated by the change rate of the gray value of the point. The above-mentioned gradient coincidence vector CSGV represents the gray value change rate of the image at multiple scales. Therefore, we regard the pixels with CSGV less than 2 as a region with constant grayscale, and the differential order is 0; the pixels with CSGV in the range of 2~128 are regarded as regions with small grayscale changes, and appropriately increasing the differential order can enhance the human eye’s perception of fine textures; while the pixels with CSGV greater than 128 are usually edge contour areas, it must have a correctly limited gradient interval, and the differential order should be appropriately reduced. From the above analysis, we have established the function of fractional derivative
γ and CSGV:
Among them,
is any positive number;
is the maximum value of
of all pixels in the image;
is an artificially set, and its purpose is to enhance the effect of the center pixel on the neighboring pixels. The condition of
should satisfy the following formula to ensure that the value of the differential order
γ does not exceed 1:
When
,
, from Equation (3)
, so take
= 0.499. When
, intended to use
,
, so take
= 0.666. Therefore, the relationship between the differential order
γ and
CSGV is expressed as:
According to these equations, the gray value varies drastically along the image’s edge contour, and the CSGV is bigger, thus γ should be reduced accordingly. For densely textured areas, the grayscale variation and CSGV are small, so the fractional derivative γ obtained is appropriately increased. For areas where the gray value does not change or changes very little, γ is 0 and no processing is performed to maintain the gray value.
4. Texture Extraction Performance Evaluation
This section aims at demonstrating that the proposed adaptive algorithm based on CSGV has better capability using in the texture extraction.
For fractional differential G–L, we set the 0.3, 0.55, and 0.7 as the fixed fractional differential order. In order to analyze and compare the differences in the ability to obtain texture information between the adaptive order differential and the specified fractional order differential, five groups of comparative experiments were conducted, and their results are shown in
Figure 1. These results of five sets of comparative experiments show the advantages of fractional difference in extracting complex texture information. Although the 0.3-order differential extraction effect can preserve the detailed texture well, the texture extraction effect is too weak to be clearly displayed on the picture (as shown in
Figure 1(a1,b1,c1,d1,e1). However, as the differential order increases (between 0 and 1), image enhancement promotes “large” textures to sharper images, but “small” textures are lost. The adaptive differential order selection algorithm adopted in this paper can retain almost all texture details to achieve the best texture extraction among five groups (as shown in
Figure 1(a4,b4,c4,d4,e4).
Multi-scale structural similarity and information entropy (MS-SSIM) is used as an evaluation criterion to evaluate the effect of texture information extraction in images [
38,
39]. Among them, information entropy can indirectly reflect the amount of information contained in grayscale images, and is very sensitive to images containing textures. Therefore, we adopt the calculation method of information entropy to analyze the amount of information in the image texture after extracting the image texture features. The information entropy is calculated as follows:
Because the human vision system is very suitable for extracting structural information from scenes, measurement results with similar structures can provide a reference for whether image quality is good for perception. The MS-SSIM value of an image is usually used to compare the quality of two images and is an objective evaluation method that replaces the human subjective perception [
39]. We use it to compare the differences between the extraction results after using the adaptive method and the specified order differential method, thereby further verifying that the texture extraction effect of the adaptive method has obvious advantages.
A comparison method for structure, brightness, and contrast is given in Ref. [
39]:
where
,
are two image patches extracted from the same spatial location from two images, respectively. And
,
and
are the mean of
, the variance of
, and the covariance of
and
, respectively.
C1,
C2 and
C3 are small constants given by
where
L is the dynamic range of the pixel values (
L = 255 for 8 bits/pixel gray scale images), and
K1 << 1 and
K2 << 1 are the two scalar constants.
The procedure of the MS-SSIM method for image structural similarity assessment is illustrated in
Figure 2. The two images to be compared are used as input signals, and then the low-pass filter is applied for iteration, and the filtered image is down sampled by factor 2. The original image index is scale 1, and the highest index is scale
M. At the
j-th scale, the contrast comparison
and the structure comparison
are calculated, respectively. The luminance comparison is computed only at scale
M as
. Thus, the MS-SSIM evaluation is obtained by combining the measurements at different scales using:
where
,
and
are parameters to define the relative importance of the three components. To simplify parameter selection, we set
=
=
= 1.
5. Texture Extraction Results Analysis
The analysis and comparison results are shown in
Figure 3. The information entropy obtained by the adaptive method is marked in blue, and the red curve shows the information entropy obtained from order 0.05~0.95 (step length is 0.05 order). It can be seen that the result of the adaptive method is close to order 0.5~0.7, which is the fractional order interval with the best extraction effect. The grey curve shows the structural similarity of the adaptive results. When the MS-SSIM value of the extraction result of each specified order is closer to 1, it is closer to the adaptive result. From
Figure 3, we find that the extraction results of order 0.5~0.7 are most similar to the results of the adaptive method. The comparison result of
Figure 3 is the same as that of
Figure 1. Quantitative analysis shows that the adaptive method can improve the effect of texture extraction with less losing of texture details.
Using the statistics constructed based on the gray level co-occurrence (GLCM) matrix to calculate the physical information of the texture image, and select four commonly used statistics (Set the offset to 1 and the direction to [0°, 45°, 90°, 135°]), as shown in
Table 1, where the rows correspond to the pictures in
Figure 1, and the columns correspond to the directions.
AMS is a measure of the uniformity of image grayscale distribution and texture thickness. Entropy measures the randomness contained in an image and expresses the complexity of image texture. The contrast reflects the clarity of the image and the depth of the texture. The more obvious the “large” texture, the greater the contrast. The correlation reflects the consistency of the image texture.
After the texture image is processed by the adaptive fractional differential method, the “small” texture is significantly enhanced, so the values of AMS and entropy are increased. The gray value of the “small” texture is closer to that of the “large” texture than before, so the contrast value is reduced.
6. Haptic Texture Rendering Model
Haptic texture rendering is a method to reconstruct the surface attributes of virtual objects according to force fields or force vectors, so that users can feel the surface texture of virtual objects haptically by using haptic devices (such as Phantom, Force-Dimension Delta hand controller, etc.). In this section, we use the adaptive fractional differentiation method to provide a new tactile texture model based on the image texture extraction results.
The texture force vector
at each pixel of an image can be modeled as the combination of normal force vector
and tangential force vector
as
The tangential force vector of the image is calculated based on the following assumption. There exists an interaction force between any two pixels in the gray image after processed by the proposed adaptive fractional differential method. The interaction force between any two pixels
Pi and
Pj is proportional to the absolute vale of the difference of the two pixel’s gray values, and inversely proportional to the distance between two pixels. The direction of interaction force vector is defined as from pixel with high luminance value to the pixel with low luminance value.
where
is distance between two pixels
and
, and
,
are grey value of pixels
and
, respectively.
denotes the direction from high luminance pixel to low luminance pixel.
If the distance between two pixels
and
is small, the difference of the two pixel’s gray values will cause a big interaction force, and vise versa. Thus, the force vector of the pixel
can be defined as the vector sum of all interaction force vectors from pixels within
n ×
n neighborhood
N to the center pixel
.
The force vector of each pixel is a two-dimensional vector associated with pixel grey change direction of image, as illustrated in
Figure 4.
Figure 4b is an amplified picture of the small red square area in
Figure 4a, which shows the tangential force vectors of some pixels computed by Equation (11) with the 3 × 3 neighborhood N. Here, the arrow direction represents the direction of force vector while the arrow length represents the amplitude of force vector.
When the grey values of neighbor pixels change more greatly, the pixel force vector is larger, and vice versa. Since this pixel force vector is consistent with the texture information, it can be regarded as the texture tangential force component.
According to the psychological principle of the human sense of color and space, when a person observes the environment, he always feels the brighter object is closer to him than the darker object [
40]. Therefore, we can define the normal force vector
to be proportional to the image pixel gray value as
where
c is a proportion factor, and
fwall denotes the constraint force of the object surface. The above equation implies that if a portion of an image is brighter, then the rendered normal force is larger, which provides the user with feeling of bump when the virtual object is touched; and if the case is darker, then the rendered normal force is smaller, which expresses feelings of being shallow.
7. Experiment
The experimental system for haptic texture rendering of 2D images consists of a Phantom Omini haptic device and a computer, shown in
Figure 5. The Phantom Omini haptic device has a six degrees of freedom position/attitude detection and a three degrees of freedom force feedback with the maximum force of 3.3 N. Its workspace is 160 mm width × 120 mm height × 70 mm depth with location accuracy 0.055 mm.
In this experimental system, we selected five images with 500 × 500 pixels from the Brodatz texture image database [
41] and used the proposed adaptive fractional differential method to extract the texture features, shown in
Table 2. Then, we used the proposed haptic texture rendering model to render the object surface based on the extracted texture. In order to verify the effect of our method, 20 volunteer subjects (10 male and 10 female, aged 21 to 31) is randomly selected to perform the texture perception experiments. The subject just haptically felt the surface of 25 2D images, which are stochastically produced by a computer one by one, and classified them into 5 types of images by using the Phantom Omini haptic device without any visual information of the 2D image.
The appearance of the original image and the texture image are hidden, and only the calculated texture force is mapped to a smooth virtual plane of 500 × 500 pixels. The calculation of the constraint force of the virtual plane is based on Hooke’s law. When the volunteers “groped” the “blank” virtual plane, the hand controller fed back the texture force of the image together with the constraint force of the virtual plane to the subject. Volunteers need to select the image to be reproduced by force/haptic sense of texture from five original images according to the perceived texture, and then count the correct rate of volunteers’ perception of each image.
The experimental results show that the average classification accuracy of 5 types of images based on haptic feelings are 87%, 72%, 81%, 91%, and 83%, shown in
Figure 6. It is obvious that the proposed haptic texture rendering method can help users understand the texture contents of the image. So, it is an effective approach of image-based haptic texture rendering.
Further, we conducted another group of experiments, we selected four 2D images, and extracted their texture features through the proposed adaptive algorithm, shown in
Table 2. It can be seen from the extraction results that the image texture extracted by the adaptive fractional differentiation algorithm is clear and the details are retained completely, which indicate the excellent effect of the adaptive fractional differentiation algorithm in extracting the detail texture.
The TV-Gabor model is used for comparison (the TV-Gabor model decomposes the image by using the prior conditions of frequency and texture direction, so as to distinguish the contour shape of the image body from the texture part), and selected 4 real object images with textures, shown in
Table 3. Similarly, we used the proposed Haptic texture rendering model to render the object surface based on the extracted texture. In the cases where the volunteers were blindfolded, we calculated the correct rate of each image recognition, shown in
Figure 7. It is verified again that the texture feature outputs by the tactile texture rendering model conform to the more realistic perception of volunteers.