1. Introduction
With the rapid development of information technology, artificial intelligence (AI) technology is gradually attracting the attention of the industry. Because of the popularity of cell phones, many different applications can be downloaded and installed through online stores. There are many different image processing applications available on the online market. Users can select an original image and choose the desired style, such as hand painting or impressionist painting, to convert the image. After applying the processing, the original image is automatically transformed into a new style image with the desired style. An ideal style transfer technique can serve the scene of the original image in advance and then combine it with the color and style of the target image. How to consider both the color and style of the target image has become an important issue today. P. P. Galanter et al. [
1] proposed a definition of generative art where a machine can perform an automatic computation through an algorithm or a set of rules and can eventually generate a work of art. Image transfer techniques can be divided into two parts: image color transfer and image style transfer. Jimenez-Arredondo et al. [
2] summarized the following three main approaches to image color transfer: geometry-based methods, user-assisted schemes, and statistical methods. First, when similar features appear on two images, the geometry-based method [
3] can automatically search for features that correspond to each other, allowing the corresponding features to be color transferred, and finally creating similar features with similar colors. This method is widely applicable to real-world images. If the structures of the content image and the target image are very different, the geometry-based color transfer method cannot successfully find the corresponding features, so these images are not suitable for color transfer. Second, the user-assisted solution [
4] requires manual user intervention. The user needs to mark the corresponding feature positions of the content image and the target image and then perform the color transfer. This approach can provide better results in terms of the color transfer of features and is less prone to color transfer errors between features. However, the disadvantage of the user-assisted solution method is that if the color distribution of the structure of the content image is more detailed than that of the target image, the user may need to point out the locations of more corresponding features. Although the user-assisted solution offers better results for image color transfer, it also means that this approach is not efficient. Third, the statistical method [
5,
6,
7] is applicable to the case where the features of the content image and the target image do not directly correspond to each other. Reinhard et al. [
5] used a simple statistical analysis to impose one image’s color characteristics on another. It can achieve color correction by choosing an appropriate source image and applying its characteristics to another image. Papadakis, N. et al. [
6] proposed a variational formulation for the intermediate color histogram equalization of two or more images. Cepeda-Negrete, J. et al. [
7] used a classical color transfer method to obtain first-order statistics from a target image and transfer them to a dark input, modifying its hue and brightness. The statistical method calculates the ratio of pixel values between the content image and the target image and then performs the image color transfer. The results of the image color transfer are better if the two images have similar structures.
The scales of the image color transfer can be divided into two types: global color transfer and local color transfer. Reinhard et al. [
5] proposed a statistically based method for image color transfer, which is a global color transfer method. This method transforms the image into a suitable color space beforehand and then adjusts the image color by averaging and performing standard deviation on the target image and content image for each pixel. This image color transfer by statistical method is very fast. However, the following limitations may occur. If the color of the target image is more diverse, the color transfer result of the content image will be unnatural. Liu et al. [
8] proposed a method to allow users to actively select regions that can correspond to two images for the color transfer between images. By interacting with the user, the region of interest is selected for the color transfer between the target image and the content image. Therefore, owing to the relative abundance of image colors, much human effort is required for manual marking. Khan et al. [
9] proposed that multiple target images be used to perform image color transfer. This is a local color transfer algorithm between multiple images based on simple statistics and local linear embedding for edit propagation. The color features of multiple target images are fused into the content image by the user’s labeling and statistical methods for the corresponding images. In this approach, the closer the structure of the content image and the target image (sky and sky, sea level and sea level, or flower and flower), the more effective the color transfer will be. As can be seen, the limitation of image color transfer is that it cannot simulate the texture pattern of the target image. Liu et al. [
10] proposed an emotional color transfer method for texture perception of images or target images based on emotional words. This is a local color transfer method, so the content image is segmented, and the primary color of the texture is extracted. This method considers image segmentation methods in terms of object segmentation and primary color extraction, which can solve the problem of the same block being assigned to different target colors.
However, the above image color transfer method can simulate only the color technique of the target image but not the artist’s brushstroke pattern or the style of the target image. Jimenez-Arredondo et al. [
2] proposed an image style transfer method that considers both image color transfer and image style transfer. In this method, the color transfer is first performed between the content image and the target image, and then it is merged with the canvas image that resembles the style of the target image to express the Fauvism. Since this method uses any image to mimic the texture of the target image, the texture of the image generated by this method is different from the texture of the target image. Gatys et al. [
11] proposed a deep learning-based image style transfer method. The Gram matrix is used to characterize the image, and the features of the content image and the target image are combined to produce a new style image. During the image style transfer, the colors and textures on the image are transferred together. The method proposed by NVIDIA and X. Li et al. [
12] applies linear style transformation and two small convolutional neural networks. This method is relatively computationally efficient and can save computational costs compared to algorithms that require GPU computation. In the method of X. Li et al. it was found that the style-converted image could keep the edges of the content image, but the style in the target image was less obvious. Since deep learning can extract the features of the image [
13], the feature maps of each convolutional layer can be displayed by filters. There are many types of filters in addition to edge detection and color detection.
Liu et al. [
14] used pre-training VGG19 in the encoder to capture the features of the style image and content image, and then fused the features with AdaIN. To obtain the edge feature from the content image, a refine network is designed to enhance the image edges by using a modified edge detection network, HED. Because the encoder only focuses on the global feature but ignores the detailed part, the result generated by the transform module has blurred image edges. The disadvantage of this approach is that maxpooling selects features from the feature map and passes them to the next layer, which can filter features and reduce the number of parameters but may also lose other important features. In addition, the VGG19 architecture focuses on global features, and as the network deepens, the texture features of the previous layer may not be passed to the next layer. Because the style transfer focuses on the texture, i.e., the lower-level features in the previous layers, it may lead to obtaining under-styled generated images. In Hung et al.’s proposed method [
15], the content image and reference image are used in the semantic match stage to extract the image features by VGG19, and the cosine similarity matrix is calculated to generate the semantic-assisted image. The generator of the translation network uses an autoencoder structure, including one encoder, two decoders (Task 1, Task 2). Task 1 is used to generate the image in the target domain, the U-net design is used in the decoder to enhance the image features and avoid gradient disappearance. Task 2 is used to output the image segmentation map as an attention mask, which allows the model to learn information about the attention regions. Finally, a discriminator is used to determine the generated image. The drawback of Hung et al.’s method is that the semantic match stage calculates the feature similarity between content image and reference image to preserve the spatial and semantic similarities and to match them, while VGG19 lacks multi-scale feature fusion and the feature map of each scale is not operated, which may lead to the lack of detailed features in semantic messages. VGG19 lacks multi-scale feature fusion, and the feature maps at each scale are not manipulated, which may lead to the lack of detailed features in the semantic information. Because the lower-level features have high resolution, they can contain more location and detail information, but less semantic information. The higher-level features have stronger semantic information, but lower resolution and poorer performance for details. Combining the two retrieved features can provide a better match between the color and style of the target image. The proposed method of image color transfer is a type of local color transfer; therefore, it can improve the result of using global color transfer. By selecting effective features in the convolutional layer, there is no need for manual adjustment to achieve a good visual effect. Since the proposed method improves image quality by automatically simulating the distribution of the color and style of the target image and controlling the parameters without human intervention, image transfer results can be balanced between the color and style of the target image.
In this paper, we propose a local color transfer method for the content image and the target image. First, an adaptive multilevel cut is performed based on the luminance distribution of the two image pixels, and then color transfer is performed for each region. Next, deep learning is used to select the effective features of the target, and the degree of effective features of the convolutional layer is judged by the structural similarity index (SSIM) and black blocks. The selection of convolutional layers with more effective features reduces the limitations of deep learning-based transfer that require manual control of parameters. The experimental results show that the proposed method can improve the image quality by automatically simulating the color and style of the target image and controlling the parameters without manual intervention.
This paper is organized as follows.
Section 2 reviews related works.
Section 3 introduces the proposed method.
Section 4 shows the experimental results of the proposed method. Finally,
Section 5 presents conclusions.
3. The Proposed Method
An ideal image transfer technology transfers the color and style of the target image to the content image. In the proposed method, two images are taken: an original image (referred to as a content image) and an image that will be transferred (referred to as a target image, such as an artwork made by a famous painter). The target image contains two important components: color and style. Color can be used to express the emotion of the painter, and the style of the image can show the painting technique of the painter. As shown in
Figure 3, the proposed framework has two main steps. In the first step, after undergoing the color transfer phase (the red line in
Figure 3), there will be an output image,
Figure 3c. In the second step, the target image,
Figure 3b, will be presented, and the content image,
Figure 3c, will perform the color transfer; through the style transfer (green line in
Figure 3), the new image will be the resulting image,
Figure 3d.
3.1. Adaptive Multilevel Coloring Scheme
In this section, performing the adaptive multilevel color transfer of an image is described. The first phase explains how the proposed method performs adaptive multilevel settings based on the image, and the adaptive multilevel limits between the content image and the target image. The second phase describes how to perform the color transfer after the adaptive multilevel settings are completed for both images.
3.1.1. Adaptive Multilevel Setting
In addition to the color of an image, the luminance distribution of the image is an important factor. In this phase, the proposed method brings both the target image and content image into the chromatic channel of the CIELAB color space [
21,
22,
23]. The CIELAB color space was defined by the International Commission on Illumination (CIE). CIELAB is known for its perceptual uniformity, and its L component more closely matches human perception of lightness than other color spaces. Otsu’s thresholding [
24] can divide the image into foreground and background. In this proposed method, the image is divided into multiple foregrounds and backgrounds first, and then the image color transfer is performed. According to the luminance of the target image, this paper used several thresholds to split the target image and content image into multiple regions. In
Figure 4, after using the content image and the target image as inputs to perform the adaptive multilevel color transfer, the color of the content image is similar to that of the target image.
With the adaptive multilevel coloring scheme, the image is split into several regions. The range of the L component of the image is
. The range is represented as
and
, and
. Here, if
, then
and
are used to represent 0 and 255, respectively. These two values are the initial values. If the pixel values of the image are between
and
, then the mean
and the standard deviation
of this region will be received. In this region, the luminance level
of the image is regarded as
. According to each luminance level of
, the total number of each luminance level will be
. The mean
and the standard deviation
of this region are indicated in Equations (2)–(4).
Then, Equations (5) and (6) are used to obtain two new thresholds,
, from the mean
and the standard deviation
above.
With the two luminance level thresholds and , there are three new regions. These three regions are in ranges , , and respectively. In the region and the region , two new luminance level thresholds, and , are given by Equation (2). The range of the L component in the region is the new range. and are regarded as and . Then, the next two new luminance level thresholds can be computed by Equations (2)–(6).
If the two luminance level thresholds are too close, they cannot effectively separate the images. The following is the limit on the selection of the final luminance level thresholds. The differences between thresholds are calculated and sorted from small to large. If the threshold difference between the previous threshold and the next threshold is less than , then the threshold will be deleted. represents the number of thresholds. After adjusting the number of thresholds, if the total number of remaining thresholds is less than, then the number of thresholds will be .
However, the total number of regions in the target image are used as an index to control the total number of regions in the content image. For example, if the total regions of the target image are less than the total regions of the content image, the proposed method is based on the total regions of the target image and then reduces the number of thresholds of the content image until the thresholds of the two images become the same.
Figure 5 shows the results of the adaptive multilevel color transfer between images.
3.1.2. Adaptive Multilevel Color Transfer
Color transfer mainly recolors the content image to the color of the target image. In this paper, the color transfer method proposed by Reinhard et al. [
5] is used because this statistically based color transfer method is simpler and faster than other methods. According to the threshold setting in the previous step, each region of the image is color-transferred separately. For example, the regions in
Figure 5c,g are color-transferred together.
Suppose the target image is set to the total
thresholds {
thresholdT1,
thresholdT2,…,
thresholdTt} by luminance distribution, and the content image is also set to the same number of thresholds {
thresholdC1,
thresholdC2,…,
thresholdCt}. Color transfer for each pixel of each region
for the two images is performed.
T and
C represent the target image and the content image, respectively.
For example, the pixel value in the region
=
of the target image color-transfers with the pixel value in the same region
of the content image. Then, the regions of the two images are displayed as
and
. The process continues until all the regions have finished the color transfer. Equations (8) and (9) calculate the mean of each region of the target image
and the mean of each region of the content image
represent the length and width of the image, and
S denotes the total number of region pixels.
Equations (10) and (11) calculate the standard deviation of each region of the target image
and that of the content image
as follows:
where
stands for a specific channel in the RGB color space. The mean and the standard deviations are calculated separately for these three channels.
The region of each content image corresponds to the region of the target image, and the output is the color-transferred content image
using Equation (12).
Each region is overlapped, and the final output image is
using Equation (13) as follows:
That is, the resulting image is processed by the adaptive multilevel color transfer with the content image and the target image.
Figure 6 shows the result of the adaptive multilevel color transfer.
3.2. Style Transfer Phase
In this section, judging the effective features in each layer of convolutional layer is introduced in the first phase. The second phase demonstrates style transfers between images.
3.2.1. Feature Visualization and Choose Layers
In recent years, deep learning has been used to extract the features of images. This paper adopted the first 16 layers of VGG19 and removed the last three fully connected layers. Since the convolutional kernel of VGG 19 is relatively small, the required number of parameters is also small. In addition, the learning feature of the multilayered Conv+ReLU is better than that of the single layer Conv+ReLU. The target image is viewed as an input image; through the VGG19 model, the feature extracted by each convolutional layer can be seen. According to
Figure 7, VGG19 has five main layers shown in red, orange, yellow, green, and blue blocks. Each color block represents a major layer. There are two or four sublayers for each major layer. Each of the first two major layers (red block and orange block) has two sublayers. Each of the last three layers (yellow block, green block and blue block) has four sublayers. For example, in the red block, this major layer contains two sublayers:
conv1_1 and
conv1_2.
In the feature map, although many green blocks represent zero, the green grid still contains some yellow parts. Here, the effective feature map of each layer is calculated to select a layer for the image. All sublayers of each major layer are compared together. The first major layer—that is, the red block—is calculated to select one of the two sublayers. The first and second major layers mainly use the structural similarity index (SSIM index), the method proposed by Wang et al. [
25], to calculate the effective feature extraction in each sublayer. The structural similarity index (SSIM index) is an indicator used to measure the similarity of two images. SSIM compares the images based on their structure, luminance, and contrast.
When an image is identified by a filter, a black block condition may occur. The black block indicates that the filter does not recognize valid features. Therefore, the third, fourth, and fifth major layers mainly use the number of black blocks in the feature map to calculate the effective feature extraction in each sublayer.
Figure 7 displays the architecture of the selection of a convolutional layer for the target image.
Next,
Figure 7 also demonstrates how the first two major layers use SSIM to see the degree of extraction of the effective feature map in the convolutional layer. The proposed method comprises an analysis of the features extracted from the target image for each convolutional layer. Using
conv1_1 as an example, after running the 64 filters, there will be 64 feature maps. First, all the feature maps of each sublayer are merged into a new feature map, and then the SSIM values between the feature map in the sublayer
and the new feature map
are calculated. SSIM is shown in Equation (14). All feature maps of this layer are fused in 1:1 size. A new feature map representing the entire layer can be generated since SSIM will calculate the luminance, contrast, and structure of the two images. If the feature is included in the new feature map, it represents the structure of the image that can be captured.
where
and
represent the means of images
(the feature map in the sublayer) and
(the new feature map), respectively.
and
represent the standard deviations of images
and
, respectively.
represents the covariance of images
and
.
and
are constants that maintain the stability of the formula. The scope of SSIM is
. The larger the value of SSIM, the more similar the images. If the value of SSIM is greater than 0, it is judged to be a valid feature. The number of valid features can be counted within the sublayer. Taking the first major layer as an example, the effective feature numbers of the two sublayers are respectively calculated and compared. A convolutional layer with a larger number of effective features is judged to be the most capable feature map in this major layer.
Figure 8 displays a process describing how to use SSIM to determine the effective features in a convolutional layer of the sunflower image.
The last three major layers use the number of black blocks to choose layers. In this part, the total number of black blocks is counted in each layer. The convolutional layer with the smallest number of total black blocks is selected. This paper compared the sublayers of the third major layer and chose a convolutional layer with a small number of black blocks. Similarly, since the fourth and fifth major layers have the same total number of filters, only one sublayer with the least black blocks is selected from the two major layers. In addition, the fourth and fifth major convolutional layers have the same number of filters, so they are judged together as well. The diversity of features increases as the convolutional layers are selected based on different major convolutional layers.
3.2.2. Image Style Transfer
This step is based on the deep learning style transfer method. The VGG19-based method of Gatys et al. was applied in this paper. The lower-level convolutional layer captures color patches, simple lines, or colors, whereas the higher-level convolutional layer captures the entire object of the image, such as the image architecture. Therefore, a convolutional layer is selected for the content image,
. The loss function
between the color-transferred content image
obtained from the color transfer step and the generated image
The loss function
between the target image
and the generated image is calculated based on the selected number of specific convolutional layers. In Equation (15), the ratio of
is set at 1/1000. Different values of
and
are employed to get the final loss function
.
Figure 9 shows the image style transfer process.
Next, the calculations of the loss function
and
are introduced in the following. 1. Between the color-transferred content image
and generated image
: the loss function
between the color-transferred content image
and the generated image
is represented in Equation (16) below:
where
respectively indicate the color-transferred content image and the generated image corresponding to each other in the convolutional layer
. The error squared loss function between the two values is calculated. The smaller the loss function
, the more similar the color-transferred content image and the target image at the same location. 2. Between the target image
and generated image
: in the same convolutional layer, any two features are made using the Gram matrix
as the inner product to find the correlations between the features. These correlations can represent information about some of the styles. That is to say, after finding multiple feature correlations between multiple convolutional layers, the style of the entire image can be obtained. Considering the target image and the generated image in the VGG19 model, the Gram matrix of the convolutional layer
is represented by Equations (17) and (18).
respectively represent any two activation values of the target image.
respectively represent any two activation values of the generated image.
represent the length, width, and height of the convolution layer
.
indicates the feature that the target images correspond to each other in the convolutional layer. The error squared loss function
of the target image and the generated image is calculated, and the weight of each layer can be set as
, both of which are displayed in Equation (19).
4. Experimental Results
In this section,
Section 4.1 introduces the experimental image setting and evaluation method. Next,
Section 4.2 presents the experimental results of the proposed method. Finally, the experimental results of the proposed method are compared with related works on color transfer and related works on style transfer in
Section 4.3, respectively.
4.1. Performance Evaluation
In the experiment, the input images contain the content image and the target image in a size of
. The color transfer result of the image is evaluated by the histogram of the image and measured by the distance between the two image histograms. The histogram distance here is measured by Euclidean geometry. The image color histogram similarity is evaluated in Equation (20). The greater the similarity, the more similar the color distribution of the representative image.
where
are histograms of the color transferred image and the target image, respectively.
represents the range of the histogram. The larger the value of the similarity, the more similar the two images. If the two images are completely similar, they will be indicated as 1. Conversely, if they are completely dissimilar, the similarity will be expressed as 0.
The style transfer result of the image is evaluated by Equations (21)–(23). The comparison uses the variance of the images. Each image generates a variance of the pixels, and this variance represents the feature value of the image. Therefore, the proximity of the variance of the two images is compared to determine the similarity between the two images.
The image size can be flexibly scaled according to the user’s needs. Since the similarity and color correlation of the image are not large, the image is converted into a gray image to reduce the computational complexity. and represent the length and width of the image, respectively. of each row of pixels and of the pixels of the whole image are calculated. Each average corresponds to a feature of a row. The standard deviation between the average of each row and the average of the whole image is calculated, and then the standard deviations are all added up. is calculated for all averages, and this variance is the characteristic value of the image. The similarity of the image is determined according to the difference between the variances of the two images.
The smaller the difference between the variances of the two images, the more similar the two images. If the value of the difference between the variances of the two images is 0, then the features of the two images are completely similar.
4.2. Experimental Results of the Proposed Method
According to the proposed method in this paper, the content image and target image were converted into the CIELAB color space. The threshold for the L channel of the two images was set. The thresholds for the target image were set to 86, 139, 207, and 228. According to the limit of threshold in this proposed method, the final thresholds of the target image were set at 86, 139, and 207. The total number of thresholds must be adjusted according to the target image. The thresholds for the content image were set to 46, 86, 167, and 206. It could be seen that the difference between the thresholds of the content image was less than . However, in order to reach the same threshold number of the target image, the proposed scheme would delete the threshold with the smallest difference. Given the threshold limit, the final thresholds of the content image were set to 46, 86, and 206. Then, the color transfer was performed in the same region of the two images. For example, the region between 0 and 86 in the target image underwent the color transfer with the region between 0 and 46 in the content image.
After the image color transfer was performed, SSIM and the total number of black blocks were used to select valid features for the convolutional layer. The first major layer and the second major layer were selected by SSIM, where
conv1_2 and
conv2_2 were selected. After the third layer, the fourth layer, and the fifth layer combined the sublayer features, the sublayer fusion maps in the major layer were not much different. As a result, there was no way to find the effective features with SSIM. Thus, we selected
conv3_4 and
conv5_2 according to the number of black blocks.
Figure 10 shows the black blocks situation and uses
conv3_1 as an example. Therefore, we could only make judgments based on the black blocks.
4.3. Image Analysis
Figure 11 shows the fusion image of all the feature maps in each convolutional layer.
Figure 11a,b shows the fusion feature maps of the sublayer in the first major convolutional layer.
Figure 11c,d shows the fusion feature maps of the sublayer in the second major convolutional layer.
Figure 11e–h is the fusion feature maps of the third major convolutional layer.
Figure 11i–l is the fusion feature maps of the fourth major convolutional layer.
Figure 11m–p is the fusion feature maps of the fifth major convolutional layer.
In
Figure 12, two images are used as an example of the image color transfer and style transfer.
Figure 12c displays the color-transferred image with the adaptive multilevel color transfer, and
Figure 12d shows the style-transferred image following
Figure 12c. Style losses are computed in conv1_2, conv2_2, conv3_4, conv5_3. Using the impressionist image as the target image, the image transfer results show that the structure, brightness, and darkness of the content image can be preserved. At the same time, the color of the target image is also retained.
In
Figure 13, an abstract target image is used to transform the style of the image. It can be seen that the resulting image not only maintains the structure of the dog in the content image, but also can be combined with the texture of the target image. In addition,
Figure 14 shows the results of the proposed method with different types of the images.
Examples of results using the proposed method and the related works are shown in
Figure 15.
Figure 15 demonstrates that our results can express the brightness of the content image. The obvious difference is seen from the lower left corner. The results of this proposed method highlight the difference between the lake image and the building in the content image, which is more visually stereoscopic. The result of Reinhard et al.’s method, though visually soft, does not represent the context of the image. The result of Jimenez-Arredondo et al.’s method faintly shows the foreground and background of the image, but the mottled condition appears in the lower left. The visual effect is not smooth.
Figure 16 is a comparison of the similarity results of the colors on two images using the pixel histogram to calculate the image similarity between the color-transferred images in
Figure 15c–e and the target image in
Figure 15b. Although the color similarity of the Jimenez-Arredondo et al.’s method is relatively high, the visual appearance is rather unharmonious.
Figure 17 shows the comparison of the image similarity between the content image in
Figure 15a and the color transfer images in
Figure 15c–e. As displayed in the following
Figure 16 and
Figure 17, the proposed method retains a more complete structure of the content image. Through the color and structure analysis, our color-transferred image provides a better result.
In this paper, SSIM and black blocks were used to describe the effective features by feature visualization. In the neural network, some of the effective features are used to extract texture and details, and some are used to extract contour or shape features. The feature map obtained from each convolutional layer is the output of the activation function ReLU. If this feature strength is weak, it will be represented as 0. The black block represents that most of the values are zero after passing the activation function ReLU, and the filter cannot recognize the feature. When there are fewer effective features, the result of image style conversion may be affected.
In
Figure 18, the proposed image style transfer method demonstrates better results in detail than the method of Gatys et al. For example, the windows on the building are more obvious, and the structure of the content image is more consistent. The color on the lake is also in line with the target image.
Figure 19 compares the resulting image with the variance of the target image. The closer the variance of the two images, the more similar they are. Therefore, it can be seen from the data that the image transfer result of this paper was closer to the target image. At the same time, the proposed scheme also scaled the image to two different lengths to compare their similarities.
Figure 20 displays the comparison of the results of our proposed method, Gatys et al.’s method, and X. Li et al.’s method. It can be seen that the result of Gatys et al.’s method is not obvious, but it can retain the style of target image. On the contrary, the result of X. Li et al.’s method has only a few styles, but the contour of the dog is obviously prominent. As to the result of our method, our image maintains a balance between the structure of the content image and the style of the target image, indicating that it can distinguish the structure of the dog as well as preserve the style of other images.
In
Figure 21, there is another example comparing the result of our method with the results of Gatys et al.’s method and X. Li et al. ’s method. The three methods are used to convert the portrait image and the hand-drawn portrait image. From the results of the three different methods, it is apparent that the result of the proposed method more accurately expressed the contour of the content image.