Emulating Artistic Expressions in Robot Painting: A Stroke-Based Approach

Wang, Zihe; Li, Linzhou; Zhang, Tan; Liu, Tengfei; Li, Ming; Wang, Zifan; Li, Zixiang

doi:10.3390/app14125265

Open AccessArticle

Emulating Artistic Expressions in Robot Painting: A Stroke-Based Approach

by

Zihe Wang

^1,†

,

Linzhou Li

^1,†,

Tan Zhang

^1,*

,

Tengfei Liu

¹,

Ming Li

²,

Zifan Wang

¹ and

Zixiang Li

¹

Sino-German College of Intelligent Manufacturing, Shenzhen Technology University, 3002 Lantian Road, Shenzhen 518000, China

²

Visual Computing Center, Shenzhen University, 3688 Nanhai Avenue, Shenzhen 518000, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2024, 14(12), 5265; https://doi.org/10.3390/app14125265

Submission received: 18 April 2024 / Revised: 31 May 2024 / Accepted: 11 June 2024 / Published: 18 June 2024

(This article belongs to the Section Robotics and Automation)

Download

Browse Figures

Versions Notes

Abstract

:

Representing art using a robotic system is part of artificial intelligence in our lives, especially in the realm of emotional expression. Developing a painting robot involves addressing how to enable the robot to emulate human artistic processes, which often include imprecise techniques or errors akin to those made by human artists. This paper discusses our development of an innovative painting robot utilizing the sim-to-real approach within learning technology. Specifically, this pipeline operates under a deep reinforcement learning (DRL) framework designed to learn drawing strategies from training data derived from real-world settings, aiming for the robot’s proficiency in emulating human artistic expressions. Accordingly, the framework comprises two modules when given a target drawing image: the first module trains in a simulated environment to break down the target image into individual strokes; the second module then learns how to execute these strokes in a real environment. Our experiments have shown that this system can meet our objectives effectively.

Keywords:

painting robot; stroke-based renderer; deep reinforcement learning

1. Introduction

Digital technologies, such as image processing, artificial intelligence (AI), and robotics, have ever-growing applications in our lives, including the digital medium [1] and arts such as sculpturing [2]. This paper concerns robotic painting, i.e., the making of a painting product by a robot. One idea is to construct a robot (or agent, as it is called in this paper hereafter) by mimicking human artists, especially mathematically modeling human artistic operating tools (e.g., brushes) on a canvas. The implementation of this idea involves high costs, and, furthermore, such a robot has a limited level of intelligence. Another idea is robotic teaching, i.e., human artists teaching robots to paint, including sketching [3], doodling [4], and character writing [5]. However, most of the existing painting robots developed with this idea use traditional printing and reproduction techniques [6] and can only produce those stylistic lines, not sufficiently reflecting the real situation with human artists, e.g., creativity and style. Indeed, real-world objects are imprecise [7].

There are a few works on ink paintings, including Chinese brush paintings, which are difficult to capture through precise stylistic drawings. Indeed, drawing, such as sketching, represents more about the artistic perception of the meaning of a painting object after a certain abstraction. Such a drawing usually consists of fewer strokes. To equip painting robots with such skills is very difficult, as it is even difficult for novice human painting artists to learn such skills.

The pre-condition for such a robot or agent is that there is a target image. The agent, as designed, has two steps. Step 1: decompose the given target image into strokes. Step 2: draw the strokes on the canvas in a certain sequence. A previous study [8] used learning-based techniques to decompose texture-rich images into digital strokes rather than physical strokes. It is noted that digital strokes are strokes in a simulated environment, which do not capture the imprecise nature of strokes drawn by human painting artists.

To handle such a complex task, we use a reinforcement learning (RL) method. RL aims to generate an optimal policy based on interactions between an AI system and its environment. Compared with supervised learning methods, RL does not use a large amount of training data. The main idea in our system for overcoming the aforementioned difficulty is to integrate a neural system and a reinforcement learning (RL) agent. The RL agent is responsible for the task in Step 1, and the neural system is responsible for the task in Step 2. Further, the neural system is developed in a simulated environment, free of the involvement of human artists and thus of no burden to the labeling task, which is a challenge in many other machine learning systems [9].

To make the painting agent more realistically mimic painting situations, i.e., imprecise strokes, in the neural system, ink-based strokes were used rather than relatively clean strokes in sketching. Yet another idea with our system is that the training data include failure cases. These failure cases are treated as an initial action of the agent, and the agent gradually modifies the initial action to the final action. In summary, our system or agent has three capabilities: visually analyzing a target image on a canvas, decomposing the image into strokes, and drawing the strokes on the canvas. Our work’s purpose is to learn drawing strategies from the captured imprecision nature of human data on strokes and use the sim-to-real approach to execute these strokes in a real environment.

The novel features of our system are summarized as follows. First, the drawing policy in RL adapts to the current state of the canvas. It is noted that at times the canvas may contain unanticipated strokes, such as ink drops. To accommodate such faults, we make the robot learn from failures. In this way, our system essentially reduces the ambiguity between simulated and real environments. Second, a stroke-rendering-based method is used to generate realistic strokes. The renderer is trained in a supervised way based on a synthetic dataset created by a physical robot. The renderer maps stroke parameters to stroke images. Our renderer works with any stroke-shape design, perhaps called fully parametric. This renderer is a contributing factor to the higher performance of our system.

Our study is presented in four sections: a Reinforcement Learning (RL) framework for problem-solving; the methodology of the RL module; the training process; and the results of our experiments.

2. Related Work

The drawing art includes image computing, drawing, and the interaction between them. The robot’s creativity can be involved in the interaction when the robot paints. Stroke-based rendering automatically creates artwork by placing discrete elements, such as strokes or points.

2.1. Stroke-Based Painting

While stroke-based rendering techniques in painting are valuable for professional applications, concerns regarding the final aesthetic outcomes are more significant [10]; therefore, traditional painting methods are favored for achieving desired beautification. Examples of brush techniques are stylized digital rendering and watercolor painting, which are mainly achieved with a brush. Other examples include the works of E-David [11], which creates paintings by controlling industrial robots holding paintbrushes with visual sensors. The painting system Busker Robot [12] uses a UR10 robotic arm to generate digital images by using non-photorealistic rendering (NPR) techniques to decompose them into strokes. Some studies [11,12] have mainly focused on painting techniques and improvements in visual feedback.

2.2. Learning-Based Painting

Machine learning techniques have been successfully used in the fields of robot control [13] and painting rendering [14], where robot painting becomes possible. For painting robots, image computing includes global image abstraction and local image structure. Reinforcement learning has achieved good results on many problems, including the AlphaGo game of Go [15]. Some robotic painting systems have been developed. Another study [16] used the Sigma-Lognormal model to reproduce graffiti-stylized drawing movements to enable a Baxter robot to replicate the drawings with a thick felt pen. In addition, a stylized portrait generation method was proposed [17] by using the offsets of the triangularization of the faces as a different metric, detecting essential features from this, and enabling a SCARA robot to replicate the drawing using a pencil. However, these works did not decompose the image into strokes, so they are limited to certain types of images with particular features, including handwriting and portraits.

The goal of this work is to improve the time and data efficiency of learning algorithms. Some researchers [18] introduced a reinforcement learning-based approach to training painting agents. They use proximal policy optimization to evaluate the visual quality of the generated paintings, learning their policy network without human supervision. Moreover, the AI-assisted art creation system proposed in [19] can automatically generate brushstroke effects in a specific art style. The system uses the inverse reinforcement learning algorithm to realize the modeling of artistic style behavior. It includes a digital protection method and a regularization strategy learning method based on policy gradients, thus improving the stability of the style learning process.

These previous works have demonstrated the possibility of enabling an agent to generate images. However, few methods are directly transferable to painting policies for a robot in the real world. Therefore, these art systems consider brush dynamics and visual feedback from physical systems. Style transfer [20] has shown great achievements. One approach utilized a line clustering and waypoint generation algorithm to extract the necessary information from a sketched image, effectively generating portrait sketches that provide both precision and speed [21]. However, they did not consider robotic painting, as they focused on converting images into digital art. In this paper, we aim to develop new RL-based techniques that efficiently convert images into brush strokes.

3. Problem Description

An overview of the proposed painting robot is shown in Figure 1. Our system includes an online generation stage and an offline implementation stage.

In the online generation stage, the agent can convert the target image into a sequence of strokes and then paint the strokes on a canvas to generate a painting. The agent predicts the next stroke based on the current canvas. Figure 2 illustrates that the vision module creates a representation of the current canvas state using the renderer. Then, the DRL exploration module uses the current state and the target image to determine the next action. The next action is represented by a set of parameters that are included in the current state of the canvas in the exploration module. Training is performed entirely in the randomized simulated environment, such that it can be effectively generalized to various images.

In the offline implementation stage, the goal is to train the robot to generate strokes in a robot’s drawing style. During the painting process, the agent will compare the feedback image with the original image and readjust the painting strategy.

3.1. The Framework for the Painting Robot

The overall framework of our proposed algorithm is shown in Figure 2. This framework consists of a renderer, an actor, and a critic. The actor’s role is to determine the next action by considering both the target image and the canvas’s current state. Meanwhile, the critic evaluates these actions by scoring the decision plan in relation to the canvas’s present condition. The renderer is responsible for simulating the robot’s painting on the canvas and executing the decision-making scheme suggested by the actor.

In addition, the robot is responsible for painting on the canvas and transmits the painting information to the neural network. The render generates 2D images from the parameters of the action. The robot responds to the painting and keeps undergoing feedback and iterative cycles until the painting result is similar to the target image.

3.2. Reinforcement Learning

The exploration problem is modeled with a Markov decision process (MDP). An MDP can be represented by S, A, P, R, and γ, where A and S denote the actions and states, respectively. P is the transition model, which represents the probability of state transitions. Further, R represents the reward, which is calculated when action a is taken in state S. γ is the discount factor.

The policy gradient method will be used to obtain the optimal policy. In this paper, we used a soft actor–critic (SAC) framework [22]. The SAC framework involves two models: the actor controls the agent’s behavior, while the critic evaluates the actor’s action. The actor policy, π(s, a), maximizes the rewards. The critic function, Q(s, a), minimizes the approximation error. The two models were trained in parallel. Here, s represents the input vector that contains the information, and a refers to the next action.

The components of the reinforcement learning setup are action space, state space, and reward design and are described as follows:

Action Space: An action, a_t, is a set of parameters that determine the shape and position of the stroke at step t. An agent’s behavior that maps states to actions is a policy function, π. The state, s_t, is generated, and then the agent generates the action of the next stroke, a_t, with the stroke parameters. The agent uses the state based on the transition function s_t₊₁ = trans (s_t, a_t).
State Space: It consists of all possible information observed in the environment. The state of the painting system includes three parts, namely the target image, the canvas, and the number of steps.
Reward Design: We define the reward function that allows the agent to evaluate the actions determined by the policy. The reward function is defined as the difference between the current canvas (the loss between the current canvas and the target image) and the next state (the loss between the next state and the target image).

r(s_t, a_t) = L_t − L_t₊₁

(1)

where r(s_t, a_t) is the reward at step t, and

L_{t}

represents the loss between the target image and the current canvas, C_t. Moreover, L_t₊₁ is the loss between the target image and the canvas C_t₊₁. The Wasserstein generative adversarial network (WGAN) was employed to measure the difference between the generated data and the target image [23]. A GAN was used as a specific loss function in transfer learning and text modeling due to its strong ability to measure the distance between the target data and the generated data. The WGAN was developed based on the original GAN, which introduced the Wasserstein distance as a loss function between the generator and the discriminator. Unlike the loss function designed by the original GAN, the WGAN does not have the ability to reflect the degree of the feedback penalty term in the similarity between two distributions. The improvement in the WGAN greatly reduces the difficulty of training the GAN. The WGAN minimizes the Wasserstein-l distance to stabilize training. The objective of the discriminator in the WGAN is defined as follows:

L_{d i s c r i m i n a t i o r} (w) = {}_{w I W}^{m a x}{E_{y ~ Y} [} f_{w} (y)] - E_{x ~ X} [f_{w} (x)]

(2)

where E is the expectation for optimal coupling, and f_w denotes the discriminator. In addition, Y and X are the painting and target image distributions. Previous research employed GAN-based style transfer to reduce the number of strokes required for drawing sketches, resulting in shorter drawing times [24].

This study aimed to reduce the difference between the current canvas and the target image. To achieve this, the difference in discriminator scores was used as a reward to guide the training of the actor.

4. Method

4.1. Vision Module

Our model was trained using a renderer. The RL framework issued the next action based on the current canvas and then sent the current state to the agent. The renderer is advantageous as it is highly scalable and can be executed more conveniently than in the real environment. First, the image was processed by the renderer. The stroke renderer generated a stroke image for each stroke and plotted them onto the canvas, producing the resulting image. The training samples were randomly generated using graphics renderer programs. The neural renderer was trained using a convolutional neural network. The input was the parameters of a stroke, and the output was the rendered stroke images. Samples were randomly generated based on a graphics renderer program.

The neural renderer network has several fully connected and convolution layers. Strokes are represented by curves or geometries using several parameters. In general, the parameters include position and shape. The shape of each curve stroke was determined by the thickness of the stroke and the coordinates of control points. Considering that the Bezier curve is a tool used to draw odd shapes with precise accuracy, a curve stroke was designed based on the quadratic Bezier curve. The curve is represented using eight parameters (Figure 3) as follows:

C = (x₀, y₀, x₁, y₁, x₂, y₂, t₀, t₁)

(3)

where (x₀, y₀), (x₁, y₁), and (x₂, y₂) are the points of a Bezier curve. Additionally, t₀ and t₁ are the thicknesses of each endpoint of the curve, respectively.

4.2. Exploration Module

The SAC framework was adopted for the exploration module. The SAC framework [22] is an off-policy reinforcement learning algorithm, and it optimizes a stochastic policy for continuous actions. The SAC model has three neural networks, a state value network, a critic network, and an actor. The critic network obtains the Q value of actions from the actor network. The actor network generates the policy, which is also called a policy network. The state value network obtains the value of the next state.

Previous work has proposed methods for training the robotic sketching agent from scratch without hand-crafted features, drawing sequences or trajectories, and using inverse kinematics [25]. We designed the network below for our approach. As shown in Figure 4, the input information to the state value network is the current state and the target image. The model has several convolutional layers, each of which is followed by a ReLU activation function. The network outputs the next drawing action.

The key feature of the SAC framework is entropy regularization. Entropy is a quantity used to describe the randomness of a variable. The policy was trained to maximize a trade-off between the expected return and entropy. In entropy-regularized RL, the robot obtains a reward proportional to the entropy of the policy at each time step. We used a soft Q-function, Q_θ, a state value function, V_Ψ(st), and the sampled actions from the actor network. The output of the critic network is simply the estimated Q-value of a_t. The critic network was optimized to better evaluate the Q-value and to minimize the error between the current Q-value Q(st, at) and the target Q-value: r_t + γV(s_t₊₁). It is noted that the target Q-value uses the state value, which was obtained from the state value network. Thus, the critic network, J_Q(θ), was updated by minimizing the soft Bellman residual for s_t, a_t, and s_t₊₁. The state value network, J(π), was trained using a sample-based approximation of the connection between Q_π and V_π. The value function, J_V (Ψ), was trained by minimizing the squared residual error.

The input to the actor network, π_φ(st), was the current observation. First, the actor network, J_π(φ), generated random means and covariances for each action and then selected sampled actions from a Gaussian distribution. Thus, the output of the actor network is both a policy and sampled actions. The parameters, φ, for the actor network can be trained by minimizing the expected accumulated rewards and expected entropy. The output of the critic drives learning in the three networks. The critic network approximates the state value and sends the Q-value to the actor network and the state value network to carry out a soft update of the state value network parameters.

4.3. Training

The reward was calculated when an agent completed an episode. The agent obtained a higher reward that corresponded to an image. Thus, the weights of the network were tuned to generate actions that maximized the reward using the minimum number of steps.

The SAC model is an off-policy algorithm. During training, the objective function, J_Q(θ), was optimized to obtain the parameters, qi, of two Q-functions. Then, the Q-function was used to obtain the policy gradient and value gradient, respectively. In the algorithm, all tuples of action, state transition, and reward experience were saved in the replay buffer. The replay buffer provided training data using a minibatch.

The buffer stored the experienced pair at each time step (s, a, r, s′). The buffer then formed a bundle of episodic trajectories. The policy was conditioned on the entire history. It is feasible to use off-policy data from the replay buffer. This is because the networks can read the stored training data and generate the updated gradient of parameters for the policy iteration entirely based on off-policy data.

5. Experiments

Three sets of experiments were conducted to test the feasibility of our proposed methods for the painting robot. They included (1) a test of the training and performance of our method for diversified drawings, (2) comparing our method with other methods, and (3) physical experiments with the implementation of our approach used in a physical robotic system completing a drawing.

5.1. Experiment Setup

Data collection: A desktop robot, Dobot, which was procured from Shenzhen Yuejiang Technology Co., Ltd. in Shenzhen, China, was used to collect its brush strokes (Figure 5). Traditional Asian calligraphy paper was placed on the table, and the robot drew strokes given a trajectory with eight parameters. The robot drew about 500 different strokes on the paper. We then scanned each piece of paper and saved each stroke as an image. Using the same parameters, we input the renderer and the condition stroke. The condition stroke and the ground truth stroke were fed to the GAN to generate a brush stroke renderer. We used the Dobot Magician Lite robotic arm for renderer generation and painting tasks. It should be noted that this robot arm has five links and four joints. The models were trained on an NVIDIA GeForce GTX Titan X CPU.

Training efficiency: The SAC method was compared with DQN and DDPG to test the training efficiency and the benefits of the reward function. The second drawing in Figure 6 was employed as an example. Figure 7 displays the training process. In terms of training performance, it shows that the SAC method outperforms the other baselines. The SAC network obtains policies that have a higher reward of approximately 1, while DDPG has a reward of only 0.2. This is because the overestimation of the Q value is not solvable with more exploration.

5.2. Testing in the Simulated World

Overall, 10 strokes were set to draw the image in the actor model. Figure 7 depicts our simulation of the robot painting on the current state of the canvas in the computer world. Between Step 1 and Step 10, the model acquires parameters that guide the actor’s actions based on input from both the canvas and image. It is also evident that by Step 10, the drawing closely resembles the target image. This similarity demonstrates that the robot possesses a degree of proficiency in learning to create artwork resembling ink painting style.

5.3. Testing in the Real World

The experiments were performed using the Dobot Magician Lite robotic arm. To validate the performance of our model in different settings in the real world, our trained model was used to test four different images: an apple, a portrait, an orchid, and a Chinese character. The video frames in Figure 8 show the Dobot robot painting pictures using brushes in the real world. The left image in Figure 8 illustrates the target image. The link to the video is https://youtu.be/-nl3rAQmM5k (accessed on 15 March 2024). The experiment shows the effectiveness of the painting robot using an end-to-end method and represents a generalization from the simulated world to the real world. The improvement in the quality of the drawings is shown by the video contents; the similarity and time to completion are the main indicators, showing that it is qualitatively more effective than other methods.

6. Conclusions and Future Work

In this study, a painting agent was introduced, which converts a target image into strokes and paints on a canvas in a scheduled sequence to form a painting. The agent’s learning process was based on the DRL architecture, which enabled the agent to generate a sequence of strokes, painting to maximize the cumulative rewards. The neural renderers based on the robot’s strokes also contributed to the DRL algorithm’s better performance. The learned agent could predict several strokes to generate a painting. The strokes were collected by a physical robot; thus, transferring the drawing policy from the simulated world to the real world becomes straightforward. The method proposed in this work is superior to the previous contrast method and achieved good performance in sim-to-real painting stroke painting. The experimental results demonstrate the effectiveness of the proposed system, which converts images into brush strokes.

Author Contributions

Z.W. (Zihe Wang) and L.L. contributed equally. Conceptualization, T.Z.; methodology, Z.W. (Zihe Wang) and L.L.; software, M.L.; validation, Z.L.; investigation, L.L., Z.W. (Zihe Wang), T.L. and Z.W. (Zifan Wang); data curation, L.L., Z.W. (Zihe Wang), Z.W. (Zifan Wang) and T.L.; writing—original draft preparation, Z.W. (Zihe Wang); writing—review and editing, L.L., Z.W. (Zihe Wang), Z.W. (Zifan Wang) and T.Z.; supervision, T.Z.; project administration, T.Z.; funding acquisition, T.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shenzhen Foundation for International Exchange and Cooperation (grant number: GJHZ20220913143204009) and the Industry–University–Research Fund (grant number: 2021JQR003; 2022010802011).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wands, B. Art of the digital age. In Art of The Digital Age; Thames and Hudson: London, UK, 2007; p. 52. [Google Scholar]
Zhang, A.T.; Backstrom, K.; Prince, R.; Liu, C.; Qian, Z.; Zhang, D.; Zhang, W.J. Robotic dynamic sculpture. IEEE Robot. Autom. Mag. 2014, 21, 96–104. [Google Scholar] [CrossRef]
Song, J.; Pang, K.; Song, Y.-Z.; Xiang, T.; Hospedales, T.M. Learning to Sketch with Shortcut Cycle Consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 801–810. [Google Scholar]
Zhou, T.; Fang, C.; Wang, Z.; Yang, Z.; Kim, B.; Chen, Z.; Brandt, J.; Terzopoulos, D. Learning to doodle with deep q networks and demonstrated strokes. arXiv 2018, arXiv:1810.05977. [Google Scholar]
Zheng, N.; Jiang, Y.; Huang, D. StrokeNet: A Neural Painting Environment. In Proceedings of the International Conference on Learning Representations(ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Scalera, L.; Seriani, S.; Gasparetto, A.; Gallina, P. Busker robot: A robotic painting system for rendering images into watercolour artworks. In Proceedings of the Mechanism Design for Robotics: 4th IFToMM Symposium on Mechanism Design for Robotics, Udine, Italy, 11–13 August 2018; pp. 1–8. [Google Scholar]
Cai, M.; Lin, Y.; Han, B.; Liu, C.; Zhang, W. On a Simple and Efficient Approach to Probability Distribution Function Aggregation. IEEE Trans. Syst. Man Cybern. Syst. 2017, 47, 2444–2453. [Google Scholar] [CrossRef]
Huang, Z.; Heng, W.; Zhou, S. Learning to paint with model-based deep reinforcement learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8709–8718. [Google Scholar]
Zhang, W.J.; Yang, G.; Lin, Y.; Gupta, M.M.; Ji, C. On Definition of Deep Learning, WAC; Skamania Lodge: Stevenson, WA, USA, 2018. [Google Scholar]
Kalogerakis, E.; Nowrouzezahrai, D.; Breslav, S.; Hertzmann, A. Learning hatching for pen-and-ink illustration of surfaces. ACM Trans. Graph. (TOG) 2012, 31, 1–17. [Google Scholar] [CrossRef]
Gülzow, J.M.; Paetzold, P.; Deussen, O. Recent developments regarding painting robots for research in automatic painting, artificial creativity, and machine learning. Appl. Sci. 2020, 10, 3396. [Google Scholar] [CrossRef]
Scalera, L.; Seriani, S.; Gasparetto, A.; Gallina, P. Non-photorealistic rendering techniques for artistic robotic painting. Robotics 2019, 8, 10. [Google Scholar] [CrossRef]
Levine, S.; Pastor, P.; Krizhevsky, A.; Ibarz, J.; Quillen, D. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int. J. Robot. Res. 2018, 37, 421–436. [Google Scholar] [CrossRef]
Xie, N.; Hachiya, H.; Sugiyama, M. Artist agent: A reinforcement learning approach to automatic stroke generation in oriental ink painting. IEICE Trans. Inf. Syst. 2013, 96, 1134–1144. [Google Scholar] [CrossRef]
Chen, J.X. The evolution of computing: AlphaGo. Comput. Sci. Eng. 2016, 18, 4–7. [Google Scholar] [CrossRef]
Berio, D.; Calinon, S.; Leymarie, F.F. Learning dynamic graffiti strokes with a compliant robot. In Proceedings of the 2016 I EEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 3981–3986. [Google Scholar]
Dong, X.; Li, W.; Xin, N.; Zhang, L.; Lu, Y. Stylized Portrait Generation and Intelligent Drawing of Portrait Rendering Robot. DEStech Trans. Eng. Technol. Res. 2018. [Google Scholar] [CrossRef] [PubMed]
Jia, B.; Fang, C.; Brandt, J.; Kim, B.; Manocha, D. Paintbot: A reinforcement learning approach for natural media painting. arXiv 2019, arXiv:1904.02201. [Google Scholar]
Xie, N.; Zhao, T.; Tian, F.; Zhang, X.H.; Sugiyama, M. Stroke-based stylization learning and rendering with inverse reinforcement learning. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
Liu, S.; Lin, T.; He, D.; Li, F.; Deng, R.; Li, X.; Ding, E.; Wang, H. Paint transformer: Feed forward neural painting with stroke prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 6598–6607. [Google Scholar]
Nasrat, S.; Kang, T.; Park, J.; Kim, J.; Yi, S.-J. Artistic Robotic Arm: Drawing Portraits on Physical Canvas under 80 Seconds. Sensors 2023, 23, 5589. [Google Scholar] [CrossRef] [PubMed]
Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN. arXiv 2017, arXiv:1701.07875. [Google Scholar]
Gao, F.; Zhu, J.; Yu, Z.; Li, P.; Wang, T. Making robots draw a vivid portrait in two minutes. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 9585–9591. [Google Scholar]
Lee, G.; Kim, M.; Lee, M.; Zhang, B.-T. From Scratch to Sketch: Deep Decoupled Hierarchical Reinforcement Learning for Robotic Sketching Agent. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 5553–5559. [Google Scholar]

Figure 1. An overview of our painting robot system (Two stages and two modules).

Figure 2. The robotic agent network with style learning ability (several iterative cycles).

Figure 3. Parameters of curve stroke (left: couple of point coordinates on Bezier curve).

Figure 4. Network architecture.

Figure 5. Renderer collection: (a) the experimental setup and (b) a sample of data collected by the robot in step (a).

Figure 6. The painting results using the Dobot stroke data.

Figure 7. The training performance (reward convergence for each algorithm).

Figure 8. Video frames showing the painting processes of four images using a Dobot robot (an apple, a portrait, an orchid, and a Chinese character).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Li, L.; Zhang, T.; Liu, T.; Li, M.; Wang, Z.; Li, Z. Emulating Artistic Expressions in Robot Painting: A Stroke-Based Approach. Appl. Sci. 2024, 14, 5265. https://doi.org/10.3390/app14125265

AMA Style

Wang Z, Li L, Zhang T, Liu T, Li M, Wang Z, Li Z. Emulating Artistic Expressions in Robot Painting: A Stroke-Based Approach. Applied Sciences. 2024; 14(12):5265. https://doi.org/10.3390/app14125265

Chicago/Turabian Style

Wang, Zihe, Linzhou Li, Tan Zhang, Tengfei Liu, Ming Li, Zifan Wang, and Zixiang Li. 2024. "Emulating Artistic Expressions in Robot Painting: A Stroke-Based Approach" Applied Sciences 14, no. 12: 5265. https://doi.org/10.3390/app14125265

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Emulating Artistic Expressions in Robot Painting: A Stroke-Based Approach

Abstract

1. Introduction

2. Related Work

2.1. Stroke-Based Painting

2.2. Learning-Based Painting

3. Problem Description

3.1. The Framework for the Painting Robot

3.2. Reinforcement Learning

4. Method

4.1. Vision Module

4.2. Exploration Module

4.3. Training

5. Experiments

5.1. Experiment Setup

5.2. Testing in the Simulated World

5.3. Testing in the Real World

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI