4.3.1. Qualitative Comparison
The comparative experimental results for the regular script, clerical script, and running script are shown, respectively, in
Figure 8,
Figure 9, and
Figure 10.
- (1)
Yan Zhenqing’s regular script
The generation results for Yan Zhenqing’s regular script are shown in
Figure 8. In the comparative experiment, the calligraphy characters generated using AGGAN suffer from problems such as too thin strokes, missing strokes, and incomplete character structures. The calligraphy characters generated using StarGAN-v2 have a low similarity in style to the authentic ones, with significant deviations in detail and style. Calligraphy characters generated using pix2pix have problems with missing and distorted strokes. Calligraphy characters generated using CycleGAN have redundant strokes. On the other hand, the calligraphy characters generated using the proposed model, with clear strokes and complete structures, are more similar to the authentic ones in style and detail, including the rigorous structure and varied details of the strokes.
Figure 8.
Comparison on regular script generation (The Chinese characters from upper to lower lines are named Shang, Bu, Wu, Yun, Yi, Ying, Biao, and Gui, respectively. The red squares and circles are used for comparison of details). (a) Printed characters. (b) AGGAN. (c) StarGAN-v2. (d) pix2pix. (e) CycleGAN. (f) Ours. (g) Authentic ones.
Figure 8.
Comparison on regular script generation (The Chinese characters from upper to lower lines are named Shang, Bu, Wu, Yun, Yi, Ying, Biao, and Gui, respectively. The red squares and circles are used for comparison of details). (a) Printed characters. (b) AGGAN. (c) StarGAN-v2. (d) pix2pix. (e) CycleGAN. (f) Ours. (g) Authentic ones.
- (2)
Deng Shiru’s clerical script
Clerical script, as a unique style of Chinese characters, is characterized by a slightly flattened writing effect and a structure in which horizontal strokes are long and vertical strokes are short. It is challenging to generate clerical script due to the differences between clerical script and modern character forms [
43].
The generation results for clerical calligraphy are shown in
Figure 9. In the comparative experiment, the calligraphy characters generated using AGGAN suffer from noticeably thin strokes, which results in a lack of the solidity characteristic of clerical script. The calligraphy characters generated using StarGAN-v2 have a low stylistic similarity to the authentic ones. The calligraphy characters generated using pix2pix have problems with missing and distorted strokes. CycleGAN generates calligraphy characters with redundant strokes. In contrast, the calligraphy characters generated using the proposed model have clear strokes and complete structures. It has a slightly flattened structure and long horizontal strokes with short vertical ones, and is more similar in style and detail to the authentic characters.
Figure 9.
Comparison on clerical script generation (The Chinese characters from upper to lower lines are named You, Xin, Jian, Mu, Jiao, Yin, Shi, and Niao, respectively. The red squares are used for comparison of details). (a) Printed characters. (b) AGGAN. (c) StarGAN-v2. (d) pix2pix. (e) CycleGAN. (f) Ours. (g) Authentic ones.
Figure 9.
Comparison on clerical script generation (The Chinese characters from upper to lower lines are named You, Xin, Jian, Mu, Jiao, Yin, Shi, and Niao, respectively. The red squares are used for comparison of details). (a) Printed characters. (b) AGGAN. (c) StarGAN-v2. (d) pix2pix. (e) CycleGAN. (f) Ours. (g) Authentic ones.
- (3)
Wang Xizhi’s running script
Wang Xizhi’s “Orchid Pavilion Preface” is celebrated as the most important running script in ancient China, and its unique personal style adds artistic charm to each calligraphy character. It is challenging to generate these calligraphy characters [
47].
The results of running script generation are shown in
Figure 10. In the comparison experiment, the calligraphy characters generated using AGGAN have problems with missing strokes. The calligraphy characters generated using StarGAN-v2 have less style similarity to the authentic ones, and have some blurred strokes. The calligraphy characters generated using pix2pix have problems with missing and distorted strokes. CycleGAN generated calligraphy characters with missing strokes. In comparison, the proposed model shows the highest generation quality when generating the running script. The calligraphy characters generated using the proposed model are more similar to the authentic calligraphy characters in style and detail, including their fluid strokes, natural rhythm, and unique personal style.
Figure 10.
Comparison on running script generation (The Chinese characters from upper to lower lines are named Zhi, Chu, Yu, Kuai, Ji, Shan, Lan, and Ting, respectively. The red squares are used for comparison of details). (a) Printed characters. (b) AGGAN. (c) StarGAN-v2. (d) pix2pix. (e) CycleGAN. (f) Ours. (g) Authentic ones.
Figure 10.
Comparison on running script generation (The Chinese characters from upper to lower lines are named Zhi, Chu, Yu, Kuai, Ji, Shan, Lan, and Ting, respectively. The red squares are used for comparison of details). (a) Printed characters. (b) AGGAN. (c) StarGAN-v2. (d) pix2pix. (e) CycleGAN. (f) Ours. (g) Authentic ones.
Table 2 shows the results of the qualitative analyses of the proposed and comparative models. The proposed model has shown significant advantages in the generation of regular script, clerical script, and running script. First, by introducing dense blocks into the generator, the model improves its ability to extract features from calligraphy strokes, effectively reducing problems with broken strokes. The dense blocks are able to capture richer detail information, ensuring that the generated calligraphy characters are structurally more complete. In addition, the proposed model introduces self-attention mechanisms into the generator, which further enhances the model’s perception of calligraphy strokes. The self-attention mechanism allows the model to focus on key strokes, thereby reducing the generation of redundant strokes. Second, the proposed model employs the CapsNet in the discriminator. This allows the model to effectively extract the positional information of calligraphy strokes, thereby reducing the problem of stroke distortion. The CapsNet has superiority in handling spatial information, which leads to a better understanding of the structure of calligraphy characters. Finally, the proposed model introduces a perceptual loss function. This strategy aims to improve the calligraphy style recognition ability of the model, making the generated calligraphy characters more similar to the authentic ones in style. Through the perceptual loss, the proposed model can better capture the unique charm and artistic characteristics of calligraphy characters.
In summary, the proposed model significantly improves the quality of generated calligraphy characters through modular design and optimization strategies. It not only effectively solves problems such as stroke discontinuity, redundant strokes, and distortion, but also more accurately captures the style of authentic calligraphy characters, thus imbuing the generated calligraphy characters with the essence of the authentic ones. The proposed model is trained using the dataset of authentic calligraphy characters, while the dataset of authentic characters has a limited amount of data, which leads to the fact that the calligraphy characters generated using the proposed model are still a little different from the corresponding authentic characters.
4.3.2. Quantitative Comparison
From a quantitative perspective, we analyze the generation results of three types of calligraphy fonts: regular script, clerical script, and running script. To objectively evaluate the quality of the generated results, we use three quantitative evaluation metrics: Structural Similarity Index (SSIM), Mean Square Error (MSE), and Peak Signal-to-Noise Ratio (PSNR).
SSIM is a widely used metric to measure image quality [
33]. A higher SSIM indicates that the generated calligraphy characters are structurally closer to the authentic ones. MSE is a metric used to measure the similarity of image pixels [
34]. By achieving a lower MSE, the quality of the generated calligraphy characters can be significantly improved, making them much closer to the authentic ones at the pixel level. PSNR is an important measure of image quality [
35]. A higher PSNR indicates that the generated calligraphy characters are visually closer to the authentic ones.
Table 3,
Table 4, and
Table 5, which correspond to
Figure 8,
Figure 9, and
Figure 10, respectively, show the quantitative metrics of various models on regular script, clerical script, and running script. Among them, pix2pix generates calligraphy characters with problems such as missing strokes and distorted strokes, which differ significantly from the authentic ones; thus, its three metrics are the worst of all models. The calligraphy characters generated using AGGAN are different from the authentic ones, with thin strokes and some missing strokes. But it is better than the results generated using pix2pix, so the metrics of AGGAN are better than the metrics of pix2pix, but worse than the metrics of other models. The calligraphy characters generated using CycleGAN have no missing strokes, but have problems such as redundant strokes. So, the metrics of CycleGAN are better than the metrics of pix2pix and the metrics of AGGAN, but worse than the metrics of the proposed model. StarGAN-v2 generates calligraphy characters without missing or redundant strokes, but the overall calligraphy style differs significantly from the authentic ones. Thus, its three metrics are better than those of pix2pix, AGGAN, and CycleGAN, but worse than those of the proposed model.
Targeting the characteristics of calligraphy characters, self-attention, dense blocks, CapsNet, and perceptual loss are used to design the proposed model. Therefore, the calligraphy characters generated using the proposed model are superior to pix2pix, AGGAN, CycleGAN, and StarGAN-v2 in terms of SSIM, MSE, and PSNR metrics.
Finally, we discuss the computational complexity of the proposed model. The size of the weight parameters of the generator of the proposed model is 160 MB and the size of the weight parameters of the discriminator is 20 MB. The single training time of the proposed model is about 12 h and the single testing time is about 10 s.