*Article* **The Design of a 2D Graphics Accelerator for Embedded Systems**

**Hyun Woo Oh, Ji Kwang Kim, Gwan Beom Hwang and Seung Eun Lee \***

Department of Electronic Engineering, Seoul National University of Science and Technology, Seoul 01811, Korea; ohhyunwoo@seoultech.ac.kr (H.W.O.); jikwang.kim@seoultech.ac.kr (J.K.K.); hwanggwanbeom@seoultech.ac.kr (G.B.H.)

**\*** Correspondence: seung.lee@seoultech.ac.kr; Tel.: +82-2-970-9021

**Abstract:** Recently, advances in technology have enabled embedded systems to be adopted for a variety of applications. Some of these applications require real-time 2D graphics processing running on limited design specifications such as low power consumption and a small area. In order to satisfy such conditions, including a specific 2D graphics accelerator in the embedded system is an effective method. This method reduces the workload of the processor in the embedded system by exploiting the accelerator. The accelerator assists the system to perform 2D graphics processing in real-time. Therefore, a variety of applications that require 2D graphics processing can be implemented with an embedded processor. In this paper, we present a 2D graphics accelerator for tiny embedded systems. The accelerator includes an optimized line-drawing operation based on Bresenham's algorithm. The optimized operation enables the accelerator to deal with various kinds of 2D graphics processing and to perform the line-drawing instead of the system processor. Moreover, the accelerator also distributes the workload of the processor core by removing the need for the core to access the frame buffer memory. We measure the performance of the accelerator by implementing the processor, including the accelerator, on a field-programmable gate array (FPGA), and ascertaining the possibility of realization by synthesizing using the 180 nm CMOS process.

**Keywords:** 2D graphics accelerator; embedded system; line-drawing; Bresenham's algorithm; alphablending; anti-aliasing

## **1. Introduction**

Recently, as advances in computer technology and semiconductor process technology lead a processor to high performance and high integration density, the overall performance of an embedded system, such as computing performance and energy efficiency, has been increased [1,2]. Due to the progress of embedded systems, the demand for adopting embedded systems for a variety of applications is also increasing [3–9]. Some of these applications, such as user-centric applications, require communication with users through 2D graphics [10]. Therefore, an embedded system used in these applications requires the functions to process graphics data and write data on the display device. In order to perform these functions, an embedded system, which includes a general-purpose processor (GPP), generally utilizes the GPP or additional graphics processing units (GPUs) with a graphics library [3]. However, performing a graphics process in real-time using these methods requires a high-performance GPP or GPU due to the execution of a large number of instruction codes in a limited time. For this reason, these methods are not appropriate for applications that have limited design specifications such as low power consumption or a small area [10–12].

In order to solve these issues, 2D graphics accelerators, which perform 2D graphics processing implemented in hardware, were proposed for embedded systems [13,14]. These accelerators are connected to the processor in the embedded system through various kinds of interfaces such as PCI Express and memory bus. Unlike the core of a GPP,

**Citation:** Oh, H.W.; Kim, J.K.; Hwang, G.B.; Lee, S.E. The Design of a 2D Graphics Accelerator for Embedded Systems. *Electronics* **2021**, *10*, 469. https://doi.org/10.3390/ electronics10040469

Academic Editor: Jorge Portilla

Received: 21 December 2020 Accepted: 10 February 2021 Published: 15 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

which requires a long execution time because it performs only simple operations with one instruction, a hardware accelerator can perform complex operations relatively fast [15–19]. Moreover, the accelerators have a relatively small area because of the limited and optimized execution logic [20–24]. Therefore, including and exploiting the 2D graphics accelerator allows for a variety of applications that require 2D graphics operations to be implemented with low power and small size. As applying architecture to the system that contains a specific accelerator is an efficient way to satisfy the design specifications of the embedded system, research to design the accelerator for image processing has been performed [25].

Line-drawing is one of the methods to visualize the graphics. As every image is represented as a collection of lines, line-drawing is a basic means of drawing an image [26,27]. Accordingly, the line-drawing operation can deal with various kinds of graphics processing [28,29]. Although this approach is not the most efficient way for all situations, this approach is significantly efficient when the data to be displayed are in the form of points and lines. In this point of view, some research was performed to utilize line-drawing for image processing [27]. Nevertheless, there is not a lot of research using line-drawing as a core algorithm for a graphics accelerator. Our research motivation starts with the idea to apply line-drawing for a graphics accelerator.

In this paper, we present a 2D graphics accelerator for embedded systems. The accelerator performs a 2D graphics process with a line-drawing operation based on Bresenham's algorithm. Furthermore, the accelerator provides anti-aliasing and alpha-blending features. The accelerator is directly connected to the memory bus to communicate with the core of the processor in the embedded system. Based on this structure, the accelerator can be controlled through reading or writing to certain memory addresses. Moreover, the accelerator is directly connected to the frame buffer, which has the memory to send 2D graphic data to a display device. This architectural characteristic reduces workloads by offloading the burden of the processor to have access to the frame buffer. We analyzed the performance of the accelerator by simulating and implementing the processor including the 2D graphics accelerator on a field-programmable gate array (FPGA). In addition, we ascertained the feasibility of the accelerator by synthesizing the accelerator with the Synopsys design compiler using the 180 nm CMOS process.

The paper consists of the following: Section 2 describes the preliminaries, which are essential to implement the features of the accelerator. The preliminaries are composed of Bresenham's algorithm, alpha-blending, and anti-aliasing. Section 3 explicates the architecture of the 2D graphics accelerator and explains the reasons for adopting the architecture. Section 4 describes the hardware implementation results, the analysis results of the accelerator through a sample application running on implemented hardware, and the synthesis results through the Synopsys design compiler. Section 5 summarizes our entire work and presents future work.

#### **2. Preliminaries**

A line-drawing algorithm is an essential element to implement the presented 2D graphics accelerator. As the algorithms vary according to the design architecture and resource usage of the hardware, choosing an appropriate algorithm is important. We chose Bresenham's algorithm and optimized it for the hardware accelerator [30]. Moreover, in order to provide advanced visualization, supporting additional features such as alphablending and anti-aliasing are needed.

#### *2.1. Bresenham's Line Algorithm*

Bresenham's line algorithm is one of the line-drawing algorithms and is typically used in raster graphics systems [31,32]. The algorithm calculates the position of the pixels to draw the lines. As this process performs only with integer arithmetic calculation, the process has low complexity and a fast calculation speed [33]. In raster graphics, lines are drawn as a way of painting pixels between the start point and end point. Figure 1 represents the various types of lines by Bresenham's algorithm. The two lines in Figure 1a are the type

1

that the x coordinates of the painting pixels always increment by one while drawing lines, and the two lines in Figure 1b are the type that the y coordinates of the drawing pixels always increment. The type of the line depends on the slope of the line. The slope, marked as letter *m*, represents the y-coordinate change, marked as *dy*, compared to the x-coordinate change, marked as *dx*, of the line, expressed by dividing *dy* by *dx*. The expression of the line is as shown in expression (1) because of the slope attribute as *m* and the line including the start point (*x*1, *y*1). marked as letter *m*, represents the y-coordinate change, marked as *dy*, compared to the xcoordinate change, marked as *dx*, of the line, expressed by dividing *dy* by *dx*. The expression of the line is as shown in expression (1) because of the slope attribute as *m* and the line including the start point (ଵ, ଵ).

process has low complexity and a fast calculation speed [33]. In raster graphics, lines are drawn as a way of painting pixels between the start point and end point. Figure 1 represents the various types of lines by Bresenham's algorithm. The two lines in Figure 1a are the type that the x coordinates of the painting pixels always increment by one while drawing lines, and the two lines in Figure 1b are the type that the y coordinates of the drawing pixels always increment. The type of the line depends on the slope of the line. The slope,

$$m = \frac{dy}{d\mathfrak{x}'} \; y = m(\mathfrak{x} - \mathfrak{x}\_1) + y\_1 \tag{1}$$

*Electronics* **2021**, *10*, 469 3 of 13

**Figure 1.** Various lines by Bresenham's algorithm. (**a**) Lines when dx > dy; (**b**) Lines when dx < dy. = , =(−ଵ) + ଵ (1)

Figure 2 presents the fundamentals of the algorithm for drawing each type of line. The algorithm proceeds by selecting the next point to paint based on the current point, marked as (*x<sup>i</sup>* , *y<sup>i</sup>* ). Figure 2a shows the case of x coordinates of the points always increment while drawing lines. In this case, choosing the y coordinate of the next point between being changed and not being changed is needed. This job is executed by the following operations. Calculate where the real value *y* at point (*x<sup>i</sup>* + 1, *y*) is close to *y<sup>i</sup>* or *y<sup>i</sup>* + 1, change the y coordinate when *y* is close to *y<sup>i</sup>* + 1. The algorithm repeats these operations until the current point reaches the end point. In the case of y coordinates of the points always increment, the algorithm proceeds by similar operations as shown in Figure 2b. Figure 2 presents the fundamentals of the algorithm for drawing each type of line. The algorithm proceeds by selecting the next point to paint based on the current point, marked as (, ). Figure 2a shows the case of x coordinates of the points always increment while drawing lines. In this case, choosing the y coordinate of the next point between being changed and not being changed is needed. This job is executed by the following operations. Calculate where the real value at point ( + 1, ) is close to or + 1, change the y coordinate when is close to + 1. The algorithm repeats these operations until the current point reaches the end point. In the case of y coordinates of the points always increment, the algorithm proceeds by similar operations as shown in Figure 2b.

**Figure 2.** Bresenham's line algorithm according to the line type. (**a**) Lines when dx > dy; (**b**) Lines when dx < dy.

Although the algorithm can be implemented in hardware as it is, optimizing the algorithm for hardware reduces the resource usage. Accordingly, the algorithm should be optimized for hardware implementation by the transformation of the pseudo-code. The following pseudo-code can be obtained through the appropriate transformation of

this process as shown in Algorithm 1. In order to optimize the algorithm, binary division, which has a high cost in hardware implementation, is fully excluded by the transformation. This optimization allows the implemented hardware of the algorithm to achieve the design specifications for embedded systems such as low power consumption and less area. process as shown in Algorithm 1. In order to optimize the algorithm, binary division, which has a high cost in hardware implementation, is fully excluded by the transformation. This optimization allows the implemented hardware of the algorithm to achieve the design specifications for embedded systems such as low power consumption and less area. process as shown in Algorithm 1. In order to optimize the algorithm, binary division, which has a high cost in hardware implementation, is fully excluded by the transformation. This optimization allows the implemented hardware of the algorithm to achieve the design specifications for embedded systems such as low power consumption and less area.

**Figure 2.** Bresenham's line algorithm according to the line type. (**a**) Lines when dx > dy; (**b**) Lines

Although the algorithm can be implemented in hardware as it is, optimizing the algorithm for hardware reduces the resource usage. Accordingly, the algorithm should be optimized for hardware implementation by the transformation of the pseudo-code. The following pseudo-code can be obtained through the appropriate transformation of this

**Figure 2.** Bresenham's line algorithm according to the line type. (**a**) Lines when dx > dy; (**b**) Lines

Although the algorithm can be implemented in hardware as it is, optimizing the algorithm for hardware reduces the resource usage. Accordingly, the algorithm should be optimized for hardware implementation by the transformation of the pseudo-code. The following pseudo-code can be obtained through the appropriate transformation of this


*Electronics* **2021**, *10*, 469 4 of 13

*Electronics* **2021**, *10*, 469 4 of 13

when dx < dy.

when dx < dy.

*2.2. Bresenham's Circle Algorithm 2.2. Bresenham's Circle Algorithm* 13 **end**

When the width of the line to draw is greater than one pixel's width, drawing the edge of the line to a certain shape increases the quality of the visualization. The circle shape is one of the proper choices. In order to draw circle shapes, we adopt Bresenham's circle algorithm. The algorithm proceedings are similar to Bresenham's line algorithm. Figure 3 shows the rough fundamentals of Bresenham's circle algorithm. Based on the current point (, ), the algorithm selects the next painting point between ଵ(, +1, ) and ଶ( + 1, − 1). In order to select the point, calculate the result of the expression (2) by input ( + 1, − 0.5). The next point is ଶ when the result is lower than 0. Otherwise, the next point is ଵ. When the width of the line to draw is greater than one pixel's width, drawing the edge of the line to a certain shape increases the quality of the visualization. The circle shape is one of the proper choices. In order to draw circle shapes, we adopt Bresenham's circle algorithm. The algorithm proceedings are similar to Bresenham's line algorithm. Figure 3 shows the rough fundamentals of Bresenham's circle algorithm. Based on the current point (*x<sup>i</sup>* , *yi*), the algorithm selects the next painting point between *p*1(*x<sup>i</sup>* , +1, *yi*) and *p*2(*x<sup>i</sup>* + 1, *y<sup>i</sup>* − 1). In order to select the point, calculate the result of the expression (2) by input (*x<sup>i</sup>* + 1, *y<sup>i</sup>* − 0.5). The next point is *p*<sup>2</sup> when the result is lower than 0. Otherwise, the next point is *p*1. *2.2. Bresenham's Circle Algorithm*  When the width of the line to draw is greater than one pixel's width, drawing the edge of the line to a certain shape increases the quality of the visualization. The circle shape is one of the proper choices. In order to draw circle shapes, we adopt Bresenham's circle algorithm. The algorithm proceedings are similar to Bresenham's line algorithm. Figure 3 shows the rough fundamentals of Bresenham's circle algorithm. Based on the current point (, ), the algorithm selects the next painting point between ଵ(, +1, ) and ଶ( + 1, − 1). In order to select the point, calculate the result of the expression (2)

$$f = x^2 + y^2 - r^2 \tag{2}$$

the next point is ଵ.

**Figure 3. Figure 3.**  Bresenham's circle algorithm. Bresenham's circle algorithm

#### *2.3. Alpha-Blending*

In order to provide drawing graphics with transparency and blending with the original image, alpha-blending is needed. Figure 4 shows the description of alpha-blending. Each pixel's data in the image to draw has an alpha value *α* to express the transparency. Alphablending blends the graphics to draw and the original image by reading the color value of each pixel of the original image and graphics to draw, calculating the new pixel value of

the image frame by expression (3). As the color of the digital image is composed of three color elements—red, green, and blue—the calculation of the new color of pixel *p* requires calculating each three-color axis. Each pixel's data in the image to draw has an alpha value to express the transparency. Alpha-blending blends the graphics to draw and the original image by reading the color value of each pixel of the original image and graphics to draw, calculating the new pixel value of the image frame by expression (3). As the color of the digital image is composed value of the image frame by expression (3). As the color of the digital image is composed of three color elements—red, green, and blue—the calculation of the new color of pixel requires calculating each three-color axis.

In order to provide drawing graphics with transparency and blending with the original image, alpha-blending is needed. Figure 4 shows the description of alpha-blending. Each pixel's data in the image to draw has an alpha value to express the transparency. Alpha-blending blends the graphics to draw and the original image by reading the color value of each pixel of the original image and graphics to draw, calculating the new pixel

In order to provide drawing graphics with transparency and blending with the original image, alpha-blending is needed. Figure 4 shows the description of alpha-blending.

*Electronics* **2021**, *10*, 469 5 of 13

*Electronics* **2021**, *10*, 469 5 of 13

**Figure 4.** Description of alpha-blending.

$$p\_{new} = ap\_{draw} + (1 - a)p\_{original} \tag{3}$$

௪ = ௗ௪ + (1−) (3)

= <sup>ଶ</sup> + <sup>ଶ</sup> − <sup>ଶ</sup> (2)

= <sup>ଶ</sup> + <sup>ଶ</sup> − <sup>ଶ</sup> (2)

#### *2.4. Anti-Aliasing 2.4. Anti-Aliasing*

*2.3. Alpha-Blending* 

*2.3. Alpha-Blending* 

௪ = ௗ௪ + (1−) (3) *2.4. Anti-Aliasing*  When expressing a graphical object that has a higher pixel density than the target graphics system, aliasing can be generated because the raster graphics system has limited pixel density. As the line to draw is an ideal graphical object that has unlimited pixel density, the generation rate of aliasing is very high. Anti-aliasing is a technique to deal with this problem. Figure 5 shows the description of anti-aliasing. Anti-aliasing improves vis-When expressing a graphical object that has a higher pixel density than the target graphics system, aliasing can be generated because the raster graphics system has limited pixel density. As the line to draw is an ideal graphical object that has unlimited pixel density, the generation rate of aliasing is very high. Anti-aliasing is a technique to deal with this problem. Figure 5 shows the description of anti-aliasing. Anti-aliasing improves visualization of the aliasing-generated lines, such as the line shown in Figure 5a, by blurring the rough edges at the borders of the line. Blurring can be done by decrementing the alpha value of the rough edges sequentially as shown in Figure 5b. When expressing a graphical object that has a higher pixel density than the target graphics system, aliasing can be generated because the raster graphics system has limited pixel density. As the line to draw is an ideal graphical object that has unlimited pixel density, the generation rate of aliasing is very high. Anti-aliasing is a technique to deal with this problem. Figure 5 shows the description of anti-aliasing. Anti-aliasing improves visualization of the aliasing-generated lines, such as the line shown in Figure 5a, by blurring the rough edges at the borders of the line. Blurring can be done by decrementing the alpha value of the rough edges sequentially as shown in Figure 5b.

**Figure 5. Figure 5.** Description of anti-aliasing. ( Description of anti-aliasing. ( **a**) Line without anti-aliasing; ( **a b**) Line with anti-aliasing. ) Line without anti-aliasing; (**b**) Line with anti-aliasing.

(**a**) (**b**) **Figure 5.** Description of anti-aliasing. (**a**) Line without anti-aliasing; (**b**) Line with anti-aliasing. The anti-aliasing process starts with detecting the borders of the line. Akin to Bresenham's line algorithm, the anti-aliasing has two types of lines to process, which are related to the slope value. Figure 6 shows the progression of the anti-aliasing process. The anti-aliasing starts with detecting the start point and end point of each border segment. The detection is executed while drawing a line with Bresenham's line algorithm by checking the generated coordinates. Next, as the start point ends and the end point of the border segment is clarified, the process applies the decremental alpha value to each point of the border segment. The following pseudo-code presents the process to apply the alpha value when the slope is lower than or equal to one. The alpha value of the pixel is quantified by three bits, maximum of seven, to reduce the area of the circuit by minimizing the arithmetic calculation.

**Figure 6.** Progression of the anti-aliasing process. **Figure 6.** Progression of the anti-aliasing process.

#### **3. 2D Graphics Accelerator 3. 2D Graphics Accelerator**

tic calculation.

The 2D graphics accelerator provides the 2D graphic processing features including line-drawing, alpha-blending, and anti-aliasing. In order to perform the execution with those features, the accelerator receives setup data, such as start point, end point, the width of the line, bit per pixel (BPP), other configurations, and start flag, from the core of the processor. After the setup data are received and the start instruction is sent, the accelerator operates independently to the core during execution. When the line-drawing process is completed, the accelerator sends the interrupt signal to the interrupt handler of the processor, letting the core recognize the line-drawing process is completed. Based on this characteristic, the workload of the processor is reduced by making it unnecessary for the processor to continuously check what the accelerator completed. The 2D graphics accelerator provides the 2D graphic processing features including line-drawing, alpha-blending, and anti-aliasing. In order to perform the execution with those features, the accelerator receives setup data, such as start point, end point, the width of the line, bit per pixel (BPP), other configurations, and start flag, from the core of the processor. After the setup data are received and the start instruction is sent, the accelerator operates independently to the core during execution. When the line-drawing process is completed, the accelerator sends the interrupt signal to the interrupt handler of the processor, letting the core recognize the line-drawing process is completed. Based on this characteristic, the workload of the processor is reduced by making it unnecessary for the processor to continuously check what the accelerator completed.

The anti-aliasing process starts with detecting the borders of the line. Akin to Bresenham's line algorithm, the anti-aliasing has two types of lines to process, which are related to the slope value. Figure 6 shows the progression of the anti-aliasing process. The antialiasing starts with detecting the start point and end point of each border segment. The detection is executed while drawing a line with Bresenham's line algorithm by checking the generated coordinates. Next, as the start point ends and the end point of the border segment is clarified, the process applies the decremental alpha value to each point of the border segment. The following pseudo-code presents the process to apply the alpha value when the slope is lower than or equal to one. The alpha value of the pixel is quantified by three bits, maximum of seven, to reduce the area of the circuit by minimizing the arithme-

#### *3.1. Line-Drawing Process 3.1. Line-Drawing Process*

Figure 7 presents the progression of the line-drawing process. The setup first receives the line configuration from the core, such as start point, end point, and line width. The module generates the aligned coordinate, slope, line width, and point of the edges from the line configuration and transfers to edge builder. The edge builder sets up the borders of the line by generating the coordinates. The accelerator has three cap modes called perpendicular, vertical, and circle for drawing line caps. Line caps are created by submodules in edge builder. The submodules transfer the minimum and maximum value of x and y coordinates to the line detector module. The line detector starts to process line-drawing by determining what coordinates are borders. The painter generates the coordinates to paint, which are inside the borders, and executes the anti-aliasing process when the antialiasing option is set. Finally, the blender paints the pixels with alpha-blending through Figure 7 presents the progression of the line-drawing process. The setup first receives the line configuration from the core, such as start point, end point, and line width. The module generates the aligned coordinate, slope, line width, and point of the edges from the line configuration and transfers to edge builder. The edge builder sets up the borders of the line by generating the coordinates. The accelerator has three cap modes called perpendicular, vertical, and circle for drawing line caps. Line caps are created by submodules in edge builder. The submodules transfer the minimum and maximum value of x and y coordinates to the line detector module. The line detector starts to process line-drawing by determining what coordinates are borders. The painter generates the coordinates to paint, which are inside the borders, and executes the anti-aliasing process when the anti-aliasing option is set. Finally, the blender paints the pixels with alpha-blending through options transferred from the setup and coordinates from the painter by writing the color to the frame buffer.

#### *3.2. Optimized Architecture*

Figure 8 shows the architecture of the processor including the proposed 2D graphics accelerator. As shown in Figure 8a, the accelerator is connected to the core through the memory bus of the processor. For this reason, the core controls the accelerator through memory access instructions. Moreover, the frame buffer is directly connected to the accelerator and connected to the memory bus. Based on this architecture, the core can deal with the conditions that line-drawing is inefficient to process 2D graphics, such as loading a bitmap image to the frame buffer. This characteristic enables the processor to respond flexibly and efficiently to various conditions. Figure 8b presents the architecture of the 2D graphics accelerator.

*Electronics* **2021**, *10*, 469 7 of 13

to the frame buffer.

to the frame buffer.

**Figure 7.** Progression of the line-drawing process. **Figure 7.** Progression of the line-drawing process. bitmap image to the frame buffer. This characteristic enables the processor to respond

*3.2. Optimized Architecture*  Figure 8 shows the architecture of the processor including the proposed 2D graphics accelerator. As shown in Figure 8a, the accelerator is connected to the core through the memory bus of the processor. For this reason, the core controls the accelerator through memory access instructions. Moreover, the frame buffer is directly connected to the accelerator and connected to the memory bus. Based on this architecture, the core can deal with The accelerator contains the following six modules, called config register, setup, edge builder, line detector, painter, and blender. Config register is a module to save the line configuration and options, such as anti-aliasing and cap mode, from the memory bus. The other modules perform the line-drawing process with options saved in the config register. The five modules, which perform the line-drawing process, operate as a pipelined architecture. Therefore, the accelerator provides high throughput. graphics accelerator. The accelerator contains the following six modules, called config register, setup, edge builder, line detector, painter, and blender. Config register is a module to save the line configuration and options, such as anti-aliasing and cap mode, from the memory bus. The other modules perform the line-drawing process with options saved in the config register. The five modules, which perform the line-drawing process, operate as a pipelined architecture. Therefore, the accelerator provides high throughput.

flexibly and efficiently to various conditions. Figure 8b presents the architecture of the 2D

options transferred from the setup and coordinates from the painter by writing the color

options transferred from the setup and coordinates from the painter by writing the color

(**a**)

**Figure 8.** Architecture of the processor and 2D graphics accelerator. (**a**) Architecture of the processor with 2D graphics accelerator; (**b**) Architecture of the 2D graphics accelerator. **Figure 8.** Architecture of the processor and 2D graphics accelerator. (**a**) Architecture of the processor with 2D graphics accelerator; (**b**) Architecture of the 2D graphics accelerator.

**Figure 9.** Block diagram of the edge builder.

In the setup module, the operation to generate the coordinates of the four edges is

stage. Figure 9 is a block diagram to explain the operations of the edge builder. The edge builder receives the following data signals: minimum and maximum (x, y) coordinates of the points, the distance between the start point and end point (dx, dy), width of the circle to paint when the cap mode is circle, line width, and cap mode. The module generates coordinates of the borders with these signals and submodules. Figure 10 shows all of the cap modes. The edge builder has three selectable cap modes, perpendicular, vertical, and circle, to paint the line caps. The circle submodule generates the coordinates to paint a pixel, which is circular-shaped on edges. The cap submodule generates the coordinates that are parallelogram-shaped, and rectangle-shaped. The line submodule generates borders of the line except for the edges. The entire submodule operates in parallel to provide

fast execution. The generated coordinates are sent to the line detector module.

accelerator; (**b**) Architecture of the 2D graphics accelerator.

In the setup module, the operation to generate the coordinates of the four edges is executed based on the width of the line and the distance between the start point and end point. These coordinates are used for the edge builder module, which is the next pipelined stage. Figure 9 is a block diagram to explain the operations of the edge builder. The edge builder receives the following data signals: minimum and maximum (x, y) coordinates of the points, the distance between the start point and end point (dx, dy), width of the circle to paint when the cap mode is circle, line width, and cap mode. The module generates coordinates of the borders with these signals and submodules. Figure 10 shows all of the cap modes. The edge builder has three selectable cap modes, perpendicular, vertical, and circle, to paint the line caps. The circle submodule generates the coordinates to paint a pixel, which is circular-shaped on edges. The cap submodule generates the coordinates that are parallelogram-shaped, and rectangle-shaped. The line submodule generates borders of the line except for the edges. The entire submodule operates in parallel to provide fast execution. The generated coordinates are sent to the line detector module. In the setup module, the operation to generate the coordinates of the four edges is executed based on the width of the line and the distance between the start point and end point. These coordinates are used for the edge builder module, which is the next pipelined stage. Figure 9 is a block diagram to explain the operations of the edge builder. The edge builder receives the following data signals: minimum and maximum (x, y) coordinates of the points, the distance between the start point and end point (dx, dy), width of the circle to paint when the cap mode is circle, line width, and cap mode. The module generates coordinates of the borders with these signals and submodules. Figure 10 shows all of the cap modes. The edge builder has three selectable cap modes, perpendicular, vertical, and circle, to paint the line caps. The circle submodule generates the coordinates to paint a pixel, which is circular-shaped on edges. The cap submodule generates the coordinates that are parallelogram-shaped, and rectangle-shaped. The line submodule generates borders of the line except for the edges. The entire submodule operates in parallel to provide fast execution. The generated coordinates are sent to the line detector module.

*Electronics* **2021**, *10*, 469 8 of 13

(**b**) **Figure 8.** Architecture of the processor and 2D graphics accelerator. (**a**) Architecture of the processor with 2D graphics

**Figure 9.** Block diagram of the edge builder. **Figure 9.** Block diagram of the edge builder.

**4. Implementation and Analysis** 

line is performed by the program.

designed the accelerator with Verilog HDL.

As the circle submodule generates the whole circular edge, removing the coordinates that are inside the borders is required. This process is done by the line detector module. The line detector receives the coordinates from the edge builder and detects which coordinate is a valid border. Then, it transfers the valid borders, and the minimum and maximum value of the coordinates, to the painter module. The painter module generates the coordinates inside the borders and paints the pixels of generated coordinates by writing the RGBA data to the memory at a certain address. The address to write the RGBA data can be configured by writing the address to the config register through the memory bus. In addition, the module smooths the pixels at borders through the anti-aliasing when antialiasing mode is set on the config register. The written RGBA data are used by the blender module. The blender is a module to draw the line to the display device. As the frame As the circle submodule generates the whole circular edge, removing the coordinates that are inside the borders is required. This process is done by the line detector module. The line detector receives the coordinates from the edge builder and detects which coordinate is a valid border. Then, it transfers the valid borders, and the minimum and maximum value of the coordinates, to the painter module. The painter module generates the coordinates inside the borders and paints the pixels of generated coordinates by writing the RGBA data to the memory at a certain address. The address to write the RGBA data can be configured by writing the address to the config register through the memory bus. In addition, the module smooths the pixels at borders through the anti-aliasing when anti-aliasing mode is set on the config register. The written RGBA data are used by the blender module. The blender is a module to draw the line to the display device. As the

quired. Therefore, the blender performs the alpha-blending with the previous image and the coordinates of the line to draw. Finally, the blender writes the updated image to the

In order to implement and verify the 2D graphics accelerator, we verified the algorithms that are required for the 2D graphics accelerator by programming software. We describe the scripts using MATLAB to verify the algorithms, which are line-drawing, antialiasing, alpha-blending, and drawing various line caps. As the algorithms are verified, we transformed the algorithms in accordance with the register-transfer level (RTL) and

In order to evaluate the 2D graphics accelerator, we integrated the accelerator into the processor, which includes Cortex M0 as a core, by interfacing the accelerator and the core with an AHB-Lite bus. Furthermore, the function that generates the interrupt request signal when the drawing of one line is complete is added. Next, before synthesizing the processor to hardware, we simulated the processor on Vivado 2020.1 version to verify the functionality of the accelerator by executing a customized testbench with a sample program included in the internal ROM of the processor. The embedded program performs the same work as previous MATLAB scripts. The interrupt request signal is generated when the accelerator completes the drawing of one line, and the next configuration of the

The synthesis and implementation were executed with the same Vivado tool with a Xilinx xc7z010clg400 FPGA. Table 1 shows the resource utilization of the 2D graphics accelerator and the processor. The result presents that the resource usage of the 2D graphics accelerator is suitable for embedded systems as the utilization of the processor containing the 2D graphics accelerator does not exceed eighty percent of the programmable logic.

frame buffer, and provides the images to be shown to the display device.

frame buffer has the previous image drawn, blending the drawing line with the image is required. Therefore, the blender performs the alpha-blending with the previous image and the coordinates of the line to draw. Finally, the blender writes the updated image to the frame buffer, and provides the images to be shown to the display device.

#### **4. Implementation and Analysis**

In order to implement and verify the 2D graphics accelerator, we verified the algorithms that are required for the 2D graphics accelerator by programming software. We describe the scripts using MATLAB to verify the algorithms, which are line-drawing, antialiasing, alpha-blending, and drawing various line caps. As the algorithms are verified, we transformed the algorithms in accordance with the register-transfer level (RTL) and designed the accelerator with Verilog HDL.

In order to evaluate the 2D graphics accelerator, we integrated the accelerator into the processor, which includes Cortex M0 as a core, by interfacing the accelerator and the core with an AHB-Lite bus. Furthermore, the function that generates the interrupt request signal when the drawing of one line is complete is added. Next, before synthesizing the processor to hardware, we simulated the processor on Vivado 2020.1 version to verify the functionality of the accelerator by executing a customized testbench with a sample program included in the internal ROM of the processor. The embedded program performs the same work as previous MATLAB scripts. The interrupt request signal is generated when the accelerator completes the drawing of one line, and the next configuration of the line is performed by the program.

The synthesis and implementation were executed with the same Vivado tool with a Xilinx xc7z010clg400 FPGA. Table 1 shows the resource utilization of the 2D graphics accelerator and the processor. The result presents that the resource usage of the 2D graphics accelerator is suitable for embedded systems as the utilization of the processor containing the 2D graphics accelerator does not exceed eighty percent of the programmable logic.


**Table 1.** Resource utilization of the processor including 2D graphics accelerator.

1 total of 17,600 <sup>2</sup> total of 35,200, <sup>3</sup> total of 80.

Table 2 presents the performance of the accelerator on 1024 × 768 resolution at 30 frames per second. In order to evaluate the line-drawing performance, we set up the start point and end point as (50, 50) and (700, 900), which are almost the top-left and bottom-right edges of the display, and tested for various conditions such as operating frequency and line width. The result shows that even if the width is as thick as 50 pixels, line-drawing can be performed with more than one line per frame when the operating frequency is more than 50 MHz. According to this result, the accelerator is suitable for a wide range of applications that have resource limitations and line-drawing-based features such as a real-time scope. However, as the results of Table 2 indicate that the drawing efficiency decreases when the width of the line is small, applying the accelerator to complex graphics applications that are not based on line-drawing can be a challenge.

2D graphics Accelerator

Processor


LUTs 1 5050 28.69 Flip-Flops 2 3087 8.77 DSP 3 3 3.75

LUTs 13,923 79.10 Flip-Flops 4501 12.79 DSP 4 5

Table 2 presents the performance of the accelerator on 1024 × 768 resolution at 30 frames per second. In order to evaluate the line-drawing performance, we set up the start point and end point as (50, 50) and (700, 900), which are almost the top-left and bottomright edges of the display, and tested for various conditions such as operating frequency and line width. The result shows that even if the width is as thick as 50 pixels, line-drawing can be performed with more than one line per frame when the operating frequency is more than 50 MHz. According to this result, the accelerator is suitable for a wide range of applications that have resource limitations and line-drawing-based features such as a realtime scope. However, as the results of Table 2 indicate that the drawing efficiency decreases when the width of the line is small, applying the accelerator to complex graphics

**Table 2.** Performance of the 2D graphics accelerator. **Table 2.** Performance of the 2D graphics accelerator.

*Electronics* **2021**, *10*, 469 10 of 13

**Table 1.** Resource utilization of the processor including 2D graphics accelerator.

1 total of 17,600 2 total of 35,200, 3 total of 80.

applications that are not based on line-drawing can be a challenge.

 **Resource Synthesis Utilization %** 

In order to test the features of the accelerator, line-drawing with various cap modes, anti-aliasing, and alpha-blending, we ran the test firmware on the processor that draws the various kinds of lines by controlling the 2D graphics accelerator with memory access. The processor contains the video graphics array (VGA) controller to display the image in the frame buffer to a display device through a VGA protocol. Consequently, the 2D graphics features, namely line-drawing, alpha-blending, and anti-aliasing, are visually identified by the display device as shown in Figure 11. In order to test the features of the accelerator, line-drawing with various cap modes, anti-aliasing, and alpha-blending, we ran the test firmware on the processor that draws the various kinds of lines by controlling the 2D graphics accelerator with memory access. The processor contains the video graphics array (VGA) controller to display the image in the frame buffer to a display device through a VGA protocol. Consequently, the 2D graphics features, namely line-drawing, alpha-blending, and anti-aliasing, are visually identified by the display device as shown in Figure 11.

**Figure 11.** Experimental environment of the field-programmable gate array (FPGA) implementa-**Figure 11.** Experimental environment of the field-programmable gate array (FPGA) implementation.

One of the essential things in verifying the feasibility of the 2D graphics accelerator is to identify the area of the actual synthesized circuit. In order to identify the area, we synthesize the accelerator by Synopsys design compiler N-2017.09-SP2 version using the 180 nm CMOS process. Table 3 summarizes the synthesis result. The result shows that the total area of the accelerator is 742,494 um<sup>2</sup> , which is around 75K gate counts. The results from Tables 2 and 3 show that the accelerator can be realized through a chip with acceptable performance, drawing more than one line per frame. Therefore, attaching the 2D graphics accelerator to the embedded processor can be a suitable solution to deal with design specifications when the application of the system can effectively be composed with line-drawing features.

**Table 3.** Synthesis result of the 2D graphics accelerator.


#### **5. Conclusions**

tion.

In this paper, we proposed a 2D graphics accelerator, based on line-drawing, for embedded systems. As line-drawing can be a basic element of image drawing in specific

applications, defining required 2D graphics as a set of multiple lines is an effective way to implement graphic features rather than other methods. The accelerator provides the basic line-drawing features and user-centric features that improve visualization, such as alpha-blending and anti-aliasing. In order to implement these 2D graphics features, we analyzed the line-drawing algorithm and required functions. Moreover, we optimized the algorithm and functions for hardware realization. By transforming the binary division and reducing the size of arithmetic calculation in the algorithm, the algorithm can be implemented with fewer arithmetic units and enables the hardware to operate with low power and few resources. We also constructed a system-on-a-chip including the accelerator for embedded systems. We also included the designed accelerator in the processor, which is used for embedded systems. The accelerator is connected to the core through the memory bus of the processor to receive line configuration and start signals from the core. As the accelerator is directly connected to the frame buffer, the accelerator works independently of the core while performing the line-drawing process. Based on these characteristics of the architecture, the core can execute other jobs while the accelerator performs graphics processes. As a result, the overall performance of the processor with applications using 2D graphics can be improved. In addition, the results of the FPGA implementation and the synthesis using the 180 nm CMOS process show that the accelerator is feasible to realize.

In future work, we will apply our 2D graphics accelerator to a variety of applications that are implemented on embedded systems, compare the performance of the accelerator with other methods, such as implementation with a GPP or GPU. As the drawing performance of the accelerator is not suitable for complex, microscopic graphic processes, classifying and finding the applications that have appropriate conditions to apply the accelerator is necessary. We expect that applying the 2D graphics accelerator based on line-drawing to the processor can be effective in a variety of embedded systems.

**Author Contributions:** Conceptualization, H.W.O., J.K.K., and G.B.H.; methodology, J.K.K.; software, G.B.H.; validation, H.W.O., J.K.K., and G.B.H.; investigation, H.W.O. and J.K.K.; writing—original draft preparation, H.W.O.; writing—review and editing, H.W.O. and S.E.L.; visualization, H.W.O. and G.B.H.; supervision, S.E.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT). No. 2019R1F1A1060044, 'Multi-core Hardware Accelerator for High-Performance Computing (HPC)'. This research was also funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea) under the Industrial Technology Innovation Program. No. 10076314, 'Development of lightweight SW-SoC solution for respiratory medical device'.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Electronics* Editorial Office E-mail: electronics@mdpi.com www.mdpi.com/journal/electronics

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com ISBN 978-3-0365-4245-4