**2. Robot Batting System**

We built a robot betting system as shown in Figure 1. The hardware of the batting system consists of a stereo vision sensor for recognizing the red ball and obtaining positional information in three-dimensional space, and a six DOF robotic arm for batting the ball to the target position (Net in Figure 1). The stereo vision sensor (Bumblebee2, Point Gray Research Inc.) provides 640 × 480 pixels color images at up to 48 frames per second (fps). The Triclops library included in Bumblebee2 (Point Gray Research Inc.) provides 3D position coordinates. Bumblebee is shipped with precision calibration at the production stage, and because the two cameras are structurally fixed, there is no need to perform additional calibration between the two cameras. A slightly modified version of the arm of a humanoid robot, Hubo [23], designed by the HUBO LAB of the Korea Advanced Institute of Science and Technology, was used for batting. Link lengths were changed to extend the workspace, and the robotic hand end device was replaced with a 0.09 m diameter round aluminum plate to perform the batting task.

**Figure 1.** (**a**) Robotic batting system. The hardware of the batting system consists of a stereo vision sensor and a six degree of freedom robotic arm. (**b**) Robot arm configuration. The base coordinate system of the robot arm is the first coordinate system.

#### **3. Method**

We applied a color segmentation method [24] to recognize the red ball. To improve the recognition accuracy of the ball under the conditions of illumination, a circle fitting method [25] using the geometrical characteristics of the ball is employed.

#### *3.1. Ball Recognition*

#### 3.1.1. Color Segmentation

The color separation method is used to find pixels with specific color values in the image. Since we used a red ball in the batting experiment, only red is segmented in the image by comparing the red component (*Ic*) and the threshold value (*IThreshold*) of the pixel as shown in Equation (1).

$$I\_b(\mu, \upsilon) = \begin{cases} \ 1, \text{ if } I\_c(\mu, \upsilon) \ge I\_{threshold} \\ \ 0, \text{ if } I\_c(\mu, \upsilon) < I\_{threshold} \end{cases} \tag{1}$$

For improved visualization, the color image is converted to a binary image comprising pixels (*Ib*) with only two values. The pixels corresponding to the threshold value or more are represented by "1" (white), and the other pixels are represented to "0" (black); thus, the red ball becomes white. However, since the RGB value changes according to the change in illuminance, not only the ball but also noise are binarized. To remove this noise, the morphology method [26] is applied. This method is effective for removing the salt-andpepper noise or impulse noise between objects and the background. Figure 2 shows the results of finding the location of the ball on the desk in the lab when color segmentation and morphology methods are applied. The left side of the figure is a color image, the right side is a binary image, and only the red ball appears white in the binary image.

**Figure 2.** Color image (**left**) and binary image (**right**). Green circle of color image represents the position of the red ball.

#### 3.1.2. Circle Fitting

Figure 3a shows the batting experiment environment captured using a stereo vision camera. Because the ceiling light creates a shadow on the lower part of the ball, the RGB value of the lower part of the ball changes. Figure 3b shows a binarized image with color segmentation applied to the image in Figure 3a. In the binary image of Figure 3b, the shape of the ball appears as a semicircle rather than a circle because the lower part of the color change is not properly color segmented. Incorrect color segmentation causes errors in the center position measurement of the ball. For example, if only the upper part of the ball is color-segmented, the position error in the vertical direction increases. Threshold adjustment is limited because the difference between the color value of the lower part of the ball and the color value of the upper part is large.

To overcome the limitations of color segmentation, we additionally applied a circle fitting method that uses the geometric characteristics of a circular ball. Figure 3c shows only the edge of the semicircle shown in Figure 3b. By applying a circle fitting to the edge of the semicircle, we can estimate the original circle ball shape, as shown in Figure 3d. Figure 3e shows the center position of the ball before applying the circle fitting (lime green point) and the center position of the ball after application (blue point). The positions of the lime green and blue points are (X, Y, Z) = (−0.0416, 1.3299, −0.0888) and (X, Y, Z) = (−0.0413, 1.3207, −0.0955), respectively. The true position of the ball is (X, Y, Z) = (−0.0414, 1.3187, −0.0985). Prior to the circle fitting method, the error in the z direction was 0.0097 m, which is approximately 20% of the diameter of the ball at 0.05 m. After the circle fitting method was applied, the center position of the ball moved by about 0.007 m further down the *Z*-axis, and the position error in the Z direction was reduced from 0.0097 m to 0.0030 m.

**Figure 3.** Circle fitting process. (**a**) Batting experiment environment captured using a stereo vision camera, (**b**) A binary image with color segmentation, (**c**) The edge image of the semicircle, (**d**) An image with circle fitting applied to the edge of a semicircle, (**e**) Center position before (green) and after (blue) circle fitting.

#### 3.1.3. Calibration

Since the position of the ball measured in the vision coordinate system needs to be converted to the robot coordinate system, a coordinate transformation between the two coordinate systems is required. For coordinate transformation between two coordinates, we attached the red marker shown in Figure 4 to the end-effector of the robot arm and measured its position in the vision coordinate system and the robot coordinate system. Coordinate transformation between two coordinates is possible using a homogeneous transformation matrix representing the relationship between the camera coordinate system and the robot base coordinate system, as shown in the following equation:

$$
\begin{bmatrix} \ ^B P \\ \ 1 \end{bmatrix} = H \begin{bmatrix} \ ^C P \\ \ 1 \end{bmatrix} \tag{2}
$$

where *H* = *B <sup>C</sup><sup>R</sup> BPc* 000 1 

Subscripts *B* and *C* represent the robot arm base coordinate and camera coordinate systems, *BP* and *CP* represent the positions of the markers (Figure 4) measured in the robot and vision coordinate systems, respectively, *H* denotes the homogeneous transformation matrix, and *<sup>B</sup> <sup>C</sup><sup>R</sup>* and *BPc* represent the rotation matrix and the distance vector between the robot coordinate system and the camera coordinate system, respectively. The homogeneous transformation matrix can be calculated by the least square method as

$$H = EV^{T} \left( VV^{T} \right)^{-1} \prime$$

$$\text{where } E = \left[ \begin{bmatrix} \ ^{B}P\_{1} \\ 1 \end{bmatrix} \quad \begin{bmatrix} \ ^{B}P\_{2} \\ 1 \end{bmatrix} \quad \cdots \quad \begin{bmatrix} \ ^{B}P\_{N} \\ 1 \end{bmatrix} \right] , V = \left[ \begin{bmatrix} \ ^{C}P\_{1} \\ 1 \end{bmatrix} \quad \begin{bmatrix} \ ^{C}P\_{2} \\ 1 \end{bmatrix} \quad \cdots \quad \begin{bmatrix} \ ^{C}P\_{N} \\ 1 \end{bmatrix} \right]. \tag{3}$$

.

E and V consist of the position vectors of the markers measured in the robot coordinate system and the vision coordinate system, respectively, and N is the number of measured position vectors. From the base coordinate system of the robot, the position data set *BP*<sup>1</sup> ... *BPN* is calculated by forward kinematics. The position data set *CP*<sup>1</sup> ... *CPN* is measured using the stereo vision sensor.

The accuracy of coordinate transformation is improved by calculating a homogeneous matrix from data measured at various locations as shown in Figure 5. From the measured data, the homogeneous transformation matrix is calculated as

$$H = \begin{bmatrix} -0.9981 & -0.0066 & 0.0054 & 0.1935 \\ 0.0184 & -0.9754 & 0.0026 & 0.8037 \\ -0.0114 & 0.0055 & 0.9906 & 0.3792 \\ 0 & 0 & 0 & 1 \end{bmatrix} \tag{4}$$

From *<sup>B</sup> <sup>C</sup><sup>R</sup>* and *BPc* of the homogeneous transformation matrix, the camera coordinate system is at a distance of 0.1935 m in the X-axis direction, 0.8037 m in the Y-axis direction, and 0.3792 m in the *Z*-axis direction from the robot coordinate system. In the robot coordinate system, the camera coordinate system is rotated by 0.3188◦ for the X axis, 0.6539◦ for the Y axis, and 178.9466◦ for the Z axis. To evaluate the accuracy of the calculated homogeneous transformation matrix, the positions of the markers obtained from the homogeneous transformation of the marker positions measured in the vision are compared with the positions of the markers measured in the robot coordinate system as shown in Figure 6. The error is expressed as a norm value for the X, Y, and Z axes. The mean error is 0.0024 m and the standard deviation is 0.0012 m. Considering that the error is less than 0.005 m and the diameter of marker is 0.01 m, the calculated homogeneous matrix is sufficiently accurate.

**Figure 4.** Red marker attached to the end-effector of the robotic arm to calculate the homogeneous matrix.

**Figure 5.** Red marker positions measured from the vision sensor. The positions of the markers measured at various positions are used to accurately calculate a homogeneous matrix.

**Figure 6.** Histogram showing the calibration error for 15 marker positions.
