**Featured Application: Mobile robot navigation, automatic object identification and tracking.**

**Abstract:** QR (Quick Response) codes are one of the most famous types of two-dimensional (2D) matrix barcodes, which are the descendants of well-known 1D barcodes. The mobile robots which move in certain operational space can use information and landmarks from environment for navigation and such information may be provided by QR Codes. We have proposed algorithm, which localizes a QR Code in an image in a few sequential steps. We start with image binarization, then we continue with QR Code localization, where we utilize characteristic Finder Patterns, which are located in three corners of a QR Code, and finally we identify perspective distortion. The presented algorithm is able to deal with a damaged Finder Pattern, works well for low-resolution images and is computationally efficient.

**Keywords:** QR code detection; adaptive thresholding; finder pattern; perspective transformation

### **1. Introduction**

Some of the requirements that are placed on autonomous mobile robotic systems include real-world environments navigation and recognition and identification of objects of interest with which the robotic system have to interact. Computer vision allows machines to obtain a large amount of information from the environment that has a major impact on their behavior. In the surrounding environment there are often numerous different static objects (walls, columns, doors, and production machines) but also moving objects (people, cars, and handling trucks).

Objects that are used to refine navigation, landmarks, can be artificial (usually added by a human) and natural (which naturally occur in the environment) [1,2].

Robotic systems have to work in a real environment and must be able to recognize, for example, people [3], cars [4], product parameters for the purpose of quality control, or objects which are to be handled.

The use of QR (Quick Response) codes (two-dimensional matrix codes) in interaction with robots can be seen in the following areas:

• in the field of navigation as artificial landmarks-analogies of traffic signs that control the movement of the robot in a given area (no entry, driving direction, alternate route, and permitted hours of operation) or as an information board providing context specific information or instructions (such as identification of floor, room, pallets, and working place)

• in the area of object identification, 2D codes are often used to mark products and goods and thus their recognition will provide information about the type of goods (warehouses), the destination of the shipment (sorting lines) or control and tracking during track-and-trace.

#### *QR Codes*

QR Codes are classified among 2D matrix codes (similar to Data Matrix codes). QR Codes (Model 1 and Model 2) are squared-shaped 2D matrices of dark and light squares—so called modules. Each module represents the binary 1 or 0. Each QR Code has fixed parts (such as Finder Patterns and Timing Patterns) that are common to each QR Code and variable parts that differ according to the data that is encoded by a QR Code. Finder Patterns, which are located in the three corners of a QR Code, are important for determining the position and rotation of the QR Code. The size of the QR code is determined by the number of modules and can vary from 21 × 21 modules (Version 1) to 177 × 177 (Version 40). Each higher version number comprises four additional modules per side. In the Figure 1 we can see a sample of version 1 QR Code.

**Figure 1.** Version 1: 21 × 21 QR Code.

QR Code has error correction capability to restore data if the code is partially damaged. Four error correction levels are available (L–Low, M–Medium, Q–Quartile, and H–High). The error correction level determines how much of the QR Code can be corrupted to keep the data still recoverable (L–7%, M–15%, Q–25%, and H–30%) [5]. The QR Code error correction feature is implemented by adding a Reed–Solomon Code to the original data. The higher the error correction level is the less storage capacity is available for data.

Each QR Code symbol version has the maximum data capacity according to the amount of data, character type and the error correction level. The data capacity ranges from 10 alphanumeric (or 17 numeric) characters for the smallest QR Code up to 1852 alphanumeric (or 3057 numeric) characters for the largest QR Code at highest error correction level [5]. QR Codes support four encoding modes—numeric, alphanumeric, Kanji and binary—to store data efficiently.

QR Code was designed in 1994, in Japan for automotive industry, but currently has a much wider use. QR codes are used to mark a variety of objects (goods, posters, monuments, locations, and business cards) and allow to attach additional information to them, often in the form of a URL to a web page. QR Code is an ISO standard (ISO/IEC 18004:2015) and is freely available without license fees.

In addition to a traditional QR Code Model 1 and 2, there are also variants such as a Micro QR Code (a smaller version of the QR Code standard for applications where a symbol size is limited) or an iQR Code (which can hold a greater amount of information than a traditional QR Code and it supports also rectangular shapes) (Figure 2).

**Figure 2.** Other types of a QR Code: (**a**) Micro QR code, (**b**) iQR code.

### **2. Related Work**

Prior published approaches for recognizing QR Codes in images can be divided into Finder Pattern based location methods [6–12] and QR Code region based location methods [13–18]. The first group locates a QR Code based on the location of its typical Finder Patterns that are present in its three corners. The second group locates the area of a QR code in the image based on its irregular checkerboard-like structure (a QR Code consists of many small light and dark squares which alternate irregularly and are relatively close to each other).

A shape of the Finder Pattern (Figure 3) was deliberately chosen by the authors of the QR Code, because "it was the pattern least likely to appear on various business forms and the like" [6]. They found out that black and white areas that alternate in a 1:1:3:1:1 ratio are the least common on printed materials.

**Figure 3.** Finder Pattern.

In [7] (Lin and Fuh) all points matching the 1:1:3:1:1 ratio, horizontally and vertically, are collected. Collected points belonging to one Finder Pattern are merged. Inappropriate points are filtered out according to the angle between three of them.

In [8] (Li et al.) minimal containing region is established analyzing five runs in labeled connected components, which are compacted using run-length coding. Second, coordinates of central Finder Pattern in a QR Code are calculated by using run-length coding utilizing modified Knuth–Morris–Pratt algorithm.

In [9] (Belussi and Hirata) two-stage detection approach is proposed. In the first stage Finder Pattern (located at three corners of a QR Code) is detected using a cascaded classifier trained according to the rapid object detection method (Viola–Jones framework). In the second stage geometrical restrictions among detected components are verified to decide whether subsets of three of them correspond to a QR Code or not.

In [10] (Bodnár and Nyúl) Finder Pattern candidate localization is based on the cascade of boosted weak classifiers using Haar-like features, while the decision on a Finder Pattern candidate to be kept or dropped is decided by a geometrical constraint on distances and angles with respect to other probable Finder Patterns. In addition to Haar-like features, local binary patterns (LBP), and histogram of oriented gradients (HOG) based classifiers are used and trained to Finder Patterns and whole code areas as well.

In [11] (Tribak and Zaz) successive horizontal and vertical scans are launched to obtain segments whose structure complies with the ratio 1:1:3:1:1. The intersection between horizontal and vertical segments presents the central pixel of the extracted pattern. All the extracted patterns are transmitted to a filtering process based on principal components analysis, which is used as a pattern feature.

In [12] (Tribak and Zaz) seven Hu invariant moments are applied to the Finder Pattern candidates obtained by initial scanning of an image and using Euclidean metrics they are compared with Hu moments of the samples. If the similarity is less than experimentally determined threshold, then the candidate is accepted.

In [13] (Sun et al.) authors introduce algorithm, which aims to locate a QR Code area by four corners detection of 2D barcode. They combine the Canny edge detector with external contours finding algorithm.

In [14] (Ci ˛a˙zy ´nski and Fabija ´nska) they use histogram correlation between a reference image of a QR code and an input image divided into a block of size 30 × 30. Then candidate blocks are joined into regions and morphological erosion and dilation is applied to remove small regions.

In [15] (Gaur and Tiwari) they propose approach which uses Canny edge detection followed by morphological dilation and erosion to connect broken edges in a QR Code into a bigger connected component. They expect that the QR Code is the biggest connected component in the image.

In [16,17] (Szentandrási et al.) they exploit the property of 2D barcodes of having regular distribution of edge gradients. They split a high-resolution image into tiles and for each tile they construct HOG (histogram of oriented gradients) from the orientations of edge points. Then they select two dominant peeks in the histogram, which are apart roughly 90◦. For each tile, a feature vector is computed, which contains a normalized histogram, angles of two main gradient directions, a number of edge pixels and an estimation of probability score of a chessboard-like structure.

In [18] (Sörös and Flörkemeier) areas with high concentration of edge structures as well as areas with high concentration of corner structures are combined to get QR Code regions.

#### **3. The Proposed Method**

Our method is primarily based on searching for Finder Patterns and utilizes their characteristic feature—1:1:3:1:1 black and white point ratio in any scanning direction. The basic steps are indicated in the flowchart in Figure 4.

**Figure 4.** The flow chart of proposed algorithm.

Before searching for a QR Code, the original image (maybe colored) is converted to a gray scale image using Equation (1), because the color information does not bear any significant additional information that might help in QR Code recognition.

$$I = \frac{77R + 151G + 28B}{256} \tag{1}$$

where *I* stands for gray level and *R*, *G*, *B* for red, green, and blue color intensities of individual pixels in the RGB model, respetively. This RGB to gray scale conversion is integer approximation of widely used luminance calculation as defined in recommendation ITU-R BT.601-7:

$$I = 0.299R + 0.587G + 0.114B \tag{2}$$

Next, the gray scaled image is converted to a binary image using modified adaptive thresholding (with the size of window 35—the window size we choose to be at least five times the size of expected size of QRC module) [19]. We expect that black points, which belong to QRC, will become foreground points.

We use the modification of the well-known adaptive thresholding technique (Equation (3)), which calculates individual threshold for every point in the image. This threshold is calculated using average intensity of points under a sliding window. To speed up the thresholding we pre-calculate the integral sum image and we also use the global threshold value (points with intensity above 180 we always consider as background points). Adaptive thresholding can successfully threshold also uneven illuminated images.

$$B(\mathbf{x}, y) = \begin{cases} 0 & \leftarrow & I(\mathbf{x}, y) > 180 \\ 0 & \leftarrow & I(\mathbf{x}, y) > = T(\mathbf{x}, y) \\ 1 & \leftarrow & I(\mathbf{x}, y) < T(\mathbf{x}, y) \end{cases} \tag{3}$$

$$T(\mathbf{x}, y) = m(\mathbf{x}, y) - \frac{I(\mathbf{x}, y)}{10} - 10$$

$$m(\mathbf{x}, y) = \frac{1}{35 \times 35} \sum\_{i=-17}^{17} \sum\_{j=-17}^{17} I(\mathbf{x} + i, y + j)$$

where *I* is gray scale (input) image, *B* is binary (output) image, *T* is threshold value (individual for each pixel at coordinates *x*, *y*), and *m* is average of pixel intensities under sliding window of the size 35 × 35 pixels.

In order to improve adaptive thresholding results, some of the image pre-processing techniques, such as histogram equalization, contrast stretching or deblurring, are worth to consider.

### *3.1. Searching for Finder Patterns*

First, the binary image is scanned from top to bottom and from left to right, and we look for successive sequences of black and white points in a row matching the ratios 1:1:3:1:1 (*W*1:*W*2:*W*3:*W*4:*W*<sup>5</sup> where *W*1, *W*3, *W*<sup>5</sup> indicates the number of consecutive black points which are alternated by *W*2, *W*<sup>4</sup> white points) with small tolerance (tolerance is necessary due to imperfect thresholding and noise in the Finder Pattern area, black and white points in a line do not alternate in ideal ratios 1:1:3:1:1):

$$\begin{aligned} \mathcal{W}\_1, \mathcal{W}\_2, \mathcal{W}\_4, \mathcal{W}\_5 &\in \langle w - 1.5, w + 2.0 \rangle \\ \mathcal{W}\_3 &\in \langle 3w - 2, 3w + 2 \rangle \\ \mathcal{W}\_3 &\ge \max(\mathcal{W}\_1 + \mathcal{W}\_2, \mathcal{W}\_4 + \mathcal{W}\_5) \\ w &= \frac{\mathcal{W}\_1 + \mathcal{W}\_2 + \mathcal{W}\_3 + \mathcal{W}\_4 + \mathcal{W}\_5}{7} = \frac{\mathcal{W}}{7} \end{aligned} \tag{4}$$

For each match in a row coordinates of Centroid (*C*) and Width (*W* = *W*<sup>1</sup> + *W*<sup>2</sup> + *W*<sup>3</sup> + *W*<sup>4</sup> + *W*5) of the sequence (of black and white points) are stored in a list of Finder Pattern candidates (Figure 5).

**Figure 5.** A Finder Pattern candidate matching 1:1:3:1:1 horizontally.

Then, Finder Pattern candidates (from the list of candidates) that satisfy the following criteria are grouped:


We expect, that the Finder Pattern candidates in one group belong to the same Finder Pattern and therefore we set the new centroid *C* and width *W* of the group as average of *x*, *y* coordinates and widths of the nearby Finder Pattern candidates (Figure 6a).

**Figure 6.** (**a**) Group of 8 Finder Pattern candidates matching 1:1:3:1:1 in rows, (**b**) Finder Pattern candidate matching 1:1:3:1:1 vertically.

After grouping the Finder Patterns, it must be verified whether there are sequences of black and white points also in the vertical direction, which alternate in the ratio 1:1:3:1:1 (Figure 6b). A bounding box around the Finder Pattern candidate, in which vertical sequences are looked for, is defined as

$$\forall x \in \langle \mathbb{C}\_x \pm 1.3/7\mathcal{W} \rangle, \ y \in \left\langle \mathbb{C}\_y \pm 5.5/7\mathcal{W} \right\rangle \tag{5}$$

where *C* (*Cx*, *Cy*) is a centroid and *W* is width of the Finder Pattern candidate. We work with a slightly larger bounding box in case the Finder Pattern is stretched vertically. Candidates, where no vertical match is found or where the ratio *H*/*W* < 0.7, are rejected. For candidates, where a vertical match

is found, the *y* coordinate of centroid *C* (*Cy*) is updated as an average of *y* coordinates of centers of the vertical sequences.

### *3.2. Verification of Finder Patterns*

Each Finder Pattern consists of a central black square with the side of 3 units (*R*1), surrounded by a white frame with the width of 1 unit (*R*2), surrounded by a black frame with the width of 1 unit (*R*3). In Figure 7 there are colored regions *R*1, *R*<sup>2</sup> and *R*<sup>3</sup> in red, blue, and green, respectively. For each Finder Pattern candidate Flood Fill algorithm is applied, starting from the centroid *C* (which lies in the region *R*1) and continuing through white frame (region *R*2) to black frame (region *R*3). As continuous black and white regions are filled, following region descriptors are incrementally computed:


**Figure 7.** Regions of the Finder Pattern candidate.

The Finder Pattern candidate, which does not meet all of the following conditions, is rejected.


Note: the criteria were set to be invariant to the rotation of the Finder Pattern, and the acceptance ranges were determined experimentally. In an ideal undistorted Finder Pattern, the criteria are met as follows:


In real environments there can be damaged Finder Patterns. The inner black region *R*<sup>1</sup> can be joined with the outer black region *R*<sup>3</sup> (Figure 8a) or the outer black region can be interrupted or incomplete (Figure 8b). In the first case the bounding box of the region *R*<sup>2</sup> is completely contained by bounding box of the region *R*<sup>1</sup> and in the second case is bounding box of the region *R*<sup>3</sup> contained in bounding box of the region *R*2. These cases are handled individually. If the first case is detected, then the region *R*<sup>1</sup> is proportionally divided into *R*<sup>1</sup> and *R*<sup>3</sup> and if the second case is detected then the region *R*<sup>2</sup> is instantiated using the region *R*<sup>1</sup> and *R*3.

**Figure 8.** Various damages of Finder Patterns: (**a**) merged inner and outer black regions; (**b**) interrupted outer black region.

The Centroid (*C*) and Module Width (*MW*) of the Finder Pattern candidate are updated using the region descriptors as follows:

$$\mathcal{C} = \left(\frac{M\_{10}(\mathcal{R}\_1) + M\_{10}(\mathcal{R}\_2) + M\_{10}(\mathcal{R}\_3)}{M\_{00}(\mathcal{R}\_1) + M\_{00}(\mathcal{R}\_2) + M\_{00}(\mathcal{R}\_3)}, \frac{M\_{01}(\mathcal{R}\_1) + M\_{01}(\mathcal{R}\_2) + M\_{01}(\mathcal{R}\_3)}{M\_{00}(\mathcal{R}\_1) + M\_{00}(\mathcal{R}\_2) + M\_{00}(\mathcal{R}\_3)}\right) \tag{6}$$

$$\mathcal{M}\mathcal{W} = \sqrt{M\_{00}(\mathcal{R}\_1) + M\_{00}(\mathcal{R}\_2) + M\_{00}(\mathcal{R}\_3)}/7$$
