1. Introduction
A charge coupled device (CCD) is an important piece of digital imaging equipment. With the rapid development of CCD sensors and computer hardware, high precision dimensional measurement systems based on machine vision have been gradually adopted by industries such as automobile manufacturing, iron and steel manufacturing, and electronic manufacturing [
1]. As shown in
Figure 1, there are examples of industries applying dimensional measurement for their products. In the dimensional measurement systems, edge detection is the core algorithm. Due to the high precision requirements and the cost of the high-resolution CCD sensor, the accuracy of pixel-level edge detection is insufficient, and sub-pixel edge detection becomes an effective way to further improve the performance.
In this work, we take the production line of steel plate in the Taiyuan iron and steel industry for research.
Figure 1b shows the actual scene of the production line. Sheared steel plates are conveyed along with the roller; the aim of the task is to design a system for the measurement of length.
Figure 2 shows the sketch map of the designed dimensional measurement system. Steel plate images are collected by two industrial-grade array CCD sensors and the length of the steel plate can be calculated by the sum of the field interval in the middle, and the steel plate’s length in the fields of two CCD sensors. Therefore, the core issue here is the high precision measurement of edge positions. Here, we take the images of one camera to test sub-pixel edge detection methods.
The concept of sub-pixel edge detection was first introduced by Hueckel [
2], which includes three main methods: fitting method, interpolation method, and moment method. The fitting methods acquire edge location by fitting the grey value of the hypothetical edge model. The method proposed by Ye [
3] adopts the Gaussian edge function obtained by convoluting the ideal edge model. The interpolation methods [
4,
5,
6] interpolate the grey value of the pixel to increase information and locate the sub-pixel edge positions. Moreover, for the moment method, Ghosal first utilizes Zernike orthogonal moment on edge detection, which only needs three masks to be calculated [
7,
8,
9,
10]. The method proposed by Xie [
11] improves the Zernike orthogonal moment with Roberts operator and Otsu’s method. However, in terms of the vast field-of-view of steel plates, those traditional methods cannot achieve a satisfactory accuracy with the resolution limitation of image. Furthermore, some images captured by the CCD sensor have noise such as scars and retro-reflective targets, which may affect the prediction results.
In recent years, the convolution neural network (CNN) has achieved great success in the computer vision field [
12,
13,
14,
15,
16,
17,
18,
19], including feature extraction, classification, and regression. CNN-based models learn network parameters directly from data using backpropagation and have more hidden layers which means a more powerful nonlinear fitting ability. In the early stage, CNN mainly focuses on classification tasks. LeNet was introduced by Yann in 1998 [
20], which was designed to deal with the recognition of handwritten characters. After that, AlexNet [
21], ZF-Net [
22] and GoogleNet [
23] were proposed at the ImageNet large scale visual recognition competition (ILSVRC), all of which achieved good grades. VGG was proposed by the visual geometry group of Oxford University at 2015 [
24,
25], which has more than 10 hidden layers, a smaller filter size, and a more robust feature extraction ability. However, image noise may affect the prediction results, while long short-term memory (LSTM) [
26] has been proposed to handle the problem due to its excellent performance in analyzing sequence information.
LSTM is a widely used deep learning algorithm that aims to process and analyze sequence data. LSTM has been used in applications like natural language processing and the prediction of the stock market [
27,
28,
29,
30,
31,
32,
33,
34]. LSTM is developed based on the recurrent neural network (RNN) [
35,
36,
37]. The traditional RNN has the problem of long-term dependency, which cannot connect the information when the gap between relevant information grows. LSTM avoids the long-term dependency problem by different gate structures that keep or drop out information. Moreover, the bi-directional LSTM model fuses forward propagation LSTM and backpropagation LSTM to connect both past and future information [
38,
39,
40]. In our case, the positions of edge points in one image has relationships with both forward and backward ones. Therefore, bi-directional LSTM is a more appropriate option to optimize the edge positions further. For image noise, which may affect the extraction of edge points, bi-directional LSTM can learn edge information from adjacent unaffected edge points to rectify incorrect prediction results. Recent research has been proposed that combines the advantages of CNN and bi-directional LSTM for practical applications [
41,
42].
To further improve the accuracy of dimensional measurements with the limitation of image resolution, inspired by the analysis above, we propose a novel sub-pixel edge detection model based on CNN and bi-directional LSTM, which simultaneously has high precision and anti-noise ability. Our model adopts a one-dimensional visual geometry group-16 (VGG-16) to extract edge point features from the images. Then, a transformation module is developed to generate sequence information and bi-directional LSTM is followed to equip the model with anti-noise ability. In the end, a fully connected layer is employed to output the final prediction results. Experiments on our steel plate dataset demonstrate that the proposed model outperforms traditional methods and achieves only 0.112 of the overall mean absolute error (MAE) with the low image resolution of 512 pixels 612 pixels.
The main contributions of this work are listed as follows:
We propose a sub-pixel edge detection method based on deep learning for high precision dimensional measurements.
We adopt CNN to extract features from images and introduce the anti-noise ability by adding bi-directional LSTM.
We offer a sub-pixel edge detection dataset of steel plate used in training and testing sub-pixel edge detection methods.
The remainder of the paper is organized as follows:
Section 2 describes different components of the proposed sub-pixel edge detection system.
Section 3 introduces the dataset, preprocessing methods, training protocol, and results.
Section 4 is the discussion.
Section 5 is the conclusion.
3. Results
3.1. Dataset
We evaluate our sub-pixel edge-detection system on the steel plate images collected at the Taiyuan Iron and Steel industry. At the end of the production line, all the finished steel plates need to be measured by the system. The dataset is captured by the industrial-grade array CCD sensor in the stainless-steel cold rolling production line with roller and steel plate in the image. The type of CCD sensor is Point Grey FL3-U3-120S3C-C and the resolution is
. All the images have corresponding ground truth, 241 images for training, and 62 images for testing.
Figure 7 shows the samples of the dataset.
3.2. Preprocessing the Dataset
To verify the superiority of the proposed method with the limitation of image resolution, all the images in the dataset need to be preprocessed. It is unnecessary to calculate all edge positions for length measurement and getting a fixed number of edge positions at equal interval is sufficient. Therefore, the preprocessing includes two steps: downsampling and collecting one-dimensional horizontal vectors from each image.
First, we downsample the original images by four times to obtain low-resolution images with the size of . Then, we select the region of interest (ROI) that covers the steel plate and pick 90 one-dimensional horizontal vectors at equal interval as the input data. The resolution of each vector is .
Moreover, different from pixel-level edge detection, it is impossible to obtain absolute ground truth of sub-pixel edge position. Thus, the edge position of each selected vector is calculated manually on the original images and divided by four. In this way, the error of ground truth is within one-fourth of a pixel, and the accuracy can be guaranteed to the greatest extent. Finally, every 90 selected vectors and the corresponding ground truth from one image are considered as one set of input data to the proposed model.
3.3. Training Protocol and Metrics
The proposed sub-pixel edge detection model is deployed on Google Tensorflow deep learning platform with one NVIDIA GTX1080Ti GPU (11GB RAM). During the training procedure, the learning rate starts with 0.0001 and decays 5% every 241 iterations. The total number of iterations is 80,000.
The metrics to evaluate the proposed sub-pixel edge detection model involve three different criteria: mean-absolute error (MAE), MSE, and root mean square error (RMSE). The formulas are as follows:
where
is the total number of edge points,
is the prediction, and
is the ground truth.
3.4. Experimental Results
To better evaluate the proposed sub-pixel edge detection model, traditional methods including interpolation method and moment method are adopted as the baseline for comparison. For the interpolation method, we introduce the quadratic interpolation algorithm, while for the moment method, the Sobel-Zernike operator is employed.
Table 2 shows the detail results of the proposed model on our steel plate dataset. We evaluate the model based on the four times downsampled images and
Figure 8 shows examples of the prediction results. The first row is the steel plate images; the second row is the predictions and the corresponding ground truth, where the vertical coordinate represents the serial number of the edge points, and the horizontal ordinate represents the coordinate of the edge point.