1. Introduction
Mass finishing is an important surface treatment process that involves placing components in a container filled with granular media, grinding fluid, and water. Through a certain form of relative motion, it achieves collision, rolling pressure, micro-grinding, and certain chemical actions on the surface of the workpiece. As a result, it accomplishes polishing, brightening, deburring, cleaning, and degreasing effects, thus enhancing surface quality [
1]. Due to its advantages such as strong adaptability to components, good processing effects, economic efficiency, and environmentally friendly processes, mass finishing is widely used in surface treatment of machined parts in various industries [
1,
2].
Many studies have shown that the contact force and impact velocity between the granular media and the workpiece directly affect the processing effect in mass finishing [
3,
4,
5,
6]. However, the granular media are a collective of particulate matter influenced by contact force, damping, and gravity, and exhibit dynamic characteristics of forced particle groups. These characteristics of granular media flow result in velocity measurement methods that are different from traditional measurement methods for solid, liquid, and gas [
7].
Existing velocity measurement methods for granular flow can be roughly classified into five categories.
The first category involves obtaining contact force through in-situ measurement and deriving velocity information from the contact force data. Ciampini’s team has made significant contributions in this area. They developed a method to extract normal impact velocity from measured impact force signals [
3] and creatively utilized the deformation of Almen strips under impact to characterize the effect of impact velocity [
8]. The methods of this category have certain limitations, such as the need for expensive high-sampling rate equipment to record contact force and the fact that the measurement area can only be a single point in the flow field.
The second category involves using various fiber-optical probes (FOPs) and laser-Doppler velocimetry (LDV) to detect the flow field velocity. For example, Hashemnia et al. [
9] developed a laser displacement probe for measuring the velocity of mass-finishing flow fields, while Liu et al. [
10] developed a novel optical-fiber probe for detecting local instantaneous solid-volume concentration, velocity, and flux information in the flow field. The limitations of this category include the need for complex setup, expensive equipment, and a high-performance processor for real-time signal processing. Additionally, FOPs and LDV can only measure a very small volume in the flow field.
The third category is modeling of the flow field. The most commonly used methods are discrete-element method (DEM)-based and computational fluid dynamics (CFD)-based modeling of the granular flow field. Ciampini’s team [
8], Hashimoto’s team [
11,
12], and Uhlmann’s team [
13] have conducted research in this area. The limitations of this category are very obvious. First, building a specific model is complex and difficult. Second, perfecting a model will consume a large amount of time, but a specific model cannot be used in different scenarios. Additionally, these types of models, which are based on traditional algorithms, will cause a significant computational load.
The fourth category consists of some special methods. For example, Domblesky et al. [
14] passed a knotted nylon rope through a small hole in the workpiece and determined the actual velocity field of individual objects by calculating the motion time and length of the nylon rope. Hashimoto et al. [
11] attached the end of a fishing line to a granule and allowed the fishing line to move with the medium. By measuring the elongation of the fishing line during motion, they obtained the average velocity of the medium’s movement. These methods interfere with the flow field and can only obtain the velocity data of a few granules. Therefore, they are only used in some specific scenarios.
The fifth category is methods based on optical techniques. The measurement of fluid velocity fields using optical techniques is very mature and has been conducted by many researchers. However, the studies on using optical techniques to measure the velocity field of clustered solid granular media flow are scarce. Some researchers have used particle-image velocimetry (PIV) and particle-tracking velocimetry (PTV) techniques from experimental fluid dynamics to measure the velocity of granular flow [
15,
16,
17], but there are still few substantive results published in the literature. Among these few researchers, two teams have achieved substantial results. Thomas Hagemeier et al. compared the four techniques of FOPs, LDV, PIV, and PTV in the measurement of a particle flow velocity field in a fluidized bed and summarized the capabilities, limitations, and development potential of these techniques [
18]. J. X. Duan et al. used a deep learning network to measure the two-dimensional velocity field of granules in rotary drums [
19]. Both teams used very advanced optical methods, but their work was limited to single-point or two-dimensional measurements of the flow field.
In actual mass finishing, granular flow is a complex three-dimensional flow, and the flow parameters in different areas are significantly different. Therefore, developing a global three-dimensional velocity field measurement method for granular flow is very important for studying the processing effect of mass finishing. As there is currently no global three-dimensional velocity field measurement method for granular flow, we propose a single-camera-based three-dimensional velocity field measurement method for granular media in mass finishing with deep learning. This method uses optical flow [
20,
21] to process images obtained using the single-camera multi-view method proposed by Lee’s team [
22] to reconstruct the three-dimensional flow and velocity field. In addition, we innovatively introduce an optical flow deep learning network [
23,
24] to replace the traditional optical flow algorithm, which significantly reduces the computing time and the requirements for computing hardware resources.
The method we proposed is able to analyze images based on temporal and spatial transformations to obtain the overall velocity of the granular flow. We constructed the corresponding measurement system and demonstrated the accuracy and performance of our method by comparing the results with those of the traditional DIC algorithm. Through the measurement of granular media in a bowl-type vibratory finishing machine, we proved that our method can efficiently and accurately measure the three-dimensional velocity field of granular flow in mass finishing. Furthermore, our research not only provides a novel measurement method for three-dimensional granular flow with one single-color camera, but also highlights the great research potential of integrating deep learning with traditional optical techniques.
2. Three-Dimensional Velocity Field Measurement System
2.1. Composition of the Measurement System
As illustrated in
Figure 1, the three-dimensional velocity field measurement system for granular media comprises two main components: hardware and software systems. The hardware system consists of a high-speed camera (the parameter settings of the high-speed camera are shown in
Table 1), a lens (Zeiss Milvus 2/100M ZF.2-mount), a trichromatic mask, a coaxial light source, a long-stroke gear rack translation stage (Zolix AD160C-40), and a high-performance processor. The software system includes a proprietary algorithm platform designed specifically for measuring the three-dimensional velocity field of granular media.
It is necessary to introduce the composition and function of the self-designed trichromatic mask here. The trichromatic mask consists of three 1/3 circular filters (as shown in
Figure 2a) and three circular apertures (as shown in
Figure 2b, the center distance between every two apertures is 19 mm). Each of the red, green, and blue 1/3 circular filters only allows light with wavelengths of 650 nm, 532 nm, and 450 nm to pass through, respectively. The trichromatic mask serves to convert the three RGB channels into three perspectives of a single camera. This is achieved by using a specific algorithm to separate the RGB channels from the captured images, resulting in images from three perspectives of a single camera. Then, the flow field is reconstructed using multi-perspective three-dimensional reconstruction technology (more details about multi-perspective three-dimensional reconstruction will be introduced in the following section). This approach can reduce hardware investment in the experimental system and eliminate the requirement for high precision synchronization when using multiple cameras [
25].
2.2. Principle of Multi-Perspective Three-Dimensional Reconstruction
Multi-perspective three-dimensional reconstruction is an image-based technique used to reconstruct the three-dimensional shape and structure of an object from multiple two-dimensional images taken from different perspectives. This technology has been greatly developed in the past few decades, and now there are many widely used methods such as EPI [
26], PMVS [
27], Depth Estimation CNNs [
28], etc.
Figure 3 is a schematic diagram showing the principle of the traditional multi-view stereo (MVS) method, which involves the following key steps:
Firstly, multiple cameras are utilized to capture two-dimensional images of the target object from different perspectives. In this paper, multiple cameras are replaced by the aforementioned trichromatic mask and a single-color camera. These images need to be preprocessed, including image registration and noise reduction, to ensure image quality and accurate alignment.
Secondly, feature point extraction and matching are performed. Using image processing algorithms such as SIFT or SURF, significant feature points are extracted from each image. Subsequently, matching algorithms like nearest-neighbor search are employed to identify corresponding feature point pairs across different perspective images. For this feature extracting and matching step, this paper uses an optical flow estimation convolutional neural network (CNN) named LiteFlowNet to replace the traditional method (details in
Section 2.5), which not only greatly shortens the calculation speed, but also does not sacrifice the resolution of the original images.
Thirdly, triangulation is applied. Given the intrinsic and extrinsic parameters of the known camera, as well as the pixel coordinates of corresponding feature points in different images, the three-dimensional positions of these feature points can be computed. Specifically, triangulation uses geometric relationships to solve for the intersection points of rays in three-dimensional space, thereby determining the three-dimensional coordinates of the feature points. Normally, triangulation is used to recover the depth information
z of a three-dimensional point
by observing the point at different positions and exploiting the triangular relationship between the two-dimensional projection points
and
observed at different positions. For example, suppose there are two cameras with projection matrices
and
and feature points in the images with coordinates
and
, respectively. The coordinates of the three-dimensional point
can be determined using the following equations:
where
denotes the triangulation method [
29].
Finally, global optimization methods such as bundle adjustment (BA) are used to further refine the reconstructed three-dimensional point cloud by minimizing the reprojection error as determined by the following equation [
30]:
where
represents the three-dimensional coordinates of the
-th point,
represents the projection matrix of the
-th camera,
represents the observed two-dimensional projection point of the
-th three-dimensional point in the
-th camera image,
n is the total number of three-dimensional points,
m is the total number of cameras, and
repressents the loss function. This step significantly enhances the accuracy and consistency of the reconstruction results.
Through these steps, multi-perspective three-dimensional reconstruction generates a three-dimensional point cloud, which can be used for further three-dimensional model reconstruction, measurement, and analysis.
2.3. Principle of Operation for the Measurement System
The operation principle of the hardware system is illustrated in
Figure 1. The assembled camera is mounted on the long-stroke gear rack translation stage, powered on, and connected to the high-performance processor. The long-stroke gear rack translation stage and the camera lens are adjusted to ensure that the camera’s capture area covers the desired range. The image capture interval ∆
T is set in the camera control software and the camera is used to capture images of the granular flow field in the measurement area.
The operational principle of the software system is shown in
Figure 4, which relies on a self-designed algorithm platform for measuring the three-dimensional velocity field of the granular media. Camera calibration is performed using pre-captured images of a checkerboard calibration board to obtain the camera’s intrinsic and extrinsic parameters. Based on this algorithm, the three-dimensional positions
and
of the granular flow field in space at time instants
and
are calculated. Using the positions of the granular flow field at different time instants, the displacement ∆
L in space is calculated, and then, the three-dimensional velocity
V of the granular flow field is determined. The formula for calculating the three-dimensional velocity of the granular flow field is as follows:
Finally, visualization processing is performed on the spatial positions of the granular flow field, as well as the three-dimensional velocity field of the granular media.
2.4. Algorithm Workflow of the Measurement System
By summarizing and supplementing the operation principle of the software system shown in
Figure 4, the algorithm flow of the measurement system can be obtained as shown in
Figure 5. Prior to obtaining motion images of the granular flow field, parameters of the camera’s spatial position and viewpoint need to be acquired for application of the subsequent triangulation algorithm. Calibration is conducted using the Stereo Camera Calibrator toolbox in MATLAB (version: R2021a). During calibration with the Stereo Camera Calibrator toolbox, the RGB channels of the checkerboard calibration board image are first separated using an algorithm to obtain images from the three perspectives. Subsequently, calibration is performed using two of the three perspectives (the specific details will be presented in
Section 3.2).
Prior to calculating the planar position, several factors affecting the calculation results need to be addressed by preprocessing the images. This includes the following steps:
Demosaicing helps restore the details and clarity of the image, reducing errors caused by capturing moving objects [
31,
32].
Eliminating color crosstalk is performed to more effectively restore the true colors of the image, ensuring color accuracy and quality. The process of eliminating color crosstalk also involves separating the RGB channels of the motion images of the granular flow field [
33,
34].
Grayscale processing is employed to address the issue of slow computation speed due to the large amount of data involved in the calculation of the granular media three-dimensional velocity field.
Gaussian filtering is applied to remove noise introduced during the image acquisition process by the imaging equipment, transmission devices, and shooting environment.
The planar position calculation is a crucial step in the granular media three-dimensional velocity field measurement algorithm, which employs an optical flow estimation convolutional neural network (CNN)—LiteFlowNet.
To differentiate between different steps of the planar position calculation there are two distinct estimations performed. Different perspectives images (DPI) optical flow estimation is performed on images separated into different perspectives and captured by the camera at a specific moment. Different time images (DTI) optical flow estimation is performed on images separated into the same perspective and captured by the camera at different moments.
After obtaining the results of the planar position calculation, the three-dimensional position of the granular media in space is computed using the triangulation method combined with camera calibration parameters.
A target area containing granular media is selected for measured displacement averaging, and the mean errors of the measurement results are obtained by subtracting the actual applied displacement values. This step aims to evaluate the accuracy of the measurement results. Additionally, standard deviation (SD) detection is applied to the measurement results. Standard deviation (SD) is a measure of the spread of the data distribution, quantifying the extent to which the data values deviate from the arithmetic mean. A smaller standard deviation (SD) indicates less deviation of the data values from the mean.
Finally, the data results are exported, and CloudCompare (version: v2.12.4) and Tecplot 360 (version: 2022 R1) software are employed for visualizing the granular flow field, facilitating a more intuitive presentation of the research findings.
2.5. LiteFlowNet
LiteFlowNet is a lightweight optical flow estimation neural network model in deep learning, which provides efficient and accurate optical flow estimation. “Optical flow” is a technique in computer vision for computing pixel motion in a scene. It calculates the motion information of each pixel point in the scene by analyzing the correlation between pixels in two consecutive images [
19,
20]. The model is designed with lightweight and practicality in mind, enabling it to operate in resource-constrained environments.
The LiteFlowNet used in this paper is an alternative network proposed by Tak-Wai Hui et al. [
35], which serves as a replacement for FlowNet2. The structure of this network is shown in
Figure 6.
LiteFlowNet consists of two compact subnetworks: the encoder network (NetC) and the decoder network (NetE). NetC is used to extract pyramid features of images, converting the given image pairs into pyramids of two multi-scale high-dimensional features. NetE consists of cascaded flow inference and regularization modules, used for estimating coarse-to-fine optical flow fields. Below are the main steps for optical flow estimation implemented in LiteFlowNet.
2.5.1. Pyramid Feature Extraction
Pyramid feature extraction is a multi-scale feature extraction method used to capture image information at different scales in image processing and computer vision tasks. This approach is based on the concept of pyramids, where images are constructed into a series of images according to different resolution levels.
In pyramid feature extraction, Gaussian pyramids are commonly used to generate multi-scale representations of images. Gaussian pyramids are a series of images obtained by repeatedly smoothing and downsampling the original image, with each image layer having a lower resolution than the previous one. This multi-scale representation allows the system to analyze images at different scales, thereby improving the detection and recognition capabilities of objects or features at different scales.
Once the Gaussian pyramid is constructed, feature extraction algorithms can be applied at each scale. For example, local feature descriptors can be used at each scale to detect and describe feature points or regions in the image. These features can then be used for tasks such as object detection, object tracking, image registration, etc.
The NetC of LiteFlowNet has two streams, each acting as a feature descriptor, transforming images ( and ) into pyramids of multi-scale high-dimensional features {} and {}. The pyramid features in NetC are generated by stride-s convolutions with the reduction of spatial resolution by a factor of s on the pyramid.
2.5.2. Feature Warping
Feature warping is a method of feature transformation, and in the LiteFlowNet model, feature warping aims to reduce the feature space distance, thereby improving the accuracy and stability of optical flow estimation.
Specifically, by warping the high-level feature
towards the low-level feature
via feature warping (f-warp), the feature-space distance between
and
is reduced. The formula for this feature warping can be expressed as follows:
where
represents the optical flow estimation. In general, for any sub-pixel displacement
,
is warped towards
by f-warp, and the formula is as follows:
where
denotes the source coordinates in the input feature map
that defines the sample point,
denotes the target coordinates of the regular grid in the interpolated feature map
, and
denotes the four pixel neighbors of
.
Different from traditional f-warp, the f-warp of LiteFlowNet is performed on high-level CNN features rather than directly on images, making LiteFlowNet faster and more efficient in solving optical flow problems.
F-warp allows the LiteFlowNet model to infer residual flow between and with smaller flow magnitude, which enables accurate inference of the complete flow field. The f-warp operation is applied to each layer of the pyramid feature extraction and to each module (M:S) of cascaded flow inference, effectively improving the accuracy and stability of optical flow estimation.
2.5.3. Cascaded Flow Inference
Cascaded flow inference consists of the first flow inference (descriptor matching) and the second flow inference (sub-pixel refinement).
The first flow inference is conducted within the descriptor matching unit M, where the correlation between high-level feature vectors in individual pyramid features
and
is computed to establish point correspondences between images
and
. This is represented as follows:
where
c is the matching cost between points
X in
and points
X +
d in
,
is the displacement vector starting from point
X, and
N is the length of the feature vector. The cost
C is the sum of all matching costs. Then, the residual flow
is inferred by filtering the cost
C, and the complete flow field
is computed as follows:
where,
represents upsampling in spatial resolution, and
s is a scalar magnification factor. In the first flow inference, NetE reduces computational burden and improves network speed by performing short-range matching, f-warp, and matching at sampled positions only in pyramid levels of high spatial resolution.
Following the descriptor matching, the second flow inference is introduced to refine the pixel-level flow estimation
from the descriptor matching unit M to sub-pixel accuracy. After warping
to
by f-warp, the sub-pixel refinement unit S minimizes the feature-space distance between
and
by computing the residual flow
, resulting in a more accurate flow field
as follows:
2.5.4. Flow Regularization
In machine learning, regularization is primarily used to control the complexity of a model and prevent overfitting. Deep learning models typically have a large number of parameters, and when the model becomes too complex, it may perform well on the training data but poorly on the testing data, resulting in overfitting.
The flow regularization in LiteFlowNet is achieved through a feature-driven local convolution (f-lcon) layer at each pyramid level. This layer helps to smooth the flow field and maintain clear flow boundaries by using adaptive kernels based on pyramidal features, flow estimates, and occlusion probability maps. The f-lcon filters are specialized to handle smooth variations in flow vectors while avoiding oversmoothing across flow boundaries.
3. Measurement Accuracy Analysis
3.1. Displacement Measurement System
As shown in
Figure 7, the displacement measurement system has been modified from the existing measurement system by incorporating a high-precision XYZ three-axis translation stage, spherical granular media (as shown in
Figure 8), and a container for the granular media.
3.2. Displacement Measurement Process
Before starting the displacement measurement, adjust the camera parameters and lens to ensure clear capture of both the granular media and the checkerboard pattern. Utilize the Stereo Camera Calibrator toolbox in MATLAB to calibrate the camera, obtaining its intrinsic and extrinsic parameters as well as its relative spatial position.
Figure 9 shows the three perspectives obtained by separating the RGB channels of a checkerboard pattern image. As a warm-colored light source is used in this measurement, the R-channel perspective appears relatively dark overall, hence the G-channel and B-channel should be selected for the measurement.
After completing the above preparations, start the displacement measurement process. Place the granular media inside the container and position it on the high-precision XYZ three-axis translation stage with an accuracy of 0.01 mm. The measurement is performed in two phases: first, apply displacement only in the X-axis direction, and second, apply displacement only in the Z-axis direction. Since the use of optical flow estimation network requires consecutive images or sufficiently small motion between two images, the value of pixel displacement between two images must be controlled within the computable range of the model when measuring the actual flow field motion. Based on the optical flow estimation of consecutive-displacement images, the results indicate that LiteFlowNet can compute pixel displacements ranging from sub-pixel levels to approximately 70 pixels. Considering the actual displacement value represented by a single pixel, capture one picture for every 0.5 mm displacement applied. Each set of images should total only 5 mm of displacement, resulting in two sets of images to be captured. Subsequently, preprocess these images in MATLAB. First, perform demosaicing, eliminate color crosstalk, and separate the RGB channels of the images. Then, select the G-channel and B-channel for subsequent steps, and apply grayscale conversion and Gaussian filtering to the images.
Use LiteFlowNet to estimate the optical flow between images of different perspectives and different times to obtain the planar displacements of pixels caused by target movement. Then, apply triangulation method to compute the pixel positions in space, and further calculate the three-dimensional displacements.
To analyze the accuracy of the measurement, select a
pixel area containing the granular media from the images (as shown in
Figure 10). Average the displacements of all pixels in this area are and compare them with the actual displacement. Throughout the measurement process, no external loads or artificial disturbances should be applied to the granular media. Thus, the displacement measured for each translation should be kept consistent.
This analysis experiment was set up to validate the accuracy and stability of the system during the translation processes.
3.3. Result Analysis
It can be observed from
Figure 11a,b that the measured displacement values in the X and Z directions match exactly with the actual applied displacement values, while the displacements in the other two directions are close to zero, indicating no applied displacement in those directions.
To assess the accuracy of the measurement, the mean error for each displacement was calculated by subtracting the applied displacement values from the measured displacement values. The resulting mean errors were then plotted in
Figure 11c,d. It can be seen that the mean errors of the X direction displacement were all less than 0.02 mm, and the mean errors of the Z direction displacement were all less than 0.07 mm. In
Figure 11e,f, the standard deviation (SD) of pixel displacement values in each direction within the selected
pixel area is presented. It can be observed that the standard deviations (SDs) of the displacements in the Z direction are significantly larger than these in the X and Y directions in both sets of measurements. These differences in standard deviations (SDs) are not due to random errors but are highly correlated with the hardware settings and structural parameters of the measurement system. Firstly, since the observation points in this measurement are located above the target points, larger Z-direction errors may be introduced by the triangulation method. Secondly, issues related to lighting conditions caused by the use of a warm-colored light source in this measurement, and potential optical distortions, lens distortions, sensor noise, etc. in the measurement process of the camera sensor, could contribute to the larger standard deviations (SDs) in the Z direction.
Taking into account the minor systematic errors generated during the installation of the displacement measurement system and the random errors encountered during measurement process, these errors may lead to imperfect translation of the granular media in the X and Z directions. Considering all these factors, we can conclude that the established measurement system is capable of accurately measuring the spatial displacement of the granular flow field.