3. Results and Discussion
In the interest of developing a rapid analysis system, a new machine vision processing algorithm has been developed that was designed to provide a highly parallel approach to the cotton trash identification problem. As the cotton trash system in practice is, in many instances, a retrofit system that is placed into existing systems, it's been found that controlled lighting is less controlled than would be considered optimal for alternative algorithms such as a pre-computed lookup table based on a Bayesian Classifier approach, (
Pelletier 1999a,
b,
2003). The main problem with the Baye's Classifier approach is the need for pre-computation of the Bayesian statistics, typically provided in the form of presorted classes obtained by an expert or human classer. Thus, it's not practical to dynamically adjust the Bayesian statistics as the expert is not available for periodic recalculation of the Bayesian statistics. This requirement forces the Bayesian approach to demand that the system provide a stable environment where the image statistics are unchanging. However, in trial deployments into several commercial installations, it was found that the static-image-statistic's criteria, was not valid. In practice, the changing lighting conditions and system placement as a retrofit onto various types of machines, typically creates a wide variation in the image statistics for each member of the feature set {trash, background, lint}; primarily due to the fact that each member of the feature set moves in and out of full or partially lit areas or becomes alternatively and repeatedly immersed in lighting and then later in shadows. To compensate for the widely changing lighting environments, encountered in typical commercial installations, required an alternative image processing algorithm to overcome the difficulties of the varying statistics. The new developments that have been brought about by this research were also coupled with the additional goal of increasing the processing speed of the algorithm to achieve a robust system that would also be capable of performing real-time trash feedback control. In an effort to obtain higher processing speeds, the research developed the algorithm with a goal to obtain a highly parallel algorithm suitable for use on highly parallel vector processors.
The basic overview of the image processing algorithm,
figure 3, shows the steps required to process the image from raw color pixels into a set of statistics to inform the mechanical cleaning system of the quantity and type of trash; the basic information required by an optimal imaging/mechanical control system. The start of the image processing algorithm is to process each pixel, by analysis of the current target pixel against the target pixel's local neighboring pixels with the goal to determine or classify the target pixel into either lint or trash,
figure 4. Noting that the bulk of the time required for the image processing algorithm is tied to this first step of pixel identification, the focus of the new development was to optimize the processing of this stage of the algorithm.
The new algorithm, under investigation, was developed around a rapid single-pass Gaussian band-pass convolution kernel, “GBPCK”, that effectively partitions the color space such that a simple threshold operation following the GBPCK will allow for the generation of a binary image where each pixel is classed to be either a trash or lint pixel. In practice, the GBPCK was shown to be remarkably robust across a wide variety of lighting situations. The single-pass Gaussian band-pass convolution kernel, “GBPCK”, is implemented on a 7×7 finite impulse response, “FIR” two-dimensional convolution kernel or filter. As an FIR filter operates solely on the existing data and utilizes no feedback, the filtering operations operate solely on the incoming image without modifying it during the processing of the image. All results from the FIR filtering process are then stored in a second processed image. This non-feedback requirement of the FIR filter is a key enabling feature that allows for the implementation of the algorithm to be performed simultaneously on each pixel in parallel.
The basis for the Gaussian-normal band-pass filter is derived from the Gaussian low-pass filter, which is illustrated in
equation 1:
Where:
For illustration, we present a simplified form of the Gaussian band-pass filter that can be constructed from the difference of two Gaussian low-pass filters with differing extents, as shown in
equation 2:
Where:
r := distance from center of non-causal filter
r = √(x2 + y2)
ρ := spread of the Gaussian filter 1
σ := spread of the Gaussian filter 2
In practice the Gaussian band-pass filter was comprised of the sum of several Gaussian filters. By utilization of multiple-cascaded Gaussian filters, the shape of the Gaussian curve can be highly tuned for both extent and fall-off, allowing for optimum processing for the specific application. In order to optimize the calculation of the filter in real-time, the filter coefficients were pre-calculated. For the research subject under investigation, the discrete two-dimensional Gaussian band-pass filter was implemented from the consolidated cascade of multiple Gaussian-normal filters as detailed in
equation 3:
Where:
To gain insight into how the Gaussian band-pass filter, hereafter known as the GBPCK, is affecting the image, the frequency response of the discrete two-dimensional filter of
equation 3 was calculated using the discrete-Fourier-Transform to transfer from the discrete spatial position domain to the discrete frequency domain (
Strum and Kirk, 1988;
Jain, 1989; Porat, 1998) where the two-dimensional discrete-Fourier-Transform is illustrated in
equation 4:
Where:
The discrete frequency response of the filter, as calculated from
equation 4, via the fast Fourier-Transform (FFT) algorithm, is shown in
figure 5. For clarity, the one-dimensional cross-section of the filter is shown in
figure 6.
We note here that one of the key criteria's for online classing is the ability of the algorithm to separate cotton from trash in both lit and shadowed areas. One example of the performance of the GBPCK suitability for performing this task is detailed in
figure 7.
Once the development of a single pass filter was completed, the next task was to fine tune the implementation of the filter to effect the fastest processing on the given hardware. To gain insight into areas that would provide a meaningful speed-up, which was one of the primary goals of this research, it was also crucial to provide a baseline performance by which to judge the GPU approach. The algorithm was initially optimized for use on a Pentium 4 processor using the extended operation set for “Single Instruction for Multiple Data” or SIMD. The SIMD extension for the Pentium 4 provides a single vector processor that is capable of multiplying 4 single-precision floating point numbers in parallel. Performance of the algorithm after adjustment to take advantage of the Pentium's SIMD CPU chipset extensions as well as inline expanded and optimized C code, resulted in a processing time of 7.5 frames per second. This performance increase represents a significant speedup over the previous algorithm implementation of 2.5 frames per second. The next step in the development was to compare the optimized SIMD performance to the same algorithm running on an NVIDIA GeForce 8800 Ultra GPU graphics processing unit, housed on a pci-express bus card, where the code would then have the opportunity to take advantage of the GPU's 132 vector processors. We note here that while the GPU has 132 vector processors, each capable of multiplying 4 single precision floating point numbers in parallel, the core is only running at 500 MHz versus the Pentium's core at 3.0GHz. Given the speed disparity between the GPU processor to the Pentium core, one cannot expect a speedup of 132 times for the GPU over the CPU. Other potential problem areas for implementation on the GPU platform lies in the bottle necks that are created by pushing large amounts of image data across the pci-express bus into the GPU's video ram,
figure 8. In short, one may not even expect a 132-X times (500MHz/3000MHz) = 22-X gain from running on the GPU core versus normal operations that take place via computation on the CPU due to other hardware constraints.
In moving the algorithm off of the PC's traditional computing platform, the CPU, to the GPU; the developers of the modern GPUs have designed the system such that the CPU passes both an algorithm as well as the image and geometry data to the GPU. To effect efficient general computation on the GPU, the algorithm must be transformed to fit the highly specialized stream processing the GPU was designed to perform,
figure 8. As such only certain algorithms that are inherently parallel in nature, can be converted, however for those suitable algorithms, once the transformation is made, the massively parallel architecture of the modern GPUs becomes available which enables dramatic increases in performance over the traditional CPU style computations. To ease the transformation process, the graphic card developers have developed an augmented C programming language that allows one to specify how the image is to be broken into numerous sub-images that will all be computed concurrently as well as the algorithm each vector processor should execute. By combining both the GBPCK algorithm with the threshold and non-linear median-filter shot-noise reduction operation, to form a combined GBPCK-TM analyzer, it allowed the algorithm to be moved completely off of the CPU and onto the GPU. The new cotton trash analyzer processing program now has the ability to break the large image into numerous sub-images that can all be analyzed concurrently as detailed in
figure 9. Once the GPU analyzes the data per specification of the GBPCK-TM, it then transfers the fully analyzed binary image solution back to the CPU along with trash statistics such as trash content. In testing of the GBPCK-TM algorithm for cotton trash identification, we found the following:
Transitioning from the Bayesian Classifier to a single pass GBPCK, was enabling technology as commercial use dictated retro-fitting the system into on-line environments which resulted in wide fluctuations in the positioning of the lint bat, which in turn created large fluctuations in both the light intensity as well as large variations depth and quality of the shadowed lint. These lighting inconsistencies resulted in large degradations in the quality of the performance of the Bayesian classifier. When tested against these same lighting fluctuations, the GBPCK performance was significantly improved over the Bayesian Classifier.
In practice the Bayesian Classifier in practice required extensive on-site training. In contrast, the GBPCK provided a robust rapid startup as no on-site training was required and eliminated the need for ambient light shielding, thereby improving system's ability to be retrofit into a wider variety of machine designs as well as reducing manufacturing and installation system costs.
We note herein for those not familiar with cotton classification, that the standard by which all cotton lint is graded, is the human visual system. An elaborate process has been developed over the last century that enables for a stable transition from year to year of the cotton classing grade. Each year in Memphis Tennessee, a standard set of boxes, holding lint samples that are representative of the color and trash grade. Each classing office throughout the US utilizes a set of these boxes as the defacto standard for any lint sample that is in question as to the correct grade. As such, the human visual system is the standard which is judged against a contiuously changing historical standard. Given this standard, the best test for performance of the system is a visual analysis of the quality of the recognition, which can be best judged by examination of the included
figure 7. We also note that when the system was run against the official photograph prints of the cotton classing standards, used daily by USDA-AMS cotton classers, the system was able to predict the grade with a coefficient of determination of r
2=0.99. While it is also recognized that with the shadowing, the performance will be degraded, field tests indicated a level of performance comparable to the correlations achieved between human classers during periodic re-grading of the lint that occurs across the course of the season at the request of customers, typical r
2=0.80-0.85. In house testing suggests the performance with shadows is within ½ leaf grade, which is typically within 1 standard deviation of the natural variability of the lint in the ginning process.
By transitioning from the Bayesian Classifier to a single pass GBPCK, along with optimizations of the algorithm, improvements over processing speed was gained.
- –
Optimization of algorithm effected a speed up of 2.5 times (7.5 frames/sec).
By moving the code from the CPU to the GPU and utilizing the combined GBPCK-TM algorithm, further improvements were gained:
- –
When utilizing a single Nvidia GeForce 8800 GPU with 132 vector processors; a speed up of 20 times was gained (60 frames/sec) over the Bayesian Classifier and a speed up of eight was gained over running highly optimized code on the CPU.
At 60 Frames/second:
- –
71 sq-m.of cotton can be imaged/bale with the new GBPCK-TM algorithm running on the GPU.
versus 2.5 Frames/second, utilizing the previous Bayesian Classifier approach:
4. Conclusions
As the cotton ginning industry moves toward machines that have the ability to dynamically adjust the amount of cleaning the machine performs, a great deal of valuable lint is saved and there is a significant reduction in fiber damage as well. The missing element at this time is the ability of the sensors to determine the required amount of cleaning for the cotton as it is feed into the machine. This research has demonstrated that through the use of massively parallel processing, that is now possible on today's programmable GPUs, a machine vision algorithm suitable for real-time classing of cotton, can be processed in a significantly reduced time that is sufficient to open the door for the possibility of processing the trash content of the incoming lint in time to set the machine so that it cleans the cotton that was just analyzed. This just-in-time analysis provides the enabling technology with the capacity to allow for a system that is optimized to clean the cotton that is being fed into the machine at the precise optimal cleaning level. Once this transition from the current system, that looks at a sample of cotton taken either way before processing or after the cleaning has already taken place, moves toward one where the machine is cleaning the cotton that was analyzed as it is being fed into the machine; performance gains are expected to be upwards of 30% in reduction of valuable lint loss. This level of improvement can be expected due to the fact that today's systems use a very large dead-band to protect the users against both the inherent wide variability in the cotton lint's trash distribution, as well as the potential for changes to take place before the machine can react to the changing cotton trash levels. It is also expected that this technology will likely drive new machine designs that can not only optimize the cleaning across the entire width of the machine, which effectively cleans 100% of the cotton to remove the trash from the 4% of the cotton that actually have trash particles, towards a machine that only cleans the 4% of the cotton that actually contains the trash. Once this level of automation is reached, significant reductions in lint loss as well as subsequent reductions in fiber damage will also become possible as there should be an additional 96% reduction in lint loss and fiber damage. As the typical first stage lint cleaners generate upwards of 4.5-7 kg of lint loss, if 4.5 kg of lint per bale can be saved, this would represent a $100 million U.S. of added annual revenue to US cotton growers as well as a similar amount for the international growers.