1. Introduction
Defects, such as porosity, frequently occur in metal additively manufactured (AM) prints, adding additional costs to reprint or improve the parts. Such defects are typically only identified after the part is completed, leading to scrapped material and wasted production time at best or product failure in service if not detected. Identifying defect formation early in AM printing processes and taking preventative measures to stop them could be hugely beneficial in reducing costs and increasing quality in AM. This study focuses on a specific application in this area, applying machine learning (ML) to process data to identify regions of porosity in parts produced by laser powder bed fusion (L-PBF) metal AM parts.
Porosity is one of the most prevalent and damaging defects for AM processes, with pores acting as initiation sites for crack formation and growth and contributing to drastically reduced fatigue life, among other issues [
1,
2,
3]. There has therefore been significant research aimed at measuring and classifying porosity and other defects in AM parts and determining the raw materials properties and process parameters relating to porosity formation. Various approaches using machine learning with data from in-process sensors, cameras, process settings, or post-process analysis have been investigated for their potential to detect or predict problematic porosity [
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15]. Much of the focus has been on camera-based systems [
16] and some studies have combined several different sensors; for example, Petrich et al. [
17] combined signals from several different sensor types in one neural network and achieved an overall prediction accuracy of 98%, albeit from only a single part. Relatively few studies have focused on photodiode signals [
18,
19,
20,
21], like the present study. Photodiodes have some advantages in that they are relatively cheap and their data are relatively easy to process compared to cameras, for example. They are also included in some commercial printers, such as the Renishaw RenAM 500M (Wotton-under-Edge, UK) used here.
Distinct classes of pores have been identified based on their morphology and process of formation. Three important classes are lack of fusion (LOF), keyhole (KH), and gas porosity; each has different characteristics that allow them to be identified and result predominately from different material or process-related factors [
2,
22,
23,
24,
25,
26,
27,
28,
29,
30]. Example 3D renderings of KH and LOF pores from [
31] are shown in
Figure 1. Shrestha and Chou compared the characteristics of KH and LOF pores by looking at pore formation in cuboid parts printed with various energy densities. KH pores were found to have a near-spherical shape, while LOF pores were found to be elongated and to vary in size more [
26,
31].
Snell et al. concluded that LOF porosity is caused by a lack of input energy to the powder bed during the melting process [
27]. This lower energy input then fails to fully melt the powder, which in turn leaves voids in the resultant structure. They attempted two different pore classification methods: an unsupervised ML model and a defined limits method for 2D and 3D pore models. A K-clustering ML method was found to be most effective for 3D pore data, while the defined limits method was found to be more suitable for 2D data.
KH porosity is caused by an excess of input energy, causing excessive penetration into the metal powder and leaving a pore near the bottom of the melt pool [
27]. Bayat et al. [
24] worked on a combined numerical and experimental investigation to study the formation of KH-induced porosities in L-PBF processes. Their study found that the cause of these pores came from a chain of multiple physical phenomena due to the occurrence of local cold zones with higher surface tension and insignificant recoil pressure [
24].
While LOF and KH pores are mainly described as process-induced, the same cannot be said for gas porosity [
27]. Gas porosity is the creation of spherical pores that occur due to gas entrapped in the raw metal powder particles or trapped environmental inert gas during the melting process [
22]. Gas porosity is most commonly caused by the powder quality of the part or the presence of excessive gas density, making it quite difficult to reduce [
32].
Applying ML to AM requires the association of data from different stages of the production process, e.g., data from the part design model, process settings, in-process sensor readings, and post-production CT scanning. Voxelisation is one technique that has been widely used. Voxelisation is the process of discretisation used to divide up data into hexahedral block elements called voxels. It is characterized as converting a three-dimensional model described initially by the surface as a mesh or by volume information in the form of a solid object into 3D elements called voxels that are used to fit the shape of the model [
33]. Bacciaglia et al. conducted an in-depth study into the application of voxelisation in AM, finding that the voxel-based approach for storing localised model data was an efficient and discrete method [
34]. Each voxel then represents a data point that can be used to train an ML model.
Many studies have used X-ray computed tomography (CT) scanning of finished parts to identify and characterise pores [
35]. CT scanning allows a detailed 3D model of the part to be created, including internal defects. This allows one to have a complete picture of the porosity in the part, with information on the pore size, shape, volume, and distribution in the part readily available [
29]. This non-destructive technique allows one to view the pores in three dimensions and is able to uncover potential contamination in powder feedstock and in the finished part, thus revealing local defects that can hardly be detected by conventional means [
26]. CT scanning can only be applied after the part is finished so it cannot be used for in-process monitoring; however, it is excellent for providing the analysis of pores required to train ML models [
17].
The existing literature shows the clear potential for ML applied to AM quality improvements and also that there is much research still to be carried out. This paper presents an ML tool to identify porosity in metal laser-based AM, specifically L-PBF, taking input data from the part design, process parameters, toolpath, and in-process photodiode sensors. The discretisation of these photodiode signals using the voxelisation method has not previously been reported to our knowledge. CT scanning of manufactured parts was used to identify and classify porosity, to train and test the ML models. Two-step classification was applied based on the presence of pores and whether they were LOF or KH pores. Three different classifiers were evaluated and compared along with two approaches to voxelisation and two approaches to class-balancing.
2. Materials and Methods
The experimental work was divided into two main sections. The first was the manufacture of AM parts and the characterisation of their porosity, consisting of:
Laser powder bed fusion (L-PBF) printing of titanium pillars under three different conditions of inert gas flow rate, namely control (31 L/h), low (26 L/h) and high (36 L/h);
Micro-CT scanning to create 3D models of the parts and porosity;
Extraction of pore characteristics;
Classifications of pores by type (keyhole or lack-of-fusion) and size.
The
Section 2 combined data from in-process settings and measurements and post-process characterisation to create a single aligned dataset for the application of machine learning, consisting of:
Spatial alignment of in-process parameters and sensor readings with post-process porosity data from CT scans;
Spatial discretisation of the aligned dataset by voxelisation at three different voxel sizes, namely 0.5 mm, 1 mm, and 2 mm, using either a defect-centric (biased) or uniform discretisation (unbiased) approach;
Application of ML to classify voxels by presence, type, and size of porosity.
These steps were all carried out using MATLAB (release 2023a, The MathWorks, Inc., Natick, MA, USA).
2.1. Part Manufacturing
L-PBF part manufacturing was carried out using a Renishaw RenAM 500M equipped with a 500 W Yb:YAG (
Gaussian profile laser and InfiniAM Spectral in situ process monitoring system. The RenAM 500M is a point-based system, in that the laser fires for a given amount of time after which it switches off and moves a defined distance, known as the point distance, to the next point location before firing again for a given time, known as the exposure time. Titanium powder (Ti6Al4V extra low interstitials-0406 grade 23, from AP and C) was used for all parts. A schematic of the process is shown in
Figure 2.
Cylindrical pillars of a nominal diameter of 10 mm and height of 25 mm were produced (
Figure 3). Overall, 3 sets of 17 pillars each were produced. Each set was produced in a single print run with the pillars distributed across the build plate. Each print run used a different inert gas flow rate, namely control (31 L/h), low (26 L/h) and high (36 L/h). The control flow rate represents the manufacturer’s recommended settings and the low and high flow rates were chosen due to the expectation that they would produce variations in porosity of the parts. Other print settings were left at the manufacturer’s recommended settings and values recorded for laser power, point distance, hatch distance, and duration can be found in
Section 3. It was found that control and high flow rates produced very low levels of porosity, so in order to create a greater number of porous points in the dataset, more parts were printed at the low gas flow rate.
2.2. Process Data Collection
In-process data collection included process settings and sensor readings. The structure of the collected data can be seen in
Table 1. After aligning the machine data and porosity data from CT scans, the voxelisation was carried out and each voxel was labelled by porosity (Yes/No) and porosity type (LOF/KH). The positional data were used only for alignment and not as inputs to the machine learning models.
Sensor values were obtained from Renishaw RenAM 500M machine sensors, namely the LaserView module and MeltView module, at a frequency of 100 kHz [
36]. The LaserView module is designed to monitor the emissions from the 500W Ytterbium fibre laser. The module measures wavelengths ranging from 1050 nm to 1080 nm using a photodiode sensor. The MeltView module is designed to measure the emissions from the melt pool using two photodiode sensors with the MeltVIEWMeltPool sensor measuring the wavelengths ranging from 1090 nm to 1700 nm for the infrared spectrum and the MeltVIEWPlasma sensor measuring the emissions in the wavelength ranging from 700 nm to 1040 nm in the visible spectrum.
The sensor data were then augmented with process settings initially set by the process engineers operating the AM machine to manufacture the part. These settings were extracted from the computer-aided manufacturing (CAM) tool used to create the machine toolpath instructions for each layer. These values include the pre-set laser power, pulse duration, point distance, and hatch distance (
Table 1).
2.3. µCT Scanning
µCT scanning was carried out using the General Electric (GE) Phoenix V|tome|x M240 industrial scanner (Waygate Technologies, Lewistown, PA USA). A schematic is shown in
Figure 4. The scanning was performed on the test subjects at a resolution of 28 µm, with the voltage set at 130 kV and current at 200 µA with a resulting power of 26 Watts. The scan time was set at 10 min for a fast scan, generating around 3000 image slices per scan. The scan data acquisition was completed using the Phoenix datos|x software (version 2.4.0) [
37]. This software package is used to combine the image slices to form the volumetric model of the object.
2.4. Porosity Characteristics and Classification
The volumetric models produced by CT scanning were analysed for porosity characteristics using the metrology software package Volume Graphics StudioMax version 3.3 [
38]. This software package has a porosity-analysis software module specifically used to analyse and visualise internal defects from scans of manufactured objects. The pore analysis module presents the results in a tabulated spreadsheet with the pore characteristics including the location, volume, sphericity, compactness, and many more details of the pores. These traits were then utilised for pore classification.
The identification and classification of pores using CT and other post-manufacturing analytical methods has been studied extensively by other authors and has been covered in the literature review of this paper. This literature provides the basis for classifying porosity in the CT analysis to produce a dataset to train and test ML algorithms.
The procedure for classification was as follows:
A voxel was first classified as porous or non-porous depending on whether there were any pores whose centre of volume lay within it. The procedures for data alignment and voxelisation are described in
Section 2.5 and
Section 2.6;
Porous voxels were then classified as KH or LOF based on the average sphericity of the pores. Sphericity greater than 0.6 was considered KH and less than this was considered LOF. The sphericity was calculated according to Equation (2). This method has been corroborated in many research papers that were reviewed for this purpose [
25,
27,
39,
40].
The compactness (
C) and sphericity (
S), as calculated in Equations (1) and (2), have been previously used to define the difference between different types of pores (keyhole or LOF) [
27]. Equation (2) provides the sphericity formula by utilising the knowledge of the volume (
V) and surface area (
A) of the pore being analysed. These values are obtained from the StudioMax software package.
2.5. Data Alignment
To relate the porosity data from the CT volumetric models to the in-process data, the two datasets must be spatially aligned. Positional data are recorded in each dataset but they do not share a common reference frame. A bounding box alignment method was applied in this case. This approach focused on applying rotations to the part and then identifying the lowest bounding box around the rotating part, hence placing the part in the correct orientation. Translation could then be utilised to complete the alignment of the parts.
One limitation of this approach is that it is only applicable to asymmetric objects. If the object has symmetry, it is impossible to distinguish between orientations that are equivalent due to symmetry. In the case of the cylinders prepared here, it would not be possible to distinguish the top and bottom ends or the correct rotation of the cylinder axis. To overcome this, labels were printed on the top surface of the cylindrical pillars indicating the sample number and the printer x direction.
2.6. Discretisation through Voxelisation
Once the CT pore data and the part processing data were aligned, the dataset volume was discretised by voxelisation. Cubic voxels with sizes of 0.5 mm, 1.0 mm, and 2.0 mm were used. Three sizes were used to investigate the effect of voxel size on the accuracy of porosity prediction and to investigate the trade-off between computational processing time and spatial resolution. Smaller voxel sizes require more computational processing but provide a higher spatial resolution. Pores were included in a voxel if their centre of volume fell within the region, while a single value for each process parameter was calculated for each voxel as the arithmetic mean of all data points falling within the voxel.
Two variants of discretisation by voxelisation were applied. One approach carried out the discretisation centred around the pore centres and this approach is termed the defect-centric approach (DCA). In the other approach, discretisation was carried out uniformly on the part and this approach was termed the uniform-discretisation approach (UDA).
2.6.1. Defect-Centric Approach (DCA or Biased)
In this approach, after aligning the process monitoring data with the porosity data, cubic voxels were created with centres aligned with pore centres. These voxels were labelled as porous and the classification for type of porosity depended on the majority classification of pores contained withing the voxel. The process is illustrated in
Figure 5.
Once all porous voxels had been identified in this way, the remaining data were discretised using the uniform approach described in the following
Section 2.6.2.
2.6.2. Uniform Discretisation Approach (UDA or Unbiased)
In this approach, voxels were created based on a uniform grid of cubic voxels aligned with the extremities of the process data point cloud. Pores were then associated with the voxel in which their centre of volume lay. Any voxel with at least one pore was labelled as porous. The rest of the voxels are classified as non-porous. Pore type and size for the voxels were again based on the majority classification of pores in the voxel and process monitoring values based on the arithmetic mean of all data points in the voxel. The process is illustrated in
Figure 6.
2.7. Machine Learning
Machine learning models were trained and tested in MATLAB using models available in the Classification Learner toolbox. Three different models were included: k-Nearest Neighbours, Bagged Decision Trees, and Neural Network. Default ‘medium’ settings for the parameters, as defined in MATLAB (version r2023a), were used and are provided in
Table 2.
Two methods for class-balancing the training data were tested: the random undersampling (RUS) and the synthetic minority oversampling technique (SMOTE). RUS involves randomly removing data points from the majority class and SMOTE involves synthesising new data points of the minority class(es) using a weight combination of neighbouring data points. In each case, the classes were fully balanced, e.g., 50:50 porous:non-porous data points.
The complete dataset consisted of three separate builds of 17 cylindrical rods as described in
Section 2.1. A test dataset was withheld by selecting 3 out of 17 of the rods in each build, equating to approximately 18%. Only the uniform discretisation approach was used for test data; since the defect-centric approach depends on prior knowledge of pore locations that could not be available in practice, it was considered unsuitable for test data and likely to overestimate model accuracy. The remainder of the rods were used as the training dataset. Five-fold cross-validation was used to estimate validation accuracy.
Two feature sets were tested as inputs to machine learning models: one using only the data from the three machine sensors and one using both the three sensors and the four process settings. The classifications of porosity and porosity type were carried out independently. The overall classification accuracy was used as the measure to compare performance. While this measure has limitations, especially in biased datasets like we find here, class balancing was used to mitigate this.
4. Discussion
Considering all of the variables together, the uniform discretisation approach with 7 input features, synthetic minority oversampling, and a voxel size of 2.0 mm is best, with consistently reasonably high accuracy compared to other conditions across the classifier types. The choice of classifier is less clear and might be made based on considerations other than accuracy or after further testing. While accuracy should be the key criterion due to the high-value nature of L-PBF parts, computational efficiency should also be considered since the ultimate aim is to achieve in situ monitoring and the adjustment of process parameters to avoid defective parts.
The maximum achieved test accuracy of around 76% is unlikely to be sufficient for practical application but suggests that the voxelisation-based framework has potential with further refinement of the parameters and expansion of the training dataset. In the histograms of
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11 and
Figure 12, it is clear that there is significant overlap between the distributions of porous and non-porous data points for each of the features, indicating that no feature was a very strong predictor of porosity. Given that each data point represents the mean of the signal throughout a single voxel, it is possible that many voxels actually contain a combination of both porous and non-porous signals, which blurs the distinction between the two groups. However, this should then lead to higher accuracy at lower voxel size, which is not found here.
Another likely contributing factor is the inherent difficulty of predicting porosity from photodiode signals alone. The photodiodes measure the total intensity of certain wavelengths emitted from the region of the melt pool, so they may not detect smaller-scale variations within this region that would better indicate porosity.
4.1. Voxel Size
Voxel size has little impact on accuracy but this somewhat favours the 2.0 mm voxel size over 0.5 or 1.0 mm for a number of reasons. The largest voxel size correspondingly produced the smallest number of data points and therefore the smallest and fastest classifiers with the shortest training times. Metal additive manufacturing is a high-value application where the computing power required for any of these models could be justified, so none of the models should be excluded based on these criteria. However, this still represents a significant practical advantage of the larger voxel size. The downside of the larger voxel size is the reduced spatial resolution of porosity predictions; however, 2.0 mm should be sufficiently fine for many applications.
The choice of voxel size has a significant impact on the balance between porous and non-porous voxels in the dataset, which also has interactions with the discretisation approach and class-balancing approach. Pores represent a small percentage of the volume of a specimen and their distribution is fixed regardless of voxel size, so increasing the voxel size increases the probability that any given voxel will include a pore. This leads to the very biased non-porous to porous ratio seen in
Figure 7 and
Figure 8 compared to the more balanced ratio in
Figure 9. The skew towards non-porous regions is expected and relevant to real-world applications, since manufacturer-recommended process parameters were used, which are naturally chosen to minimise porosity. However, it presents a challenge to training classifiers and some approaches such as class-balancing must be used to avoid the classifier becoming very biased towards the majority class in the training data.
The relationship between class balance and voxel size for the defect-centric discretisation approach is very different since in this approach, there is one porous data point created for each pore, regardless of voxel size. In contrast, the number of non-porous data points increases significantly as voxel size decreases, so the dataset goes from non-porous-biased at 0.5 mm voxel size to porous-biased at 2.0 mm voxel size and is quite balanced at 1.0 mm voxel size.
Another advantage of the 2.0 mm voxel size is that it achieved higher accuracy than the smaller voxel sizes. While the magnitude of the advantage varied by classifier type and other factors, the high accuracy combined with the computational advantages at this voxel size is clearly desirable. A possible reason for the improved accuracy at this voxel size is that the mean values per voxel for each of the features are calculated using more individual data points, helping to filter out noise in the sensor signals. The significant overlap between porous and non-porous classes in the histograms of
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11 and
Figure 12 supports this.
4.2. Number of Features
The best results were obtained using both the process settings (laser power, point distance, hatch distance, and duration) and in-process photodiode features (LaserVIEW, MeltVIEWPlasma, and MeltVIEWMeltPool) as described in
Section 2.2 but in most cases, equivalent accuracy was obtained using only the three in-process sensor features. This is not surprising given that the process settings show very little variation compared to the sensor readings. The number of features had a relatively small impact on the computational characteristics of the models and all seven features would be readily available in-process, so using all seven features is preferred.
4.3. Discretisation Approach
The relative performance of the uniform and defect-centric discretisation approaches varied by voxel size and class-balancing approach but showed similar trends between classifiers. In many cases, the defect-centric approach performed better on validation data but worse on test data. One explanation is that relevant test data could only be generated using the uniform discretisation approach since the defect-centric approach requires prior knowledge of the locations of pores gathered from the post-manufacturing CT scans that would not be available in real-world applications. On the other hand, the uniform discretisation approach creates training and validation data in the same way that test data are created or that in-service data are collected.
However, at 0.5 mm voxel size, the defect-centric approach performed relatively well (
Figure 13), achieving higher accuracy on test data and smaller differences between test and validation accuracy. Typically, a bigger difference between validation and test accuracy suggests more overfitting of the model to the training data and therefore a reduction in the generalisation ability of the model.
At 2.0 mm voxel size, the uniform discretisation approach appears better (
Figure 15 and
Figure 18), with higher test accuracy and lower test/validation difference.
4.4. Class-Balancing Approach
The significance of the class-balancing approach depends on the degree of bias in the raw dataset. This, in turn, depends on the voxel size and discretisation approach as discussed in
Section 4.1 and
Section 4.3. Where the raw dataset is relatively balanced between classes, the class-balancing approach has less impact. In some cases, class balancing was not necessary but was applied for consistency.
At the best performing conditions, synthetic minority oversampling appears to produce small accuracy improvements compared to random undersampling. This was likely due to the much larger number of training data points available by this method; random undersampling discarded more than 90% of data points in some cases of highly biased data. However, this also led to much smaller models and faster training times for random undersampling in these cases. Given the typically high-value nature of metal additive manufacturing, it is suggested that the increased accuracy would be worth the additional computational expense of synthetic minority oversampling.
4.5. Classifier
There is no clear winner between the three classifier types tested here. While the neural network performs relatively poorly in most tests and requires very long training times, it is only 2–3 percentage points behind on test accuracy at the best performing conditions and it is easily best in terms of model size and prediction speed. Further optimisation of the model or expansion of the dataset could therefore alter the balance between classifiers.
The bagged trees model produces some very high results for validation accuracy using defect-centric discretisation but the test accuracy drops significantly, indicating overfitting on the training data and likely poor real-world performance. This model type also had the largest model size of up to 2 GB. The other two classifier types were only slightly lower on test accuracy, had smaller model sizes, and, in the case of the nearest neighbours, also had shorter training times. With further optimisation, the performance of any of the classifiers could be improved.
The three classifiers used here do not represent all possible options. However, they represent three fundamentally different approaches that are widely used and have significantly different strengths and weaknesses in terms of training requirements, speed, and model size. Given the relatively similar accuracies between these models, it is suggested that testing more model types is not the priority at this stage.
4.6. Porosity Type
Predicting the pore type in addition to the presence of porosity would be useful since different pore types should require different parameter adjustments to be eliminated. The bagged trees classifier performs better relative to the other classifiers when predicting pore type, as compared to predictions of pore presence, achieving a test accuracy of 73%. This is rather low to be useful in practice.
The pore types in the training dataset were labelled based only on sphericity, so it is possible that the labelling of the training data could be improved by considering more pore characteristics.
4.7. Limitations and Future Work
The purpose of this paper was to introduce a voxelisation-based framework for porosity prediction in metal additive manufacturing. It was not practical to attempt to conclusively cover all of the variables involved in the machine learning classification and future work could further optimise the classifiers or introduce more complex classifier architectures. Future work could also experiment with combinations of undersampling and oversampling as well as other preprocessing techniques.
It is likely that focussing on the input features during the voxelisation process is more important than the machine learning step and would be more likely to produce significant improvements. The results support this, in that none of the variables tested here showed a strong impact on accuracy, and with improved input features it is very likely that all of the tested models would be more accurate. The voxelisation approach only involved calculating the mean of each variable for each voxel and should be expanded in the future to include other features, such as standard deviation, range, etc., to produce a richer set of input features to train the machine learning models. The current work is a valuable step towards this, having developed a framework within which these tests can be carried out and have shown the viability and computational efficiency of the 2.0 mm voxel size.
The dataset was from a single production machine operating at manufacturer-recommended settings and from a small number of separate print runs with one fixed geometry. Models trained on this dataset may not generalise well to other geometries without further training; however, this was mitigated by choosing input features that were independent of the geometry or location within the print volume.
This geometric independence has further advantages for expanding the technique to other geometries. Only cylindrical shapes were tested here; however, AM is well known for its suitability for very complex shapes. The voxel-based approach used here should be very suitable.
The present work considers the binary presence or absence of porosity in a voxel and the classification of either lack-of-fusion or keyhole pore type. This is useful but predictions of porosity volume, number of pores, and more in-depth consideration of pore type will be the focus of future work. While being more useful, this may also help to increase classifier accuracy by creating better defined classes. For example, keyhole and lack-of-fusion pores are known to form under different process conditions, as discussed in the introductory section of this paper, so they might be better classified individually instead of under the single porous/non-porous classification used here. This was not tested here as the feature set should first be improved.