Preprocessing
The image pre-processing steps were carried out in order to remove noise that refers to unwanted interference or distortions in the images that can be caused by various factors, such as failures in the image acquisition equipment, adverse environmental conditions, or imperfections in the transmission or storage processes, resulting from the image acquisition stage. These steps consisted of converting the original image to the HSV (hue, saturation and value) color space. The HSV color space separates the color, saturation, and value of an image. The H-channel (hue) of the HSV image was isolated to preserve information related to hue (color), and filters were applied to smooth the image and remove noise, improving the quality and accuracy of the information.
For analysis purposes, the median and two-sided filter of the open-source library Open Source Computer Vision, developed by Intel Corporation, based in Santa Clara, California, USA, was applied. The evaluation of the performance of the filters took place through the application of the Structural Similarity Index Measure (SSIM), which is a metric used to evaluate the quality of an image, including the preservation of details, contrast and sharpness.
The SSIM ranges from −1 to 1. A value of +1 indicates that the two images, the auto-segmented and the reference-segmented image, are truly very similar, while a value of −1 indicates that the images are distinct [
18]. Thus, the higher the SSIM value, the more similar the image is to the reference or the original. This procedure followed the methodology adopted in the research of Kalaiyarasi et al. [
28] which analyzed the results of the median filter in medical images using SSIM. After the SSIM analysis, the type of filter with the highest value was selected.
To improve contrast, histogram equalization was applied, where a histogram represents the distribution of pixel intensities in an image, and equalization involves the redistribution of these intensities to achieve a more uniform distribution. This process is particularly useful for enhancing details in shadow areas and highlighting important features. By calculating the cumulative distribution function, histogram equalization transforms the image, increasing sharpness and improving contrast overall.
Segmentation
The segmentation of the region of interest was performed by two methods: the segmentation method proposed by Otsu [
29] and the color targeting method. In Otsu’s method, the filtered images were transformed into grayscale, with intensities ranging from 0 to 255, in which the threshold technique was applied, where an image threshold was automatically determined based on the histogram to eliminate the background of the images.
Erosion and dilation techniques were used to adjust the result of image segmentation, in which a morphological closure operator was used with a structuring element of the area of interest [
30]. The adjustments were developed by first applying erosion to reduce the segmented areas and remove small noise, followed by dilation to expand the segmented regions, filling any gaps and connecting adjacent areas, using a Colab notebook.
Such procedures were developed according to the proposal of Bareli [
31]. The resulting threshold was inverted so that the region of interest appeared white (255) and the background black (0). This process ensured that only the pixels associated with the targeted animals remained visible, while the background pixels were suppressed by assigning them a value of zero.
Then, the outlines of the binarized animals were identified in the image based on the application of OpenCV’s findContours algorithm and were drawn over the grayscale image.
After these steps, a blank mask was generated and a binarization was added with an automatically found threshold of 25, where the filtered contours were added, highlighting the areas of interest.
The color segmentation technique allowed the pigs represented by color to be separated from the hottest pixels in the image, using the HSV (hue, saturation, value) color space [
31]. Thus, the HSV space gathered the information regarding color (hue) in a single channel, to generate vectors containing the values referring to the lower limit (darker colors) and upper limit (white color).
For the segmentation by color, the proposal of Bareli [
31] that considers segmentation based on HSV space was used.
Table 1 shows the intervals for the colors yellow, blue, green, and red. For example, the yellow range spans the hue scale (H) from 10 to 50, representing the characteristic range of that color. The S-values are used to limit saturation, ranging from 100 to 255, while the V value is the luminance and defines the brightness range, ranging from 100 to 255.
After conversion to the HSV color space, masks were extracted, each designed to encompass a specific color range, with the exception of the cool shades, according to the scale of the images, which included the manual annotation of the minimum and maximum values of each one, providing the basis for the generation of the temperature range and the color palette of the ThermaCam software (Version 2.10) (
Table 2).
This process aimed to identify the colors of the warmest pixels in the image. The segmentation stage was carried out with the exclusion of all pixels that did not belong to the region of interest, because in this way all the cold pixels of the image were zeroed.
Subsequently, all individual masks were unified, thus highlighting the areas of interest in the image. In addition, a supplementary mask was generated for the contours of the animals and the treatment of possible failures, according to the procedure previously described.
To validate an image segmentation method, it is necessary to compare automatically segmented images with manually segmented images (reference). In this sense, specific reference segmentations were developed for this study, which were established as a standard. To perform the reference segmentations, the images were manually annotated on the Online Platform VGG Image Annotator (Version 2.0.11) delimiting the contour of the region of interest in a free way and with this, a file with the coordinates of the animals’ edges was obtained. Using the OpenCV library and language Python na IDE Pycharm (Version 2023.2.2), the masks were generated and converted into binary images. The region of interest and the background of the image were identified through the colors white and black, corresponding to pixel values of 255 and 0, respectively. Subsequently, an image was cropped using the contour mask and OpenCV binary operators (CV2.bitwise_and).
To evaluate the results of pig segmentation in the thermographic images, evaluation metrics based on the Jaccard similarity index (Equation (1)), the Dice coefficient (Equation (2)) and the precision (Equation (3)) were introduced, proposed by Zhang et al. [
32].
The Jaccard Index is a metric that quantifies the similarity between two sets and in image segmentation, it is used to evaluate how well the automatically segmented image overlaps with the manually segmented image. The closer the Jaccard Index value is to 1, the greater the overlap between the two images and the more accurate the segmentation.
The Dice coefficient is also a similarity metric used to assess the overlap or agreement between two sets. It ranges from 0 to 1, where 0 indicates no overlap and 1 indicates a full agreement overlap. The closer to 1, the better the algorithm’s performance against the referral targeting.
Accuracy was calculated by considering the proportion of pixels identified by the algorithm compared to the number of pixels found in manual targeting. Accuracies closer to 1 indicate the effectiveness of the segmentation method in identifying these elements, while accuracies farther away from 1 signal a lower performance in segmentation.
The metrics were evaluated considering true positives (PV), true negatives (VN), false positives (FP) and false negatives (FN). The PVs correspond to the pixels of the animals that have been precisely targeted by the targeting method, in accordance with manual targeting. The VN are the pixels that represent the background of the image.
FPs include pixels that were incorrectly included in the segmentation as part of the pig, but are not, contrary to manual segmentation. These are the pixels in the background of the image that were mistakenly considered to be part of the pig by the segmentation. Finally, the FN are the pixels of the pig contours and the pig area that were not identified by segmentation, in contrast to manual segmentation. They represent parts of the pig that the targeting method could not correctly identify.
Recognition and Interpretation—Classifier
For the classification of animals in situations of comfort or thermal discomfort, a Machine Learning method was used, the Support Vector Machine (SVM). According to Rodriguez et al. [
34] This method is used for pattern recognition in order to find decision limits that optimally separate classes, reducing classification errors. The SVM looks for a hyperplane or line that separates the data into the comfort and discomfort classes, so that they are as far away from each as possible. In accordance with Alfarzaeai et al. [
35], this model demonstrates good performance on small and medium-sized datasets.
Skin surface temperature data were used as input, automatically extracted from the thermal images in the previous step. The images were separated by treatment (air-conditioned and non-air-conditioned environment). From each thermal image, a total of 30 points were extracted, totaling 6780, and averages were made. Each image was given a label identifying the situation in which the animals were inserted, where animals in air-conditioned environments were given the label “comfort” (coded as 0), while those in non-air-conditioned environments were given the label “discomfort” (coded as 1), and these labels were recorded in an Excel spreadsheet.
The training was conducted on Google Colab, starting with the importation of the necessary libraries, such as pandas, scikit-learn, and matplotlib, which were used for data manipulation, model training, and performance evaluation.
To prepare the training data before modeling, the data were normalized so that the values were within the range 0 or 1 and the variables of different quantities had the same relevance to the algorithm. Normalization was performed according to Equation (8), in which Xmax and Xmin are the highest and lowest values of the variable, respectively:
To find the best combination of hyperparameters, a random search algorithm was implemented, which randomly selects values for the parameters of the SVM algorithm, such as kernel function, kernel scale, and C value, responsible for regulating the maximum penalty applied to observations that violate the margin. After the algorithm was executed, the most effective hyperparameters were identified and selected to proceed with the model training. Using the radial base kernel (RBF) functions, the training was conducted with the optimized model and predictions were performed on the test set.
In order to obtain a more robust evaluation of classifier performance, a 5-fold cross-validation was used. The dataset was randomly partitioned into two subsets, with 70% (42 samples) for training and 30% (18 samples) for prediction. This process was repeated in five iterations, each randomly selecting two different thirds of the data for training and the remaining third for prediction, in order to mitigate possible biases in the evaluation of the model.
The performance evaluation included metrics such as precision (Equation (3)) and accuracy (Equation (9)).
In addition, the AUC (Area Under the Curve) index or area under the curve was evaluated. This curve is the ROC (Receiver Operating Characteristic) and represents the rate of true positives in relation to the rate of false positives for different classification threshold values. The closer the AUC is to 1, the better the model performs in distinguishing between classes. If the AUC has a value below 0.5, the model performs similarly to chance.