1. Introduction
Prostate cancer (PCa) is the most frequently diagnosed cancer in men worldwide. In England, it ranks as the most common cancer among men and is the second leading cause of cancer-related deaths [
1]. In 2024 [
2] prostate cancer is projected to remain the most common type of cancer diagnosed in men and is also anticipated to be the second most frequent cause of cancer mortality. PCa diagnosis typically involves a combination of screening, histopathology, and medical imaging techniques.
The advancement of medical imaging techniques over the years has significantly enhanced the quality of PCa diagnoses. Ultrasound (US) and magnetic resonance imaging (MRI) are the primary imaging modalities used in PCa detection. Although MRI offers greater sensitivity than the US, it comes at a considerably higher cost and is not suitable for all patients, particularly those with pacemakers, ferromagnetic metals, or those who suffer from claustrophobia. B-mode ultrasound is the fundamental parameter for assessing lesions’ location, size, and shape. However, the identification of lesions in b-mode images depends on echogenicity, meaning that some PCa lesions may appear isoechoic, displaying a brightness similar to that of the prostate gland [
3]. Another ultrasound technique frequently employed in PCa diagnosis is shear-wave elastography (SWE). This quantitative method evaluates shear-wave velocity through the application of an acoustic radiation force impulse (ARFI) to the tissue, which estimates Young’s modulus of the tissue [
4]. The results are presented as a superimposed color map overlaid on each pixel of the grayscale ultrasound image [
5]. SWE demonstrates higher sensitivity and specificity in detecting PCa compared to other ultrasound modalities [
6]. However, it does face limitations in detection depth, which restricts its ability to measure deeper regions of the prostate [
7]. Additionally, the accuracy of SWE results can be adversely affected by the presence of prostate stones or calcifications [
8,
9].
Recently, artificial intelligence (AI) has been increasingly utilized for texture analysis and the development of machine learning (ML) techniques to enhance diagnostic accuracy. Machine learning is a subset of AI that enables computer systems to learn patterns from data and make predictions or decisions without explicit programming. ML algorithms are trained to differentiate between normal and malignant conditions based on provided data [
10]. In the realm of ultrasound imaging, the primary applications of machine learning include classification and computer-aided diagnosis, regression analysis, and tissue segmentation. Furthermore, ML is also employed in image registration and content retrieval [
11]. By leveraging mathematical models, ML enhances the ability to analyze complex imaging features, improving diagnostic precision and reducing human observer variability. Texture analysis serves as a classification and segmentation tool within machine learning, providing a quantitative assessment of pixel metrics that surpass human visual capabilities [
12]. The process of texture analysis involves several key steps: image acquisition, image segmentation, and feature extraction [
13].
Image acquisition is a critical stage in texture analysis, as it involves selecting the appropriate imaging modality and choosing images based on specific criteria that affect the quality and relevance of the extracted texture features. Image segmentation is the process of identifying the region of interest (ROI) in medical imaging. Selecting the ROI is a vital step that influences the quantitative collection of texture data and the results of machine learning predictions. Therefore, adhering to specific criteria when selecting the ROI for segmentation ensures targeted and meaningful analysis, thereby enhancing the robustness and reliability of the texture features extracted for machine learning applications.
Texture features can be categorized as either semantic or agnostic. Semantic features are linked to morphological aspects such as shape and size, while agnostic features pertain to intensity values, including minimum, maximum, and mean. Agnostic features are further divided into first-order such as mean, variance, skewness, kurtosis, and entropy and second-order, which include the gray-level co-occurrence matrix (GLCM), gray-level Run length matrix (GLRLM), gray-level size zone matrix (GLSZM), and gray-level distance zone matrix (GLDZM) [
12,
13,
14]. In the field of ultrasound and SWE, the application of texture analysis and machine learning has demonstrated promising results across various examinations. Morphological and first-order texture analysis features have been extracted from B-mode breast images to differentiate between different types of breast lesions effectively [
15].
Quantitative ultrasound spectral analysis of B-mode breast ultrasound images utilized GLCM texture features to distinguish between malignant and normal lesions, yielding statistically significant differences across several spectral parameters [
16]. Xiao et al. (2014) [
17] developed a reconstruction process for SWE ultrasound images of the breast, which was assessed for the quantity of texture features. Their findings indicated a high performance in differentiating between malignant and normal conditions.
Additionally, first-order and second-order texture features were extracted from breast B-mode and SWE ultrasound images, and no statistically significant differences among all features [
18]. For thyroid gland assessments, GLCM texture features derived from B-mode ultrasound images were compared with real-time elastography results [
19]. In this context, purified SWE ultrasound ROIs were generated by subtracting shear-wave pixels from the B-mode thyroid gland images, facilitating enhanced extraction of GLCM texture features. The results showed a pronounced efficacy of the purified SWE images in distinguishing malignant from normal lesions [
20]. Machine learning models, including logistic regression, naive Bayes, quadratic discriminant analysis, and support vector machines (SVM), have also been employed to differentiate between renal cell carcinoma and angiomyolipoma based on ultrasound shear-wave velocity [
21]. GLCM texture features from ultrasound images of salivary glands were evaluated using machine learning models such as K-nearest neighbors (KNN), naive Bayes, artificial neural networks (ANN), and SVM to categorize malignant and normal conditions [
22]. Prostate cancer prediction through machine learning models in ultrasound and SWE has been executed by utilizing elasticity measured in Kilopascal (kPa) as extracted features [
23]. Wildeboer et al. (2020) [
24] harnessed machine learning models utilizing radiomics features from ultrasound B-mode, SWE, and dynamic contrast-enhanced ultrasound to assess machine learning’s potential in this domain. In the study of Wang et al. (2022) [
25], machine learning models were evaluated based on radiomics features extracted from transrectal ultrasound video clips of prostate cancer.
B-mode ultrasound and SWE are commonly used imaging modalities for PCa detection, but they have notable limitations. SWE, for instance, is influenced by factors such as prostate gland enlargement, lesion depth, and machine dependency, which can affect its diagnostic performance. Additionally, conventional imaging may not fully capture the textural characteristics differentiating malignant from normal prostate tissue. To address these challenges, this study intends to evaluate quantitative texture features of normal and prostate cancer tissues as identified through ultrasound B-mode and SWE imaging with reconstructed images. By extracting these texture features, we will develop and assess machine learning models to predict and classify normal versus malignant prostate tissue to enhance non-invasive diagnostic accuracy.
3. Results
A study was conducted involving 62 patients diagnosed with prostate cancer, where six ROIs were extracted from both normal and malignant prostate tissue. This led to the collection of 50 images representing normal tissue and another 50 for malignant tissue. The general characteristics of the patient cohort are detailed in
Table 3. In the analysis of texture features, statistical evaluation via the
t-test revealed no significant differences between normal and malignant tissues in the Gray ROI. However, significant discrepancies were observed in the SWE ROIs, with a total of 17, 27, 41,26, and 37 features demonstrating statistically significant differences between normal and malignant cases when considering the original SWE ROI, PSWE ROI, GPSWE ROI, RI ROI, and GRRI ROI, respectively.
This investigation explores the correlation between Prostate-Specific Antigen (PSA) levels and texture features by employing one-way ANOVA. PSA levels were classified into three distinct clinical categories: normal (≤4 ng/mL), gray zone (4–8 ng/mL), and high risk (≥8 ng/mL) [
38]. The objective of the statistical analysis was to assess whether the distribution of texture feature values displayed significant differences across these PSA classifications when utilizing various imaging modalities.
The results demonstrated that in Gray images, only one feature—minimum intensity—was found to be statistically significant, with a p-value of 0.0408. In contrast, Original SWE images did not present any significant texture features out of the 94 analyzed. Within the PSWE images, two features were statistically significant: Entropy (p-value = 0.0320) and Low Gray Level Run Emphasis 90 (p-value = 0.0448). For GPSWE images, one feature, standard deviation of intensity, exhibited statistical significance (p-value = 0.0204). A further analysis of RI images revealed five significant features related to PSA levels, including Energy 0 and High Gray Level Zone Emphasis across four angles (0°, 45°, 90°, and 135°), with p-values ranging from 0.0149 to 0.0430. Lastly, GRRI images yielded the most significant findings, particularly about standard deviation intensity and features 50–53 (High Gray Level Run Emphasis) and features 81–84 (High Gray Level Zone Emphasis), all revealing p-values below 0.03. These results indicate that specific texture features—especially those associated with gray-level emphasis and run length patterns—exhibit significant variability with PSA levels, suggesting that their relevance is significantly influenced by the specific image reconstruction technique used.
Simultaneously, the relationship between the Gleason Score (GS) and radiomic features was assessed using a one-way ANOVA, aiming to identify which features demonstrate significant fluctuations across various GS categories, which are stratified into grades 6 to 10. The findings revealed variability in the number and types of significant features contingent on the image reconstruction method employed. For Gray images, no statistically significant features were identified. Conversely, Original SWE images showcased nine significant features, primarily linked to contrast, entropy, and zone-based metrics. PSWE images exhibited the highest number of significant features, totaling 23, while GPSWE images revealed 21. Both RI and GRRI images identified 19 significant features each
Table S1. Overall, these findings highlight a strong association between GS and a diverse range of radiomic features, particularly those associated with texture complexity and gray-level distribution, emphasizing notable disparities across different imaging modalities.
Tables S2–S4 summarize the features that exhibited significant differences between normal and malignant tissues across the original SWE ROI, pure SWE ROI, and GPSWE ROIs., respectively. Each table includes features that are classified under various classifications.
Tables S5 and S6 focus on the features that showed a notably high level of significance between normal and malignant tissues in the RI ROI and GRRI ROI, specifically highlighting features that were excluded from the GLSZM classification.
In the context of machine learning model development, the evaluation of model performance is crucial for ensuring reliability and accuracy. The cross-validation error and the results from LASSO regression are illustrated in
Figure 6. This figure depicts the LASSO regression coefficient paths corresponding to the selected features, with feature names associated with non-zero coefficients displayed alongside each coefficient path. The features included in the model were meticulously chosen based on cross-validation techniques aimed at minimizing the mean squared error, thereby identifying the most significant predictors for the model. The specific features selected from each ROI are detailed in
Table 4. Additionally, the evaluation results of the machine learning model are summarized in
Figure 7.
Table 5 includes key metrics such as Sensitivity, Specificity, and Accuracy. In comparing the various models, confusion matrices serve to illustrate the accuracy of each model in predicting both positive and negative cases of prostate tissue. Furthermore, the receiver operating characteristic (ROC) curves provide a visual representation of model performance, where higher AUC values are indicative of superior discriminatory capabilities, as shown in
Figure 7.
Original SWE: the original image of the SWE, PSWE: is the purified image of the original SWE, GPSWE: is the gray image of the PSWE. RI: is the reconstructed image of the original SWE. GRRI: is the gray image of the RI.
The predictions were applied to the same data used to extract quantitative features for true positives and true negatives. The performance metrics of the machine learning model in predicting normal and malignant prostate cancer cases for all images are presented in
Table 6, along with the ROC curve shown in
Figure 8.
Table 7 and
Table 8 show the performance metrics of the machine learning model in predicting normal and malignant prostate cancer cases for images with true positive and negative and false positive and negative, respectively.
4. Discussion
Our primary objective was to create machine learning models utilizing several reconstructed images of SWE from both prostate cancer and normal tissues. These models were designed to accurately predict the classification of normal and malignant tissues within SWE prostate imaging.
In this study, we assessed 94 features extracted from both normal and malignant prostate lesions using B-mode ultrasound and SWE. We successfully reconstructed ROIs from SWE images. The reconstruction of PSWE and RI ROIs was accomplished effectively, as evidenced by the distinct quantitative features obtained from each ROI. The differences in feature values among the original SWE, PSWE, and RI images further confirm the integrity and effectiveness of the reconstruction process. Moreover, the grayscale representations of the GPSWE and GRRI images exhibited clear variations compared to the original B-mode image, validating the successful transformation and extraction of unique quantitative and textural information. These findings underscore the potential of reconstructed ROIs to deliver complementary diagnostic insights that extend beyond traditional B-mode and SWE imaging techniques.
Despite the advanced capabilities of modern B-mode imaging, none of the features extracted from this method showed statistical significance in differentiating normal from malignant prostate lesions in our analysis. This outcome is consistent with [
39], which underscores the inherent limitations of B-mode ultrasound in distinguishing between normal and malignant prostate tissues. B-mode imaging primarily offers anatomical and structural insights, failing to capture the subtle tissue characteristic differences associated with malignancy. Prostate cancer is widely recognized for its heterogeneity, and the overlapping echotexture and grayscale intensity between normal and malignant lesions render differentiation particularly challenging [
40]. Additionally, factors such as glandular distortion, calcifications, and normal prostatic hyperplasia (BPH) further complicate the interpretation of b-mode ultrasound.
The differentiation between normal and malignant lesions is informed by several significant features identified in ultrasound shear-wave imaging. This imaging technique captures variations in tissue stiffness and spatial patterns, proving valuable for the differential diagnosis of lesions. Notably, GLCM features such as “Contrast” and “Homogeneity” are instrumental in illustrating the heterogeneity and uniformity of tissue stiffness. Higher contrast values often indicate malignant regions, thereby enhancing the diagnostic potential of this imaging modality [
18]. Features, including “High Gray Level Zone Emphasis” and “Long Run Emphasis”, play a crucial role in identifying extensive zones of high stiffness, which are reflective of pathological changes associated with malignancy. Intensity-based metrics, such as “Mean Intensity” and “Minimum Intensity”, further contribute to the assessment by quantifying overall stiffness; malignancies generally present with elevated mean stiffness levels compared to normal tissues. Together, these features leverage the color-coded shear-wave elasticity data to characterize the mechanical properties of prostate tissue, providing robust differentiation between normal and malignant lesions.
In contrast, features extracted from grayscale images provide a significantly larger dataset for analysis, presenting both advantages and challenges in distinguishing between normal and malignant lesions. Grayscale images typically capture more detailed variations in texture, highlighting finer nuances in tissue heterogeneity and intensity distribution. This results in a wider array of features, such as those obtained from GLCM, GLRLM, GLSZM, and pixel intensity metrics, which enhance the ability to differentiate subtle variations between tissue types [
41]. These additional features can bolster the model’s discriminatory power by offering a more comprehensive characterization of tissue stiffness and structure. However, the increased feature set introduces challenges, particularly in the development of machine learning models. With a larger number of features, there is a heightened risk of overfitting, especially when the training data are limited [
42].
It has been observed that the GLSZM features, which are sensitive to heterogeneity, show a lack of significance with the RI and GRRI. This indicates that the texture information essential for distinguishing between normal and malignant tissues may have been lost or diminished during the reconstruction process. Specifically, GLSZM features, which depend on identifying variations in the size and distribution of homogeneous regions, may not effectively capture subtle heterogeneities when the image has been excessively smoothed or homogenized. This absence of significant differentiation suggests that the reconstructed ROI may have become overly uniform or noisy, thereby obscuring the intricate textural patterns often characteristic of malignant tissues [
43]. These patterns, including irregular zone sizes and varying intensities, are vital for differentiating malignant lesions from normal ones. Consequently, the smoothing effects during reconstruction may have impaired the GLSZM’s ability to identify key pathological features, potentially explaining the non-significant findings in the analysis.
The machine learning models generally demonstrate strong performance across the five ROIs, with SVM, KNN, and NB achieving perfect results in the original SWE and PSWE ROI. This indicates that the features within this region are linearly separable, enabling the models to data file. However, this success may also raise concerns regarding overfitting, which should be assessed using an independent test set. This observation can be compared with studies [
23,
25], which typically indicate a more cautious approach when analyzing larger and more diverse patient populations. It is crucial to acknowledge that the number of patient data points is vital for achieving reliable and generalizable model effectiveness. Insufficient sample sizes can result in overly optimistic outcomes that may not be applicable in real-world clinical settings. For example, in our study, the logistic regression sensitivity and specificity for the original SWE were 0% and 100%, respectively, where it was 61.1% and 91.1%, respectively [
23]. Conversely, LR struggles in the original SWE, PSWE, and GPSWE ROI, likely due to its linear characteristics that fail to capture the non-linear relationships present in this area.
The results of the machine learning models reveal interesting performance trends across different image preprocessing techniques and categories of cases (true positive, true negative, false positive, and false negative). Gray Pure SWE and Gray Reconstructed images consistently outperformed other methods, achieving sensitivities, specificities, and accuracies of 71.6–98%, 73.1–96%, and 72.4–97%, respectively. These findings suggest that converting images to grayscale enhances texture analysis by capturing more discriminative features, leading to improved classification of normal and malignant tissues. Conversely, SWE images and reconstructed images in their raw forms demonstrated poor performance, with sensitivities, specificities, and accuracies below 40% in most cases. This discrepancy highlights the importance of preprocessing techniques in enhancing model performance.
When comparing ML and deep learning (DL), the key difference lies in how each approach handles feature extraction and learning from data. ML algorithms, such as SVM, KNN, and RF, depend on manually selected features. This reliance on feature engineering makes ML methods more interpretable and effective for smaller datasets.
In contrast, deep learning, particularly through convolutional neural networks (CNNs), automatically extracts hierarchical features from raw data. This capability often results in superior performance on complex image analysis tasks. However, deep learning requires large datasets and significant computational resources, making it more susceptible to overfitting when working with limited data [
44].
While machine learning remains a viable option in scenarios with constrained datasets, future research may explore hybrid approaches that combine ML feature extraction with DL architectures to enhance both performance and reliability.
In a recent application, deep learning was used to distinguish between prostate cancer and benign prostatic hyperplasia, utilizing a vast number of transrectal ultrasound (TRUS) images. The performance of CNNs in differentiating between benign and malignant prostate cancer was notably high [
45].
When analyzing subsets of data, such as only true positive/true negative cases or false positive/false negative cases, we observed further disparities. SWE and Pure SWE excelled in identifying false positive and false negative cases, achieving sensitivity and specificities as high as 100%. However, these models failed in the general classification tasks, with an accuracy of below 10% for true positive and true negative cases. Gray images, in contrast, performed exceptionally well for true positive/true negative cases but struggled with false positive/false negative cases, where sensitivities and specificities dropped to as low as 18%.
The observed trends might stem from the small dataset size, particularly the limited number of false positive and false negative cases. A small sample size can lead to biased learning and limited generalization capacity for the models, particularly in imbalanced or borderline scenarios. Future studies with larger datasets are needed to validate these findings and improve robustness. The machine learning classifiers developed in this study show promising results; however, several limitations may affect their overall performance. Firstly, the limited number of cases could impede the models’ ability to generalize effectively. Smaller datasets often result in overfitting, biased outcomes, and reduced stability, highlighting the need for larger and more diverse datasets to strengthen the robustness and reliability of the classifiers. Additionally, the data were obtained from the Axiplorer machine (Supersonic Imagine in Aix-en-Provence, France), which has limitations concerning heterogeneous lesions [
46]. Furthermore, the dataset is outdated, and newer machines offer superior image quality. Another limitation of this study is that the texture analysis and machine learning models were built from a high number of true positive and true negative images compared with false positive and false negative images. This selective method introduces bias, as the models are trained exclusively on instances with clear classifications. Consequently, they may encounter difficulties in accurately classifying cases that involve false positives or false negatives. This challenge is further intensified by the inherent limitations of SWE ultrasound, where image quality and feature representation may not always be adequate to address ambiguous or borderline cases. As a result, this may hinder the models’ ability to generalize to more complex real-world scenarios. Additionally, the performance of the developed classifiers was not compared with existing state-of-the-art methods, which limits the contextual evaluation of their effectiveness. Furthermore, the models were only tested on a single dataset and were not externally validated on independent datasets. This limits the generalizability of the findings and highlights the need for future studies to evaluate the models on diverse data sources.