MDPI - Publisher of Open Access Journals

14 pages, 566 KB

Open AccessArticle

Compositional and Bioactive Differentiation of Opuntia spp. Fruit Varieties by PCA and LDA

by Liliana Espírito Santo, Cláudia S. G. P. Pereira, Anabela S. G. Costa, Agostinho Almeida, João C. M. Barreira, Maria Beatriz P. P. Oliveira and Ana F. Vinha

Foods 2025, 14(18), 3170; https://doi.org/10.3390/foods14183170 - 11 Sep 2025

Viewed by 353

Abstract

The nutritional, mineral, and bioactive profiles of four Opuntia fruit varieties—Opuntia robusta red variety (OR-RV) and three Opuntia ficus-indica varieties (red, yellow, and green: OFI-RV, OFI-YV, and OFI-GV, respectively)—were characterized to assess their compositional diversity and potential discriminant markers. Standard analytical procedures [...] Read more.

The nutritional, mineral, and bioactive profiles of four Opuntia fruit varieties—Opuntia robusta red variety (OR-RV) and three Opuntia ficus-indica varieties (red, yellow, and green: OFI-RV, OFI-YV, and OFI-GV, respectively)—were characterized to assess their compositional diversity and potential discriminant markers. Standard analytical procedures were applied to determine proximate composition, individual sugars, fibre content, mineral concentration, and bioactive compounds, followed by antioxidant activity assays. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) were used to explore multivariate patterns and identify variables with the greatest discriminatory power. Results revealed significant inter-varietal differences across all measured parameters (p < 0.05). OR-RV displayed the highest non-fibre carbohydrate, protein, copper, and ascorbic acid contents, as well as superior antioxidant activity. OFI-GV stood out for its high soluble and insoluble fibre, magnesium, and strontium levels, while OFI-YV was characterized by elevated sodium and calcium, and OFI-RV by increased protein and glucose contents. LDA identified ascorbic acid, protein, and five mineral elements (Sr, Zn, Cu, Mn, B) as key discriminant variables, achieving 100% classification accuracy. These findings highlight compositional diversity among Opuntia varieties and support their differentiated use in food and health applications. Full article

(This article belongs to the Special Issue Advances in Fruit and Vegetable Quality, Bioactive Compounds, and Nutritional Value: 3rd Edition)

► Show Figures

Figure 1

26 pages, 1607 KB

Open AccessFeature PaperArticle

Analyzing Performance of Data Preprocessing Techniques on CPUs vs. GPUs with and Without the MapReduce Environment

by Sikha S. Bagui, Colin Eller, Rianna Armour, Shivani Singh, Subhash C. Bagui and Dustin Mink

Electronics 2025, 14(18), 3597; https://doi.org/10.3390/electronics14183597 - 10 Sep 2025

Viewed by 384

Abstract

Data preprocessing is usually necessary before running most machine learning classifiers. This work compares three different preprocessing techniques, minimal preprocessing, Principal Components Analysis (PCA), and Linear Discriminant Analysis (LDA). The efficiency of these three preprocessing techniques is measured using the Support Vector Machine [...] Read more.

Data preprocessing is usually necessary before running most machine learning classifiers. This work compares three different preprocessing techniques, minimal preprocessing, Principal Components Analysis (PCA), and Linear Discriminant Analysis (LDA). The efficiency of these three preprocessing techniques is measured using the Support Vector Machine (SVM) classifier. Efficiency is measured in terms of statistical metrics such as accuracy, precision, recall, the F-1 measure, and AUROC. The preprocessing times and the classifier run times are also compared using the three differently preprocessed datasets. Finally, a comparison of performance timings on CPUs vs. GPUs with and without the MapReduce environment is performed. Two newly created Zeek Connection Log datasets, collected using the Security Onion 2 network security monitor and labeled using the MITRE ATT&CK framework, UWF-ZeekData22 and UWF-ZeekDataFall22, are used for this work. Results from this work show that binomial LDA, on average, performs the best in terms of statistical measures as well as timings using GPUs or MapReduce GPUs. Full article

(This article belongs to the Special Issue Hardware Acceleration for Machine Learning)

► Show Figures

Figure 1

14 pages, 2649 KB

Open AccessArticle

The Classification of Synthetic- and Petroleum-Based Hydrocarbon Fluids Using Handheld Raman Spectroscopy

by Javier E. Hodges, Kailee Marchand, Geraldine Monjardez and Jorn Chi-Chung Yu

Chemosensors 2025, 13(9), 327; https://doi.org/10.3390/chemosensors13090327 - 2 Sep 2025

Viewed by 574

Abstract

Hydrocarbon fluids have a widespread presence in modern society due to their role in the global energy and fuel supply. The ability to distinguish between hydrocarbon fluids from different manufacturing processes is essential in industrial and government settings. Currently, performing such analyses is [...] Read more.

Hydrocarbon fluids have a widespread presence in modern society due to their role in the global energy and fuel supply. The ability to distinguish between hydrocarbon fluids from different manufacturing processes is essential in industrial and government settings. Currently, performing such analyses is expensive and time-consuming, as standard practice involves sending samples to a laboratory for gas chromatography-mass spectrometry (GC-MS) analysis. The inherent limitations of traditional separation techniques often make them unsuitable for the demands of real-time process monitoring and control. This work proposes the use of handheld Raman spectroscopy for rapid classification of petroleum- and synthetic-based hydrocarbon fluids. A total of 600 Raman spectra were collected from six different hydraulic fluids and analyzed. Preliminary visual observations revealed reproducible spectral differences between various types of hydraulic fluids. Principal component analysis (PCA) and linear discriminant analysis (LDA) were used to investigate the data further. The findings indicate that handheld Raman spectrometers are capable of detecting chemical features of hydrocarbon fluids, supporting the classification of their formulations. Full article

(This article belongs to the Special Issue Chemical Sensing and Analytical Methods for Forensic Applications)

► Show Figures

Figure 1

28 pages, 40313 KB

Open AccessArticle

Colorectal Cancer Detection Through Sweat Volatilome Using an Electronic Nose System and GC-MS Analysis

by Cristhian Manuel Durán Acevedo, Jeniffer Katerine Carrillo Gómez, Gustavo Adolfo Bautista Gómez, José Luis Carrero Carrero and Rogelio Flores Ramírez

Cancers 2025, 17(17), 2742; https://doi.org/10.3390/cancers17172742 - 23 Aug 2025

Viewed by 781

Abstract

Background: Colorectal cancer (CRC) remains one of the leading causes of cancer-related mortality worldwide, emphasizing the urgent need for early, non-invasive, and accessible diagnostic tools. This study aimed to evaluate the effectiveness of a microelectromechanical systems (MEMS)-based electronic nose (E-nose) in combination with [...] Read more.

Background: Colorectal cancer (CRC) remains one of the leading causes of cancer-related mortality worldwide, emphasizing the urgent need for early, non-invasive, and accessible diagnostic tools. This study aimed to evaluate the effectiveness of a microelectromechanical systems (MEMS)-based electronic nose (E-nose) in combination with gas chromatography–mass spectrometry (GC-MS) for CRC detection through sweat volatile organic compounds (VOCs). Methods: A total of 136 sweat samples were collected from 68 volunteer participants. Samples were processed using solid-phase microextraction (SPME) and analyzed by GC-MS, while a custom-designed E-nose system comprising 14 gas sensors captured real-time VOC profiles. Data were analyzed using multivariate statistical techniques, including PCA and PLS-DA, and classified with machine learning algorithms (LDA, LR, SVM, k-NN). Results: GC-MS analysis revealed statistically significant differences between CRC patients and healthy controls (COs). Cross-validation showed that the highest classification accuracy for GC-MS data was 81% with the k-NN classifier, whereas E-nose data achieved up to 97% accuracy using the LDA classifier. Conclusions: Sweat volatilome analysis, supported by advanced data processing and complementary use of E-nose technology and GC-MS, demonstrates strong potential as a reliable, non-invasive approach for early CRC detection. Full article

(This article belongs to the Section Methods and Technologies Development)

► Show Figures

Figure 1

20 pages, 3657 KB

Open AccessArticle

Bioaccumulation and Tolerance of Metals in Floristic Species of the High Andean Wetlands of the Ichubamba Yasepan Protected Area: Identification of Groups and Discriminant Markers

by Diego Francisco Cushquicullma-Colcha, María Verónica González-Cabrera, Cristian Santiago Tapia-Ramírez, Marcela Yolanda Brito-Mancero, Edmundo Danilo Guilcapi-Pacheco, Guicela Margoth Ati-Cutiupala, Pedro Vicente Vaca-Cárdenas, Eduardo Antonio Muñoz-Jácome and Maritza Lucía Vaca-Cárdenas

Sustainability 2025, 17(15), 6805; https://doi.org/10.3390/su17156805 - 26 Jul 2025

Viewed by 839

Abstract

The Ichubamba Yasepan wetlands, in the Andean páramos of Ecuador, suffer heavy metal contamination due to anthropogenic activities and volcanic ash from Sangay, impacting biodiversity and ecosystem services. This quasi-experimental study evaluated the bioaccumulation and tolerance of metals in high Andean species through [...] Read more.

The Ichubamba Yasepan wetlands, in the Andean páramos of Ecuador, suffer heavy metal contamination due to anthropogenic activities and volcanic ash from Sangay, impacting biodiversity and ecosystem services. This quasi-experimental study evaluated the bioaccumulation and tolerance of metals in high Andean species through stratified random sampling and linear transects in two altitudinal ranges. Concentrations of Cr, Pb, Hg, As, and Fe in water and the tissues of eight dominant plant species were analyzed using atomic absorption spectrophotometry, calculating bioaccumulation indices (BAIs) and applying principal component analysis (PCA), clustering, and linear discriminant analysis (LDA). Twenty-five species from 14 families were identified, predominantly Poaceae and Cyperaceae, with Calamagrostis intermedia as the most relevant (IVI = 12.74). The water exceeded regulatory limits for As, Cr, Fe, and Pb, indicating severe contamination. Carex bonplandii showed a high BAI for Cr (47.8), Taraxacum officinale and Plantago australis for Pb, and Lachemilla orbiculata for Hg, while Fe was widely accumulated. The LDA highlighted differences based on As and Pb, suggesting physiological adaptations. Pollution threatens biodiversity and human health, but C. bonplandii and L. orbiculata have phytoremediation potential. Full article

(This article belongs to the Special Issue Environmental Protection, Biodiversity Conservation and Sustainable Development)

► Show Figures

Figure 1

16 pages, 1808 KB

Open AccessArticle

Chemometric Classification of Feta Cheese Authenticity via ATR-FTIR Spectroscopy

by Lamprini Dimitriou, Michalis Koureas, Christos S. Pappas, Athanasios Manouras, Dimitrios Kantas and Eleni Malissiova

Appl. Sci. 2025, 15(15), 8272; https://doi.org/10.3390/app15158272 - 25 Jul 2025

Viewed by 623

Abstract

The authenticity of Protected Designation of Origin (PDO) Feta cheese is critical for consumer confidence and market integrity, particularly in light of widespread concerns over economically motivated adulteration. This study evaluated the potential of Attenuated Total Reflectance–Fourier Transform Infrared (ATR-FTIR) spectroscopy combined with [...] Read more.

The authenticity of Protected Designation of Origin (PDO) Feta cheese is critical for consumer confidence and market integrity, particularly in light of widespread concerns over economically motivated adulteration. This study evaluated the potential of Attenuated Total Reflectance–Fourier Transform Infrared (ATR-FTIR) spectroscopy combined with chemometric modeling to differentiate authentic Feta from non-Feta white brined cheeses. A total of 90 cheese samples, consisting of verified Feta and cow milk cheeses, were analyzed in both freeze-dried and fresh forms. Spectral data from raw, first derivative, and second derivative spectra were analyzed using principal component analysis–linear discriminant analysis (PCA-LDA) and Partial Least Squares Discriminant Analysis (PLS-DA) to distinguish authentic Feta from non-Feta cheese samples. Derivative processing significantly improved classification accuracy. All classification models performed relatively well, but the PLS-DA model applied to second derivative spectra of freeze-dried samples achieved the best results, with 95.8% accuracy, 100% sensitivity, and 90.9% specificity. The most consistently highlighted discriminatory regions across models included ~2920 cm⁻¹ (C–H stretching in lipids), ~1650 cm⁻¹ (Amide I band, corresponding to C=O stretching in proteins), and the 1300–900 cm⁻¹ range, which is associated with carbohydrate-related bands. These findings support ATR-FTIR spectroscopy as a rapid, non-destructive tool for routine Feta authentication. The approach offers promise for enhancing traceability and quality assurance in high-value dairy products. Full article

► Show Figures

Figure 1

18 pages, 2273 KB

Open AccessArticle

Integrating Near-Infrared Spectroscopy and Proteomics for Semen Quality Biosensing

by Notsile H. Dlamini, Mariana Santos-Rivera, Carrie K. Vance-Kouba, Olga Pechanova, Tibor Pechan and Jean M. Feugang

Biosensors 2025, 15(7), 456; https://doi.org/10.3390/bios15070456 - 15 Jul 2025

Viewed by 666

Abstract

Artificial insemination (AI) is a key breeding technique in the swine industry; however, the lack of reliable biomarkers for semen quality limits its effectiveness. Seminal plasma (SP) contains extracellular vesicles (EVs) that present a promising, non-invasive biomarker for semen quality. This study explores [...] Read more.

Artificial insemination (AI) is a key breeding technique in the swine industry; however, the lack of reliable biomarkers for semen quality limits its effectiveness. Seminal plasma (SP) contains extracellular vesicles (EVs) that present a promising, non-invasive biomarker for semen quality. This study explores the biochemical profiles of boar SP to assess semen quality through near-infrared spectroscopy (NIRS) and proteomics of SP-EVs. Fresh semen from mature Duroc boars was evaluated based on sperm motility, classifying samples as Passed (≥70%) or Failed (<70%). NIRS analysis identified distinct variations in water structures at specific wavelengths (C1, C5, C12 nm), achieving high accuracy (92.2%), sensitivity (94.2%), and specificity (90.3%) through PCA-LDA. Proteomic analysis of SP-EVs revealed 218 proteins in Passed and 238 in Failed samples. Nexin-1 and seminal plasma protein pB1 were upregulated in Passed samples, while LGALS3BP was downregulated. The functional analysis highlighted pathways associated with single fertilization, filament organization, and glutathione metabolism in Passed samples. Integrating NIRS with SP-EV proteomics provides a robust approach to non-invasive assessment of semen quality. These findings suggest that SP-EVs could serve as effective biosensors for rapid semen quality assessment, enabling better boar semen selection and enhancing AI practices in swine breeding. Full article

(This article belongs to the Section Optical and Photonic Biosensors)

► Show Figures

Figure 1

20 pages, 5652 KB

Open AccessArticle

Capacitive Sensing of Solid Debris in Used Lubricant of Transmission System: Multivariate Statistics Classification Approach

by Surapol Raadnui and Sontinan Intasonti

Lubricants 2025, 13(7), 304; https://doi.org/10.3390/lubricants13070304 - 14 Jul 2025

Viewed by 584

Abstract

The quantification of solid debris in used lubricating oil is essential for assessing transmission system wear and optimizing maintenance strategies. This study introduces a low-cost capacitive proximity sensor for monitoring total solid particle contamination in lubricants, with a focus on ferrous (Fe), non-ferrous [...] Read more.

The quantification of solid debris in used lubricating oil is essential for assessing transmission system wear and optimizing maintenance strategies. This study introduces a low-cost capacitive proximity sensor for monitoring total solid particle contamination in lubricants, with a focus on ferrous (Fe), non-ferrous (Al), and non-metallic (SiO₂) debris. Controlled tests were performed using five mixing ratios of large-to-small particles (100:0, 75:25, 50:50, 25:75, and 0:100) at a fixed debris mass of 0.5 g per 25 mL of SAE 85W-140 automotive gear oil. Cubic regression analysis yielded high predictive accuracy, with average R² values of 0.994 for Fe, 0.943 for Al, and 0.992 for SiO₂. Further dimensionality reduction using Principal Component Analysis (PCA), along with Linear Discriminant Analysis (LDA) of multivariate statistical analysis, effectively classifies debris types and enhances interpretability. These results demonstrate the potential of capacitive sensing as an offline, non-invasive alternative to traditional techniques for wear debris monitoring in transmission systems. These results confirm the potential of capacitive sensing, supported by statistical modeling, as a non-invasive, cost-effective technique for offline classification and monitoring of wear debris in transmission systems. Full article

(This article belongs to the Special Issue Tribological Research on Transmission Systems)

► Show Figures

Figure 1

14 pages, 2434 KB

Open AccessArticle

Surface-Enhanced Raman Spectroscopy (SERS) Method for Rapid Detection of Neomycin and Chloramphenicol Residues in Chicken Meat

by Yan Wu, Junshi Huang, Ni Tong, Qi Chen, Fang Peng, Muhua Liu, Jinhui Zhao and Shuanggen Huang

Sensors 2025, 25(13), 3920; https://doi.org/10.3390/s25133920 - 24 Jun 2025

Viewed by 569

Abstract

In the process of chicken breeding, there has been a great deal of abuse of antibiotics. Antibiotics can enter the human body along with the chicken meat, comprising a possible risk to human health. In this paper, principal component analysis (PCA)–linear discriminant analysis [...] Read more.

In the process of chicken breeding, there has been a great deal of abuse of antibiotics. Antibiotics can enter the human body along with the chicken meat, comprising a possible risk to human health. In this paper, principal component analysis (PCA)–linear discriminant analysis (LDA) was chosen to classify neomycin (NEO) and chloramphenicol (CAP) residues in chicken meat. A total of 400 chicken meat samples were used for the classification, of which 268 samples and 132 samples were used as the training sets and the test sets, respectively. The experimental condition of SERS spectrum collection was optimized, including the use of a gold colloid and active agent, and an improvement in the adsorption time. The optimal measurement conditions for the SERS spectra were an adsorption time of 4 min and the use of a 14th-generation gold colloid as the enhanced substrate without a surfactant. For three groups of different spectral preprocessing methods, the classification accuracies of PCA-LDA models for test sets were 78.79% for baseline correction, 84.85% for the second derivative and 100% for the second derivative combined with baseline correction. LDA was used to establish a classification model to realize the quick determination of NEO and CAP residues in chicken meat by SERS. The results showed that the characteristic peaks at 546 and 666 cm⁻¹ could be used to distinguish NEO and CAP residues in chicken meat. The classification model based on PCA-LDA had higher classification accuracy, sensitivity and specificity using a second derivative combined with baseline correction as the spectral preprocessing method, which shows that the SERS method based on PCA-LDA could be used to perform the classification of NEO and CAP residues in chicken meat quickly and effectively. It also verified the feasibility of PCA-LDA to effectively classify chicken meat samples into four types. This research method could provide a reference for the measurement of such antibiotic residues in chicken meat in the future. Full article

(This article belongs to the Special Issue Infrared and Raman Spectral Sensing for Food and Industrial Applications)

► Show Figures

Figure 1

17 pages, 1610 KB

Open AccessArticle

Enhancing Coffee Quality and Traceability: Chemometric Modeling for Post-Harvest Processing Classification Using Near-Infrared Spectroscopy

by Mariana Santos-Rivera, Lakshmanan Viswanathan and Faris Sheibani

Spectrosc. J. 2025, 3(2), 20; https://doi.org/10.3390/spectroscj3020020 - 19 Jun 2025

Viewed by 986

Abstract

Post-harvest processing (PHP) is a key determinant of coffee quality, flavor profile, and market classification, yet verifying PHP claims remains a significant challenge in the specialty coffee industry. This study introduces near-infrared spectroscopy (NIRS) coupled with chemometrics as a rapid, non-destructive approach to [...] Read more.

Post-harvest processing (PHP) is a key determinant of coffee quality, flavor profile, and market classification, yet verifying PHP claims remains a significant challenge in the specialty coffee industry. This study introduces near-infrared spectroscopy (NIRS) coupled with chemometrics as a rapid, non-destructive approach to classify green coffee beans based on PHP. For the first time, seven distinct PHP categories—Alchemy, Anaerobic Processing (Deep Fermentation), Dry-Hulled, Honey, Natural, Washed, and Wet-Hulled—were discriminated using NIRS, encompassing 20 different processing protocols under varying environmental and fermentation conditions. The NIR spectra (350–2500 nm) of 524 green Arabica coffee samples were analyzed using PCA-LDA models (750–2450 nm), achieving classification accuracies up to 100% for underrepresented categories and strong performance (91–95%) for dominant PHP groups in an independent test set. These results demonstrate that NIRS can detect subtle chemical signatures associated with diverse PHP techniques, offering a scalable tool for quality assurance, fraud prevention, and traceability in global coffee supply chains. While limited sample sizes for some PHP categories may influence model generalization, this study lays the foundation for future work involving broader datasets and integration with digital traceability systems. The approach has direct implications for producers, traders, and certifying bodies seeking reliable, real-time PHP verification. Full article

(This article belongs to the Special Issue Feature Papers in Spectroscopy Journal)

► Show Figures

Figure 1

39 pages, 1478 KB

Open AccessArticle

Chemical Profiles of the Volatilome and Fatty Acids of “Suero Costeño” (Fermented Cream)/Raw Milk from Colombia: Promising Criteria for the Autochthonous-Regional Product Identity Designation

by Amner Muñoz-Acevedo, Osnaider J. Castillo, Clara Gutiérrez-Castañeda, Mónica Simanca-Sotelo, Beatriz Álvarez-Badel, Alba Durango-Villadiego, Margarita Arteaga-Márquez, Claudia De Paula, Yenis Pastrana-Puche, Ricardo Andrade-Pizarro, Ilba Burbano-Caicedo and Rubén Godoy

Molecules 2025, 30(12), 2524; https://doi.org/10.3390/molecules30122524 - 9 Jun 2025

Viewed by 837

Abstract

A traditional dairy product from northern Colombia is suero costeño (SC), typically handmade through artisanal processes involving the natural fermentation of raw cow’s milk (RM); it is characterized by a creamy texture and a distinctive sensory profile, with a sour/salty taste and rancid [...] Read more.

A traditional dairy product from northern Colombia is suero costeño (SC), typically handmade through artisanal processes involving the natural fermentation of raw cow’s milk (RM); it is characterized by a creamy texture and a distinctive sensory profile, with a sour/salty taste and rancid odor. This study aimed to determine the chemical identity (using GC-FID/MSD) of SC and RM samples (from eight locations in the department of Córdoba-Colombia) by analyzing volatile components (trapped by HS-SPME and SDE) and fatty acid content. Consequently, the most notable results were as follows: (a) myristic (7–12%), stearic (12–17%), oleic (13–23%), and palmitic (21–29%) acids were the most abundant constituents [without significant differences among them (p > 0.05)] in both RM and SC fats; these were also expressed as polyunsaturated (2–5%), monounsaturated (26–36%), saturated (59–69%), omega-9 (19–30%), omega-6 (0.5–1.6%), and omega-3 (0.2–1.2%) fatty acids; (b) differences in the composition (p < 0.05) of the volatile fractions were distinguished between RM and SC samples; likewise, the SC samples differed (from each other) in their volatile composition due to the preparation processes applied (processes with raw milk and natural fermentation had less variability); nonetheless, it was possible to determine the volatilome for the artisanal product; and (c) the major components responsible for the chemical identity of SC were ethyl esters (of linear saturated and unsaturated acids, short/medium chains), aliphatic alcohols (linear/branched, short/long chains), aliphatic aldehydes (long chains, >C₁₄), alkyl methyl ketones (long chains, >C₁₁), sesquiterpenes (caryophyllane/humulane types), monoterpenes (mono/bi-cyclics), short-chain fatty acids, and aromatic alcohol/acid, among others. Full article

(This article belongs to the Special Issue Research on Bioactive Compounds in Milk)

► Show Figures

Graphical abstract

19 pages, 3372 KB

Open AccessArticle

iDNS3IP: Identification and Characterization of HCV NS3 Protease Inhibitory Peptides

by Hui-Ju Kao, Tzu-Hsiang Weng, Chia-Hung Chen, Chen-Lin Yu, Yu-Chi Chen, Chen-Chen Huang, Kai-Yao Huang and Shun-Long Weng

Int. J. Mol. Sci. 2025, 26(11), 5356; https://doi.org/10.3390/ijms26115356 - 3 Jun 2025

Viewed by 744

Abstract

Hepatitis C virus (HCV) infection remains a significant global health burden, driven by the emergence of drug-resistant strains and the limited efficacy of current antiviral therapies. A promising strategy for therapeutic intervention involves targeting the NS3 protease, a viral enzyme essential for replication. [...] Read more.

Hepatitis C virus (HCV) infection remains a significant global health burden, driven by the emergence of drug-resistant strains and the limited efficacy of current antiviral therapies. A promising strategy for therapeutic intervention involves targeting the NS3 protease, a viral enzyme essential for replication. In this study, we present the first computational model specifically designed to identify NS3 protease inhibitory peptides (NS3IPs). Using amino acid composition (AAC) and K-spaced amino acid pair composition (CKSAAP) features, we developed machine learning classifiers based on support vector machine (SVM) and random forest (RF), achieving accuracies of 98.85% and 97.83%, respectively, validated through 5-fold cross-validation and independent testing. To support the accessibility of the strategy, we implemented a web-based tool, iDNS3IP, which enables real-time prediction of NS3IPs. In addition, we performed feature space analyses using PCA, t-SNE, and LDA based on AAindex descriptors. The resulting visualizations showed a distinguishable clustering between NS3IPs and non-inhibitory peptides, suggesting that inhibitory activity may correlate with characteristic physicochemical patterns. This study provides a reliable and interpretable platform to assist in the discovery of therapeutic peptides and supports continued research into peptide-based antiviral strategies for drug-resistant HCV. To enhance its flexibility, the iDNS3IP web tool also incorporates a BLAST-based similarity search function, enabling users to evaluate inhibitory candidates from both predictive and homology-based perspectives. Full article

(This article belongs to the Section Molecular Informatics)

► Show Figures

Figure 1

18 pages, 4882 KB

Open AccessArticle

Identifying the Geographical Origin of Wolfberry Using Near-Infrared Spectroscopy and Stacking-Orthogonal Linear Discriminant Analysis

by Shijie Song, Xiaohong Wu, Mingyu Li and Bin Wu

Foods 2025, 14(10), 1684; https://doi.org/10.3390/foods14101684 - 9 May 2025

Viewed by 539

Abstract

The geographical origin identification of wolfberry is key to ensuring its medicinal and edible quality. To accurately identify the geographical origin, the Stacking-Orthogonal Linear Discriminant Analysis (OLDA) algorithm was proposed by combining OLDA with the Stacking ensemble learning framework. In this study, Savitzky–Golay [...] Read more.

The geographical origin identification of wolfberry is key to ensuring its medicinal and edible quality. To accurately identify the geographical origin, the Stacking-Orthogonal Linear Discriminant Analysis (OLDA) algorithm was proposed by combining OLDA with the Stacking ensemble learning framework. In this study, Savitzky–Golay (SG) + Multiplicative Scatter Correction (MSC) served as the optimal preprocessing method. Four classifiers—K-Nearest Neighbors (KNN), Decision Tree, Support Vector Machine (SVM), and Naive Bayes—were used to explore 12 stacked combinations on 400 samples from five regions in Gansu: Zhangye, Yumen, Wuwei, Baiyin, and Dunhuang. When Principal Component Analysis (PCA), PCA + Linear Discriminant Analysis (LDA), and OLDA were used for feature extraction, Stacking-OLDA achieved the highest average identification accuracy of 99%. The overall accuracy of stacked combinations was generally higher than that of single-classifier models. This study also assessed the role of different classifiers in different combinations, finding that Stacking-OLDA combined with KNN as the meta-classifier achieved the highest accuracy. Experimental results demonstrate that Stacking-OLDA has excellent classification performance, providing an effective approach for the accurate classification of wolfberry origins and offering an innovative solution for quality control in the food industry. Full article

(This article belongs to the Section Food Quality and Safety)

► Show Figures

Figure 1

12 pages, 2523 KB

Open AccessArticle

Classifying Raman Spectra of Colon Cells by Principal Component Analysis—Linear Discriminant Analysis and Partial Least Squares—Linear Discriminant Analysis Methods

by Maria Lasalvia, Vito Capozzi and Giuseppe Perna

Appl. Sci. 2025, 15(8), 4193; https://doi.org/10.3390/app15084193 - 10 Apr 2025

Viewed by 644

Abstract

Colorectal cancer is one of the most commonly diagnosed cancers in developed countries. Although the gold-standard diagnosis technique is the histological analysis of colon biopsies, it is important to investigate different diagnostic tools because the microscope examination of stained tissues provides indications partially [...] Read more.

Colorectal cancer is one of the most commonly diagnosed cancers in developed countries. Although the gold-standard diagnosis technique is the histological analysis of colon biopsies, it is important to investigate different diagnostic tools because the microscope examination of stained tissues provides indications partially depending on the experience of the pathologist. This study reports a Raman-spectroscopy-based analysis of healthy and cancerous colon cells to detect biochemical differences at the subcellular level and discriminate the former from the latter. FHC and CaCo-2 cell lines were used to model healthy and cancerous cells, respectively. The comparison of the Raman spectra measured inside subcellular volumes including the nucleus (nucleus spectra) and excluding it (cytoplasm spectra), as well as principal component analysis and partial least squares analysis of these spectra, suggest that the differences between the spectra of healthy and cancerous cells are very small, and they mainly involve the different relative content of lipids and nucleic acid components. The relative intensity of lipid peaks is higher in the Raman spectra of healthy samples, while nucleic acid peaks show higher relative intensity in the spectra of cancer cells. Linear discriminant analysis of a few principal components and partial least squares components was used to estimate the classification accuracy of a set of Raman spectra measured inside nucleus and cytoplasm. Both methods are able to classify unknown cells with excellent accuracy (100% and 96%, respectively). The findings of this study confirm the general applicability of subcellular Raman analysis in clinical practice for diagnosis of cytological samples. Full article

(This article belongs to the Section Biomedical Engineering)

► Show Figures

Figure 1

20 pages, 3921 KB

Open AccessArticle

Quinary Classification of Human Gait Phases Using Machine Learning: Investigating the Potential of Different Training Methods and Scaling Techniques

by Amal Mekni, Jyotindra Narayan and Hassène Gritli

Big Data Cogn. Comput. 2025, 9(4), 89; https://doi.org/10.3390/bdcc9040089 - 7 Apr 2025

Cited by 3 | Viewed by 765

Abstract

Walking is a fundamental human activity, and analyzing its complexities is essential for understanding gait abnormalities and musculoskeletal disorders. This article delves into the classification of gait phases using advanced machine learning techniques, specifically focusing on dividing these phases into five distinct subphases. [...] Read more.

Walking is a fundamental human activity, and analyzing its complexities is essential for understanding gait abnormalities and musculoskeletal disorders. This article delves into the classification of gait phases using advanced machine learning techniques, specifically focusing on dividing these phases into five distinct subphases. The study utilizes data from 100 individuals obtained from an open-access platform and employs two distinct training methodologies. The first approach adopts stratified random sampling, where 80% of the data from each subphase are allocated for training and 20% for testing. The second approach involves participant-based splitting, training on data from 80% of the individuals and testing on the remaining 20%. Preprocessing methods such as Min–Max Scaling (MMS), Standard Scaling (SS), and Principal Component Analysis (PCA) were applied to the dataset to ensure optimal performance of the machine learning models. Several algorithms were implemented, including k-Nearest Neighbors (k-NNs), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (Gaussian, Bernoulli, and Multinomial) (NB), Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis (QDA). The models were rigorously evaluated using performance metrics like cross-validation score, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), accuracy, and

R^{2}

score, offering a comprehensive assessment of their effectiveness in classifying gait phases. In the five subphases analysis, RF again performed strongly with a 94.95% accuracy, an RMSE of 0.4461, and an

R^{2}

score of 90.09%, demonstrating robust performance across all scaling methods. Full article

(This article belongs to the Special Issue Deep Learning-Based Pose Estimation: Applications in Vision, Robotics, and Beyond)

► Show Figures

Figure 1

Search Results (293)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (293)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI