Open AccessArticle
Machine Learning-Based Classification of Albanian Wines by Grape Variety, Using Phenolic Compound Dataset
by
Ardiana Topi, Agim Kasaj, Daniel Hudhra, Hasim Kelebek, Gamze Guclu, Serkan Selli and Dritan Topi
Analytica 2025, 6(4), 43; https://doi.org/10.3390/analytica6040043 (registering DOI) - 24 Oct 2025
Abstract
Wine phenolics serve as robust chemical signatures correlated to grape variety, processing, and regional identity. This study explores the potential of machine learning algorithms, combined with the phenolic profiles of Albanian wines, to classify them according to grape variety. Geographic origin analysis was
[...] Read more.
Wine phenolics serve as robust chemical signatures correlated to grape variety, processing, and regional identity. This study explores the potential of machine learning algorithms, combined with the phenolic profiles of Albanian wines, to classify them according to grape variety. Geographic origin analysis was conducted as a preliminary exploration. The dataset of phenolic compounds included white and red wines, spanning the 2017 to 2021 vintages. Using five supervised algorithms—Support Vector Machine (SVM), Random Forest, XGBoost, Logistic Regression, and K-Nearest Neighbors—a high classification accuracy was achieved, with SVM reaching 100% under Leave-One-Out Cross-Validation (LOOCV). To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) and stratified cross-validation were applied. Random Forest feature importance consistently highlighted trans
-Fertaric acid and Procyanidin B3 as dominant discriminants. Parallel coordinates plots demonstrated clear varietal patterns driven by phenolic differences, while PCA and hierarchical clustering confirmed unsupervised grouping consistent with wine type and maceration level. Permutation testing (1000 iterations) confirmed the non-randomness of model performance. These findings show that a small set of phenolic markers can offer high classification accuracy, supporting chemically based wine authentication. Although the dataset is relatively small, thorough cross-validation, non-redundant modeling, and chemical interpretability provide a solid foundation for scalable methods. Future work will expand the dataset and explore sensor-based phenolic measurement to enable rapid authentication in wine.
Full article