Next Article in Journal
Utilization of Meat and Bone Meal for Yeast Fermentation to Produce Astaxanthin
Previous Article in Journal
Application of an Exhausted Fermentation Broth Obtained from Biohydrogen Production in an Apple Orchard: Assessment of Fruit Quality
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Applying Neural Networks in Wineinformatics with the New Computational Wine Wheel

1
Department of Mathematics, University of Central Arkansas, Conway, AR 72034, USA
2
Department of Computer Science and Engineering, University of Central Arkansas, Conway, AR 72034, USA
*
Authors to whom correspondence should be addressed.
Fermentation 2023, 9(7), 629; https://doi.org/10.3390/fermentation9070629
Submission received: 26 May 2023 / Revised: 27 June 2023 / Accepted: 27 June 2023 / Published: 1 July 2023
(This article belongs to the Section Fermentation Process Design)

Abstract

:
Wineinformatics involves the application of data science techniques to wine-related datasets generated during the grape growing, wine production, and wine evaluation processes. Its aim is to extract valuable insights that can benefit wine producers, distributors, and consumers. This study highlights the potential of neural networks as the most effective black-box classification algorithm in wineinformatics for analyzing wine reviews processed by the Computational Wine Wheel (CWW). Additionally, the paper provides a detailed overview of the enhancements made to the CWW and presents a thorough comparison between the latest version and its predecessors. In comparison to the highest accuracy results obtained in the latest research work utilizing an elite Bordeaux dataset, which achieved approximately 75% accuracy for Robert Parker’s reviews and 78% accuracy for the Wine Spectator’s reviews, the combination of neural networks and CWW3.0 consistently yields improved performance. Specifically, this combination achieves an accuracy of 82% for Robert Parker’s reviews and 86% for the Wine Spectator’s reviews on the elite Bordeaux dataset as well as a newly created dataset that contains more than 10,000 wines. The adoption of machine learning algorithms for wine reviews helps researchers understand more about quality wines by analyzing the end product and deconstructing the sensory attributes of the wine; this process is similar to reverse engineering in the context of wine to study and improve the winemaking techniques employed.

1. Introduction

Data science is a field that uses interdisciplinary study, including computer science, mathematics, business, and more, to discover or extract knowledge in a specific domain with large amount of structured and/or unstructured data. A thousand years ago, wine was produced as an ancient technology; over the years, winemaking has continued to thrive and be refined, motivated by the passion of humans. Picking the grapes, processing, fermentation, and aging wine are four important traditional steps involved in the winemaking process. Among all the step, fermentation, which converts sugar to alcohol, is the key step in changing grape juice into wine. Small variations at each stage of the winemaking process can impact the flavor, aroma, and other characteristics of the wine. In the present era, enhancing the fragrance and taste of wine products necessitates the undertaking and documentation of experiments involving diverse recipes and industrial quality control parameters. These records encompass a wide range of formats, such as statistical, categorical, and human language, in order to capture the comprehensive nature of the research. Machine learning algorithms can leverage the extensive amount of experimental data in different formats as input to suggest the optimal quality-control setups as well as provide valuable insights and recommendations for refining and improving quality control processes. Computational models can also be employed to analyze wine reviews of the final products, allowing researchers to comprehend the fundamental components that contribute to the production of high-quality wines. By leveraging these models, scientists can gain insights into the key factors that shape the overall quality of the wine products.
Wineinformatics incorporates data science and wine-related datasets produced from the process of grape growing, winemaking, and wine evaluating to discover useful information for wine producers, distributors, and consumers [1,2]. Professionals generate reports to assess the final product of wines, which may include either physicochemical laboratory data or wine review evaluations. Physicochemical laboratory data typically pertains to the analysis of the wine’s physicochemical composition, including acidity, residual sugar, alcohol content, and other relevant parameters [3], in order to characterize the wine. This type of information is more relevant for the winemaking process and usually involves a smaller dataset (less than 200 wines), due to the expense of the experiments involved [3,4,5]. On the other hand, sommeliers, who are experts in wine, create wine reviews that typically cover aspects such as aroma, flavor, tannins, weight, finish, appearance, and the interplay between these sensory experiences [6]. In addition to the detailed sensory analysis, wine reviews typically include a score that summarizes the wine’s overall quality. This score is given on a 100-point scale and reflects the reviewer’s opinion of how the wine compares with others in the same category in terms of potential quality.
Structured data, such as physicochemical laboratory data, can be easily read and analyzed by computers. On the other hand, unstructured data such as wine reviews require natural language processing techniques to enable computers to comprehend the reviews written in the human language format. There are millions of wine reviews available to researchers from various sources and accessing them typically involves minimal expense. These reviews have the potential to uncover valuable insights that can benefit broader audiences. To accurately capture keywords in wine reviews, including non-flavor notes, the Computational Wine Wheel (CWW) was developed. This tool utilizes a dictionary-based approach, which is often referred to as the bag-of-words (BoW) technique in natural language processing [7,8]. With the help of the Computational Wine Wheel, unstructured wine reviews collected for certain wineinformatics research topics can be converted into a clean dataset for answering questions such as “What are the wine characters aged in French oaks versus American oaks in different wine regions and which one is more suitable?”, “Among all wine reviewers, who is the more reliable one?”, “What characteristics of classic (95+pts) or outstanding (90-94pts) wines show in 21st century Bordeaux?”, “What common characteristics do certain groups of wine share, and what food pairings complement them?”, “What are the wine characteristic differences between great (2018, 2019), good (2017) and not-ideal vintages (2011) in Napa, California?”
To address the aforementioned questions, computers need to effectively integrate two crucial elements: utilize appropriate machine learning algorithms and obtain top-notch quality data. Naïve Bayes and SVM were tested and proved to be suitable white-box and black-box classification algorithms, respectively, in several wineinformatics studies [1,7,9,10]. A neural network is a computational model inspired by the structure and function of the human brain. It consists of interconnected artificial neurons organized into layers, which process and transmit information through weighted connections. By adjusting these weights during a training process, neural networks can learn to perform various tasks, such as pattern recognition, classification, regression, and decision making. They are widely used in machine learning and artificial intelligence applications. This research tries to train neuro networks on wine reviews processed by the Computational Wine Wheel and compare the performances directly with Naïve Bayes and SVM classification results.
High-quality data with minimum noise are crucial to successful data science research; therefore, quality wine reviews are the foundation of this research work. According to wine school of Philadelphia, there are five top wine review sites proving large volume professional wine reviews in their database. Wine Spectator, which has been the main source of reviews for previous wineinformatics research since 1979, reviews more than 15,000 wines around the world; it is also considered as one of the most prominent and well-known wine magazines. Wine Enthusiast began in 1988 and reviews 24,000 wines a year; the reviews are free as long as the readers provide an email address. The magazine covers not only wine reviews but also wine accessories, wine storage hardware, education, food paring, and lifestyle. Antonio Galloni’s Vinous, founded in 2012, employs several top wine critics for publication of online wine reviews. Vinous is well respected and utilized in the field of wine traders; it is also considered as a new leader in wine writing. Robert Parker is probably the most famous wine critic in the wine review business; Robert Parker’s Wine Advocate magazine started in 1978 has had fundamental impacts on the wine industry. Although Wine Advocate is not the first wine magazine, nor the first magazine using a numerical scale as a verdict, it was the first magazine to widely adopt the 100-point scale in its review system. Since then, the wine trade business has adopted the 100-point system. Decanter was founded in London in 1975; the magazine focuses on not only wine reviews but also wine producers and regions. The wine reviews produced by Decanter are usually detailed and focused; however, their reviews also tend to contradict the other magazines mentioned above.
The first two versions of the Computational Wine Wheel were built from Wine Spectator’s reviews and applied on the data sets collected and developed from Wine Spectator’s reviews for various clustering, classification, and regression model creation processes [1,2,7,8,9,11]. Not until early 2022 was the Computational Wine Wheel applied to the data sets collected and developed from Robert Parker’s reviews and compared directly with the data sets collected and developed from Wine Spectator reviews [9]. Robert Parker’s reviews are renowned for their remarkable and powerful contemporary rhetoric, which has had an unparalleled influence on the world of fine wine for over two decades [10]. In contrast to the shorter and more concise reviews offered by Wine Spectator, Parker’s reviews are more elaborating, colorful, and descriptive. As the result, reviews from Robert Parker pose greater challenges for natural language processing techniques [11]. In this research, the Computational Wine Wheel proved its adaptability on the Robert Parker dataset. The performance on classic wine predictions is very compatible with the Wine Spectator data set, even though some words used in Robert Parker’s reviews cannot be converted using the previous versions of the Computational Wine Wheel, which was built purely based on Wine Spectator reviews. This research work also proposes a new Computational Wine Wheel that includes a new bag of keywords from Robert Parker’s reviews and shows the improvements through various experiments.
The major contributions of this research are: (1) Adopting neural networks on wine reviews processed by the Computational Wine Wheel and comparing the performances directly with Naïve Bayes and SVM classification results. (2) Proposing a new Computation Wine Wheel evolved into the 3rd generation by including Robert Parker’s reviews. (3) Testing a new data set that consist of more than 10,000 wines for a comprehensive comparison between two different review sites (Wine Spectator and Robert Parker Wine Advocate) using all three algorithms and two Computational Wine Wheels.

2. Materials and Methods

Every year, tens of thousands of wines are made with different techniques, industrial quality controls, and compositions of grapes. Reviews of these wines give the final verdict on the fermented end product and provide a precise description of every element that contributes to its character. This includes not only flavors and aromas but also characteristics such as acidity, tannins, weight, finish, and structure [8]. By understanding wine reviews, researchers may find the optimal combination of techniques, industrial quality control, and composition of grapes for a specific region and grape type.
With tens of thousands of wine reviews released each year, it is impractical for researchers to manually read and analyze them all. As such, the primary objective of this research is to find the best combination of machine learning algorithms and natural language processing techniques that can automatically and accurately extract essential terms from each review. In order to find the most appropriate machine learning algorithms and obtain top-notch quality data, neural networks, SVM, and naïve Bayes, together with the new Computational Wine Wheel are introduced in this section.

2.1. Classification Algorithms

To initiate the classification process, the first step involves building a classification model. This involves using training data and applying classification algorithms, which can be broadly categorized into two approaches: black-box models and white-box models. Black-box models are difficult to interpret and understand in practical applications as they involve complicated mathematical and distance functions, as well as representation spaces [12]. Commonly used examples of black-box models are hyperplane-based models, such as support vector machines [13], which use subspaces to separate the problem’s classes. A neural network is another popular black-box classification algorithm, which simulates brain neurons in many different wine-related studies including taste sensors [14], electronic nose [15], and wine review analysis [16]. In contrast, white-box models, also known as explainable models, are based on patterns or rules that can be easily understood and explained in practical applications. Common white-box classification algorithms include decision trees [17], linear regression [18], Bayesian networks [19], naïve Bayes [20], etc. Since SVM and naïve Bayes have produced the best results in the latest wineinformatics research [21], the current research applies both algorithms and directly compares their performances with a neural network.

2.1.1. SVM

The support vector machine (SVM) is a well-known machine learning algorithm that was introduced by Boser, Guyon, and Vapnik in COLT-92 [22]. SVMs are a set of supervised learning methods used for classification and regression tasks. The mapping function used in SVM predicts which independent variables map to which dependent variables. This mapping function is a hyperplane or a set of hyperplanes in a high-dimensional space that is used to separate data points into classes. To systematically find the support vector classifier in higher dimensions, a kernel function is used. The polynomial kernel function can be represented by the equation K(x, y) = (xTy + c)d and is one of the most commonly used kernel functions. Other popular kernel functions include the Gaussian radial basis function, linear kernel, and multilayer perceptron, among others [22,23].
In this project, SVM-light [24] was utilized for implementing SVM classification. The process involves two sets of input data, namely the training dataset, which is used to find the support vector classifier, and the testing dataset, which is used to predict the classes by mapping the data points based on the support vector classifier. In this case, SVM-light was used with the linear kernel. Finding the support vector classifier is a crucial part of SVM algorithms.

2.1.2. Neural Networks

Neural networks (NNs) have become a prominent machine learning approach for binary classification tasks, offering a powerful alternative to traditional methods such as support vector machines (SVMs). At the core of these networks lies the concept of artificial neurons that, when organized into layers, can be trained to perform complex mappings between input and output data. Binary classification, a specific application of neural networks, involves the categorization of input data into one of two distinct classes, often represented by the values 0 and 1.
A typical neural network for binary classification consists of an input layer, one or more hidden layers, and an output layer. The input layer receives the feature vectors and passes them through the hidden layers, which apply non-linear activation functions, such as ReLU (rectified linear unit), sigmoid, or tanh, to capture complex relationships between input and output data. The output layer, using an activation function such as the sigmoid function, provides a probability score that assigns the input data to one of the two classes. The network learns the optimal weights and biases through a process called backpropagation, which is an iterative optimization technique that minimizes a predefined loss function, such as cross-entropy or mean-squared error.
This research employed a feedforward neural network implemented using the popular deep learning library TensorFlow [25]. The training process involved dividing the data set into training and testing subsets, with the former being used to adjust the network’s weights and biases, whereas the latter was used to assess the model’s performance on previously unseen data. The selection of the appropriate architecture, activation functions, and optimization algorithm played a pivotal role in achieving optimal classification accuracy for the given problem.
In light of the problem at hand and the potential benefits of employing neural networks for binary classification, this work opted for a deep learning model with an input layer, three hidden layers utilizing the ReLU activation function, and a final sigmoid layer. The decision to use ReLU activation functions was informed by their ability to mitigate the vanishing gradient problem, which can arise during the training of deep neural networks. ReLU activation functions can be represented mathematically as f(x) = max(0, x) and have been proven to be effective in various classification tasks.
To maintain consistency and computational efficiency, the experiment designed the hidden ReLU layers to have the same number of neurons as the input layer. This configuration facilitates the preservation of the feature dimensionality throughout the hidden layers, allowing the network to capture complex patterns within the input data effectively. The final sigmoid layer consists of a single neuron that outputs a probability score, which is then thresholded to assign the input to one of the two binary classes. To reduce the risk of overfitting, a dropout rate of 20% was incorporated between the hidden layers. Dropout is a regularization technique that involves the random deactivation of a portion of neurons during training, which encourages the remaining neurons to learn more robust and generalized representations of the input data. By including a 20% dropout rate, our model is less prone to rely on specific features, resulting in better generalization and improved performance on previously unseen data. Figure 1 provides the architecture of the chosen neural network, consisting of an input layer, three hidden ReLU layers, and a final sigmoid layer, which, coupled with a 20% dropout rate, offers a powerful and robust solution for binary classification tasks.

2.1.3. Naïve Bayes

In the context of classification, Bayes’ theorem can be used to calculate the probability of a class (A) given some input features (B), based on the prior probability of the class and the conditional probability of the features given the class. Naïve Bayes classifiers assume that the input features are independent of each other, which simplifies the calculation of the conditional probability. The formula of Bayes’ theorem is as follows.
P ( A | B ) = P ( A ) P ( B | A ) P ( B )
  • P(A|B) is the posterior probability of A given B
  • P(B|A) is the likelihood probability of B given A
  • P(A) is the prior probability of A
  • P(B) is the prior probability of B
Real-world datasets often contain multiple attributes for each data point and more than one class. To handle such datasets, the Naïve Bayes classifier is commonly used. While B represents n attributes of the data point as a vector B = ( b 1 , b 2 , , b n ) , A has m classes in the dataset ( a 1 ,   a 2 , ,   a m ) . The formula of the Naïve Bayes is as follows:
C l a s s   a i : P ( a i | B ) =   P ( a i ) P ( b 1 | a i ) P ( b 2 | a i ) P ( b n | a i ) P ( b 1 , b 2 , , b n )
for each i = 1 ,   2 , ,   m .
The Naïve Bayes classification aims to derive the maximum posterior probability, which represents the maximal P(A|B) when there are m classes in the data set. In this research, the data sets consist of both binary and continuous attributes. To implement the Naïve Bayes algorithm on the data sets, the continuous attributes are first converted into binary attributes. The conversion process involves treating the count of the frequency in the “CATEGORY” and “SUBCATEGORY” attributes as binary values. For instance, if the count for “fruity” under the “CATEGORY” attribute is 3 for a wine, it will be assigned a value of “1” under the binary attribute “fruity3”. Conversely, if the count is 0 for another wine, it will be assigned a value of “1” under the binary attribute “fruity0”. Once the continuous attributes are converted into binary attributes, the Bernoulli Naïve Bayes classifier is implemented for binary attributes.

2.2. Wine Review Data from Wine Spectator and Robert Parker Wine Advocate

As their reviews typically focus on specific tasting notes and observations while excluding extraneous anecdotes and unrelated details, Wine Spectator is selected as the primary data source for the majority of wineinformatics research [1,2,7,8,9,27]. In [1], the work of all reviewers in Wine Spectator was studied and evaluated, which is considered as an intra-review site evaluation. In this research, like [11], inter-review site assessment is investigated. Based the criteria of similar a history of publication records for fair comparison and less contradicting reviews for the study of merging reviews purposes, Robert Parker’s Wine Advocate was chosen as the second review site for data sources. The reviews in Robert Parker are known to be more descriptive and colorful, in contrast to the Wine Spectator’s shorter and more precise reviews. Table 1 provides two examples of wine reviews for the 2017 E. Pira e Figli Barolo Mosconi, where the left column contains Robert Parker’s review and the right column includes Wine Spectator’s review. Robert Parker’s review contains more information than Wine Spectator’s review, not only focusing on the wine’s taste quality but also mentioning wine-related information, such as the fact that it is from a “hot vintage”. In contrast, Wine Spectator’s review is much shorter, with straightforward descriptions of the wine’s taste. Examining these two distinct review styles is a significant aspect of this research project.
Both Robert Parker and Wine Spectator use a 100-point rating scale to evaluate each wine reviewed. This scale reflects how highly their reviewers regard each wine relative to others in its category and potential quality. The score given summarizes the wine’s overall quality, whereas the tasting note provides information on the wine’s style and character [9]. The meaning of the different overall rating ranges is very similar between Robert Parker and Wine Spectator, as seen in Table 2.

2.3. Computational Wine Wheel

Every year, Wine Spectator releases a list of its top 100 wines based on factors such as quality, value, availability, and excitement to wine judges. The Computational Wine Wheel 2.0 was created using a small but representative sample of wines obtained from Wine Spectator’s top 100 lists between 2003 and 2013 [8], in other words, the CWW 2.0 was developed using 1100 wine reviews from Wine Spectator.
In the latest research [11], the CWW2.0 was applied to Robert Parker’s reviews and then compared with reviews from Wine Spectator through computational model predicting whether a wine would receive 95+ scores on a carefully selected elite Bordeaux, France, data set, which is based on the famous list of the Bordeaux Wine Official Classification of 1855. The elite Bordeaux data set was gathered from Wine.com, comprising all 21st century (2000–2020) wines listed in the 1855 Bordeaux Wine Official Classification that had reviews from both Robert Parker and Wine Spectator. This data set consisted of 513 wines with a total of 1026 wine reviews and included the wine name, vintage, score, and reviews.
Since the data set contains both Robert Parker and Wine Spectator reviews, the computational models were built to predict whether a wine may receive a score of 95+. The best accuracy result achieved from Wine Spectator (513 reviews) was 76.02%, whereas the best result achieved from Robert Parker (513 reviews) was 75.63%. These results suggest that the CWW2.0 was also very effective on Robert Parker reviews. The research also attempted to build a classification model using both Robert Parker and Wine Spectator reviews by creating an elite Bordeaux RP+WS data set that contained 1026 wine reviews so that the computational models were trained by both the Robert Parker and Wine Spectator inputs. Although the results did not surpass either the Robert Parker or Wine Spectator’s results, the experiment was the first attempt at searching for a way to build computational models from multiple review sources in wineinformatics research; the results can also serve as the baseline purpose for this study and future research.

2.4. Computational Wine Wheel 3.0

In this paper, not only will the Computational Wine Wheel 2.0 be applied to more than 10,000 wine reviews from both Robert Parker and Wine Spectator for comparison but it will also evolve into the Computational Wine Wheel to 3.0 by including 513 of Robert Parker’s elite Bordeaux reviews, so that the CWW3.0 is developed based on 1613 (1100 + 513) wine reviews from two different sources.

2.4.1. New Attribute Extraction from Robert Parker’s Elite Bordeaux Data Set Process

The attribute extraction process for the CWW3.0 is a combination of manual and automated processes consisting of several steps. To illustrate, consider the following review of the 2016 Mouton Rothschild by Robert Parker:
Review of 2016 Mouton Rothschild by Robert Parker’s Wine Advocate 100 pts
Composed of 83% Cabernet Sauvignon, 15% Merlot, 1% Cabernet Franc and 1% Petit Verdot, the 2016 Mouton Rothschild has an opaque garnet-purple color. WOW—the nose explodes from the glass with powerful blackcurrant cordial, black raspberries, blueberry pie and melted chocolate notions, plus suggestions of aniseed, camphor, lifted kirsch and the faintest waft of a subtle floral perfume in the background. Full-bodied, concentrated, bold and totally seductive in the mouth, it has very fine-grained, silt-like tannins, while jam-packed with tightly wound fruit layers, finishing in this wonderful array of mineral sparks. Magic.
The initial step involves manually extracting the essential attributes that are crucial in evaluating the wine’s quality, and these attributes are listed in Table 3 under the column “Hand-extracted attributes”. The second step involves the programmatic extraction of attributes from the Robert Parker reviews using the CWW2.0 to determine the number of attributes contained within them. The extracted attributes are shown in the second column of Table 3, titled “Program-extracted attributes”.
The third step involves comparing the attributes extracted manually with those extracted programmatically using CWW2.0. The “Common Attributes” column in Table 3 shows the attributes that were identified by both the manual and programmatic extraction methods. To assess the effectiveness of the CWW2.0 attribute extraction, the number of important attributes extracted by the program was determined. The extraction rate was calculated by dividing the count of the attributes extracted by both hand and program (shown in the “Common Attributes” column in Table 3) by the count of hand-extracted attributes (shown in the “Hand-extracted attributes” column in Table 3). Table 3 displays that out of 26 hand-extracted attributes, 20 attributes were commonly extracted by both hand and program (column “Common Attributes”). Therefore, the extraction rate is calculated as 20/26 = 77%. This rate indicates the percentage of attributes extracted by the CWW2.0 after applying it to Robert Parker’s reviews.
The last step is to add new attributes, identifying their category and subcategory attributes to the new Computational Wine Wheel. Using Table 3 as the example, the extracted attributes to be included in the new CWW are those not in the common attributes. As a result, six additional attributes (resulting from twenty-six minus twenty), were included in the updated CWW. One of the existing attributes in CWW2.0, “BLACK CURRANT”, was already present but under the name “BLACKCURRANT”, which the program was unable to extract. Thus, “BLACK-CURRANT” was added to the CWW with its normalized name being “BLACK CURRANT.” Moreover, new descriptive attributes such as “FINE-GRAINED”, “SILT-LIKE TANNINS”, “ANISEED”, “CAMPHOR” and “MAGIC” were also added to the CWW.
In general, three main types of attributes were included. The first type was the new descriptive and savory attributes. Reviews from Wine Spectator and Robert Parker have different styles. Many of the descriptive words that Parker frequently uses are rarely seen in Wine Spectator’s reviews. Examples of such attributes include “CUMIN SEED”, “ROSE HIP TEA”, “TILLED SOIL”, “VEGETABLE”, and “BLACK FOREST CAKE”. However, to avoid redundancy and considering their actual frequency of usage, some descriptive attributes that appear rarely, such as “ACACIA FLOWER”, “TART ACID”, and “SMORGASBORD”, were not added.
The second type of attribute added to the CWW3.0 was related to n-grams, which are long groups of words that are not extracted as a single attribute. For instance, the phrase “INTENSE AROMATICS” was being extracted as “INTENSE” and “AROMATICS”, which was not an accurate description and also caused redundancy. Hence, additional n-gram words were incorporated into the CWW3.0. In the aforementioned example, “INTENSE AROMATICS” was normalized as “INTENSE” to avoid redundancy and improve accuracy.
The third type of attribute added to the CWW3.0 involves adding different formats of existing attributes in the CWW2.0. This can be a complex process as it requires identifying subtle differences in the way attributes are expressed by the reviewers. For example, the attribute “FRESH FIGS” was extracted by the CWW2.0 as “FRESH” and “FIGS”, and then normalized as “FRESH” and “FIGS”. However, there are significant differences between “FRESH FIGS” and “DRIED FIGS”, which should be extracted as separate attributes and normalized accordingly. In this case, two new attributes were added: “FRESH FIGS” (normalized as “FIGS”) and “DRIED FIGS” (normalized as “DRIED FIGS”). This step was critical in improving the accuracy and reducing the redundancy of the CWW.
In the next step of the process, the focus was on removing unnecessary attributes from the CWW3.0 in order to reduce redundancy and improve the programming efficiency. There were three types of attributes that were considered for deletion. Firstly, plural formats were deemed important for programming extraction. One-word attributes always needed to have both single and plural formats in the CWW, whereas multi-word attributes only required the single format. Thus, redundant multi-word attribute plurals were removed, such as “RED CURRANTS”, “ASIAN SPICES”, “SMOKED MEATS”, and so on.
The second type of attribute that was removed was needless word groups. For instance, attributes such as “HINTS OF ORANGE”, “TOUCH OF SMOKE”, and “NOTES OF BEEF” were normalized as “ORANGE”, “SMOKE”, and “BEEF”, respectively. Since “ORANGE”, “SMOKE”, and “BEEF” already exist in the CWW vocabulary, these attributes will always be extracted and normalized regardless. Therefore, removing redundant word groups helps in improving the efficiency of the programming. The inclusion of word groups in the CWW is to capture a different or more accurate description, otherwise it becomes meaningless.
The third type of attribute that was deleted was broad attributes that showed up frequently in reviews without clear direction, such as “WEIGHT” and “STRUCTURE”. In the CWW2.0, only “high”, “low”, and “medium” applied to “BODY”, “WEIGHT”, “STRUCTURE”, and “ACIDITY”. In the updated version of the CWW, additional descriptors such as “modest” and “decent” were added. For example, in a review that stated: “the plate has fantastic intensity with a very elephant, modest weight, featuring super-ripe…”, the program would extract “WEIGHT” but did not have a clear expression of how the wine was. Instead of extracting “WEIGHT”, “MODEST WEIGHT” would be more meaningful. In the end, a total of 111 attributes were removed from the Computational Wine Wheel 2.0.

2.4.2. CWW2.0 vs. CWW3.0

After the process of adding and deleting attributes, the CWW3.0 final version ended up with 14 categories, 34 subcategories, 1191 normalized attributes, and 2589 specific terms (as shown in Table 4). A total of 657 new specific terms and 205 new normalized attributes were added; in other words, the CWW3.0 had 34% more specific terms and 20.8% more normalized attributes than the CWW2.0.
The CWW3.0 is an updated version of the Computational Wine Wheel that includes additional attributes extracted from wine reviews by Robert Parker, specifically those for wines in the 1855 Bordeaux Wine Official Classification made between 2000 and 2020. This version has 657 more specific-term attributes and 205 more normalized attributes than the CWW2.0, which was based on wine reviews from Wine Spectator. Table 5 is provided to show the differences between the two versions in terms of attribute changes in each subcategory.
Table 5 displays the changes in the number of attributes for each subcategory between the CWW2.0 and the CWW3.0. Two additional columns are included, “SPECIFIC_NAME added” and “NORMALIZED_NAME added”, which indicate the number of newly added attributes. Negative numbers in yellow indicate the deleted attributes, whereas red numbers highlight the largest increases in new attributes. The addition of specific and normalized attributes has also contributed to a growing number of category and subcategory names.

2.5. Data Sets

The data for this research comes from Wine.com, the leading online wine retailer in the United States. Wine.com provides customers with access to the world’s largest wine store and offers wine reviews from various professional critics, including Wine Spectator, Decanter, Wine Enthusiast, and wine experts such as James Suckling and Robert Parker. This platform was chosen for its convenience and reliability in collecting wine reviews. Two data sets were collected and analyzed for this project, which allowed for comparisons between reviews from Robert Parker and Wine Spectator, among others. Additionally, https://www.wine.com/ (accessed on 20 May 2023)’s display of all reviews in one page made it easier to collect wines with multiple reviews.

2.5.1. Dataset1: Elite Bordeaux Data Set (513 Wines)

The wine review data collected in this project is based on the Bordeaux Wine Official Classification of 1855, which was established by Emperor Napoleon III for the Exposition Universelle de Paris in 1855. This classification system ranked France’s best Bordeaux wines from first to fifth growths, based on their quality. The list has remained influential to this day, and France continues to be a major player in the global wine industry. According to the statistics for 2020 from the International Organisation of Vine and Wine (OIV), France produced 46.6 million hectoliters of wine, placing it in second place worldwide [28]. French winemaking has a long and rich history dating back to the 6th century BC, and it has become an integral part of French civilization.
Dataset1, which consisted of 513 wines (1026 wine reviews in total), was collected from Wine.com and included all wines in the 1855 Bordeaux Wine Official Classification list that were produced between 2000 and 2020, along with reviews from both Robert Parker and Wine Spectator. This was also one of the data sets used in the previous research [9], as well as the data sets used to develop the new Computational Wine Wheel in this work. A list of wines included in dataset1 is available in Table S1.

2.5.2. Dataset2: Big Data Set

Dataset2 is the new data set created to comprehensively evaluate the effectiveness of different algorithms and the Computational Wine Wheels. The big data set contains wines from Bordeaux (2341 wines), Italy (3198 wines), and California (4180 wines). Bordeaux, located on the west coast of central France, is one of the most renowned winemaking regions in the world due to its ideal climate and soil conditions for high-quality viticulture. Italy is renowned for being one of the oldest wine-producing regions in the world, with wine produced in every region of the country. According to the OIV, Italy is the largest wine producer in the world, not only in 2021 but also in the past five years [28]. The Californian wine industry is experiencing growth and transformation in the midst of a global evolution in grape growing, wine production, wine marketing, and consumer preferences [29]. Dataset2 comprises 10,232 wines with a total of 20,464 wine reviews. It includes the name, vintage, score, and wine reviews for each wine, providing a comprehensive overview of all the wine reviews collected. A list of the wines included in dataset2 is available in Table S2.

2.6. Data Preprocessing

The data sets described in Section 2.4 underwent processing by both Computational Wine Wheel versions 2.0 and 3.0. The Computational Wine Wheel is designed with numerous levels and branches to effectively partition the general categories of wine attributes into more detailed and specific subcategories. As mentioned in Section 2.3, both the CWWs contain “CATEGORY_NAME”, “SUBCATEGORY_NAME”, “SPECIFIC_NAME”, and “NORMALIZED_NAME” attributes. “SPECIFIC_NAME” indicates the words used in the review and are mapped into “NORMALIZED_NAME” attributes to be understood by computers. For example, fresh apple, red apple, and apple are all mapped into apple as the normalized attributes. The program initiates the attribute search process by matching the wine reviews with the list of “SPECIFIC_NAME” attributes. In this operation, every review is treated as a word list and each word is compared with the attributes in the “SPECIFIC_NAME” list. The “SPECIFIC_NAME” list is divided into two separate lists: a list of individual words (1-gram) and a list of word groups (N-gram), sorted by descending length. The program subsequently iterates through the list of word groups, removing any matching attributes from the review to prevent the occurrence of repeated single-word matches. For instance, if the review includes “blueberry cream”, the program will first scan through the list of word groups and eliminate the word group “blueberry cream.” This way, when the program subsequently scans through the list of individual words, it will not add the redundant attribute “blueberry” to the extracted words.
Once the program finds an attribute that matches a “SPECIFIC_NAME” attribute, it assigns a value of 1 to the corresponding “NORMALIZED_NAME” attribute. Conversely, if no match is found, it assigns a value of 0. The corresponding “SUBCATEGORY” and “CATEGORY” attributes are continuously incremented for each wine review. Although the “NORMALIZED_NAME” attribute is binary, the “SUBCATEGORY” and “CATEGORY” attributes are continuous and require normalization to avoid any potential weight imbalances in features. Normalization not only speeds up the computational process but also makes it easier to understand the data. For this project, the min–max normalization technique was used to rescale the continuous data sets to a range of 0–1. The normalized value (z) is calculated using the original value (x), the minimum value (min(x)) among all x, and the maximum value (max(x)) among all x, using the following equation:
z = x min ( x ) max ( x ) min ( x )
Most of the focus of existing wineinformatics classification problems [1,2,9,11,27] are to predict whether a wine can score 90 points or higher: if a wine receives a score equal to or above 90 out of 100, it will be labeled as a positive (+) class. Conversely, if it scores below 90, it will be labeled as a negative (−) class. However, since the wines analyzed in this research were sourced from a high-class e-commerce website, 99% of them received a score above 90 points. Hence, to create a more balanced dataset, the classification threshold was raised to 95 points. Thus, if a wine receives a score equal to or above 95 points, it will be labeled as a positive (+) class, whereas scores below 95 points will be labeled as a negative (−) class. Nonetheless, despite setting the threshold at 95, some data sets remain imbalanced. The data conversion process is identical to [11], where there was a figure to describe the complete conversion process; therefore, the results can be compared directly in the Results section. Table S2 provides the full processed data (including category, subcategory and normalized attributes) from Robert Parker’s reviews using the CWW3.0. Table S3 provides the full processed data (including category, subcategory and normalized attributes) from Wine Spectator’s reviews using the CWW3.0 to show the name and number of all attributes (descriptors).

2.7. Classification Evaluations

To assess the performance of the classification models, five-fold cross-validation was employed. Here, the data were divided into five equal parts, and each part was used once as the testing set, whereas the remaining four parts were used for training. This process was repeated five times and the average accuracy, precision, recall, and F1 scores were calculated to evaluate the performance of the model. Four statistical measures were used to evaluate the performance of the classification models: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). These measures were used to calculate the accuracy, precision, recall, and F1 scores for the models.
As shown in Table 6, a true positive prediction occurs when the model predicts a positive result (a 95+ wine) and the wine actually belongs to the positive class. A false positive prediction occurs when the model predicts a positive result (a 95+ wine) but the wine actually belongs to the negative class (a 94− wine). A true negative prediction occurs when the model predicts a negative result (a 94− wine) and the wine actually belongs to the negative class. Finally, a false negative prediction occurs when the model predicts a negative result (a 94− wine) but the wine actually belongs to the positive class (a 95+ wine).
To assess the classification performance, four metrics were utilized: accuracy, precision, recall, and F-score. However, due to space constraints, only the accuracy results were reported in this paper. Accuracy represents the percentage of correctly classified wines over all wines in the dataset, indicating how many wines were accurately predicted as either positive or negative.
A c c u r a c y = TP + TN TP + TN + FP + FN

3. Experimental Results

This section aims to evaluate and compare several aspects of wine reviews from Wine Spectator and Robert Parker, focusing on the accuracy of the classification of wines between different methods. More specifically, accuracy results from the SVM, Naïve Bayes, and neural network techniques using the same Computational Wine Wheel will be carefully studied. This paper will also compare the results derived from the Computational Wine Wheel 2.0 and those from the Computational Wine Wheel 3.0. Both data sets discussed in the previous section with different choices of attributes, including category attributes, normalized attributes, and all attributes (normalized, categorical, and subcategory attributes) will be tested in this section.
The accuracy from the SVM and Naïve Bayes models on dataset1 is extracted from the work of [11] to directly compare with the results using neural networks. Since dataset2 is a new data set, all experimental results reported in this paper have the same setup as [11]. To obtain accuracy for neural network models, different setups were tested and evaluated; this research work adopts the following setups:
-
Three inner layers, each with a number of nodes equal to the number of attributes of the data set;
-
Each inner layer is followed by a drop-off layer set at a 20% ratio;
-
A five-fold cross-validation is used;
-
Since neural networks use the random initial weight for each node for each data set, the reported accuracy is the average of five independent runs.
Note that due to its stochastic nature, different settings for a neural network can sometimes produce better accuracy but at most other times they produce worse. The optimal setting, including the values of the hyperparameters, is not our main focus in this paper. Once the Computational Wine Wheel was applied to the datasets, the CWW3.0 produced 1239 attributes (14 CATEGORY + 34 SUBCATEGORY + 1191 NORMALIZED), whereas the CWW2.0 generated 1034 attributes (14 CATEGORY + 34 SUBCATEGORY + 986 NORMALIZED) for comparison purposes. Both data sets were applied and compared through all three classification algorithms. Detailed performances are provided in Section 3.1 and Section 3.2 corresponding to dataset1 and dataset2, whereas Section 3.3 describes the performance differences using the CWW2.0 and the CWW3.0.

3.1. Performances on Dataset1: Elite Bordeaux

3.1.1. Results of Robert Parker’s review in Dataset1

For the dataset1, the elite Bordeaux data set, Figure 2 shows the accuracy comparison between the naïve Bayes, SVM, and neural network techniques applied on Robert Parker’s review. On the left is the result from Robert Parker’s review processed by the CWW2.0, and on the right is the result from Robert Parker’s review processed by the CWW3.0. On the x-axis, C, N, and CSN indicate the attribute selection of category attributes, normalized attributes, and all attributes (normalized, categorical and subcategory attributes), respectively.
On the left side of Figure 2 are the category attributes, which consist of 14 continuous values processed by the CWW2.0. If these were selected, the accuracy generated from the three algorithms was around 70%. Although 70% may not seem ideal, the results actually suggest that the classification models do pick up a certain pattern from only 14 attributes. It is worth noting that the SVM (73.10%) gave the best result and the neural network (68.24%) did not perform as well as the other classification algorithms in this setting. If the normalized attributes, which consist of 1932 binary values processed by the CWW2.0, were selected, the accuracy generated from the three algorithms improved to around 73%. In this setting, naïve Bayes (74.85%) outperformed the other two algorithms, whereas the neural network performed worst (72.55%). If the category attributes, subcategory attributes, and normalized attributes processed by the CWW2.0, which consist of 14 continuous values, 34 continuous values, and 1932 binary values, respectively, were selected, the neural network (81.62%) outperformed both the SVM (75.63%) and naïve Bayes (74.86%). The satisfactory result suggests that the neural network algorithm is more suitable for processing a data set that contains different data types in wineinformatics by directly comparing the classification results reported in [11].
On the right side of Figure 2 are the category attributes, which consist of 14 continuous values processed by CWW3.0. If these were selected, the accuracy generated from the three algorithms was around 73%, which is slightly higher than the data processed by the CWW2.0. It is worth noting that the SVM (74.07%) gave the best result and the neural network (73.48%) was only slightly behind the SVM. If the normalized attributes, which consist of 2589 binary values processed by CWW3.0, were selected, the accuracy generated from the three algorithms improved to around 74%. In this setting, naïve Bayes (75.83%) again outperformed the other two algorithms, whereas the neural network performed worst (72.89%). If the category attributes, subcategory attributes, and normalized attributes processed by the CWW3.0, which consist of 14 continuous values, 34 continuous values, and 2589 binary values, respectively, were selected, the neural network (81.76%) outperformed both the SVM (74.27%) and naïve Bayes (75.24%). The consistent result again suggests that the neural network algorithm is suitable for processing data sets that contain different wineinformatics data types.

3.1.2. Results from Wine Spectator

For dataset1, the elite Bordeaux data set, Figure 3 shows the accuracy comparison between the naïve Bayes, SVM, and neural network techniques applied to the Wine Spectator reviews. On the left is the result from Wine Spectator’s reviews processed by the CWW2.0, and on the right is the result from Wine Spectator’s reviews processed by the CWW3.0. On the x-axis, C, N, and CSN indicate the attribute selection of category attributes, normalized attributes, and all attributes (normalized, categorical, and subcategory attributes), respectively.
On the left side of Figure 3 are the category attributes, which consist of 14 continuous values processed by the CWW2.0. If these were selected, the accuracy generated from the three algorithms was around 70%. The SVM (71.35%) and naïve Bayes (71.93) gave almost identical results, whereas and the neural network (66.70%) did not perform well. If the normalized attributes, which consist of 1932 binary values processed by the CWW2.0, were selected, the accuracy generated from three algorithms improved to around 74%. In this setting, the SVM (75.44%) outperformed the other two algorithms, whereas the neural network performed worst (73.15%). If the category attributes, subcategory attributes, and normalized attributes processed by the CWW2.0, which consist of 14 continuous values, 34 continuous values, and 1932 binary values, respectively, were selected, the neural network (86.23%) outperformed both the SVM (76.02%) and naïve Bayes (74.46%). This satisfactory result confirmed that the neural network algorithm performed well to capture the patterns hidden behind the data by directly comparing the classification results reported in [11].
On the right side of Figure 3 are the category attributes, which consist of 14 continuous values processed by the CWW3.0. If these were selected, the accuracy generated from the three algorithms was around 71%; all three algorithms had almost identical results. If the normalized attributes, which consist of 2589 binary values processed by the CWW3.0, were selected, the accuracy generated from three algorithms improved to around 75%. In this setting, the SVM (77.01%) outperformed the other two algorithms, whereas the neural network performed worst (75.28%). If the category attributes, subcategory attributes, and normalized attributes processed by the CWW3.0, which consist of 14 continuous values, 34 continuous values, and 2589 binary values, respectively, were selected, neural networks (86.69%) achieved the best results compared with the SVM (78.36%) and naïve Bayes (77.13%). These results confirmed that the neural network algorithm is the best choice for dataset1 by selecting all possible attributes.
In summary, the neural network outperformed SVM and Naïve Bayes in both the Robert Parker and Wine Spectator reviews processed by both the CWW versions 2.0 and 3.0 with all attributes.

3.2. Performances on Dataset2: Big Dataset

3.2.1. Results of Robert Parker’s Review in Dataset2

For the dataset2, the big data set, Figure 4 shows the accuracy comparison between the naïve Bayes, SVM, and neural networks techniques applied on Robert Parker’s review. On the left is the result from Robert Parker’s reviews processed by the CWW2.0, and on the right is the result from Robert Parker’s reviews processed by the CWW3.0. On the x-axis, C, N, and CSN indicate the attribute selection of category attributes, normalized attributes, and all attributes (normalized, categorical, and subcategory attributes), respectively.
On the left side of Figure 4 are the category attributes, which consist of 14 continuous values processed by the CWW2.0. If these were selected, the accuracy generated from the three algorithms was around 67%. Unlike the results shown in Figure 2 and Figure 3, the neural network (68.25%) generated the best results compared with the SVM (67.76%) and naïve Bayes (66.33%). If the normalized attributes, which consist of 1932 binary values processed by the CWW2.0, were selected, the accuracy generated from the three algorithms improved to around 78%. In this setting, the neural network (81.70%) clearly outperformed the other two algorithms, SVM (77.71%) and naïve Bayes (75.12%). If the category attributes, subcategory attributes, and normalized attributes processed by the CWW2.0, which consist of 14 continuous values, 34 continuous values, and 1932 binary values, respectively, were selected, again the neural network (81.45%) outperformed both the SVM (77.49%) and naïve Bayes (73.22%). This satisfactory result suggests that the neural network algorithm is even more suitable for the larger data set.
On the right side of Figure 4 are the category attributes, which consist of 14 continuous values processed by the CWW3.0. If these were selected, the accuracy generated from the three algorithms was around 69%, which is slightly higher than the data processed by the CWW2.0. The neural network (69.61%) and SVM (69.28%) were almost identical and naïve Bayes (67.80%) was slightly lower. If the normalized attributes, which consist of 2589 binary values processed by the CWW3.0, were selected, the accuracy generated from the three algorithms improved to around 78%. In this setting, the neural network (82.03%) clearly outperformed the other two algorithms, the SVM (78.76%) and naïve Bayes (75.88%). If the category attributes, subcategory attributes, and normalized attributes processed by the CWW3.0, which consist of 14 continuous values, 34 continuous values, and 2589 binary values, respectively, were selected, the neural network (82.74%) outperformed both the SVM (78.48%) and naïve Bayes (74.78%). The consistent result again suggests that the neural network algorithm is the classification model to process data sets that contain different in wineinformatics data types.

3.2.2. Results of Wine Spectator’s Reviews in Dataset2

For dataset2, the big data set, Figure 5 shows the accuracy comparison between the naïve Bayes, SVM, and neural networks techniques applied to Wine Spectator’s reviews. On the left is the result from Wine Spectator’s reviews processed by the CWW2.0, and on the right is the result from Wine Spectator’s reviews processed by the CWW3.0. On the x-axis, C, N, and CSN indicate the attribute selection of category attributes, normalized attributes, and all attributes (normalized, categorical, and subcategory attributes), respectively.
On the left side of Figure 5 are the category attributes, which consist of 14 continuous values processed by the CWW2.0. If these were selected, the accuracy generated from the three algorithms was around 77%, which is almost 10% higher than for Robert Parker’s reviews. The neural network (77.83%) and naïve Bayes (77.32%) generated satisfactory results considering the that the data contain only 14 attributes over SVM (76.25%). If the normalized attributes, which consist of 1932 binary values processed by the CWW2.0, were selected, the accuracy generated from the three algorithms improved to around 84%. In this setting, the neural network (86.17%) clearly outperformed the other two algorithms, the SVM (83.09%) and naïve Bayes (81.07%). If the category attributes, subcategory attributes, and normalized attributes processed by the CWW2.0, which consist of 14 continuous values, 34 continuous values, and 1932 binary values, respectively, were selected, again the neural network (85.93%) outperformed both the SVM (83.33%) and naïve Bayes (79.20%). This result suggests that the neural network algorithm is even more suitable for the larger data set. On top of all discussed results, the neural network seems to work slightly better with normalized attributes (1932 attributes) over 14 continuous values, 34 continuous values, and 1932 binary values. This is the only special scenario that happened in the neural network application in this research.
On the right side of Figure 5 are the category attributes, which consist of 14 continuous values processed by CWW3.0. If these were selected, the accuracy generated from the three algorithms was around 77%, whereas the SVM underperformed. The neural network (78.44%) and naïve Bayes (77.85%) almost tied and the SVM (72.08%) underperformed. If the normalized attributes, which consist of 2589 binary values processed by CWW3.0, were selected, the accuracy generated from the three algorithms improved to around 82%. In this setting, the neural network (86.59%) clearly outperformed the other two algorithms, the SVM (79.76%) and naïve Bayes (81.20%). If the category attributes, subcategory attributes, and normalized attributes processed by CWW3.0, which consist of 14 continuous values, 34 continuous values, and 2589 binary values, respectively, were selected, the neural network (86.67%) outperformed both the SVM (79.60%) and naïve Bayes (79.88%). The consistent result again suggests that the neural network algorithm is the classification model to process data sets that contain different wineinformatics data types.
To summarize, the neural network outperformed SVM and Naïve Bayes in both Robert Parker’s and Wine Spectator’s reviews processed by both the CWW versions 2.0 and 3.0 with all of the different choices of attributes.

3.3. Performances of the Computational Wine Wheel 2.0 vs. 3.0

To compare the new and old Computational Wine Wheels, since the best results all came from the neural network, accuracy performances from both dataset1 and dataset2 using all attributes were gathered and analyzed for Figure 6.
According to the figure, not only do wine reviews from Robert Parker (both dataset1 and dataset2) benefit from the new Computational Wine Wheel but reviews from Wine Spectator also share the same trend. This result suggests that CWW 3.0 is a better choice when the neural network is used to classify whether a wine receives a 95+ score, no matter if Robert Parker’s reviews or Wine Spectator’s reviews are used. Comparing with the best accuracy results received from [9] using dataset1, which is close to 75% for Robert Parker’s reviews and 78% for Wine Spectator’s reviews, neural networks plus CWW3.0 consistently produce 82% for Robert Parker’s reviews and 86% for Wine Spectator’s reviews.

4. Discussion

In this research, the neural network classification method is implemented and the accuracy results are compared with those of the Naïve Bayes and SVM classification methods. Two data sets were tested by all three classification algorithms, with the elite Bordeaux data set studied as in the previous research and another newly created data set consisting of more than 10,000 wines. All wines studied in this research work have reviews from both Robert Parker and Wine Spectator. These reviews are processed to extract attributes using the Computation Wine Wheel 2.0, as well as the newly proposed Computation Wine Wheel 3.0. Three different types of attributes were extracted: category, subcategory, and normalized. In this research, classification algorithms were trained on the data sets with categorical attributes, normalized attributes, and all attributes (category, normalized, and subcategory combined).
The performance results of the neural network classification performed on the reviews that have been through the CWW 2.0 and CWW 3.0 were also analyzed. The results in Section 3 show that the CWW 3.0 had a better capability to process wine reviews over the CWW 2.0. Although the CWW 3.0 yields better results, the training time is longer due to more attributes being considered. Whether the CWW 2.0 or CWW 3.0 should be used depends on specific use cases, resource availability, and the importance of obtaining the best possible prediction.
The results of Section 3 show that the neural network classification outperforms the other two classification methods. Neural networks, as well as SVMs, are considered black-box classifications that are difficult to explain and understand but offer high performance. On the other hand, Naïve Bayes classification is a white-box classification that can be easily understood and explained. Therefore, using neural network or SVM classification to achieve high results and using Naïve Bayes classification to study the inner logic of the data is the ideal approach for this research work. To provide an example, let us consider the following table.
Table 7 presents key attributes extracted from instances predicted as positive and negative classes for the elite Bordeaux wine dataset using the Naïve Bayes classification. The attribute “FULL-BODIED” from the positive class and “MEDIUM-BODIED” from the negative class show a significant contrast in scoring. A Bordeaux wine is more likely to score higher than 95 when it has a “FULL-BODIED” mouthfeel, whereas it is more likely to score lower than 94 when it has a “MEDIUM-BODIED” mouthfeel. Additionally, the attributes “FLORAL” and “AROMA” form another interesting contrast. Whereas both describe scents, “FLORAL” specifically refers to flower scents, implying that wines scoring 95+ are more likely to have a floral scent.
Analyzing the key attributes of different classes is a critical aspect in the field of wineinformatics. These attributes offer valuable insights into the characteristics that contribute to positive and negative instances, which are particularly relevant for the wine industry, especially during the fermentation process, as most of the wine’s features are determined during this stage. Recently, more and more studies are working on analyzing the relationship between wine reviews and quality [16,30,31,32]; it is crucial to note that evaluation metrics such as accuracy are simply providing numbers for preliminary results, discovering useful knowledge that might be more important and which humans can understand is the final goal.

5. Conclusions

This research paper presents three major contributions:
  • Adopting neural networks on wine reviews processed by the Computational Wine Wheel and comparing performances directly with Naïve Bayes and SVM classification results;
  • Proposing a new Computation Wine Wheel evolved into the third generation by including Robert Parker’s reviews;
  • A new data set, which consists of more than 10,000 wines was collected and tested for a comprehensive comparison between two different review sites (Wine Spectator and Robert Parker Wine Advocate) using all three algorithms and two Computational Wine Wheels.
This paper demonstrates that neural networks might be the best black-box classification algorithm in wineinformatics to analyze wine reviews processed by the Computational Wine Wheel. This paper also offers a comprehensive account of the steps taken to improve the Computational Wine Wheel, as well as comparisons between the new and previous CWW versions. In comparison with the highest accuracy results obtained in the latest research work [11] utilizing elite Bordeaux data set, which achieved approximately 75% accuracy for Robert Parker’s reviews and 78% accuracy for Wine Spectator’s reviews, the combination of neural networks and the CWW3.0 consistently yields improved performance. Specifically, this combination achieves an accuracy of 82% for Robert Parker’s reviews and 86% for Wine Spectator’s reviews. Similar trends reported in the result section are also achieved in the second data set, which consist of more than 10,000 wines.
In summary, this paper brings the natural language processing on wine reviews to the next level by utilizing neural networks and the new Computational Wine Wheel. The adoption of machine learning algorithms on wine reviews helps researchers to understand more about quality wines by analyzing the end product and deconstructing the sensory attributes of the wine; this process is similar to reverse engineering in the context of wine in order to study and improve the winemaking techniques employed.

6. Future Works

This paper provides many new research directions in wineinformatics. First of all, since neural networks work very well in the data set with all attributes, many other advanced neural networks, such as convolutional neural networks (CNN) [33], recurrent neural networks (RNN) [26], and long short-term memory networks (LSTM) [34], as well as deep learning [35], can be implemented and tested in similar settings. Next, since the new Computational Wine Wheel provides a stronger capability in processing wine reviews, the positive impact may also extend to all existing wineinformatics data mining research, such as research using association rules [7], clustering [9], granular computing [36], etc. Unlike this paper, which mainly focuses on classification, different data mining techniques and algorithms should be able to fine-tune the parameters and retrieve useful domain knowledge. Third, development of the Computational Wine Wheel is a never-ending project. This research worked on two of the famous wine magazines; there are many others to be included to broaden the scope of the CWW. Meanwhile, each magazine/expert produces tens of thousands of reviews each year to be to be included for the depth and scope of the CWW. Determining the best strategy to expand the CWW itself could be an important research topic. Fourth, although neural networks did a great job building the model to understand wine reviews, researchers can not translate those models into knowledge easily, since the neural network algorithm is considered as a black-box classification. Interpreting neural network models [37,38] can be a key component to translate hidden models into winemaking knowledge. Last but not least, all wines used in this research have both reviews from Wine Spectator and Robert Parker; since they review on the same wine, do their wine scores and reviews match each other? Can the reviews on the same wine merge together and the machine learning models build on it to provide a better insight about wine? All of these questions are very likely to be answered in the near future through wineinformatics research.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/fermentation9070629/s1, Table S1: Wine lists for dataset 1 and 2. Table S2: allRP_nor+cat+sub. Table S3: allWS_nor+cat+sub.

Author Contributions

Conceptualization, B.C.; Methodology, B.C.; Investigation, L.L., P.N.H., I.L., Q.T. and B.C.; Data curation, Q.T.; Writing—original draft, B.C.; Writing—review & editing, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chen, B.; Velchev, V.; Palmer, J.; Atkison, T. Wineinformatics: A Quantitative Analysis of Wine Reviewers. Fermentation 2018, 4, 82. [Google Scholar] [CrossRef] [Green Version]
  2. Palmer, J.; Chen, B. Wineinformatics: Regression on the Grade and Price of Wines through Their Sensory Attributes. Fermentation 2018, 4, 84. [Google Scholar] [CrossRef] [Green Version]
  3. Cortez, P.; Cerdeira, A.; Almeida, F.; Matos, T.; Reis, J. Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst. 2009, 47, 547–553. [Google Scholar] [CrossRef] [Green Version]
  4. Chen, M.-C.; Chen, L.-S.; Hsu, C.-C.; Zeng, W.-R. An information granulation based data mining approach for classifying imbalanced data. Inf. Sci. 2008, 178, 3214–3227. [Google Scholar] [CrossRef] [Green Version]
  5. Capece, A.; Romaniello, R.; Siesto, G.; Pietrafesa, R.; Massari, C.; Poeta, C.; Ro-mano, P. Selection of indigenous Saccharomyces cerevisiae strains for Nero d’Avola wine and evaluation of selected starter implantation in pilot fermentation. Int. J. Food Microbiol. 2010, 144, 187–192. [Google Scholar] [CrossRef] [PubMed]
  6. Edelmann, A.; Diewok, J.; Schuster, K.C.; Lendl, B. Rapid Method for the Discrimination of Red Wine Cultivars Based on Mid-Infrared Spectroscopy of Phenolic Wine Extracts. J. Agric. Food Chem. 2001, 49, 1139–1145. [Google Scholar] [CrossRef] [PubMed]
  7. Chen, B.; Rhodes, C.; Crawford, A.; Hambuchen, L. Wineinformatics: Applying data mining on wine sensory reviews pro-cessed by the computational wine wheel. In Proceedings of the 2014 IEEE International Conference on Data MiningWork-Shop, Shenzhen, China, 14 December 2014; pp. 142–149. [Google Scholar]
  8. Chen, B.; Rhodes, C.; Yu, A.; Velchev, V. The computational wine wheel 2.0 and the TriMax triclustering in wineinformatics. In Proceedings of the Industrial Conference on Data Mining, Barcelona, Spain, 12–15 December 2016; Springer: Cham, Switzerland, 2016; pp. 223–238. [Google Scholar]
  9. McCune, J.; Riley, A.; Chen, B. Clustering in Wineinformatics with Attribute Selection to Increase Unique-ness of Clusters. Fermentation 2021, 7, 27. [Google Scholar] [CrossRef]
  10. Hommerberg, C. Persuasiveness in the Discourse of Wine: The Rhetoric of Robert Parker. Ph.D. Thesis, Linnaeus University Press, Kalmar, Sweden, 2011. [Google Scholar]
  11. Tian, Q.; Whiting, B.; Chen, B. Wineinformatics: Comparing and Combining SVM Models Built by Wine Reviews from Robert Parker and Wine Spectator for 95 + Point Wine Prediction. Fermentation 2022, 8, 164. [Google Scholar] [CrossRef]
  12. Loyola-Gonzalez, O. Black-Box vs. White-Box: Understanding Their Advantages and Weaknesses from a Practical Point of View. IEEE Access 2019, 7, 154096–154113. [Google Scholar] [CrossRef]
  13. Cherkassky, V.; Dhar, S. Interpretation of Black-Box Predictive Models; Springer: Cham, Switzerland, 2015; pp. 267–286. [Google Scholar] [CrossRef]
  14. Riul, A., Jr.; de Sousa, H.C.; Malmegrim, R.R.; Santos, D.S.D., Jr.; Carvalho, A.C.P.F.; Fonseca, F.J.; Oliveira, O.N., Jr.; Mattoso, L.H.C. Wine classification by taste sensors made from ultra-thin films and using neural networks. Sens. Actuators B Chem. 2004, 98, 77–82. [Google Scholar] [CrossRef]
  15. Aguilera, T.; Lozano, J.; Paredes, J.A.; Alvarez, F.J.; Suárez, J.I. Electronic nose based on inde-pendent component analysis combined with partial least squares and artificial neural networks for wine prediction. Sensors 2012, 12, 8055–8072. [Google Scholar] [CrossRef]
  16. Katumullage, D.; Yang, C.; Barth, J.; Cao, J. Using Neural Network Models for Wine Review Classification. J. Wine Econ. 2022, 17, 27–41. [Google Scholar] [CrossRef]
  17. Chen, Y.-L.; Hung, L.T.-H. Using decision trees to summarize associative classification rules. Expert Syst. Appl. 2009, 36, 2338–2351. [Google Scholar] [CrossRef]
  18. Weisberg, S. Applied Linear Regression; John Wiley & Sons: Hoboken, NJ, USA, 2005; Volume 528. [Google Scholar] [CrossRef]
  19. Ben-Gal, I. Bayesian networks. In Encyclopedia of Statistics in Quality and Reliability; Wiley Online Library: New York, NY, USA, 2008; Volume 1. [Google Scholar]
  20. Webb, G.I.; Keogh, E.; Miikkulainen, R. Naïve Bayes. Encycl. Mach. Learn. 2010, 15, 713–714. [Google Scholar]
  21. Chen, B.; Le, H.; Atkison, T.; Che, D. A Wineinformatics Study for White-box Classification Algorithms to Understand and Evaluate Wine Judges. Trans. Mach. Learn. Data Min. 2017, 10, 3–24. [Google Scholar]
  22. Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PN, USA, 27–29 July 1992. [Google Scholar]
  23. Karimi, N.; Kazem, S.; Ahmadian, D.; Adibi, H.; Ballestra, L. On a generalized Gaussian radial basis function: Analysis and applications. Eng. Anal. Bound. Elem. 2020, 112, 46–57. [Google Scholar] [CrossRef]
  24. Joachims, T. Making large-Scale SVM Learning Practical. In Advances in Kernel Methods—Support Vector Learning; Schölkopf, B., Burges, C., Smola, A., Eds.; MIT-Press: Cambridge, MA, USA, 1999. [Google Scholar]
  25. Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
  26. Medsker, L.R.; Jain, L.C. Recurrent neural networks. Des. Appl. 2001, 5, 64–67. [Google Scholar]
  27. Palmer, J.; Sheng, V.S.; Atkison, T.; Chen, B. Classification on grade, price, and region with multi-label and multi-target methods in wineinformatics. Big Data Min. Anal. 2019, 3, 1–12. [Google Scholar] [CrossRef]
  28. Roca, P. State of the Vitiviniculture World in 2020; International Organization of Vine and Wine: Dijon, France, 2021. [Google Scholar]
  29. Goodhue, R.; Green, R.; Heien, D.; Martin, P. California wine industry evolving to compete in 21st century. Calif. Agric. 2008, 62, 12–18. [Google Scholar] [CrossRef] [Green Version]
  30. Buccafusco, C.; Masur, J.S.; Whalen, R. How Many Latours Is Too Many? Measuring Brand Name Congestion in Bordeaux Wine. J. Wine Econ. 2021, 16, 419–428. [Google Scholar] [CrossRef]
  31. Capehart, K.W. Expensive and Cheap Wine Words Revisited. J. Wine Econ. 2021, 16, 411–418. [Google Scholar] [CrossRef]
  32. Yang, C.; Barth, J.; Katumullage, D.; Cao, J. Wine Review Descriptors as Quality Predictors: Evidence from Language Processing Techniques. J. Wine Econ. 2022, 17, 64–80. [Google Scholar] [CrossRef]
  33. Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef] [Green Version]
  34. Cheng, J.; Dong, L.; Lapata, M. Long Short-Term Memory-Networks for Machine Reading. arXiv 2016, arXiv:1601.06733. [Google Scholar]
  35. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  36. Chen, B.; Buck, K.H.; Lawrence, C.; Moore, C.; Yeatts, J.; Atkison, T. Granular computing in wineinformatics. In Proceedings of the 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), Guilin, China, 29–31 July 2017; 2017; pp. 1228–1232. [Google Scholar]
  37. Mehta, S.; Azarnoush, B.; Chen, B.; Saluja, A.; Misra, V.; Bihani, B.; Kumar, R. Simplify-then-translate: Automatic preprocessing for black-box translation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 8488–8495. [Google Scholar]
  38. Miyata, R.; Fujita, A. Understanding Pre-Editing for Black-Box Neural Machine Translation. arXiv 2021, arXiv:2102.02955. [Google Scholar]
Figure 1. Neural network architecture used in this research: one input layer with three hidden layers and one output node to determine positive or negative classes. The number of nodes in the input layer, which is determined by the number of attributes used in the dataset, is equal to each of the hidden layers. For example, if the feed in dataset has 1000 attributes, the input layer naturally has 1000 nodes and each of the hidden layers also have 1000 nodes. This research employed a feedforward neural network implemented using the popular deep learning library TensorFlow [26]. The training process involved dividing the data set into training and testing subsets using five-fold cross validation. To reduce the risk of overfitting, a dropout rate of 20% was incorporated between the hidden layers.
Figure 1. Neural network architecture used in this research: one input layer with three hidden layers and one output node to determine positive or negative classes. The number of nodes in the input layer, which is determined by the number of attributes used in the dataset, is equal to each of the hidden layers. For example, if the feed in dataset has 1000 attributes, the input layer naturally has 1000 nodes and each of the hidden layers also have 1000 nodes. This research employed a feedforward neural network implemented using the popular deep learning library TensorFlow [26]. The training process involved dividing the data set into training and testing subsets using five-fold cross validation. To reduce the risk of overfitting, a dropout rate of 20% was incorporated between the hidden layers.
Fermentation 09 00629 g001
Figure 2. The accuracy comparison between the naïve Bayes, SVM, and neural networks applied on Robert Parker’s reviews in dataset1, processed by the CWW2.0 and the CWW3.0.
Figure 2. The accuracy comparison between the naïve Bayes, SVM, and neural networks applied on Robert Parker’s reviews in dataset1, processed by the CWW2.0 and the CWW3.0.
Fermentation 09 00629 g002
Figure 3. The accuracy comparison between the naïve Bayes, SVM, and neural network applied on the Wine Spectator’s reviews in dataset1, processed by the CWW2.0 and the CWW3.0.
Figure 3. The accuracy comparison between the naïve Bayes, SVM, and neural network applied on the Wine Spectator’s reviews in dataset1, processed by the CWW2.0 and the CWW3.0.
Fermentation 09 00629 g003
Figure 4. The accuracy comparison between the naïve Bayes, SVM, and neural network applied on Robert Parker’s reviews in dataset2, processed by the CWW2.0 and the CWW3.0.
Figure 4. The accuracy comparison between the naïve Bayes, SVM, and neural network applied on Robert Parker’s reviews in dataset2, processed by the CWW2.0 and the CWW3.0.
Fermentation 09 00629 g004
Figure 5. The accuracy comparison between the naïve Bayes, SVM and neural network applied on Wine Spectator’s reviews in dataset2, processed by the CWW2.0 and the CWW3.0.
Figure 5. The accuracy comparison between the naïve Bayes, SVM and neural network applied on Wine Spectator’s reviews in dataset2, processed by the CWW2.0 and the CWW3.0.
Fermentation 09 00629 g005
Figure 6. The accuracy comparison between the data sets processed by CWW2.0 and CWW3.0 using all of the attributes with the neural network as the classifier.
Figure 6. The accuracy comparison between the data sets processed by CWW2.0 and CWW3.0 using all of the attributes with the neural network as the classifier.
Fermentation 09 00629 g006
Table 1. Wine critiques for E. Pira e Figli Barolo Mosconi 2017 by Robert Parker and Wine Spectator.
Table 1. Wine critiques for E. Pira e Figli Barolo Mosconi 2017 by Robert Parker and Wine Spectator.
Robert Parker’s ReviewWine Spectator’s Review
Made with organically grown clones of Michel and Lampia, the E Pira-Chiara Boschis 2017 Barolo Mosconi softly presents bright berry aromas, raspberry, wild cherry, crushed limestone and delicate floral tones of lilacs and violets. Like other wines from this hot vintage, this expression from the Mosconi cru in Monforte d’Alba has that unique floral signature that is precious and unexpected. The wine shows great depth and balance with a pretty intensity that spreads over the palate. The tannins are dry with some dustiness, but the mouthfeel is spot-on in terms of length and polish.Cherry and plum fruit flavors are accented by vanilla, toast, hay, white pepper and tar notes in this expressive, solidly built Barolo, which is fluid, with a dense matrix of tannins shoring up the long finish, showing fine complexity, balance and length. Best from 2025 through 2048.
Table 2. The meaning of the different range of score from Robert Parker and Wine Spectator.
Table 2. The meaning of the different range of score from Robert Parker and Wine Spectator.
Robert Parker’s Score MeaningWine Spectator’s Score Meaning
96–100 Extraordinary: worth a special effort to find, purchase, and consume.
90–95 Outstanding: terrific wines
80–89 Barely above average and very good
70–79 Average: a straightforward, innocuous wine.
60–69 Below average
50–59 Unacceptable
95–100 Classic: a great wine
90–94 Outstanding: a wine of superior character and style
85–89 Very good: a wine with special qualities
80–84 Good: a solid, well-made wine
75–79 Mediocre: a drinkable wine that may have minor flows
50–74 Not recommended
Table 3. Example of Extraction Rate Progress.
Table 3. Example of Extraction Rate Progress.
Hand-Extracted AttributesProgram-Extracted AttributesCommon Attributes
powerful, blackcurrant, black raspberries, blueberry, pie, melted chocolate, aniseed, camphor, kirsch, subtle, floral, full-bodied, concentrated, bold, seductive, fine-grained, silt-like tannins, jam-packed, tightly wound, fruit, layers, finishing, wonderful, mineral, sparks, magicpowerful, black raspberries, blueberry, pie, melted chocolate, kirsch, subtle, floral, full-bodied, concentrated, bold, seductive, jam-packed, tightly wound, fruit, layers, finishing, wonderful, mineral, sparks, purple color, tannins, explodespowerful, black raspberries, blueberry, pie, melted chocolate, kirsch, subtle, floral, full-bodied, concentrated, bold, seductive, jam-packed, tightly wound, fruit, layers, finishing, wonderful, mineral, sparks,
Total count: 26Total count: 23Total count: 20
Table 4. Improvements in the CWW3.0 from the CWW2.0.
Table 4. Improvements in the CWW3.0 from the CWW2.0.
CWW2.0CWW3.0
Data SourceWine SpectatorWine Spectator + Robert Parker
Categories1414
Subcategories3434
Specific Terms19322589
Normalized Attributes9861191
Table 5. The CWW 3.0 Statistical Record compared with the CWW2.0.
Table 5. The CWW 3.0 Statistical Record compared with the CWW2.0.
CATEGORY_NAMESUBCATEGORY_NAMESPECIFIC_NAMESPECIFIC_NAME
Added
NORMALIZED_NAMENORMALIZED_NAME
Added
CARAMELCARAMEL97265616
CHEMICALPETROLEUM11261
SULFUR110100
PUNGENT4041
EARTHYEARTHY128564716
MOLDY2020
FLORALFLORAL8726456
FRUITYBERRY84353911
CITRUS5619352
DRIED FRUIT769655
FRUIT4220167
OTHER22-39-9
TREE FRUIT5516409
TROPICAL FRUIT6719369
FRESHFRESH75344415
DRIED50253918
CANNED/COOKED182172
MEATMEAT3611218
MICROBIOLOGICALYEASTY5040
LACTIC14060
NUTTYNUTTY272205
OVERALLTANNINS1243462
BODY611117-6
STRUCTURE511120
ACIDITY612141
FINISH23349138
FLAVOR/DESCRIPTORS89924046735
OXIDIZEDOXIDIZED2121
PUNGENTHOT3020
COLD1010
SPICYSPICE852539
WOODYRESINOUS317123
PHENOLIC6051
BURNED514282
Table 6. Evaluation Matrix.
Table 6. Evaluation Matrix.
Evaluation MatrixPredicted
(Positive)
Predicted
(Negative)
Actual (Positive)TPFN
Actual (Negative)FPTN
Table 7. Ten Key Attributes from Instances with Positive Labels and Negative Labels.
Table 7. Ten Key Attributes from Instances with Positive Labels and Negative Labels.
Key Attributes from Positive Label (95+)Key Attributes from Negative Label (94-)
GREAT, BLACK CURRANT, PURPLE, FULL-BODIED, LAYER, RIPE, FRUIT, FLORAL, FIRM, BLACK FRUITMEDIUM-BODIED, WELL-BALANCED, VELVET, YOUNG, PURE, AROMA, DENSE, STYLE, TANNINS_DECENT, TOBACCO
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Le, L.; Hurtado, P.N.; Lawrence, I.; Tian, Q.; Chen, B. Applying Neural Networks in Wineinformatics with the New Computational Wine Wheel. Fermentation 2023, 9, 629. https://doi.org/10.3390/fermentation9070629

AMA Style

Le L, Hurtado PN, Lawrence I, Tian Q, Chen B. Applying Neural Networks in Wineinformatics with the New Computational Wine Wheel. Fermentation. 2023; 9(7):629. https://doi.org/10.3390/fermentation9070629

Chicago/Turabian Style

Le, Long, Pedro Navarrete Hurtado, Ian Lawrence, Qiuyun Tian, and Bernard Chen. 2023. "Applying Neural Networks in Wineinformatics with the New Computational Wine Wheel" Fermentation 9, no. 7: 629. https://doi.org/10.3390/fermentation9070629

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop