A Machine Learning Approach to Evaluating the Impact of Natural Oils on Alzheimer’s Disease Progression

Amawi, Rema M.; Al-Hussaeni, Khalil; Keeriath, Joyce James; Ashmawy, Naglaa S.

doi:10.3390/app14156395

Open AccessArticle

A Machine Learning Approach to Evaluating the Impact of Natural Oils on Alzheimer’s Disease Progression

¹

Mathematics and Sciences Department, Rochester Institute of Technology, Dubai 341055, United Arab Emirates

²

Computing Sciences Department, Rochester Institute of Technology, Dubai 341055, United Arab Emirates

³

Electrical Engineering Department, Rochester Institute of Technology, Dubai 341055, United Arab Emirates

⁴

Pharmaceutical Sciences Department, College of Pharmacy, Gulf Medical University, Ajman 4184, United Arab Emirates

⁵

Pharmacognosy Department, Faculty of Pharmacy, Ain Shams University, Cairo 11566, Egypt

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(15), 6395; https://doi.org/10.3390/app14156395

Submission received: 1 June 2024 / Revised: 16 July 2024 / Accepted: 19 July 2024 / Published: 23 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

Alzheimer’s Disease is among the major chronic neurodegenerative diseases that affects more than 50 million people worldwide. This disease irreversibly destroys memory, cognition, and the overall daily activities which occur mainly among the elderly. Few drugs are approved for Alzheimer’s Disease management despite its high prevalence. To date, the available drugs in the market cannot reverse the damage of neurons caused by the disease leading to the exacerbation of symptoms and possibly death. Medicinal plants are considered a rich source of chemical constituents and have been contributing to modern drug discovery in many therapeutic areas including cancer, infectious, cardiovascular, neurodegenerative and Central Nervous System (CNS) diseases. Moreover, essential oils that are extracted from plant organs have been reported for a wide array of biological activities, and their roles as antioxidants, antiaging, cytotoxic, anti-inflammatory, antimicrobial, and enzyme inhibitory activities. This article highlights the promising potential of plants’ essential oils in the discovery of novel therapeutic options for Alzheimer’s Disease and halting its progression. In this article, 428 compounds were reported from the essential oils isolated from 21 plants. A comparative study is carried out by employing a variety of machine learning techniques, validation, and evaluation metrics, to predict essential oils’ efficacy against Alzheimer’s Disease progression. Extensive experiments on essential oil data suggest that a prediction accuracy of up to 82% can be achieved given the proper data preprocessing, feature selection, and model configuration steps. This study underscores the potential of integrating machine learning with natural product research to prioritize and expedite the identification of bioactive essential oils that could lead to effective therapeutic interventions for Alzheimer’s Disease. Further exploration and optimization of machine learning techniques could provide a robust platform for drug discovery and development, facilitating faster and more efficient screening of potential treatments.

Keywords:

Alzheimer’s disease treatments; essential oils; natural oils; data mining; prediction

1. Introduction

Traditionally, natural products have been used for the treatment of many diseases [1]. Medicinal plants are among the major sources of traditional medicines. Moreover, several modern medicines are produced indirectly from medicinal plants [2]. Natural products have played an important role in drug discovery in many therapeutic areas, including cancer [3], infectious diseases [4], cardiovascular [5] and CNS diseases [6].

Due to the unique chemical diversity of natural products, their biological activities and drug-like properties are very diverse in comparison to synthetic drugs. Natural products have served as a structural pool for small molecular libraries to discover biologically active drugs [6]. Nowadays, natural products including medicinal plants have been considered as main targets in drug-discovery programs [7].

Plant essential oils are well known in traditional medicine as well as aromatherapy for the management of numerous diseases [8]. Essential oils are mixtures of various volatile chemical compounds and are extracted from different plant organs, such as roots, leaves, and fruits. Their biological activities have been studied and reported in the literature for decades and their popularity has only increased over time. Examples of such activities include antioxidants, antiaging, cytotoxic, anti-inflammatory, antimicrobial, and enzyme inhibitory activities [9,10,11]. Essential oils have also been reported as promising agents for the treatment of neurodegenerative diseases due to possessing strong free radical scavenging activity, and hence inhibition of oxidative stress in the body [12].

Alzheimer’s Disease is considered to be the number one chronic neurodegenerative disease, that affects more than 50 million people worldwide [13]. Being a neurodegenerative disease, it slowly and irreversibly destroys memory, cognition, as well as the ability to perform daily activities, resulting in the requirement for full-time care, which occurs mainly among individuals over 65 [14]. Being the most common cause of dementia, Alzheimer’s Disease is considered the third major death cause for the elderly, after cardiovascular diseases and cancer [15].

Despite the high prevalence of Alzheimer’s Disease, only five drugs have been approved by the Food and Drug Administration (FDA) for its management, namely galantamine, rivastigmine, donepezil, memantine, and the combination of memantine and donepezil. Moreover, none of the available drugs can reverse or stop the damage of neurons which causes Alzheimer’s Disease symptoms leading to disease mortality [16].

Among the treatment approaches for Alzheimer’s Disease is the use of choline esterase inhibitors, one of the FDA-approved drugs galantamine which inhibits the choline esterase enzyme is a natural product [17]. In this work, we biologically screen the targeted and isolated pure compounds, in addition to the selected essential oils, in order to identify potential active compounds against Alzheimer’s Disease progression.

Some essential oils have been documented to improve memory and learning abilities. Using essential oils as a potential source of treatment for Alzheimer’s could achieve improvement of patients’ quality of life in terms of cognitive abilities, mental health, and social interactions [18,19]. We study the components of 21 natural oils commercially available in the United Arab Emirates and examine their effectiveness in preventing Alzheimer’s Disease progression. Based on the amount of research interest in these oils; Amla, Anjeer, Apricot, Avocado, Chamomile, Costus, Ginger, Ginseng, Grapefruit, Gum Myrrh, Hazelnut, Henna, Juniper, Mint, Mustard, Onion, Rosemary, Sadab, Sandal, Sweet Violet, Turmeric, they were chosen to be the focus of this study to enable comparison with previous results.

Machine learning (ML) has been increasingly prevalent in the scientific literature due to its ability to analyze vast amounts of data and produce accurate results. Using ML techniques, we develop a model trained on the composition of the 21 oils and their activity against Alzheimer’s progression. The purpose of this model is to predict whether a new and previously untested oil could have potential activity against Alzheimer’s as well, which may save time in terms of laboratory testing. The potential treatments could then be provided to Alzheimer’s patients, thus offering a cost-effective solution for this age-old disease.

The paper is organized as follows. Section 2 surveys the literature for related work. Section 3 details our proposed methodology, including the employed dataset, data preparation and preprocessing, and building machine learning models. The experimental settings and results are detailed in Section 4. A discussion of the results and the application of our work is presented in Section 5. Finally, the paper concludes in Section 6.

2. Literature Review

We conduct a thorough literature review on relevant natural oils to this study. We survey the body of existing research work on the impact of natural oils in the medical domain while shedding light on effective machine-learning techniques used in such studies.

An interesting study by Abdel-Hady et al. in 2022 highlighted the benefits of Amla in the treatment of nausea, asthma, bronchitis, leucorrhoea and vomiting, in addition to having antipyretic and anti-inflammatory properties [20]. Anjeer, commonly known as Fig, was studied by a group of researchers from India in 2014 who reported antioxidant and antibacterial activities and highlighted the potential of nutritional and therapeutic benefits [21,22].

Nafis et al. in 2020 stated that apricots exhibited antimicrobial activity and can be considered for their potential in combatting multidrug-resistant strains [23], as well as Costus which displayed a similar activity according to Shafti et al. in 2015 [24]. Avocado is one of the most researched fruits as evident in the hundreds of articles published over the last two decades. In addition to its popular use in cosmetics and culinary industries, it has shown its potential in medical applications as well [25,26,27].

Chamomile, which has several applications in the pharmaceutical and cosmetics industries, is well known for its relaxant properties and has exhibited antioxidant activity according to a team of researchers in the Republic of Srpska, Bosnia and Herzegovina [28]. Similarly, Grapefruit, known for its use in fragrances, has also shown antioxidant activity in a study by researchers in Asia in 2010 [29].

For centuries, Ginger has been a very famous herb and is recognized by many cultures for its uses as a medicinal herb in treating digestive, respiratory, and other infections. A study in 2019 has also shown that Ginger has a significant potential use as an antifungal agent [30]. Gum Myrrh was also reported to have antifungal biological activity in a study by Perveen et al. in 2018 [31]. Ginseng has many therapeutical properties and applications as an antioxidant, antibacterial, and anticancer agent [32].

Hazelnut has medicinal traits and is recommended for mental fatigue and anemia, in addition to exhibiting antibacterial and antiparasitic activities [33,34]. Elaguel et al. showed that henna has substantial antioxidant activity and has a significant potential in combatting cancer [35]. Mint, a natural antioxidant, has also shown its effectiveness in the treatment of mental fatigue as reported in a study by a team of researchers from Korea in 2010 [29].

Mustard, a natural food preservative, has exhibited effective antimicrobial activities in a study by researchers in Iran in 2019 [36], while Egyptian researchers in 2015 have shown that onion, in addition to its many uses in the food, cosmetics, and medicinal industries, have exhibited antimicrobial and antioxidant activities [37].

Rosemary, a popular herb in the Mediterranean region especially in the food industry, has medicinal uses as well. It has been known to alleviate symptoms of respiratory and anxiety-related disorders, as well as other infectious diseases. Jiang et al. reported that Rosemary has significant antimicrobial activities [38]. In 2022, Shahrajabian examined the medicinal advantages of Sadab (Rue) and reported its many benefits as an anti-inflammatory, anti-hyperglycemic, and anti-hyperlipidemic among others [39,40]. In 2012, researchers from China and Japan examined Sandalwood for its biologically active components and reported its antioxidant and antitumor properties [41].

Sweet Violet is known for its therapeutic properties and for its use in the production of perfumes. It was also shown to have antioxidant and antibacterial activities as per a study conducted by Akhbari et al. in 2011 [42]. More recently, researchers in 2023 examined Turmeric given its high nutritional, industrial, and medicinal values. Specifically, due to its significant use in the food industry, it is no surprise that it has exhibited high antioxidant activities [43]. Boukhaloua et al. reported that Juniper, which is widely used in medicine, has strong antimicrobial activities [44].

The work in [45] used quantitative composition-activity relationships (QCAR) machine learning-based models to identify the chemical compounds across 61 assayed essential oils exhibiting inhibitory potency against Microsporum spp. Five different machine learning algorithms were used, namely, logistic regression (LR), support vector machines (SVM), gradient boosting (GB), k-nearest neighbor (kNN), and random forest (RF). Random Forest was found to be the best-performing model. The study also implements data augmentation for the biological data, which are characterized by high dimensionality and scarcity. This was conducted to dynamically alter the essential oil composition mixtures, addressing the challenge of standardizing essential oil composition due to plant and extraction method variations. Data augmentation was employed to reshape unbalanced datasets, enabling statistical analysis on larger datasets, reducing overfitting, and constructing reliable models. Our study follows a similar methodology.

3. Methodology

This section discusses the employed dataset and proposed methodology. First, a detailed analysis of the curated data is provided. After that, a discussion on the various data preparation and preprocessing steps is presented. Then, our proposed methodology for using machine learning to predict the impact of the 21 essential oils on Alzheimer’s Disease progression is detailed. The overall process entails dataset feature selection, dataset preprocessing, machine learning model evaluation, and model selection.

3.1. Dataset

The collected data (Made publicly available at https://github.com/researchrepo1/EssentialOilsDataset (accessed on 30 May 2024)) used in this research work comprise 21 essential oil samples and 428 chemical compounds, which represent the chemical composition percentages of these essential oils. These oil samples, available commercially in the United Arab Emirates market, were chosen based on the reported literature that indicate their potential activity against Alzheimer’s Disease progression [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,46,47,48,49,50]. The chemical composition data were gathered from the literature. Each essential oil sample was classified as “HIGH” or “LOW” in activity, with eight samples classified as “HIGH” and 13 as “LOW”. This classification was based directly on the literature available about each oil. Oils classified as “HIGH” reported significant activity relevant to Alzheimer’s Disease, such as neuroprotection or acetylcholinesterase inhibition. Oils classified as “LOW” either showed low activity, or there was insufficient literature to conclusively determine their efficacy.

The 21 essential oils examined in this study contain 428 chemical compounds in total, as reported in the literature. Out of which, 12 compounds were present in four or more essential oils. These compounds, along with their concentrations, are shown in Figure 1, Figure 2 and Figure 3, which portray a visual representation of how each compound is concentrated in each oil.

As per Figure 1, the compound 1,8-Cineole is concentrated the highest in Rosemary oil at

26.54

%, and is found with much lower concentrations in the remaining oils: Sweet Violet at

1.92

%, Costus at

1.73

%, Hazelnut at 1%, and in negligible concentrations with the remaining oils. The compound Camphene is concentrated the highest in Ginger oil at

32.79

%, followed by Rosemary oil at

11.38

% and Costus oil at

4.96

%. It was found in less than 1% concentrations in the remaining oils. The compound Camphor is mostly concentrated in Rosemary oil at

12.88

%, in Costus oil at

2.11

%, in Sweet Violet Oil at

0.92

% and in Henna at

0.27

%. It is noticeably clear that Limonene is highly concentrated in Grapefruit oil at

94.20

%, in Juniper oil at

12.10

%, and at much lower concentrations in the remaining oils. The compound Linalool is concentrated the highest in Apricot oil at

6.38

%, in Sweet Violet oil at 3.06%, in Mint oil at

2.22

%, in Henna oil at

1.58

% and in less than 1% concentrations in the remaining oils. The compound Spathulenol is concentrated the highest in Sweet Violet oil at

2.54

% and in Hazelnut oil at

1.80

%. It is also found in Chamomile and Anjeer oils at

0.20

% and

0.10

% concentrations, respectively.

In Figure 2, it is noticed that the compound

α

-Pinene is concentrated the highest in Juniper oil at

29.10

%, followed closely by Rosemary oil at

20.14

% and in Ginger oil at

18.05

%. It is also found in less than 2% concentrations in the remaining oils. The compound

α

-Terpineol is concentrated the highest in Hazelnut oil at

2.30

%, in Rosemary oil at

1.95

%, in Mint oil at

0.23

% and in Costus oil at

0.11

%, while the compound

α

-Thujene is concentrated the highest in Juniper oil at

2.30

%, and in smaller concentrations in Rosemary, Grapefruit, Chamomile oils at

0.27

%,

0.24

% and

0.20

%, respectively. The compound

β

-Elemene is concentrated the highest in Gum Myrrh oil at

2.20

%, followed by Ginsing oil at

1.50

%, at

0.40

% concentration in Chamomile oil, and at a negligible

0.04

% concentration in Costus oil. The compound

β

-Pinene is primarily concentrated in Juniper oil at

17.60

% and at a much lower concentration of

6.59

% in Rosemary oil. It is also found in several other oils but at less than 3% concentration. Finally, the compound

γ

-Cadinene is concentrated the highest in Gum Myrrh at

2.30

%, in Apricot oil at

1.62

%, and in both Chamomile and Juniper oils at

0.10

%.

By examining the heatmap in Figure 3, it is evident that the concentration of the compound Limonene in Grapefruit oil at

94.2

% is the highest among the concentrations of all the chemical compounds in the 21 oils. This is followed by the concentration of the compound Camphene in Ginger oil at

32.79

%. It is worth noting that among the highest concentrations is the concentration of the compound 1,8-Cineole in Rosemary oil at

26.54

%, and the concentration of the compound

α

-Pinene in Juniper oil at

29.1

%, in Rosemary oil at

20.14

% and in Ginger oil at

18.05

%. Furthermore, Chamomile, Costus and Rosemary oils contain nine out of the twelve compounds examined in the study. On the other hand, these 12 compounds have no concentrations in the following oils: Amla, Mustard, Onion, Sadab, Sandal, and Turmeric.

Since our dataset contains 428 chemical compounds in total, Table 1 provides a detailed overview of the concentrations of only some chemical compounds found in the 21 essential oils at hand, along with their activity label (“HIGH” or “LOW”) against the Alzheimer’s Disease progression. Each row represents an essential oil sample, and the columns list the percentage concentrations of various compounds, including Camphene, Limonene, Linalool,

α

-Pinene,

β

-Pinene, and 1,8-Cineole. The last column labels the activity of each essential oil, as either “HIGH” or “LOW”, where “High” indicates reported significant activity of the oil and “LOW” indicates low activity or not enough literature availability about the essential oil.

3.2. Feature Selection

As reported in Table 1, our dataset has more than 400 attributes. This data phenomenon is known as the curse of high-dimensionality, whereby data points are sparse in the high-dimensional space, resulting in a negative impact on the predictive accuracy of machine learning models [51].

Due to the extremely high dimensionality of our dataset, a rigorous feature selection process is implemented to enhance the predictive accuracy of the machine learning models. This process entails creating various versions of the dataset by systematically excluding chemical compound features that exhibit relatively low cumulative percentages across all essential oil samples.

Each chemical compound could theoretically achieve a maximum percentage of 100% within a single essential oil, yielding a total possible sum of 2100% across all 21 essential oils. In practice, the highest observed cumulative percentage was for Limonene, with a value of 114.97%. To determine which features to retain, percentage thresholds were established based on this maximum value. 11 distinct threshold values, particularly 0.1, 0.2, 0.5, 1, 2, 5, 10, 20, 30, 40, and 50, were enforced, resulting in 11 dataset variations (subsets). Additionally, the original dataset, incorporating all chemical compound features (where the sum is greater than 0), is also included for comparison. This approach results in twelve unique datasets, each progressively refining the feature set to enhance the model’s efficacy in predicting the activity of essential oils against Alzheimer’s Disease progression, from a computational perspective.

Table 2 specifies the 11 threshold values enforced in the feature selection process, the corresponding number of resulting chemical compound features, and the designated names for each dataset. The experiments in Section 4 are carried out on all the 11 + 1 (original) datasets.

3.3. Preprocessing

Machine learning algorithms may require numerical data in specific formats for optimal performance. Initially, our dataset contains textual labels for activity and unformatted numerical data for features. To facilitate data processing, the binary classes “HIGH” and “LOW” are converted to numerical values using Label Encoding. Label Encoding is a technique that transforms categorical text data into numerical values. In this case, the labels “HIGH” and “LOW” are encoded as 1 and 0, respectively.

Furthermore, machine learning algorithms generally perform better when the numerical values are normalized within a specific range, typically between 0 and 1. However, the values in our datasets ranged from 0 to 100, representing percentage compositions. To address this issue, the Min-Max Scaling technique is employed. Min-Max Scaling transforms the data by rescaling each feature to a specified range, usually 0 to 1. This is achieved by subtracting the minimum value of the feature from each data point and then dividing by the range of the feature (the difference between the maximum and minimum values), as per Equation (1).

X^{'} = \frac{X - X_{m i n}}{X_{m a x} - X_{m i n}}

(1)

In Equation (1), X is the original value,

X_{m i n}

is the minimum value of the feature,

X_{m a x}

is the maximum value of the feature, and

X^{'}

is the scaled value. This normalization process ensures that the selected machine learning models are effectively trained, as the dataset now contains standardized feature values to a common scale.

Further preprocessing beyond these steps was not necessary, as the dataset does not contain any outliers. All numerical values are between 0 and 100. Moreover, the dataset does not contain any missing values as all chemical compounds were accounted for during the process of collecting their data from the literature.

3.4. Model Training

The subsequent step in developing a machine learning model involves splitting the dataset into training and testing sets. Traditionally, the training set comprises 70% of the samples, with the remaining 30% allocated to the testing set [52]. This approach is generally more effective for larger datasets containing at least 100 samples. Given that the current dataset consists of only 21 samples, a 50%–50%split with stratification is employed.

Stratification is particularly important due to the imbalanced nature of the original dataset, which has a ratio of 8:13 for “HIGH” to “LOW” activity labels. Stratification ensures that each subset (training and testing) maintains a distribution of class labels close to the original dataset. By employing stratified sampling, the ratio of “HIGH” to “LOW” labels is preserved within both the training and testing sets. This ensures that the model is exposed to a representative distribution of the classes, enhancing prediction accuracy in classification tasks.

3.5. Model Evaluation

The effectiveness of a machine learning model is evaluated based on its accuracy in predicting the correct label of an essential oil sample in the underlying dataset (see Table 1 and Table 2). Particularly, two accuracy metrics are employed: testing accuracy and Leave-one-out cross-validation (LOOCV) accuracy.

Cross-validation (CV) is a model evaluation technique where the sample set is split into k subsets, and model fitting and prediction are performed k times. The testing set will be formed of one particular subset, out of the k subsets, and the remaining subsets will be the training set. This method is useful for avoiding model overfitting and also for effectively evaluating the performance of a model on a smaller dataset.

Leave-one-out cross-validation (LOOCV) is a form of k-fold cross-validation, where k = n, and n is the number of samples in the dataset. In the n iterations, the testing set will be the kth sample and the remaining samples will be the training set. To evaluate model performance using LOOCV, the accuracies of the model in predicting the activity of the testing sample in the n training iterations are averaged to give the overall LOOCV accuracy of the model.

Due to the extremely small size of our dataset in terms of the number of samples, several other evaluation methods were experimented with to identify the best ones. These evaluation methods include 50:25:25 and 70:15:15 ratios of training-validation-testing sets, and mean cross-validation using k = 3, 5, 7, and LOOCV. Out of these methods, evaluating a model using 50% training and 50% testing sets and using LOOCV reported the highest accuracies. Table 3 summarizes these experiments. Henceforth, all our experiments will be evaluated using the 50% testing set and LOOCV. The accuracies of a machine learning model in predicting the activity of essential oils are hereby referred to as testing accuracy and LOOCV accuracy, respectively.

3.6. Model Selection

The problem of predicting the class label of an essential oil sample is a classification problem. This section discusses building three widely-used classification models: k-Nearest Neighbours, Logistic Regression, and Random Forest. For each one of these models, this section discusses the selection of the various model parameters. All these algorithms were implemented using Python’s scikit-learn library [45,53,54].

3.6.1. k-Nearest Neighbours

k-Nearest Neighbors (kNN) is a supervised learning classification algorithm that operates by plotting samples in a multidimensional space, where the number of features corresponds to the number of dimensions. The class label of a sample is predicted by identifying the k nearest neighbors in this multidimensional space and taking a majority vote of these neighbors’ labels [55].

It is crucial to select an appropriate value for k to avoid ties in the voting process. For binary classification tasks, k should be an odd number to prevent ties. In cases where the classification involves three classes, k should be chosen such that it is not a multiple of three, minimizing the likelihood of tie votes. This careful selection of k ensures accurate classification outcomes.

To explore the impact of different parameters on the performance of the kNN algorithm, the scikit-learn implementation was utilized [53]. Specifically, an odd number of neighbors (the n_neighbors parameter) is chosen between 1 and 9, and two distinct settings for the weights parameter, uniform and distance, are used. The uniform setting assigns equal weights to each neighbor, while the distance setting assigns larger weights to closer samples [53]. This configuration yields 10 distinct kNN models, each characterized by a combination of parameters. These models are listed and named in Table 4.

3.6.2. Logistic Regression

Logistic Regression (LR) is a supervised learning binary classification algorithm that predicts the probability of a sample belonging to a certain class. Equation (2) formulates this process, where the equation takes all input features as weighted variables, and the output would be either 0 or 1 with each extremity representing the two classes [56].

y = w_{0} + w_{1} x_{1} + w_{2} x_{2} + w_{3} x_{3} + \dots + w_{n} x_{n}

(2)

In Equation (2), y is either 0 or 1, the input features are

x_{1}, x_{2}, x_{3},

⋯,

x_{n}

, and the weights of the features are given by

w_{1}, w_{2}, w_{3},

⋯,

w_{n}

, where n is the number of features.

The output can be plotted as an S-shaped curve, which is also called a Sigmoid function. The curve helps to visualize the predicted probabilities and how outputs are assigned. This model is called regression since the prediction is a range of values between 0 and 1, but it performs binary classification since the output can only have two possible values, 0 or 1.

During the model training process, the weights are calculated based on the output values. Then, during model evaluation, the testing samples are assigned output values by substituting the values of their input features into the equation, multiplying each of them with their respective calculated weights, adding them up, and the resulting output would be assigned 0 or 1 based on the value to which it is closest.

An optimization algorithm is used to find the optimal weights and bias during training. Conventionally, gradient descent is used for this purpose. An LR Model trained on the same dataset with different optimization algorithms may result in different outcomes.

To prevent overfitting the S-shaped curve to the training data, there are regularization methods called

L 1

Loss and

L 2

Loss functions. An LR model can be trained using either one of these loss functions or neither of them. Implementing a loss function improves model performance by penalizing incorrect predictions.

Using Python’s scikit-learn library, we experimented with several parameters for Logistic Regression. The

p e n a l t y

parameter trains the LR model with either

L 1

or

L 2

loss functions (or none) to regularize the data and avoid overfitting to the training set. The

c l a s s_w e i g h t

parameter is

N o n e

by default but can be set to

b a l a n c e d

for an imbalanced training set. This assigns samples of the minority class with a higher weight and majority class samples with a lower weight. This may improve model performance. The

s o l v e r

parameter implements different optimization algorithms. The C parameter specifies the strength of regularization, whereas a smaller C would specify stronger regularization [54].

We experimented with the

p e n a l t y

,

s o l v e r

, and C parameters and found that

c l a s s_w e i g h t

=

‘ b a l a n c e d ’

and

p e n a l t y

=

‘ L 2 ’

worked well for all datasets. We then varied C and

s o l v e r

, and evaluated the performance of the logistic regression models. Table 5 summarizes the different LR models resulting from parameter variations.

3.6.3. Random Forest

Random Forest (RF) is an ensemble supervised learning technique that can be used for classification and regression applications [57]. To understand how RF works, one must first understand how Decision Trees (DT) works. Decision Trees is another machine learning technique that is used for classification and regression. It works as a flowchart, splitting the dataset based on different features at different levels, to ultimately predict the label of a target sample [58].

Random Forest implements multiple decision trees, where a random subset of features is dropped in each decision tree, thus minimizing the chance of overfitting to the training data. The class label of a sample is predicted by feeding that sample’s feature values into each decision tree, obtaining each decision tree’s predicted class label, and then taking the majority of votes to assign the output class label [59].

An RF model’s accuracy is based on multiple parameters, namely the number of decision trees, the best-split algorithm, and whether the tree has been pre-pruned or not. Pre-pruning refers to limiting the growth of the trees. This is conducted to further ensure that the model does not overfit. One way of conducting this is by limiting the maximum depth of the trees or by varying certain parameters during training.

The n_estimators parameter refers to the number of decision trees used in the random forest. The criterion parameter specifies the method of choosing the best feature for splitting the data at each node. We experimented with the gini, log_loss, and cross-entropy methods for criterion and discovered that the log_loss method results in a better performance. The max_depth and ccp_alphas parameters are used to pre-prune the trees. max_depth refers to the maximum depth of each decision tree in the RF model, whereas ccp_alphas is a constant that specifies which sub-trees are allowed in a singular tree based on their cost complexity. By limiting the depth and complexity of the trees, we can ensure better performance. Table 6 summarizes the different RF models resulting from parameter variations.

4. Results

In this section, we carry out extensive experiments using the three classification methods detailed in Section 3.6 and datasets in Table 2. All the classification models are implemented using Python’s scikit-learn library [45,53,54]. The performance of a model is evaluated based on its prediction accuracy using two metrics: testing accuracy and LOOCV accuracy.

Certain models exhibited a notable trend wherein performance seemed to improve across the last five datasets (Table 2). These datasets, characterized by cumulative percentage compositions of each chemical compound exceeding 10%, were subject to focused examination. In visualizing the outcomes of these experiments, the average accuracy across all twelve datasets is computed. Additionally, to provide nuanced insights, the results from the last five datasets are visualized separately. This approach helps in identifying datasets that demonstrate better performance across all models.

4.1. k-Nearest Neighbors Results

The ten kNN models specified in Table 4 were evaluated by plotting the varying parameters with testing and LOOCV accuracies. Figure 4 describes the performance of these kNN models.

From Figure 4a,b, it can be seen that the best-performing model is kNN_M4. With reference to Table 4, this result suggests that

k = 3

and weights = ‘distance’ are the best model parameters. It is also worth noting that the testing accuracies are higher than the LOOCV accuracies. This may indicate a bias in our training set, despite stratification.

4.2. Logistic Regression Results

The twelve LR models specified in Table 5 are evaluated by plotting their testing and LOOCV accuracies. Figure 5 shows the accuracies of all the models averaged across all the twelve datasets, and the accuracies across the last five datasets only.

Figure 5a,b report that model LR_M10 seems to perform the best. With reference to Table 5,

C = 0.1

, solver = ‘liblinear’, penalty = L2, and class_weight = ‘balanced’ are the best model parameters. It is also worth noting that there is a difference between the testing and LOOCV accuracies, and this may indicate a bias in our training data. It is also likely that the LR models are not performing well as the chemical composition data may not be linearly related.

4.3. Random Forest Results

Table 6 summarizes three RF models, RF_1, RF_2, and RF_3, by varying the n_estimators, max_depth, and ccp_alphas parameters. That is, for each model, one parameter is varied while the other two are fixed. The accuracy results against testing and LOOCV metrics of each RF model are plotted twice, once using all the twelve datasets and another using only the last five datasets. These results are plotted in Figure 6, Figure 7 and Figure 8, for each model, respectively.

Figure 6a,b report the performance of model RF_1. Both figures show that the graphs peak at n_estimators

= 6

and 12, suggesting that these two parameter values result in the model having the highest accuracy, while the other two parameters are fixed.

Looking at Figure 7a,b, which plot the performance of model RF_2, we can see the disparate performance of the same model against the two evaluation metrics, testing and LOOCV. LOOCV results in much better prediction accuracy than testing, where the model performance peaks around max_depth = 4 for LOOCV.

Evaluating the performance of model RF_3 in Figure 8a,b, the accuracy is highest at ccp_alphas

= 0.25

for the LOOCV metric.

Based on Figure 6, Figure 7 and Figure 8, we can conclude that the best model parameters for the RF model would be n_estimators

= 12

, max_depth

= 4

, and ccp_alphas

= 0.25

.

All the experiments conducted thus far used the datasets whose dimensionality was reduced (from 428 dimensions) as per the feature selection process presented in Section 3.2. To maintain a more objective evaluation approach, we employ Principal Component Analysis (PCA) for dimensionality reduction [60,61]. PCA transforms a high-dimensional dataset by projecting data points onto a new subspace defined by newly created dimensions, i.e., principal components. The idea is to utilize the original dimensions that contribute the most to explaining the data. The latter is conducted through the variance. That is, we perform PCA on our curated dataset (see Section 3.1) where the principal components explain 90% and 95% of the variance, respectively. This leads to the creation of two datasets; the first dataset contains 16 principal components, and the second dataset contains 18 principal components, respectively.

In Figure 9, PCA was conducted such that 90% of the data variance is maintained. This resulted in a dataset that contains 16 dimensions. We observe from both Figure 9a,b that the prediction model exhibits overfitting when trained and tested using a 50%–50% split. As such, it would not be fair to accept or conclude the high accuracy of 90.91% reported in Figure 9b.

Similarly, Figure 10 reports the prediction results from using PCA where 95% of the data variance is explained by 18 principal components (dimensions). Figure 10a depicts fluctuation of accuracy across the different kNN models (see Table 4), but the model stabilizes under the configuration of “kNN_M9” and “kNN_M10”. On the other hand, Figure 10b shows a robust and stable performance across most of the Logistic Regression models (see Table 5) and far fewer overfitting issues. Consequently, under “LR_M11”, the highest reported accuracy can be considered.

5. Discussion

This section discusses the results obtained from our experiments and adds context to their applicability in clinical studies.

5.1. Experimental Results

In Section 4, we extensively experimented with three prominent classification models, kNN, LR, and RF, by varying their respective parameters and underlying datasets. Table 7 summarizes the parameters under which each model performs the best across all twelve datasets (Table 2).

Next, we compare the accuracy results from using each of the twelve datasets from Table 2 in order to determine which dataset performs the best overall. The winner dataset will be used in our last experiment later in this section. We apply the best-performing kNN, LR, and RF models to each of the twelve datasets, then we take the average accuracy across the three models.

Figure 11 plots the average performance of the three best models against each of the twelve datasets. The average testing and LOOCV accuracies resulting from each dataset are reported. Comparing all twelve datasets, more_than_50_compounds results in the highest accuracy. It is also worth noting that, in general, the average LOOCV accuracy across the different datasets is higher than the average testing accuracy. This observation has been consistent in all the Logistic Regression and Random Forest experiments depicted in Figure 5, Figure 6, Figure 7 and Figure 8.

To determine which of the three models performs best, we apply the best kNN, LR, and RF models from Table 7 on the best dataset determined in Figure 11, i.e., more_than_50_compounds. Figure 12 depicts this comparison. The Logistic Regression and Random Forest models having the parameters listed in Table 7 result in the highest accuracy of nearly 81% on the LOOCV metric, while kNN leads on the testing metric with an accuracy of nearly 82%. This conclusion corroborates our observations from the literature [62].

Due to the large number of features in our original dataset (see Section 3.1), one key aspect of this study is dimensionality reduction. Feature selection was conducted following our proposed method described in Section 3.2, and using Principal Component Analysis (PCA) [60,61]. Both dimensionality reduction methods resulted in similar model performances, achieving around 81% accuracy. Figure 12 also depicts the best result achieved by preprocessing the dataset using PCA.

5.2. Application

This study harnesses machine learning to explore the potential therapeutic properties of essential oils in treating Alzheimer’s Disease. Our study underscores the potential of integrating computational models with traditional pharmacological approaches to accelerate the discovery of novel therapeutic agents.

The biological relevance of selected compounds provides insights into possible mechanisms of action against Alzheimer’s Disease progression. 1,8-Cineole, commonly found in eucalyptus oil, has been documented for its anti-inflammatory and antioxidant properties [63]. Research indicates that 1,8-Cineole can modulate the activity of key enzymes and inflammatory mediators in the brain, which are crucial in the pathology of Alzheimer’s Disease. By potentially reducing oxidative stress 1,8-Cineole is reported for its significant antioxidant and anti-Alzheimer’s Disease activities, which provide a potential medicinal approach for Alzheimer’s Disease [64].

Camphene, another significant compound identified, exhibits strong antioxidant properties that may protect neuronal cells from oxidative stress which is a known contributor to Alzheimer’s Disease. Found in high concentrations in ginger and rosemary oils, Camphene’s role in reducing oxidative damage underscores its relevance in slowing disease progression [65].

Additionally,

α

-Pinene has demonstrated potential in inhibiting acetylcholinesterase, an enzyme associated with the degradation of the neurotransmitter acetylcholine, which is notably diminished in Alzheimer’s Disease patients. By modulating acetylcholine levels,

α

-Pinene may improve cognitive function and communication between neurons, providing a plausible mechanism through which essential oils could benefit Alzheimer’s Disease patients [66].

Understanding the interactions of these bioactive compounds with the biological pathways affected by Alzheimer’s Disease allows us to hypothesize their mechanisms of action. This not only enhances our understanding of disease pathology but also directs future research toward interventions that could modulate these pathways more effectively. For instance, the anti-inflammatory properties of 1,8-Cineole could be leveraged to develop treatments that target inflammation-related pathways in Alzheimer’s Disease.

The insights gained from our experimental analysis suggest new directions for research into the therapeutic potential of essential oils and their constituents. Further in vivo and clinical studies are necessary to validate these findings and to assess the efficacy and safety of these compounds in human populations.

This study proposes a framework for integrating essential oils into existing treatment strategies. Compounds such as 1,8-Cineole, Camphene, and

α

-Pinene, highlighted for their neuroprotective and anti-inflammatory properties, could synergistically enhance the effectiveness of current pharmacological treatments like cholinesterase inhibitors and NMDA receptor antagonists. Moreover, the antioxidative properties of these compounds suggest their use in preventative strategies aimed at high-risk populations, potentially delaying or even preventing the onset of Alzheimer’s symptoms. This study invites a re-evaluation of current treatment protocols and encourages the development of holistic, multi-targeted treatment approaches. These insights could foster innovative clinical trials and might lead to the development of a new class of neuroprotective medications that address the complex pathophysiology of Alzheimer’s Disease more comprehensively.

6. Conclusions and Future Work

This study was conducted in the hope of bridging the gap between traditional knowledge of essential oils and modern machine learning techniques and offering a unique perspective on Alzheimer’s Disease therapeutics. Particularly, this study investigates the predictive capability of machine learning techniques on the impact of plants’ essential oils on Alzheimer’s Disease progression.

We curated data from 21 essential oils commercially available in the United Arab Emirates market. The 21 essential oil samples have a total of 428 chemical compounds, which, from a data analytics perspective, result in an extremely high-dimensional dataset that renders predictive models useless due to data sparsity. Extensive data preprocessing and feature selection steps were carried out in order to increase the efficacy of the applied machine learning models.

Several classification models were configured and evaluated against two accuracy metrics. Experimental results showcased promising accuracy in predicting essential oils activity against Alzheimer’s Disease progression, and suggest that Random Forest and Logistic Regression have the potential to be nearly 82% accurate.

We emphasize that our findings require further in vivo and clinical studies are necessary to validate these findings and to assess the efficacy and safety of these compounds in human populations. This is an important and natural progression to this work.

While our models exhibited robust performance, the exploration must continue. One potential future work is to validate the findings of this study clinically to measure the effectiveness and safety of essential oil compounds in human populations. Another future work is to employ and discuss more feature selection and model validation techniques and compare their impact on prediction capability. A third future direction would be employing data augmentation techniques and integrating neural networks to unravel complex relationships within chemical features. This holistic approach aims to enhance model generalization and uncover nuanced patterns, contributing to a more comprehensive understanding of essential oils’ efficacy in preventing Alzheimer’s Disease progression and exploring potential treatments.

Author Contributions

Conceptualization, R.M.A., N.S.A. and K.A.-H.; Methodology, R.M.A., K.A.-H. and J.J.K.; Software, J.J.K.; Validation, K.A.-H. and J.J.K.; Formal analysis, K.A.-H., and J.J.K.; Investigation, J.J.K.; Resources, R.M.A.; Data curation, R.M.A. and N.S.A.; Writing—original draft, R.M.A. and J.J.K.; Writing—review and editing, K.A.-H., R.M.A. and N.S.A.; Visualization, J.J.K. and R.M.A.; Supervision, R.M.A. and K.A.-H.; Project administration, R.M.A. and K.A.-H. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported in part by Research Fund 2023-24-1004 from Rochester Institute of Technology, Dubai.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data that support the findings of this article are publicly available on GitHub at https://github.com/researchrepo1/EssentialOilsDataset (accessed on 30 May 2024).

Acknowledgments

The authors would like to thank the reviewers for their constructive feedback that contributed to the enhancement of this study.

Conflicts of Interest

The authors have no competing interest to declare that are relevant to the content of this article.

Abbreviations

The following abbreviations are used in this manuscript:

FDA	Food and Drug Administration
CNS	Central Nervous System
SVM	Support Vector Machine
kNN	k-Nearest Neighbor
LR	Logistic Regression
RF	Random Forest

References

Fabricant, D.S.; Farnsworth, N.R. The Value of Plants Used in Traditional Medicine for Drug Discovery. Environ. Health Perspect. 2001, 109, 69–75. [Google Scholar] [PubMed]
Hosseinzadeh, S.; Jafarikukhdan, A.; Hosseini, A.; Armand, R. The Application of Medicinal Plants in Traditional and Modern Medicine: A Review of Thymus vulgaris. Int. J. Clin. Med. 2015, 6, 635–642. [Google Scholar] [CrossRef]
Atanasov, A.G.; Waltenberger, B.; Pferschy-Wenzig, E.M.; Linder, T.; Wawrosch, C.; Uhrin, P.; Temml, V.; Wang, L.; Schwaiger, S.; Heiss, E.H.; et al. Discovery and resupply of pharmacologically active plant-derived natural products: A review. Biotechnol. Adv. 2015, 33, 1582–1614. [Google Scholar] [CrossRef] [PubMed]
Adegboye, O.; Field, M.; Kupz, A.; Pai, S.; Sharma, D.; Smout, M.; Wangchuk, P.; Wong, Y.; Loiseau, C. Natural-Product-Based Solutions for Tropical Infectious Diseases. Clin. Microbiol. Rev. 2021, 34, e0034820. [Google Scholar] [CrossRef] [PubMed]
Yousaf, M.; Razmovski-Naumovski, V.; Zubair, M.; Chang, D.; Zhou, X. Synergistic Effects of Natural Product Combinations in Protecting the Endothelium Against Cardiovascular Risk Factors. J. Evid.-Based Integr. Med. 2022, 27, 2515690X221113327. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; He, Y.; Jiang, X.; Jiang, H.; Shen, J. Nature brings new avenues to the therapy of central nervous system diseases-An overview of possible treatments derived from natural products. Sci. China Life Sci. 2019, 62, 1332–1367. [Google Scholar] [CrossRef]
Zhong, Z.; He, X.; Ge, J.; Zhu, J.; Yao, C.; Cai, H.; Ye, X.Y.; Xie, T.; Bai, R. Discovery of small-molecule compounds and natural products against Parkinson’s disease: Pathological mechanism and structural modification. Eur. J. Med. Chem. 2022, 237, 114378. [Google Scholar] [CrossRef] [PubMed]
Ayaz, M.M.; Sadiq, A.; Junaid, M.; Ullah, F.; Subhan, F.; Ahmed, J. Neuroprotective and Anti-Aging Potentials of Essential Oils from Aromatic and Medicinal Plants. Front. Aging Neurosci. 2017, 9, 168. [Google Scholar] [CrossRef]
Khan, A.; Amjad, M.S.; Saboon. GC-MS analysis and biological activities of Thymus vulgaris and Mentha arvensis essential oil. Turk. J. Biochem. 2019, 44, 388–396. [Google Scholar] [CrossRef]
Rahmi, D.; Yunilawati, R.; Jati, B.; Setiawati, I.; Riyanto, A.; Batubara, I.; Astuti, R. Antiaging and Skin Irritation Potential of Four Main Indonesian Essential Oils. Cosmetics 2021, 8, 94. [Google Scholar] [CrossRef]
Tu, P.T.B.; Tawata, S. Anti-Oxidant, Anti-Aging, and Anti-Melanogenic Properties of the Essential Oils from Two Varieties of Alpinia zerumbet. Molecules 2015, 20, 16723–16740. [Google Scholar] [CrossRef] [PubMed]
Tomaino, A.; Cimino, F.; Zimbalatti, V.; Venuti, V.; Sulfaro, V.; De Pasquale, A.; Saija, A. Influence of Heating on antioxidant activity and the chemical composition of some spice essential oils. Food Chem. 2005, 89, 549–554. [Google Scholar] [CrossRef]
Moore, M.; Díaz-Santos, M.; Vossel, K. Alzheimer’s Association 2021 Facts and Figures Report. Alzheimer’s Assoc. 2021, 17, 327–406. [Google Scholar]
Chen, X.; Drew, J.; Berney, W.; Lei, W. Neuroprotective Natural Products for Alzheimer’s Disease. Cells 2021, 10, 1309. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
James, B.; Leurgans, S.; Hebert, L.; Scherr, P.; Yaffe, K.; Bennett, D. Contribution of Alzheimer disease to mortality in the United States. Neurology 2014, 82, 1045–1050. [Google Scholar] [CrossRef]
Silva, M.; Loures, C.; Alves, L.; Cruz de Souza, L.; Borges, K.; Carvalho, M. Alzheimer’s disease: Risk factors and potentially protective measures. J. Biomed. Sci. 2019, 26, 33. [Google Scholar] [CrossRef]
Scott, L.; Goa, K. Galantamine: A review of its use in Alzheimer’s disease. Drugs 2000, 60, 1095–1122. [Google Scholar] [CrossRef] [PubMed]
Woo, C.C.; Miranda, B.; Sathishkumar, M.; Dehkordi-Vakil, F.; Yassa, M.A.; Leon, M. Overnight olfactory enrichment using an odorant diffuser improves memory and modifies the uncinate fasciculus in older adults. Front. Neurosci. 2023, 17, 1200448. [Google Scholar] [CrossRef]
Wegener, B.A.; Croy, I.; Haehner, A.; Hummel, T. Olfactory training with older people: Olfactory training. Int. J. Geriatr. Psychiatry 2017, 33, 212–220. [Google Scholar] [CrossRef]
Abdel-Hady, H.; Morsi, E.; El-wakil, E. In-vitro Antimicrobial Potentialities of Phylunthus emblica Leaf Extract against Some Human Pathogens. Egypt. J. Chem. 2021. [Google Scholar] [CrossRef]
Soni, N.; Mehta, S.; Satpathy, G.; Gupta, R. Estimation of nutritional, phytochemical, antioxidant and antibacterial activity of dried fig (Ficus carica). Pharmacogn. Phytochem. 2014, 3, 158–165. [Google Scholar]
Ayoub, N.; Singab, A.N.; Mostafa, N.; Schultze, W. Volatile Constituents of Leaves of Ficus carica Linn. Grown in Egypt. J. Essent. Oil Bear. Plants 2013, 13, 316–321. [Google Scholar] [CrossRef]
Nafis, A.; Ayoub, K.; Chaima, A.; Custódio, L.; Vitalini, S.; Iriti, M.; Hassani, L. A Comparative Study of the in Vitro Antimicrobial and Synergistic Effect of Essential Oils from Laurus nobilis L. and Prunus armeniaca L. from Morocco with Antimicrobial Drugs: New Approach for Health Promoting Products. Antibiotics 2020, 9, 140. [Google Scholar] [CrossRef]
Thambi, M.; Shafi, M. Rhizome Essential Oil Composition of Costus Speciosus and its Antimicrobial Properties. Int. J. Pharm. Res. Allied Sci. 2015, 4, 28–32. [Google Scholar]
Flores, M.; Saravia, C.; Vergara, C.; Avila, F.; Valdés, H.; Ortiz-Viedma, J.; Miranda, J. Avocado Oil: Characteristics, Properties, and Applications. Molecules 2019, 24, 2172. [Google Scholar] [CrossRef]
Woolf, A.; Wong, M.; Eyres, L.; Mcghie, T.; Lund, C.; Olsson, S.; Wang, Y.; Bulley, C.; Wang, M.; Friel, E.; et al. Avocado Oil. In Gourmet and Health-Promoting Specialty Oils; Elsevier: Amsterdam, The Netherlands, 2009; pp. 73–125. [Google Scholar] [CrossRef]
Sagrero-Nieves, L.; Bartley, J.P. Volatile components of avocado leaves (Persea americana mill) from the Mexican race. J. Sci. Food Agric. 1995, 67, 49–51. [Google Scholar] [CrossRef]
Stanojević, L.; Marjanović-Balaban, Z.R.; Kalaba, V.; Stanojević, J.; Cvetkovic, D. Chemical Composition, Antioxidant and Antimicrobial Activity of Chamomile Flowers Essential Oil (Matricaria chamomilla L.). J. Essent. Oil Bear. Plants 2016, 19, 2017–2028. [Google Scholar] [CrossRef]
Yang, S.A.; Jeon, S.K.; Lee, E.J.; Shim, C.H.; Lee, I.S. Comparative study of the chemical composition and antioxidant activity of six essential oils and their components. Nat. Prod. Res. 2010, 24, 140–151. [Google Scholar] [CrossRef] [PubMed]
Bucur, L. GC-MS analysis and bioactive properties of zingiberis rhizoma essential oil. FARMACIA 2020, 68, 280–287. [Google Scholar] [CrossRef]
Perveen, K.; Bokhari, N.A.; Siddique, I.; Al-Rashid, S.A. Antifungal Activity of Essential Oil of Commiphora molmol Oleo Gum Resin. J. Essent. Oil Bear. Plants 2018, 21, 667–673. [Google Scholar] [CrossRef]
Jiang, R.; Sun, L.; Wang, Y.; Liu, J.; Liu, X.; Feng, H.; Zhao, D. Chemical Composition, and Cytotoxic, Antioxidant and Antibacterial Activities of the Essential Oil from Ginseng Leaves. Nat. Prod. Commun. 2014, 9, 865–868. [Google Scholar] [CrossRef] [PubMed]
Najda, A.; Gantner, M. Chemical composition of essential oils from the buds and leaves of cultivated hazelnut. Acta Sci. Polonorum. Hortorum Cultus Ogrod. 2012, 11, 91–100. [Google Scholar]
Zhou, Q.; Han, J.; Lyu, C.; Meng, X.; Tian, J.; Tan, H. Construction and Adulteration Detection Based on Fingerprint of Volatile Components in Hazelnut Oil. J. Food Nutr. Res. 2022, 10, 164–174. [Google Scholar] [CrossRef]
Elaguel, A.; Kallel, I.; Gargouri, B.; Ben Amor, I.; Hadrich, B.; Ben Messaoud, E.; Gdoura, R.; Lassoued, S.; Gargouri, A. Lawsonia inermis essential oil: Extraction optimization by RSM, antioxidant activity, lipid peroxydation and antiproliferative effects. Lipids Health Dis. 2019, 18, 196. [Google Scholar] [CrossRef] [PubMed]
Milani, M.; Dana, M.G.; Ghanbarzadeh, B.; Alizadeh, A.; Afshar, P.G. Comparison of the Chemical Compositions and Antibacterial Activities of Two Iranian Mustard Essential Oils and Use of these Oils in Turkey Meats as Preservatives. Appl. Food Biotechnol. 2019, 6, 225–236. [Google Scholar]
El-wakil, E.A.; El-Sayed, M.M.; Abdel-Lateef, E.E.S. GC-MS Investigation of Essential oil and antioxidant activity of Egyptian White Onion (Allium cepa L.). Int. J. Pharma Sci. Res. (IJPSR) 2015, 6, 537–543. [Google Scholar]
Jiang, Y.; Wu, N.; Fu, Y.J.; Wang, W.; Luo, M.; Zhao, C.J.; Zu, Y.G.; Liu, X.L. Chemical composition and antimicrobial activity of the essential oil of Rosemary. Environ. Toxicol. Pharmacol. 2011, 32, 63–68. [Google Scholar] [CrossRef]
Shahrajabian, M.H. A Candidate for Health Promotion, Disease Prevention and Treatment, Common Rue (Ruta graveolens L.), an Important Medicinal plant in Traditional Medicine. Curr. Rev. Clin. Exp. Pharmacol. 2022, 17, 2–11. [Google Scholar] [CrossRef] [PubMed]
Soleimani, M.; Aberoomand azar, P.; Saber-Tehrani, M.; Rustaiyan, A. Volatile Composition of Ruta graveolens L. of North of Iran. World Appl. Sci. J. 2009, 7, 124–126. [Google Scholar]
Zhang, X.; Teixeira da Silva, J.; Jia, Y.; Zhao, J.; Ma, G. Chemical Composition of Volatile Oils from the Pericarps of Indian Sandalwood (Santalum album) by Different Extraction Methods. Nat. Prod. Commun. 2012, 7, 93–96. [Google Scholar] [CrossRef]
Akhbari, M.; Batooli, H.; Jookar, F. Composition of essential oil and biological activity of extracts of Viola odorata L. from central Iran. Nat. Prod. Res. 2011, 26, 802–809. [Google Scholar] [CrossRef] [PubMed]
Fahmy, N.; Fayez, S.; Uba, A.; Shariati, M.A.; Aljohani, A.; El-Ashmawy, I.; Batiha, G.; Eldahshan, O.; Singab, A.N.; Zengin, G. Comparative GC-MS Analysis of Fresh and Dried Curcuma Essential Oils with Insights into Their Antioxidant and Enzyme Inhibitory Activities. Plants 2023, 12, 1785. [Google Scholar] [CrossRef]
Boukhaloua, A.H.E.; Berrayah, M.; Bennabi, F.; Ayache, A.; Fatima Zohra, A. Antibacterial activity and identification by GC/MS of the chemical composition of essential oils of Juniperus phoenecea and Juniperus oxycedrus L. from Western Algeria: Tiaret province. Ukr. J. Ecol. 2022, 12, 31–39. [Google Scholar] [CrossRef]
Scikit-Learn. Cross-Validation: Evaluating Estimator Performance. 2024. Available online: https://scikit-learn.org/stable/modules/cross_validation.html (accessed on 27 May 2024).
Oliveira, A.P.; Silva, L.R.; Ferreres, F.; Guedes de Pinho, P.; Valentão, P.; Silva, B.M.; Pereira, J.A.; Andrade, P.B. Chemical Assessment and in Vitro Antioxidant Capacity of Ficus carica Latex. J. Agric. Food Chem. 2010, 58, 3393–3398. [Google Scholar] [CrossRef] [PubMed]
Bonesi, M.; Tenuta, M.C.; Loizzo, M.R.; Sicari, V.; Tundis, R. Potential Application of Prunus armeniaca L. and P. domestica L. Leaf Essential Oils as Antioxidant and of Cholinesterases Inhibitors. Antioxidants 2019, 8, 2. [Google Scholar] [CrossRef]
Alahmady, N.F.; Alkhulaifi, F.M.; Abdullah Momenah, M.; Ali Alharbi, A.; Allohibi, A.; Alsubhi, N.H.; Ahmed Alhazmi, W. Biochemical characterization of chamomile essential oil: Antioxidant, antibacterial, anticancer and neuroprotective activity and potential treatment for Alzheimer’s disease. Saudi J. Biol. Sci. 2024, 31, 103912. [Google Scholar] [CrossRef]
Ahmed, H.Y.; Kareem, S.M.; Atef, A.; Safwat, N.A.; Shehata, R.M.; Yosri, M.; Youssef, M.; Baakdah, M.M.; Sami, R.; Baty, R.S.; et al. Optimization of Supercritical Carbon Dioxide Extraction of Saussurea costus Oil and Its Antimicrobial, Antioxidant, and Anticancer Activities. Antioxidants 2022, 11, 1960. [Google Scholar] [CrossRef] [PubMed]
Raquel, S.; Rodolfo, S.H.; Maria, G.G. Effects of Spices (Saffron, Rosemary, Cinnamon, Turmeric and Ginger) in Alzheimer’s Disease. Curr. Alzheimer Res. 2021, 18, 347–357. [Google Scholar] [CrossRef]
Bai, E.W. Big data: The curse of dimensionality in modeling. In Proceedings of the 33rd Chinese Control Conference, Nanjing, China, 28–30 July 2014; pp. 6–13. [Google Scholar] [CrossRef]
Gholamy, A.; Kreinovich, V.; Kosheleva, O. Why 70/30 or 80/20 Relation between Training and Testing Sets: A Pedagogical Explanation; UTEP-CS-18-09; The University of Texas at El Paso: El Paso, TX, USA, February 2018. [Google Scholar]
Scikit-Learn. Machine Learning in Python KNeighborsClassifier. 2024. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html (accessed on 27 May 2024).
Scikit-Learn. Machine Learning in Python Logistic Regression. 2024. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html (accessed on 1 June 2024).
Zhang, Z. Introduction to machine learning: k-nearest neighbors. Ann. Transl. Med. 2016, 4, 218. [Google Scholar] [CrossRef]
Pal, A. Logistic regression: A simple primer. Cancer Res. Stat. Treat. 2021, 4. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Rokach, L.; Maimon, O. Decision Trees. In Data Mining and Knowledge Discovery Handbook; Maimon, O., Rokach, L., Eds.; Springer: Boston, MA, USA, 2005; pp. 165–192. [Google Scholar] [CrossRef]
Ali, J.; Khan, R.; Ahmad, N.; Maqsood, I. Random Forests and Decision Trees. Int. J. Comput. Sci. Issues (IJCSI) 2012, 9, 272. [Google Scholar]
Jackson, J.E. A User’s Guide to Principal Components; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
Jolliffe, I.T. Principal Component Analysis for Special Types of Data; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Ragno, A.; Baldisserotto, A.; Antonini, L.; Sabatino, M.; Sapienza, F.; Baldini, E.; Buzzi, R.; Vertuani, S.; Manfredini, S. Machine Learning Data Augmentation as a Tool to Enhance Quantitative Composition–Activity Relationships of Complex Mixtures. A New Application to Dissect the Role of Main Chemical Components in Bioactive Essential Oils. Molecules 2021, 26, 6279. [Google Scholar] [CrossRef]
Polito, F.; Fratianni, F.; Nazzaro, F.; Amri, I.; Kouki, H.; Khammassi, M.; Hamrouni, L.; Malaspina, P.; Cornara, L.; Khedhri, S.; et al. Essential Oil Composition, Antioxidant Activity and Leaf Micromorphology of Five Tunisian Eucalyptus Species. Antioxidants 2023, 12, 867. [Google Scholar] [CrossRef]
Tan, X.; Xu, R.; Li, A.P.; Li, D.; Wang, Y.; Zhao, Q.; Long, L.P.; Fan, Y.Z.; Zhao, C.X.; Liu, Y.; et al. Antioxidant and anti-Alzheimer’s disease activities of 1,8-cineole and its cyclodextrin inclusion complex. Biomed. Pharmacother. 2024, 175, 116784. [Google Scholar] [CrossRef]
EL Hachlafi, N.; Aanniz, T.; El Menyiy, N.; El Baaboua, A.; El Omari, N.; Balahbib, A.; Ali Shariati, M.; Zengin, G.; Fikri-Benbrahim, K.; Bouyahya, A.; et al. In Vitro and in Vivo Biological Investigations of Camphene and Its Mechanism Insights: A Review. Food Rev. Int. 2023, 39, 1799–1826. [Google Scholar] [CrossRef]
Chen, S.X.; Xiang, J.Y.; Han, J.X.; Feng, Y.; Li, H.Z.; Chen, H.; Xu, M. Essential Oils from Spices Inhibit Cholinesterase Activity and Improve Behavioral Disorder in AlCl3 Induced Dementia. Chem. Biodivers. 2022, 19, e202100443. [Google Scholar] [CrossRef]

Figure 1. Concentrations of 1,8-Cineole, Camphene, Camphor, Limonene, Linalool, and Spathulenol compounds in essential oils.

Figure 2. Concentration s of

α

-Pinene,

α

-Terpineol,

α

-Thujene,

β

-Elemene,

β

-Pinene, and

γ

-Cadinene compounds in essential oils.

Figure 2. Concentration s of

α

-Pinene,

α

-Terpineol,

α

-Thujene,

β

-Elemene,

β

-Pinene, and

γ

-Cadinene compounds in essential oils.

Figure 3. Heatmap representing concentrations of compounds in essential oils.

Figure 4. Average testing and LOOCV accuracies for the k Nearest Neighbors models. (a) Across all the twelve datasets; (b) Across the last five datasets.

Figure 5. Average testing and LOOCV accuracies for the Logistic Regression models. (a) Across all the twelve datasets; (b) Across the last five datasets.

Figure 6. Average testing and LOOCV accuracies for the n_estimator parameter. (a) Across all the twelve datasets; (b) Across the last five datasets.

Figure 7. Average testing and LOOCV accuracies for the max_depth parameter. (a) Across all the twelve datasets; (b) Across the last five datasets.

Figure 8. Average testing and LOOCV accuracies for the ccp_alphas parameter. (a) Across all the twelve datasets; (b) Across the last five datasets.

Figure 9. Model accuracies after PCA with 16 principal components explaining 90% of the variance. (a) kNN; (b) Logistic Regression.

Figure 10. Model accuracies after PCA with 18 principal components explaining 95% of the variance. (a) kNN; (b) Logistic Regression.

Figure 11. Average performance of the three best algorithms across the different datasets.

Figure 12. Visualizing the performance of the three classification models on the more_than_50_compounds and PCA datasets.

Table 1. Data representation of 21 essential oils and only some of their chemical compositions in percentages.

Essential Oil	Camphene	Limonene	Linalool	$α$ -Pinene	$β$ -Pinene	1,8-Cineole	…	Activity
Amla	0	0	0	0	0	0	…	LOW
Anjeer	0	0	0	0.6	0	0.2	…	HIGH
Apricot	0	2.54	6.38	1.37	0	0	…	HIGH
Avocado	0	0.1	0.01	0.67	0	0.3	…	LOW
Chamomile	0	0.3	0.3	1.9	0.1	0.1	…	HIGH
Costus	4.96	0.51	0.4	1.02	0.04	1.73	…	HIGH
Ginger	32.79	0	0.59	18.05	2.95	0	…	HIGH
Ginseng	0	0	0	0	0	0	…	LOW
Grapefruit	0	94.2	0	1	2.17	0	…	LOW
Gum Myrrh	0	0	0	0	0	0	…	LOW
Hazelnut	0	3.9	0	0	0	1	…	LOW
Henna	0.13	0	1.58	0.16	0.54	0	…	LOW
Juniper	0.9	12.1	0	29.1	17.6	0	…	LOW
Mint	0	0	2.22	0.69	1.14	0	…	HIGH
Mustard	0	0	0	0	0	0	…	LOW
Onion	0	0	0	0	0	0	…	LOW
Rosemary	11.38	1.32	0.25	20.14	6.95	26.54	…	HIGH
Sadab	0	0	0	0	0	0	…	LOW
Sandal	0	0	0	0	0	0	…	HIGH
Sweet violet	0	0	3.06	1.31	0.62	1.92	…	LOW
Turmeric	0	0	0	0	0	0	…	LOW

Table 2. Composition distribution of the twelve datasets.

Dataset Number	Composition Sum Threshold Criteria	Number of Chemical Compounds	Dataset Name
1	Sum greater than 0	428	all_compounds.csv
2	Sum greater than 0.1	416	more_than_0.1_compounds.csv
3	Sum greater than 0.2	389	more_than_0.2_compounds.csv
4	Sum greater than 0.5	301	more_than_0.5_compounds.csv
5	Sum greater than 1	226	more_than_01_compounds.csv
6	Sum greater than 2	139	more_than_02_compounds.csv
7	Sum greater than 5	77	more_than_05_compounds.csv
8	Sum greater than 10	46	more_than_10_compounds.csv
9	Sum greater than 20	25	more_than_20_compounds.csv
10	Sum greater than 30	16	more_than_30_compounds.csv
11	Sum greater than 40	9	more_than_40_compounds.csv
12	Sum greater than 50	8	more_than_50_compounds.csv

Table 3. Average accuracy for each algorithm using different model evaluation techniques.

Algorithm	Testing (70:15:15)	Testing (50:25:25)	Testing (50:50)	Mean CV (70:15:15)	Mean CV (50:25:25)	Mean LOOCV
kNN	36.88	35.72	78.03	32.42	30.5	59.92
Logistic Regression	60.42	40.27	86.11	33.47	33.5	64.68
Random Forest	38.03	39.29	70.45	39.26	39.29	67.06

Table 4. kNN models and their parameters.

Weights	k = 1	k = 3	k = 5	k = 7	k = 9
Uniform	kNN_M1	kNN_M3	kNN_M5	kNN_M7	kNN_M9
Distance	kNN_M2	kNN_M4	kNN_M6	kNN_M8	kNN_M10

Table 5. Logistic Regression models and their parameters.

Solver/C	C = 0.001	C = 0.01	C = 0.1
lbfgs	LR_M1	LR_M5	LR_M9
liblinear	LR_M2	LR_M6	LR_M10
newton-cg	LR_M3	LR_M7	LR_M11
newton-cholesky	LR_M4	LR_M8	LR_M12

Table 6. Random Forest models and their respective parameters.

	Varying Parameter	n_estimators	criterion	max_depth	ccp_alphas
RF_1	n_estimators	1–20	log_loss	6	$0.1$
RF_2	max_depth	4	log_loss	2–14	$N / A$
RF_3	ccp_alphas	4	log_loss	$N / A$	$0.0$ – $1.0$

Table 7. The specifications of the best kNN, LR, and RF models.

Algorithms	Model Parameters	Best Performing Values
kNN	k	3
kNN	weights	distance
Logistic Regression	penalty	L2
	class_weight	balanced
	solver	liblinear
	C	$0.1$
Random Forest	criterion	log_loss
	n_estimators	12
	max_depth	4
	ccp_alphas	$0.25$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Amawi, R.M.; Al-Hussaeni, K.; Keeriath, J.J.; Ashmawy, N.S. A Machine Learning Approach to Evaluating the Impact of Natural Oils on Alzheimer’s Disease Progression. Appl. Sci. 2024, 14, 6395. https://doi.org/10.3390/app14156395

AMA Style

Amawi RM, Al-Hussaeni K, Keeriath JJ, Ashmawy NS. A Machine Learning Approach to Evaluating the Impact of Natural Oils on Alzheimer’s Disease Progression. Applied Sciences. 2024; 14(15):6395. https://doi.org/10.3390/app14156395

Chicago/Turabian Style

Amawi, Rema M., Khalil Al-Hussaeni, Joyce James Keeriath, and Naglaa S. Ashmawy. 2024. "A Machine Learning Approach to Evaluating the Impact of Natural Oils on Alzheimer’s Disease Progression" Applied Sciences 14, no. 15: 6395. https://doi.org/10.3390/app14156395

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Machine Learning Approach to Evaluating the Impact of Natural Oils on Alzheimer’s Disease Progression

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Dataset

3.2. Feature Selection

3.3. Preprocessing

3.4. Model Training

3.5. Model Evaluation

3.6. Model Selection

3.6.1. k-Nearest Neighbours

3.6.2. Logistic Regression

3.6.3. Random Forest

4. Results

4.1. k-Nearest Neighbors Results

4.2. Logistic Regression Results

4.3. Random Forest Results

5. Discussion

5.1. Experimental Results

5.2. Application

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI