Deep Learning-Based Truthful and Deceptive Hotel Reviews

Gupta, Devbrat; Bhargava, Anuja; Agarwal, Diwakar; Alsharif, Mohammed H.; Uthansakul, Peerapong; Uthansakul, Monthippa; Aly, Ayman A.

doi:10.3390/su16114514

Open AccessArticle

Deep Learning-Based Truthful and Deceptive Hotel Reviews

by

Devbrat Gupta

¹

,

Anuja Bhargava

¹

,

Diwakar Agarwal

¹

,

Mohammed H. Alsharif

^2,*

,

Peerapong Uthansakul

^3,*

,

Monthippa Uthansakul

³ and

Ayman A. Aly

⁴

¹

Department of Electronics & Communication, GLA University, Mathura 281406, India

²

Department of Electrical Engineering, College of Electronics and Information Engineering, Sejong University, Seoul 05006, Republic of Korea

³

School of Telecommunication Engineering, Suranaree University of Technology, Nakhon Ratchasima 30000, Thailand

⁴

Department of Mechanical Engineering, College of Engineering, Taif University, Taif 21944, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Sustainability 2024, 16(11), 4514; https://doi.org/10.3390/su16114514

Submission received: 20 March 2024 / Revised: 19 May 2024 / Accepted: 23 May 2024 / Published: 26 May 2024

(This article belongs to the Special Issue Smart Technologies and Sustainable Development in Hospitality and Tourism)

Download

Browse Figures

Versions Notes

Abstract

:

For sustainable hospitality and tourism, the validity of online evaluations is crucial at a time when they influence travelers’ choices. Understanding the facts and conducting a thorough investigation to distinguish between truthful and deceptive hotel reviews are crucial. The urgent need to discern between truthful and deceptive hotel reviews is addressed by the current study. This misleading “opinion spam” is common in the hospitality sector, misleading potential customers and harming the standing of hotel review websites. This data science project aims to create a reliable detection system that correctly recognizes and classifies hotel reviews as either true or misleading. When it comes to natural language processing, sentiment analysis is essential for determining the text’s emotional tone. With an 800-instance dataset comprising true and false reviews, this study investigates the sentiment analysis performance of three deep learning models: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Recurrent Neural Network (RNN). Among the training, testing, and validation sets, the CNN model yielded the highest accuracy rates, measuring 98%, 77%, and 80%, respectively. Despite showing balanced precision and recall, the LSTM model was not as accurate as the CNN model, with an accuracy of 60%. There were difficulties in capturing sequential relationships, for which the RNN model further trailed, with accuracy rates of 57%, 57%, and 58%. A thorough assessment of every model’s performance was conducted using ROC curves and classification reports.

Keywords:

hotel reviews; sustainable hospitality; sustainable tourism; CNN; RNN; LSTM; data science

1. Introduction

In the realm of sustainable hospitality and tourism, online reviews impact travelers’ choices, and comprehensive system that can discern between genuine and fraudulent hotel reviews is required. This study tackles the urgent problem of false “opinion spam” in the hospitality industry, which negatively impacts online hotel reviews. This research intends to construct a dependable detection system that can reliably classify hotel evaluations as either genuine or deceptive, building upon the thorough studies conducted by [1,2]. Legitimate online hotel review sites are protected, and the overall traveler experience is improved with the successful application of this system. The ultimate objective is to support the integrity of the hotel industry by giving clients reliable information so they can plan their trips with confidence.

The veracity of online reviews, especially those about hotels, is crucial in the digital age. Reviews of hotels have developed into a significant factor in decision-making for tourists. However, the prevalence of false opinions, or “opinion spam,” has grown to be a worrying problem in the large sea of hotel evaluations. Travelers may experience financial losses and irritation as a result of false hotel reviews, which can damage the reputation of review sites. The “Deceptive Opinion Spam” dataset, which includes reviews for 20 Chicago hotels that have been categorized based on sentiment, is the basis of this capstone project’s attempt to solve the challenge of differentiating between honest and dishonest hotel evaluations. Positive and negative reviews are further divided into these categories. Comprehensive investigations of positive and negative deceptive opinion spam were carried out, respectively, in reference papers [1,2]; these publications are essential knowledge sources for our project since they provide useful information about misleading reviews and possible ways to spot them. The objectives of this research are as follows:

The Development of a Robust Detection System: To develop a system that is extremely accurate and dependable for identifying and classifying false hotel reviews, assuring their separation from real ones. Using foundational research publications [1,2] as a guide, we will explore the linguistic and psycholinguistic aspects that are suggestive of dishonesty in hotel evaluations.
Performance Evaluation: to assess how well different machine learning models and methods work at spotting false information in hotel reviews.
Sentiment–Deceit Relationship Analysis: to investigate the complex relationship between sentiment and deceit in the setting of hotel reviews, maybe illuminating how feelings affect deceptive behavior.
Theoretical Contributions: to advance our knowledge of the language patterns and cognitive characteristics connected to deceptive reviews, making theoretical contributions to the field of computational linguistics.
Assessment of Real-World Applicability: to assess the viability and efficacy of integrating the created detection system into real hotel review systems, hence boosting their credibility.

This study assumes that the “Deceptive Opinion Spam” dataset, and particularly the reviews for 20 Chicago hotels, is representative of the broader landscape of hotel reviews. The findings and insights derived from this dataset are expected to be generalized to a wider context. The accuracy of the detection system relies on the identification of relevant features indicative of deceptive or truthful reviews. This study acknowledges that variations in feature selection could impact the system’s performance. There are 800 truthful and 800 deceptive reviews, combined in the total dataset to make 1600, which is not a good number for deep learning models as this can be overfitted.

The main issue that this study attempts to solve is the pervasive problem of fraudulent online reviews. These reviews have the potential to deceive buyers, affect their decision to buy, and damage the reputation of review sites. Our dataset is obtained from the hospitality industry, which is particularly susceptible to fraudulent reviews that can damage hotel brands and result in losses of money. Deceptive review identification and classification is a serious difficulty in this profession. For training, deep learning models like CNN, LSTM, and RNN need large amounts of data, yet false reviews are frequently sparser and more detailed than real ones. Overfitting, in which models perform well on training data but struggle to generalize successfully to unknown data, might result from this data imbalance. The results of our work have important real-world applications, notwithstanding the difficulties in this area. Review sites can utilize the established algorithms to identify and flag reviews that may be fraudulent, making the user experience more dependable and trustworthy. This affects not just the hotel sector but also other industries where customer decision-making is heavily influenced by online reviews.

The remaining paper is structured as mentioned below: Related works are presented in Section 2. The methodologies, consisting of Data Preprocessing, feature engineering, and visualization, the choice of the model, along with training the model, the performance of the model, and metrics, are showcased in Section 3. The detailed results and analysis of the three deep learning techniques are illustrated in Section 4. Finally, the research is concluded in Section 5.

2. Literature Review

There are a variety of methods for identifying and comprehending fraudulent activities related to fake internet reviews. Numerous creative methods have been applied in the realm of deceptive review detection. To identify misleading opinion spam, Ott et al. [2] created a machine learning framework, which was a precursor to automated analysis. Using deep learning techniques, Jain et al. [3] built on this to further advance the identification of false reviews. Moon et al. [4], who investigated false customer reviews using a survey-based text categorization approach, supplemented this line of research.

Parallel to this, Plotkina et al. [5] investigated the identification of fraudulent reviews, emphasizing their findings from both computational and human viewpoints. An efficient representation of fraudulent opinion identification was provided by Cagnina and Rosso [6], who concentrated on both intra- and cross-domain classification. Chang et al. [7] extended these domain-specific approaches by proposing a rumor-based model to identify fake reviews in the hospitality sector, specifically in hotel reviews. Filieri [8] contributed to the discussion by looking at variables that affect the validity of online user reviews.

Larger-scale data has made it possible to conduct more thorough research. In an investigation into possibly false TripAdvisor hotel evaluations, Harris [9] revealed the breadth and depth of dishonest internet review techniques. A model for fake review identification was presented by Cao et al. [10], emphasizing multi-feature learning and independent training for classification. By examining phony review comments using the prism of rumor and lie theories, Lin et al. [11] advanced their theoretical understanding of the topic. Pascucci et al. [12] created a tool for detecting fraudulent reviews in the hotel industry using computational stylometry. Authentic and fake user-generated hotel evaluations were identified by Banerjee et al. [13,14], who used language analysis to verify the veracity of online reviews. These researchers made additional contributions to the topic. In the hospitality industry, Rout et al. [15] used machine learning techniques to identify false evaluations, while Martinez-Torres and Toral [16] addressed deceptive reviews using both labeled and unlabeled data. The resilience of word and character n-gram combinations in distinguishing between false and accurate opinions was examined by Siagian and Aritsugi [17]. The exaggeration in phony versus real web reviews for upscale and inexpensive lodgings was investigated by researchers. Lastly, by merging coarse- and fine-grained information further, a thorough framework for fake review identification was presented, providing a reliable detection technique. By establishing connections between these studies, we can obtain a more comprehensive grasp of the dynamic field of fraudulent review identification, showcasing the range of approaches and structures created to tackle this important problem in the era of online reviews. In this work, three well-known deep learning techniques—Recurrent Neural Networks (RNNs), Long Short-Term Memory networks (LSTMs), and Convolutional Neural Networks (CNNs)—are applied using a robust strategy, drawing on insights from the abovementioned related works. This multi-pronged approach seeks to attain maximum precision in tackling the issues presented by fraudulent opinion spam and phony customer evaluations [18,19,20,21,22,23,24,25,26,27,28,29].

3. Materials and Methods

This section delves into the experiments and techniques used to address research concerns about the identification of opinion spam that is deceptive in hotel reviews. Using cutting-edge deep learning methods—such as Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), and Recurrent Neural Networks (RNNs)—based on the findings from an extensive literature review is the main goal, as illustrated in Figure 1 as a workflow block diagram.

3.1. Data Preprocessing, Feature Engineering, and Visualization

A comprehensive investigation of data preparation methods is conducted before the model is trained. This includes managing missing values, encoding labels, and cleaning the text data. To extract pertinent information from the reviews, feature engineering is used, and visualization techniques are performed for a thorough knowledge of the dataset. First of all, the customer reviews are pre-processed in the DataFrame by the supplied Python script. It makes use of the clean_text function to convert text to lowercase and eliminate superfluous whitespace, digits, and punctuation. In addition, both custom and common English stop words are removed. Textual data are improved by this preprocessing, which qualifies the data for further investigations such as sentiment analysis or machine learning model training.

For a further exploratory data analysis, the distribution of classes (truthful vs. deceptive) is visualized using a count plot, as shown in Figure 2. It is observed that the number of truthful and deceptive reviews is 800 each. Machine learning models need to be trained using this balanced distribution to avoid biases towards any one class and to enable the model to acquire patterns from both classes equally.

The text data are transformed using TF-IDF vectorization, and the top 20 words by frequency are visualized in a barplot, as shown in Figure 3. Each bar in this barplot represents a word, and the height of the bar indicates how frequently or how important the word is. The words in the dataset that stand out the most or are characteristic are highlighted in this visualization. The barplot graphically displays the top 20 words according to their TF-IDF scores, and this procedure aids in identifying the significance of terms in the text data. This is a standard procedure for getting textual data ready for machine learning jobs.

Histograms illustrate the distribution of word lengths and character lengths in the reviews, segmented by truthfulness and deceptiveness, as shown in Figure 4 and Figure 5, respectively. The Y-axis in Figure 4 shows the frequency or count of reviews falling into each range, while the X-axis shows the range of word lengths (e.g., number of characters in a word). Based on the combination of truthfulness (truthful or deceptive) and deceptiveness, the data are divided into four groups. The word lengths that each bar in the histogram represents are ranges. The number of evaluations carried out for a certain truthfulness and deceptiveness category whose word lengths fall into that range is indicated by the height of the bar. The histogram offers a comprehensive perspective on the distribution of word lengths among several categories, facilitating a refined examination of linguistic trends. The character length range in the reviews is represented by the X-axis in Figure 5, while the frequency or count of reviews falling into each range is represented by the Y-axis. This histogram, like the word length one, is divided into two categories: truthful and deceptive. Numerous character lengths are represented by each bar. The height of the bar for each truthfulness and deceptiveness category represents the proportion of reviews that fall within that range. A histogram enables a thorough analysis of the distribution of character lengths among various categories, exposing possible differences in language use.

A word cloud was generated for truthful and deceptive reviews, providing a visual representation of the most frequent words, as in Figure 6. The word cloud in Figure 6 illustrates the most commonly used words in honest and fraudulent reviews. Each word’s size in the cloud corresponds to how frequently that term occurs in the dataset. Higher frequency is indicated by larger words. For aesthetic reasons, words may be displayed in different colors; however, the focus is on size for their frequency representation. This word cloud can be used to rapidly discover common terms and patterns seen in reviews that have been classified as truthful and deceptive. Visually conspicuous words are those that are commonly used and stand out in terms of frequency.

Words that are distinctive or have a lot of weight in a particular context are highlighted using the TF-IDF (Term Frequency-Inverse Document Frequency) approach. A TF-IDF bar plot illustrates how different words in a corpus are related to one another. Higher TF-IDF scores indicate that a word is more unique inside a context when compared to the entire corpus. In this case, the TF-IDF bar plot shows which terms are most important in honest and dishonest reviews, as shown in Figure 7. Because they are uncommon or regularly appear in one category but not the other, these terms are given more weight. With its suggestions for terms that may be suggestive of each class, this visualization helps readers understand the crucial distinctions between evaluations that are accurate and those that are dishonest.

The data points were graphically represented in two dimensions by a scatter plot. In this case, words are represented in a scatter plot through the use of word embeddings, which translate words into numerical vectors. These high-dimensional vectors are reduced to two principal components by the use of principal component analysis (PCA), yielding a two-dimensional representation, as shown in Figure 8. We may better comprehend the relationships between various words based on their semantic similarity by using a scatter plot for word embeddings. Semantically related words that are closer together in the scatter plot suggest that they may be used in related settings. The linguistic structure of honest and dishonest evaluations can be better understood by utilizing this graphic, which can highlight word clusters with related meanings.

To categorize words based on their part-of-speech tags, creating informative and imaginative word count distributions, the spaCy module is used and is depicted in Figure 9. Different kinds of parts of speech such as adjectives, verbs, nouns, and other speech components are emphasized. Sentiment polarity is calculated using TextBlob and histograms depict the sentiment distribution of truthful and deceptive reviews, as illustrated in Figure 10. This gives a general idea of the reviews’ emotional content in each category. Further, in Table 1, bigrams and trigrams are showcased for both truthful and deceptive reviews, which facilitates comprehension of the typical expressions and word combinations inside each category. A horizontal bar chart compares the frequency of top words in truthful and deceptive reviews using Count Vectorization, as shown in Figure 11, which makes it easier to find terms that are unique to each category. These analyses collectively provide insights into the dataset’s characteristics and contribute to its preparation for subsequent modeling or further investigation.

3.2. Choice of Model

Three distinct deep learning models have been selected for the provided dataset: a Simple Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM). With the ability to identify temporal correlations and spatial patterns in textual content, these models are highly suitable for sequencing data and text classification applications. A detailed explanation of three models a CNN, LSTM, and RNN, is as follows:

CNNs are a kind of deep learning model that is mainly utilized for tasks related to text and image processing because of their capacity to identify hierarchical features and local patterns. We employed an embedding layer in our CNN architecture to turn words into fixed-size (16 in this example) dense vectors. Next, we used a global max-pooling layer and a convolutional layer. The words supplied are transformed into dense vectors with a predetermined embedding dimension (16) by the embedding layer. This layer, which is either trained from scratch or initialized with pre-learned embeddings, aids in capturing word relationships. The convolutional layer applies filters to word embeddings through a sliding window method, identifying regional trends and characteristics. Our CNN employs a Conv1D layer with 5 kernel sizes and 128 filters. The global max-pooling layer reduces dimensionality while preserving the most important features by taking the maximum value from each filter. The dense layer is for binary classifications and a completely linked layer with a single output node and a sigmoid activation function. A total of 3,780,689 parameters make up the CNN architecture; of these, 10,497 are trainable and the remaining ones are not, showing the utilization of pre-trained embeddings. Patterns and features in the text data are successfully identified using this approach.

Recurrent Neural Networks (RNNs) with long-range dependencies can handle sequential data using LSTMs. Two LSTM layers with dropout for regularization make up our LSTM design, which is followed by a dense layer for classification. Words are converted into dense vectors by the embedding layer, as explained in the CNN model. The dropout layer is added after the embedding layer and drops units at random during training to avoid overfitting. The following LSTM layer can process sequences because the first LSTM layer contains 50 units and its return_sequences parameter is set to True. The final output sequence is provided by the 50-unit second LSTM layer. The dense layer is a layer with a sigmoid activation function, one output node, and complete connectivity. There are 3,803,843 parameters in the LSTM design overall, of which 33,651 are trainable. The LSTM is appropriate for sentiment analysis and other text-based tasks because of its sequential processing capacity, which aids in capturing dependencies within the text data.

RNNs are made to handle sequential data but, because of the vanishing gradient problem, they may not be able to handle long-range relationships as well as LSTMs. For binary classifications, the RNN architecture combines a dense layer with a SimpleRNN layer. The embedding layer converts words into dense vectors in a manner reminiscent of earlier models. The SimpleRNN layer can handle jobs requiring time series or text sequences because it has 100 units and processes sequential data recurrently. The dense Layer, like in the other models, is fully linked, has a single output node, and uses a sigmoid activation function. There are 3,781,993 parameters in the RNN design overall, of which 11,801 are trainable. RNNs are effective for shorter sequences but less reliable than LSTMs for longer ones.

3.3. Training the Model, Performance of the Model, and Metrics

A dataset was used to train three distinct deep learning models: a Simple Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM). For every model, dense layers, dropout for regularization, and an embedding layer were used in the construction of the sequential LSTM model. It was compiled using the Adam optimizer with binary cross-entropy loss. The training data were then used to train the model for 100 epochs. The sequential CNN model was built with an embedding layer, a global max-pooling layer, and a 1D convolutional layer. It was compiled using the Adam optimizer with binary cross-entropy loss, the same as the LSTM model. A total of 100 epochs were used for its training. An embedding layer and a SimpleRNN layer were used in the construction of the sequential RNN model. It was trained for 100 epochs using the Adam optimizer and binary cross-entropy loss, just like the other models.

The models’ performance was assessed on the test set following training. The models’ ability to generalize to new cases was assessed quantitatively by determining each model’s accuracy on the test data. The following evaluation metrics were employed to gauge model performance: The percentage of correctly identified examples of all occurrences in the test set is known as accuracy. The Receiver Operating Characteristic (ROC) curve illustrates the true positive rate with respect to the false positive rate, offering valuable information about the model’s class discrimination capabilities. The area under the curve (AUC)—by calculating the area under the ROC curve, AUC values provide an overall measure of the model’s performance. Better discriminating powers are shown by higher AUC values.

3.4. Classification Report

A classification report provides a more thorough understanding of the models’ performance for each class (truth and deception) by presenting metrics such as precision, recall, and F1-score.

Taken as a whole, these metrics offer a thorough evaluation of how well the trained models categorize reviews as honest or dishonest. A more comprehensive assessment of the models’ performance is made possible by the classification report, which provides details on the precision, recall, and F1-score for each class, in addition to the ROC curve and AUC values, which indicate its discriminatory power.

3.5. Overall Project and Improvements and Applications and Results

To be used as input into the models, the dataset is tokenized, preprocessed, and converted into sequences. Word embeddings are used, and Word2Vec is used to construct an embedding matrix. A regularization dropout layer is incorporated into the construction of a sequential LSTM model. A Receiver Operating Characteristic (ROC) curve and accuracy plots are used to train, assess, and display the models’ performance. On the test set, the LSTM model attains a particular level of accuracy. Three layers are constructed in the sequential CNN model: a global max-pooling, 1D convolutional, and embedding layer. The model undergoes training, evaluation, and performance visualization. On the test set, the CNN model attains a particular level of accuracy.

A SimpleRNN layer and an embedding layer are used to build a sequential RNN model. The model undergoes training, evaluation, and performance visualization. On the test set, the RNN model achieves a specific accuracy. For every model, ROC curves are plotted to create a visual comparison of their performance. Classification reports provide information on each model’s F1-score, recall, and precision. Each model’s training history is displayed on a single plot, making it possible to compare the accuracies of their validation and training over epochs.

The selected models can be used to categorize customer reviews as being honest or dishonest, which helps identify opinion spam. Additional enhancements could include experimenting with other topologies, using ensemble approaches, and fine-tuning hyperparameters to optimize model performance. This all-encompassing method, using LSTM, CNN, and RNN models, offers a deep analysis of the dataset and reveals the efficacy of each model for the particular goal of identifying honest and dishonest evaluations.

4. Results

Sentiment analysis is an essential part of natural language processing that seeks to ascertain a text’s emotional tone. In this study, using a dataset with 800 instances of both true and false reviews, we investigated the sentiment analysis performance of three distinct deep learning models: a Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Recurrent Neural Network (RNN). Different conclusions about these algorithms’ ability to extract sentiment from text were drawn from their evaluation. The best-performing model was the CNN model, which obtained an accuracy of 98%, 77%, and 80% for the training, testing, and validation set, respectively, as illustrated in Figure 12. Its capacity to identify enduring relationships within consecutive data was advantageous in comprehending the intricate background of the evaluations.

With an accuracy of 60%, 61%, and 60% for the training, testing, and validation set, respectively, the LSTM model came in a close second, demonstrating balanced precision and recall for both true and false classes. However, the RNN model trailed behind, only attaining 87%, 57%, and 58% accuracy for the training, testing, and validation set, respectively, as indicated in Figure 13, suggesting it had difficulties in accurately capturing sequential relationships. A thorough understanding of each model’s performance can be obtained by looking over the tables below, i.e., Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6. For both classes, the CNN model showed high precision, recall, and F1-score values, indicating a balanced performance for the training, testing, and validation set, respectively. The accuracy, recall, and F1-score values of the LSTM model showed a similar pattern, averaging 60% for both classes. For both classes, the RNN model’s accuracy, recall, and F1-score values were approximately 57%, indicating its limits.

A graphical representation of a binary classification model’s performance across different threshold settings is called a Receiver Operating Characteristic (ROC) curve. Based on a model’s ROC curve, the area under the curve (AUC) is a scalar value that measures the model’s overall performance. The CNN model’s AUC of 0.82 is considered to be reasonably good, indicating its strong discriminatory power and successful class distinction. In comparison to the CNN model, the LSTM model’s AUC of 0.65 is regarded as moderate, suggesting some discriminating ability but that it still needs work. With an AUC of 0.60, the RNN model shows weak discriminatory power and is closer to random guessing (0.5), indicating a lack of capacity to discern between classes. In this comparison, the CNN model is the most effective, followed by LSTM, while the RNN model shows the worst discriminatory power. Overall, AUC values closer to 1 indicate greater performance, as represented in Figure 13.

5. Conclusions and Future Scope

In conclusion, deep learning models—that is, Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM), and Recurrent Neural Networks (RNNs)—explore sentiment analysis and provide subtle insights into its efficacy. When it comes to extracting sentiment from textual data, the CNN model surpasses the LSTM and the RNN models in terms of accuracy, precision, recall, and F1 score. Different deep learning models perform at different levels in sentiment analysis, according to this study. The CNN model outperforms both RNN and LSTM with accuracy rates of 98%, 77%, and 80% for the training, testing, and validation sets, respectively, demonstrating its greater accuracy and discriminatory power. At 60%, the LSTM shows competitive precision and recall for both true and false classes, despite its moderate accuracy. For the training, testing, and validation sets, the RNN model performs poorly, particularly when it comes to capturing sequential relationships, as evidenced by its lower accuracy rates of 87%, 57%, and 58%.

The models’ performance is further contextualized by their area under the curve (AUC) ratings. The strong performance of the CNN model is further supported by the Receiver Operating Characteristic (ROC) curve and area under the curve (AUC) analyses, with an AUC of 0.82. This AUC value, which shows good class distinction and strong discriminatory power, is regarded as reasonably good. With an AUC of 0.65, the LSTM model is considered moderate, indicating some discriminating power but that it still requires work. With an AUC of 0.60, the RNN model has poor discriminatory power; it is less able to distinguish between classes and is more akin to random guessing (0.5). The comparative analysis highlights the effectiveness of the CNN model, which is followed by the LSTM model, but the RNN model shows the worst discriminatory power. To reduce overfitting problems and enhance model generalization, future research should concentrate on overcoming data restrictions. Studying deep learning algorithms other than RNNs, LSTM, and CNNs might reveal more information about sentiment analysis. Improvements in model performance could come from adjusting model parameters, adding larger datasets, and experimenting with pre-trained embeddings. To take advantage of the advantages of several algorithms, research could also explore hybrid models or ensemble approaches. The effectiveness of sentiment analysis algorithms must be improved via ongoing assessments and adjustments to changing datasets and language subtleties.

Author Contributions

Conceptualization, D.G. and A.B.; methodology, D.G. and D.A.; software, M.H.A., A.B. and D.G.; validation, P.U. and M.H.A.; investigation, D.A. and M.U.; resources, D.G. and A.A.A.; data curation, D.G. and A.B.; writing—original draft preparation, D.G. and A.B.; writing—review and editing, A.A.A., P.U. and M.U.; funding acquisition, P.U. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by (i) Suranaree University of Technology (SUT), (ii) Thailand Science Research and Innovation (TSRI) and (iii) National Science, Research and Innovation Fund (NSRF). Also, this research was funded by Taif University, Saudi Arabia, Project No. (TU-DSPP-2024-34).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data can be made available on request.

Acknowledgments

The authors extend their appreciation to Taif University, Saudi Arabia, for supporting this work through project number (TU-DSPP-2024-34).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ott, M.; Choi, Y.; Cardie, C.; Hancock, J.T. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. arXiv 2011, arXiv:1107.4557. [Google Scholar] [CrossRef]
Ott, M.; Cardie, C.; Hancock, J.T. Negative Deceptive Opinion Spam. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA, 9–14 June 2013; Association for Computational Linguistics: Atlanta, GA, USA, 2013; pp. 497–501. Available online: https://aclanthology.org/N13-1053 (accessed on 22 May 2024).
Jain, N.; Kumar, A.; Singh, S.; Singh, C.; Tripathi, S. Deceptive Reviews Detection Using Deep Learning Techniques. In Natural Language Processing and Information Systems; Métais, E., Meziane, F., Vadera, S., Sugumaran, V., Saraee, M., Eds.; Springer International Publishing: Cham, Switzerland, 2019. [Google Scholar]
Moon, S.; Kim, M.-Y.; Iacobucci, D. Content analysis of fake consumer reviews by survey-based text categorization. Int. J. Res. Mark. 2021, 38, 343–364. [Google Scholar] [CrossRef]
Plotkina, D.; Munzel, A.; Pallud, J. Illusions of truth—Experimental insights into human and algorithmic detections of fake online reviews. J. Bus. Res. 2020, 109, 511–523. [Google Scholar] [CrossRef]
Cagnina, L.C.; Rosso, P. Detecting Deceptive Opinions: Intra and Cross-Domain Classification Using an Efficient Representation. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2017, 25 (Suppl. S2), 151–174. [Google Scholar] [CrossRef]
Chang, L.-W.; Hamilton, J.F. Dynamics of Robotic Manipulators with Flexible Links. J. Dyn. Syst. Meas. Control. 1991, 113, 54–59. [Google Scholar] [CrossRef]
Filieri, R. What makes an online consumer review trustworthy? Ann. Tour. Res. 2016, 58, 46–64. [Google Scholar] [CrossRef]
Harris, C.G. Decomposing TripAdvisor: Detecting Potentially Fraudulent Hotel Reviews in the Era of Big Data. In Proceedings of the 2018 IEEE International Conference on Big Knowledge (ICBK), Singapore, 17–18 November 2018; pp. 243–251. [Google Scholar] [CrossRef]
Cao, N.; Ji, S.; Chiu, D.K.W.; Gong, M. A deceptive reviews detection model: Separated training of multi-feature learning and classification. Expert Syst. Appl. 2022, 187, 115977. [Google Scholar] [CrossRef]
Lin, C.H.; Hsu, P.Y.; Cheng, M.S.; Lei, H.T.; Hsu, M.C. Identifying Deceptive Review Comments with Rumor and Lie Theories. In Advances in Swarm Intelligence; Tan, Y., Takagi, H., Shi, Y., Niu, B., Eds.; Springer International Publishing: Cham, Switzerland, 2017. [Google Scholar]
Pascucci, A.; Manna, R.; Caterino, C.; Masucci, V.; Monti, J. Is this hotel review truthful or deceptive? A platform for disinformation detection through computational stylometry. In Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management; European Language Resources Association: Marseille, France, 2020; pp. 35–40. Available online: https://aclanthology.org/2020.stoc-1.6 (accessed on 22 May 2024).
Banerjee, S.; Chua, A.Y.K.; Kim, J.-J. Distinguishing between authentic and fictitious user-generated hotel reviews. In Proceedings of the 2015 6th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Dallas-Fortworth, TX, USA, 13–15 July 2015; pp. 1–7. [Google Scholar] [CrossRef]
Banerjee, S.; Chua, A.Y.K.; Kim, J.-J. Don’t be deceived: Using linguistic analysis to learn how to discern online review authenticity. J. Assoc. Inf. Sci. Technol. 2017, 68, 1525–1538. [Google Scholar] [CrossRef]
Rout, J.K.; Singh, S.; Jena, S.K.; Bakshi, S. Deceptive review detection using labeled and unlabeled data. Multimed. Tools Appl. 2017, 76, 3187–3211. [Google Scholar] [CrossRef]
Martinez-Torres, M.R.; Toral, S.L. A machine learning approach for the identification of the deceptive reviews in the hospitality sector using unique attributes and sentiment orientation. Tour. Manag. 2019, 75, 393–403. [Google Scholar] [CrossRef]
Siagian, A.H.A.M.; Aritsugi, M. Robustness of Word and Character N-Gram Combinations in Detecting Deceptive and Truthful Opinions. J. Data Inf. Qual. 2020, 12, 5. [Google Scholar] [CrossRef]
Banerjee, S. Exaggeration in fake vs. authentic online reviews for luxury and budget hotels. Int. J. Inf. Manag. 2022, 62, 102416. [Google Scholar] [CrossRef]
Cao, N.; Ji, S.; Chiu, D.K.W.; He, M.; Sun, X. A deceptive review detection framework: Combination of coarse and fine-grained features. Expert Syst. Appl. 2020, 156, 113465. [Google Scholar] [CrossRef]
Banerjee, S.; Chua, A.Y.K. Applauses in hotel reviews: Genuine or deceptive? In Proceedings of the 2014 Science and Information Conference, London, UK, 27–29 August 2014; pp. 938–942. [Google Scholar] [CrossRef]
Deshai, N.; Bhaskara Rao, B. Unmasking deception: A CNN and adaptive PSO approach to detecting fake online reviews. Soft Comput. 2023, 27, 11357–11378. [Google Scholar] [CrossRef] [PubMed]
Mohawesh, R.; Xu, S.; Tran, S.N.; Ollington, R.; Springer, M.; Jararweh, Y.; Maqsood, S. Fake Reviews Detection: A Survey. IEEE Access 2021, 9, 65771–65802. [Google Scholar] [CrossRef]
Moon, S.; Kim, M.-Y.; Bergey, P.K. Estimating deception in consumer reviews based on extreme terms: Comparison analysis of open vs. closed hotel reservation platforms. J. Bus. Res. 2019, 102, 83–96. [Google Scholar] [CrossRef]
Ren, Y.; Ji, D. Neural networks for deceptive opinion spam detection: An empirical study. Inf. Sci. 2017, 385–386, 213–224. [Google Scholar] [CrossRef]
Salunkhe, A. Attention-based Bidirectional LSTM for Deceptive Opinion Spam Classification. arXiv 2021, arXiv:2112.14789. [Google Scholar] [CrossRef]
Shinde, S.A.; Pawar, R.R.; Jagtap, A.A.; Tambewagh, P.A.; Rajput, P.U.; Mali, M.K.; Kale, S.D.; Mulik, S.V. Deceptive opinion spam detection using bidirectional long short-term memory with capsule neural network. Multimed. Tools Appl. 2024, 83, 45111–45140. [Google Scholar] [CrossRef]
Siagian, A.H.A.M.; Aritsugi, M. Exploiting Function Words Feature in Classifying Deceptive and Truthful Reviews. In Proceedings of the 2018 Thirteenth International Conference on Digital Information Management (ICDIM), Berlin, Germany, 24–26 September 2018; pp. 51–56. [Google Scholar] [CrossRef]
Wang, E.Y.; Fong, L.H.N.; Law, R. Detecting fake hospitality reviews through the interplay of emotional cues, cognitive cues and review valence. Int. J. Contemp. Hosp. Manag. 2022, 34, 184–200. [Google Scholar] [CrossRef]
Yoo, K.-H.; Gretzel, U. Comparison of Deceptive and Truthful Travel Reviews. In Information and Communication Technologies in Tourism 2009; Höpken, W., Gretzel, U., Law, R., Eds.; Springer: Vienna, Austria, 2009. [Google Scholar]

Figure 1. Block diagram of workflow of identification of opinion spam in hotel reviews.

Figure 2. Bar plot of the distribution of classes (truthful vs. deceptive).

Figure 3. Bar plot of the top 20 words by frequency.

Figure 4. Histogram distribution of word lengths in the reviews, split by truthfulness and deceptiveness.

Figure 5. Histogram distribution of character lengths in the reviews, split by truthfulness and deceptiveness.

Figure 6. A visual representation of the most frequent words as a word cloud map for truthful and deceptive reviews.

Figure 7. Bar plot of TF-IDF scores for truthful and deceptive reviews.

Figure 8. Scatter plot of a two-dimensional representation, in terms of data points, of the word embeddings for truthful and deceptive reviews.

Figure 9. Distribution of informative and imaginative words with truthful and deceptive labels.

Figure 10. Sentiment distribution polarity.

Figure 11. Top words by frequency distribution.

Figure 12. Plot of training and validation accuracy history comparison.

Figure 13. ROC curve.

Table 1. Most common bigrams and trigrams.

Most Common Bigrams				Most Common Trigrams
Truthful Data		Deceptive Data		Truthful Data		Deceptive Data
Bigram	Count	Bigram	Count	Trigram	Count	Trigram	Count
“in the”	659	“at the”	650	“the room was”	160	“the front desk”	161
“The Hotel”	590	“in the”	619	“the front desk”	131	“stay at the”	132
“of the”	529	“of the”	544	“The hotel is”	105	“stayed at the”	126
“at the”	479	“I was”	532	“stayed at the”	98	“the room was”	118
“the room”	444	“The Hotel”	530	“in the room”	89	“one of the”	101
“and the”	436	“and the”	481	“stay at the”	69	“The staff was”	95
“to the”	393	“the room”	448	“of the hotel”	67	“My husband and”	87
“on the”	334	“And I”	402	“The staff was”	67	“husband and I”	80
“it was”	311	“it was”	401	“stay here again”	63	“at the hotel”	79
“This hotel”	305	“This hotel”	382	“at this hotel”	62	“hotel in Chicago”	77
“room was”	295	“to the”	348	“the th floor”	61	“I stayed at”	73
“for a”	290	“I had”	299	“We had a”	59	“I had to”	71
“we were”	274	“In Chicago”	263	“our room was”	55	“in the room”	70
“I was”	267	“When I”	253	“The hotel was”	54	“The hotel was”	70
“for the”	253	“for a”	250	“at the hotel”	54	“at this hotel”	65
“from the”	233	“for the”	244	“room was very”	51	“recommend this hotel”	61
“was very”	230	“to be”	243	“on the th”	50	“My wife and”	61
“was a”	207	“the staff”	240	“one of the”	50	“The hotel is”	58
“And I”	207	“we were”	238	“I stayed at”	49	“the rooms are”	58
“the staff”	204	“I would”	238	“the rooms are”	49	“The rooms were”	55

Table 2. Accuracy for training, testing, and validation.

Model	Accuracy (%)
Model	Training	Testing	Validation
CNN	98	77	80
LSTM	60	61	60
RNN	57	57	58

Table 3. Precision for training, testing, and validation.

Model	Precision (%)
	Training		Testing		Validation
	(0)	(1)	(0)	(1)	(0)	(1)
CNN	97	96	77	77	45	49
LSTM	61	64	58	56	51	56
RNN	65	62	55	58	50	53

Table 4. Recall for training, testing, and validation.

Model	Recall (%)
	Training		Testing		Validation
	(0)	(1)	(0)	(1)	(0)	(1)
CNN	96	97	74	79	45	49
LSTM	71	53	72	51	60	46
RNN	60	66	53	61	48	55

Table 5. F1-score for training, testing, and validation.

Model	F1-Score (%)
	Training		Testing		Validation
	(0)	(1)	(0)	(1)	(0)	(1)
CNN	96	96	75	78	45	49
LSTM	66	58	64	57	55	51
RNN	62	64	54	59	49	54

Table 6. Support for training, testing, and validation.

Model	Support (No Actual Occurrence of the Class)
	Training		Testing		Validation
	(0)	(1)	(0)	(1)	(0)	(1)
CNN	488	472	154	166	154	166
LSTM	488	472	154	166	154	166
RNN	488	472	154	166	154	166

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gupta, D.; Bhargava, A.; Agarwal, D.; Alsharif, M.H.; Uthansakul, P.; Uthansakul, M.; Aly, A.A. Deep Learning-Based Truthful and Deceptive Hotel Reviews. Sustainability 2024, 16, 4514. https://doi.org/10.3390/su16114514

AMA Style

Gupta D, Bhargava A, Agarwal D, Alsharif MH, Uthansakul P, Uthansakul M, Aly AA. Deep Learning-Based Truthful and Deceptive Hotel Reviews. Sustainability. 2024; 16(11):4514. https://doi.org/10.3390/su16114514

Chicago/Turabian Style

Gupta, Devbrat, Anuja Bhargava, Diwakar Agarwal, Mohammed H. Alsharif, Peerapong Uthansakul, Monthippa Uthansakul, and Ayman A. Aly. 2024. "Deep Learning-Based Truthful and Deceptive Hotel Reviews" Sustainability 16, no. 11: 4514. https://doi.org/10.3390/su16114514

APA Style

Gupta, D., Bhargava, A., Agarwal, D., Alsharif, M. H., Uthansakul, P., Uthansakul, M., & Aly, A. A. (2024). Deep Learning-Based Truthful and Deceptive Hotel Reviews. Sustainability, 16(11), 4514. https://doi.org/10.3390/su16114514

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning-Based Truthful and Deceptive Hotel Reviews

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Data Preprocessing, Feature Engineering, and Visualization

3.2. Choice of Model

3.3. Training the Model, Performance of the Model, and Metrics

3.4. Classification Report

3.5. Overall Project and Improvements and Applications and Results

4. Results

5. Conclusions and Future Scope

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI