A Sustainable Price Prediction Model for Airbnb Listings Using Machine Learning and Sentiment Analysis
Abstract
:1. Introduction
Machine Learning’s Impact on Sustainable Education
2. Literature Review
2.1. Using Machine Learning in Higher Education as a Sustainable
2.2. Factors Impact Room Prices
2.3. Price Prediction
3. Research Methodology
3.1. Data Collection
3.2. Data Pre-Processing
- Remove features with frequent and irreparable missing fields.
- Convert certain features to floating-point representations by eliminating currency symbols (e.g., dollar signs in prices).
- Eliminate irrelevant or uninformative features, such as ‘picture_url’, ‘listing_url’, ‘host_id’, ‘host_name’, ‘description’ and more.
- We also eliminated constant-valued fields and duplicate features.
- Additionally, we transformed Boolean values into binary (zero or one) representations and converted ordinal values to numeric values as part of the data preprocessing phase. We converted the values of the features into integer values to easily train the dataset using different regression approaches.
- One particular column of interest was ‘amenities’, a text variable with over 2000 unique values. To handle this, we performed examination, cleaning, aggregation and split the column into individual binary variables. Each binary variable represented whether a particular amenity was included in the listing, with 1 indicating its presence and 0 indicating otherwise.The identification of the top 25 amenities was carried out by leveraging insights from various research studies that focused on investigating their impact. Notable contributions were from studies, such as those by Garcia et al. [45], who provided valuable insights and guided the selection process in determining the most influential amenities.
3.3. Sentiment Analysis on the Reviews
- Polarity. This measures the sentiment expressed in a sentence, ranging from −1 to 1. A value of −1 signifies a negative sentiment, while a value of 1 represents a positive sentiment.
- Subjectivity. This captures the degree to which the sentence reflects personal states, such as opinions, emotions and beliefs. It is measured on a scale from 0 to 1, where values closer to 0 indicate an objective sentence based on factual information, while values closer to 1 indicate a subjective statement influenced by personal views [48].
3.4. Feature Construction
3.5. Regression Models
3.6. Model Performance Evaluation
4. Results and Discussion
4.1. Feature Correlation Analysis
4.2. Regression Model Analysis
4.3. Feature Importance Analysis
5. Conclusions, Implications and Future Works
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Airbnb. About inside Airbnb. Available online: https://www.airbnb.com/about/about-us (accessed on 22 March 2023).
- Carrasco-Santos, M.J.; Peña-Romero, A.; Guerrero-Navarro, D. A Luxury Tourist Destination in Housing for Tourist Purposes: A Study of the New Airbnb Luxe Platform in the Case of Marbella. J. Theor. Appl. Electron. Commer. Res. 2023, 18, 1020–1040. [Google Scholar] [CrossRef]
- Suh, J.; Tosun, C.; Eck, T.; An, S. A Cross-Cultural Study of Value Priorities between US and Chinese Airbnb Guests: An Analysis of Social and Economic Benefits. Sustainability 2022, 15, 223. [Google Scholar] [CrossRef]
- Tian, F.; Sun, F.; Hu, B.; Dong, Z. The Impact on Bed and Breakfast Prices: Evidence from Airbnb in China. Sustainability 2022, 14, 13834. [Google Scholar] [CrossRef]
- Gyódi, K. Airbnb in European cities: Business as usual or true sharing economy? J. Clean. Prod. 2019, 221, 536–551. [Google Scholar] [CrossRef]
- Barron, K.; Kung, E.; Proserpio, D. The Effect of Home-Sharing on House Prices and Rents: Evidence from Airbnb. Mark. Sci. 2020, 40, 23–47. [Google Scholar] [CrossRef]
- Sheppard, S.; Udell, A. Do Airbnb properties affect house prices. Williams Coll. Dep. Econ. Work. Pap. 2016, 3, 43. [Google Scholar]
- Ndaguba, E.; Zyl, C.V. Professionalizing Sharing Platforms for Sustainable Growth in the Hospitality Sector: Insights Gained through Hierarchical Linear Modeling. Sustainability 2023, 15, 8267. [Google Scholar] [CrossRef]
- Sutherland, I.; Kiatkawsin, K. Determinants of guest experience in Airbnb: A topic modeling approach using LDA. Sustainability 2020, 12, 3402. [Google Scholar] [CrossRef]
- Zhang, K.; Pan, Z.; Shi, S. The Prediction of Booking Destination on Airbnb Dataset; UC San Diego: San Diego, CA, USA, 2015. [Google Scholar]
- Wu, Y.; Zhou, Z. New User Booking Prediction for Airbnb Historical Data; UC San Diego: San Diego, CA, USA, 2015. [Google Scholar]
- Ulfsson, H. Predicting Airbnb User’s Desired Travel Destinations. Ph.D. Thesis, KTH Royal Institute of Technology, Stockholm, Sweden, 2017. [Google Scholar]
- Gómez, D.; Cantu-Ortiz, F.; Contreras, V.; Diaz Ramos, R. Mexico city’s airbnb listing price analysis using regression. In Proceedings of the 6th IADIS International Conference Connected Smart Cities, Virtual Conference, 21–23 July 2020. [Google Scholar]
- Luo, Y.; Zhou, X.; Zhou, Y. Predicting Airbnb Listing Price Across Different Cities; Stanford University: Stanford, CA, USA, 2019. [Google Scholar]
- Fuentes, J.E.G. Airbnb Listings in New York City: Price Prediction and Analysis. Ph.D. Thesis, Utica College, Utica, NY, USA, 2020. [Google Scholar]
- Rezazadeh Kalehbasti, P.; Nikolenko, L.; Rezaei, H. Airbnb Price Prediction Using Machine Learning and Sentiment Analysis. In Proceedings of the Machine Learning and Knowledge Extraction: 5th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2021, Virtual Event, 17–20 August 2021; Proceedings 5. Springer: Berlin/Heidelberg, Germany, 2021; pp. 173–184. [Google Scholar]
- Zhao, C.; Wu, Y.; Chen, Y.; Chen, G. Multiscale Effects of Hedonic Attributes on Airbnb Listing Prices Based on MGWR: A Case Study of Beijing, China. Sustainability 2023, 15, 1703. [Google Scholar] [CrossRef]
- Zhang, Z.; Chen, R.J.; Han, L.D.; Yang, L. Key factors affecting the price of Airbnb listings: A geographically weighted approach. Sustainability 2017, 9, 1635. [Google Scholar] [CrossRef]
- Chattopadhyay, M.; Mitra, S. Do airbnb host listing attributes influence room pricing homogenously? Int. J. Hosp. Manag. 2019, 81, 54–64. [Google Scholar] [CrossRef]
- Kakar, V.; Voelz, J.; Wu, J.; Franco, J. The visible host: Does race guide Airbnb rental rates in San Francisco? J. Hous. Econ. 2018, 40, 25–40. [Google Scholar] [CrossRef]
- Teubner, T.; Hawlitschek, F.; Dann, D. Price determinants on AirBnB: How reputation pays off in the sharing economy. J. -Self-Gov. Manag. Econ. 2017, 5, 53–80. [Google Scholar]
- Cheng, M.; Jin, X. What do Airbnb users care about? An analysis of online review comments. Int. J. Hosp. Manag. 2019, 76, 58–70. [Google Scholar] [CrossRef]
- Abdar, M.; Yen, N. Analysis of user preference and expectation on shared economy platform: An examination of correlation between points of interest on Airbnb. Comput. Hum. Behav. 2020, 107, 105730. [Google Scholar] [CrossRef]
- Mohsin, A.; Lengler, J. Airbnb hospitality: Exploring users and non-users’ perceptions and intentions. Sustainability 2021, 13, 10884. [Google Scholar] [CrossRef]
- Ma, X.; Hancock, J.T.; Lim Mingjie, K.; Naaman, M. Self-disclosure and perceived trustworthiness of Airbnb host profiles. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing, Portland, OR, USA, 25 February–1 March 2017; pp. 2397–2409. [Google Scholar]
- Ma, X.; Neeraj, T.; Naaman, M. A computational approach to perceived trustworthiness of airbnb host profiles. In Proceedings of the International AAAI Conference on Web and Social Media, Montreal, QC, Canada, 15–18 May 2017; Volume 11, pp. 604–607. [Google Scholar]
- Quattrone, G.; Greatorex, A.; Quercia, D.; Capra, L.; Musolesi, M. Analyzing and predicting the spatial penetration of Airbnb in US cities. EPJ Data Sci. 2018, 7, 31. [Google Scholar] [CrossRef]
- Kuleto, V.; Ilić, M.; Dumangiu, M.; Ranković, M.; Martins, O.M.; Păun, D.; Mihoreanu, L. Exploring opportunities and challenges of artificial intelligence and machine learning in higher education institutions. Sustainability 2021, 13, 10424. [Google Scholar] [CrossRef]
- Chang, R. Report Artificial Intelligence to Grow 47.5 Years. 2017. Available online: https://thejournal.com/articles/2017/03/24/ai-market-to-grow-47.5-percent-over-next-four-years.aspx (accessed on 21 August 2023).
- Lacity, M.; Scheepers, R.; Willcocks, L.; Craig, A. Reimagining the University at Deakin: An IBM Watson Automation Journey. The Outsourcing Unit Working Research Paper Series; OUWP: London, UK, 2017. [Google Scholar]
- Ilić, M.P.; Păun, D.; Popović Šević, N.; Hadžić, A.; Jianu, A. Needs and Performance Analysis for Changes in Higher Education and Implementation of Artificial Intelligence, Machine Learning, and Extended Reality. Educ. Sci. 2021, 11, 568. [Google Scholar] [CrossRef]
- Gollapalli, M.; Rahman, A.; Alkharraa, M.; Saraireh, L.; AlKhulaifi, D.; Salam, A.A.; Krishnasamy, G.; Alam Khan, M.A.; Farooqui, M.; Mahmud, M.; et al. SUNFIT: A Machine Learning-Based Sustainable University Field Training Framework for Higher Education. Sustainability 2023, 15, 8057. [Google Scholar] [CrossRef]
- Wen, Y.; Zhao, X.; Li, X.; Zang, Y. Explaining the Paradox of World University Rankings in China: Higher Education Sustainability Analysis with Sentiment Analysis and LDA Topic Modeling. Sustainability 2023, 15, 5003. [Google Scholar] [CrossRef]
- Shi, Y.; Guo, F. Exploring Useful Teacher Roles for Sustainable Online Teaching in Higher Education Based on Machine Learning. Sustainability 2022, 14, 14006. [Google Scholar] [CrossRef]
- Said, C. Window into Airbnbs hidden impact on S.F. San Francisco Chronicle, June 2014. Available online: https://www.sfchronicle.com/business/item/window-into-airbnb-s-hidden-impact-on-s-f-30110.php (accessed on 5 March 2023).
- Deisenroth, M.; Faisal, A.; Ong, C. Mathematics for Machine Learning; Cambridge University Press: Cambridge, UK, 2020. [Google Scholar]
- Mason, C.; Quigley, J. Non-parametric hedonic housing prices. Hous. Stud. 1996, 11, 373–385. [Google Scholar] [CrossRef]
- Koenker, R. Quantile Regression in R: A Vignette. 2012. Available online: https://cran.r-project.org/web/packages/quantreg/vignettes/rq.pdf (accessed on 10 November 2019).
- Kalehbasti, P.; Nikolenko, L.; Rezaei, H. Airbnb price prediction using machine learning and sentiment analysis. arXiv 2019, arXiv:1907.12665. [Google Scholar]
- Ma, Y.; Zhang, Z.; Ihler, A.; Pan, B. Estimating warehouse rental price using machine learning techniques. Int. J. Comput. Commun. Control. 2018, 13, 235–250. [Google Scholar] [CrossRef]
- Yu, H.; Wu, J. Real Estate Price Prediction with Regression and Classification; CS229 (Machine Learning) Final Project Reports; Stanford University: Stanford, CA, USA, 2016. [Google Scholar]
- Masiero, L.; Nicolau, J.L.; Law, R. A demand-driven analysis of tourist accommodation price: A quantile regression of room bookings. Int. J. Hosp. Manag. 2015, 50, 1–8. [Google Scholar] [CrossRef]
- Wang, D.; Nicolau, J.L. Price determinants of sharing economy based accommodation rental: A study of listings from 33 cities on Airbnb. com. Int. J. Hosp. Manag. 2017, 62, 120–131. [Google Scholar] [CrossRef]
- Li, Y.; Pan, Q.; Yang, T.; Guo, L. Reasonable price recommendation on Airbnb using Multi-Scale clustering. In Proceedings of the 2016 35th Chinese Control Conference (CCC), Chengdu, China, 27–29 July 2016; pp. 7038–7041. [Google Scholar]
- Garcia-López, M.À.; Jofre-Monseny, J.; Martínez-Mazza, R.; Segú, M. Do short-term rental platforms affect housing markets? Evidence from Airbnb in Barcelona. J. Urban Econ. 2020, 119, 103278. [Google Scholar] [CrossRef]
- Loria, S.; Keen, P.; Honnibal, M.; Yankovsky, R.; Karesh, D.; Dempsey, E. Textblob: Simplified text processing. Second. Textblob Simpl. Text Process. 2014, 3, 2014. [Google Scholar]
- Abiola, O.; Abayomi-Alli, A.; Tale, O.A.; Misra, S.; Abayomi-Alli, O. Sentiment analysis of COVID-19 tweets from selected hashtags in Nigeria using VADER and Text Blob analyser. J. Electr. Syst. Inf. Technol. 2023, 10, 5. [Google Scholar] [CrossRef]
- Abayomi-Alli, A.; Abayomi-Alli, O.; Misra, S.; Fernandez-Sanz, L. Study of the Yahoo-Yahoo Hash-Tag tweets using sentiment analysis and opinion mining algorithms. Information 2022, 13, 152. [Google Scholar] [CrossRef]
- Petz, G.; Karpowicz, M.; Fürschuß, H.; Auinger, A.; Stříteskỳ, V.; Holzinger, A. Opinion mining on the web 2.0–characteristics of user generated content and their impacts. In Proceedings of the Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data: Third International Workshop, HCI-KDD 2013, Held at SouthCHI 2013, Maribor, Slovenia, 1–3 July 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 35–46. [Google Scholar]
- Airbnb. Airbnb Data Assumptions. Available online: http://insideairbnb.com/data-assumptions/ (accessed on 15 June 2023).
- Maulud, D.; Abdulazeez, A.M. A review on linear regression comprehensive in machine learning. J. Appl. Sci. Technol. Trends 2020, 1, 140–147. [Google Scholar] [CrossRef]
- Frank, E.; Trigg, L.; Holmes, G.; Witten, I.H. Naive Bayes for regression. Mach. Learn. 2000, 41, 5–25. [Google Scholar] [CrossRef]
- Ranstam, J.; Cook, J. LASSO regression. J. Br. Surg. 2018, 105, 1348. [Google Scholar] [CrossRef]
- Li, Y.; Yang, R.; Wang, X.; Zhu, J.; Song, N. Carbon Price Combination Forecasting Model Based on Lasso Regression and Optimal Integration. Sustainability 2023, 15, 9354. [Google Scholar] [CrossRef]
- McDonald, G.C. Ridge regression. Wiley Interdiscip. Rev. Comput. Stat. 2009, 1, 93–100. [Google Scholar] [CrossRef]
- Bishop, C.M.; Tipping, M.E. Bayesian regression and classification. Nato Sci. Ser. Sub Ser. III Comput. Syst. Sci. 2003, 190, 267–288. [Google Scholar]
- Khan, M.A.; Khan, R.; Algarni, F.; Kumar, I.; Choudhary, A.; Srivastava, A. Performance evaluation of regression models for COVID-19: A statistical and predictive perspective. Ain Shams Eng. J. 2022, 13, 101574. [Google Scholar] [CrossRef]
- Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
- De Myttenaere, A.; Golden, B.; Le Grand, B.; Rossi, F. Mean absolute percentage error for regression models. Neurocomputing 2016, 192, 38–48. [Google Scholar] [CrossRef]
- Bangare, M.L.; Bangare, P.M.; Ramirez-Asis, E.; Jamanca-Anaya, R.; Phoemchalard, C.; Bhat, D.A.R. Role of machine learning in improving tourism and education sector. Mater. Today Proc. 2022, 51, 2457–2461. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Aggarwal, K.; Kirchmeyer, M.; Yadav, P.; Keerthi, S.S.; Gallinari, P. Conditional generative adversarial networks for regression. arXiv190512868 Cs Stat. 2019, 133, 142–146. [Google Scholar]
- Yu, J.; Wen, Y.; Yang, L.; Zhao, Z.; Guo, Y.; Guo, X. Monitoring on triboelectric nanogenerator and deep learning method. Nano Energy 2022, 92, 106698. [Google Scholar] [CrossRef]
Model Name | MSE | MAE | RMSE | |
---|---|---|---|---|
KNN regression | 4369.473 | 2.2966 | 66.102 | 0.9728 |
SVR (without amenities) | 0.0007 | 0.0025 | 0.0257 | 0.995 |
SVR (with amenities) | 0.0006 | 0.0019 | 0.0245 | 0.996 |
Decision tree regression (without amenities) | 5518.656 | 23.7928 | 74.2877 | 0.9656 |
Decision tree regression (with amenities) | 5855.591 | 23.8494 | 76.5219 | 0.9635 |
Ridge Regression | 0.0006 | 0.0016 | 0.0253 | 0.997 |
Lasso Regression | 0.0006 | 0.0019 | 0.0258 | 0.997 |
Bayesian Regression | 0.0007 | 0.0179 | 0.02759 | 0.985 |
Model Name | MSE | MAE | RMSE | |
---|---|---|---|---|
KNN regression | 2,268,707.48 | 42.7961 | 1506.2229 | 0.2998 |
SVR (without amenities) | 0.0006 | 0.0024 | 0.0245 | 0.9998 |
SVR (with amenities) | 0.0005 | 0.0008 | 0.02144 | 0.9998 |
Decision tree regression (without amenities) | 2,257,681.98 | 50.4614 | 1502.5585 | 0.3032 |
Decision tree regression (with amenities) | 2,156,169.00 | 49.2969 | 1468.3899 | 0.3345 |
Ridge Regression | 0.0005 | 0.0016 | 0.0241 | 0.9998 |
Lasso Regression | 0.0005 | 0.0016 | 0.0241 | 0.9998 |
Bayesian Regression | 0.0008 | 0.0018 | 0.0284 | 0.9992 |
Model Name | MSE | MAE | RMSE | |
---|---|---|---|---|
KNN regression | 843,216.88 | 16.12 | 918.27 | 0.5605 |
SVR (without amenities) | 0.0008 | 0.0015 | 0.0278 | 0.99995 |
SVR (with amenities) | 0.0008 | 0.0013 | 0.0277 | 0.99996 |
Decision tree regression (without amenities) | 183,687.91 | 29.87 | 428.59 | 0.9043 |
Decision tree regression (with amenities) | 183,687.91 | 29.87 | 428.59 | 0.9043 |
Ridge Regression | 0.0005 | 0.0016 | 0.0234 | 0.99995 |
Lasso Regression | 0.0005 | 0.0016 | 0.0233 | 0.99995 |
Bayesian Regression | 0.0005 | 0.0016 | 0.0234 | 0.99952 |
Ridge Regression | Lasso Regression | Bayesian Regression |
---|---|---|
subjectivity | bedrooms | polarity |
polarity | accommodates | subjectivity |
bedrooms | beds | bedrooms |
accommodates | bathrooms | accommodate |
beds | property type | beds |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alharbi, Z.H. A Sustainable Price Prediction Model for Airbnb Listings Using Machine Learning and Sentiment Analysis. Sustainability 2023, 15, 13159. https://doi.org/10.3390/su151713159
Alharbi ZH. A Sustainable Price Prediction Model for Airbnb Listings Using Machine Learning and Sentiment Analysis. Sustainability. 2023; 15(17):13159. https://doi.org/10.3390/su151713159
Chicago/Turabian StyleAlharbi, Zahyah H. 2023. "A Sustainable Price Prediction Model for Airbnb Listings Using Machine Learning and Sentiment Analysis" Sustainability 15, no. 17: 13159. https://doi.org/10.3390/su151713159
APA StyleAlharbi, Z. H. (2023). A Sustainable Price Prediction Model for Airbnb Listings Using Machine Learning and Sentiment Analysis. Sustainability, 15(17), 13159. https://doi.org/10.3390/su151713159