Skip Content
You are currently on the new version of our website. Access the old version .
Engineering ProceedingsEngineering Proceedings
  • Proceeding Paper
  • Open Access

15 November 2023

Machine Learning for Accurate Office Room Occupancy Detection Using Multi-Sensor Data †

,
and
1
Department of Computer Engineering, Ahmadu Bello University, Zaria 810211, Nigeria
2
Department of Electrical and Electronics Engineering, Kaduna Polytechnic, Kaduna 800282, Nigeria
*
Author to whom correspondence should be addressed.
Presented at the 10th International Electronic Conference on Sensors and Applications (ECSA-10), 15–30 November 2023; Available online: https://ecsa-10.sciforum.net/.
This article belongs to the Proceedings The 10th International Electronic Conference on Sensors and Applications

Abstract

In this paper, we present a comparative study of several machine learning (ML) approaches for accurate office room occupancy detection through the analysis of multi-sensor data. Our study utilizes the occupancy detection dataset, which incorporates data from temperature, humidity, light, and CO2 sensors, with ground-truth labels obtained from time-stamped images captured at minute intervals. Traditional ML techniques, including Decision Trees (DT), Gaussian Naïve Bayes (NB), K-Nearest Neighbors (KNN), Logistic Regression (LR), Support Vector Machines (SVM), Multilayer Perceptron (MLP), and Quadratic Discriminant Analysis (QDA) are compared alongside advanced ensemble methods like RandomForest (RF), Bagging, AdaBoost, GradientBoosting, ExtraTrees as well as our custom voting and multiple stacking classifiers. Also, hyperparameter optimization was performed for selected models with a view to improving classification accuracy. The performances of the models were evaluated through rigorous cross-validation experiments. The results obtained highlight the efficacy and suitability of varying candidate and ensemble methods, demonstrating the potential of ML techniques in enhancing detection accuracy. Notably, LR and SVM exhibited superior performance, achieving average accuracies of 98.88 ± 0.70% and 98.65 ± 0.96%, respectively. Additionally, our custom voting and stacking ensembles demonstrated improvements in classification outcomes compared to base ensemble schemes, as indicated by various evaluation metrics.

1. Introduction

Occupancy detection refers to the process of determining whether a space or area is currently occupied by people or objects. This can be accomplished through various means and technologies and serves several purposes in different domains, including building management, safety, security, energy conservation, and automation. For instance, efficient energy management in office spaces is today a concern, where environmental sustainability and cost-effectiveness go hand in hand. Estimates indicate that precise office room occupancy detection can lead to energy savings ranging from 30% to 42% [1,2]. These savings can be further optimized, reaching up to 80%, when occupancy data are integrated into HVAC (Heating, Ventilation, and Air Conditioning) control algorithms [3]. Therefore, there is a growing need for accurate occupancy detection methods to harness the full potential of these energy-saving opportunities. This quest for precision in occupancy detection has led to substantial research efforts, especially in the application of ML models. Previous studies have shown that, with sufficient relevant data, the accuracy of occupancy detection can yield remarkable performance levels [4,5,6]. In this paper, we utilize multi-sensor data which are becoming increasingly popular in ML applications as they can provide more accurate and reliable results compared to using a single sensor. The significant contributions of this paper include:
  • Systematic comparison of a wide range of ML models, from traditional to advanced ensemble methods;
  • Optimizing hyperparameters of selected models in order to enhance performance;
  • Evaluating custom voting and multiple stacking classifiers and demonstrating their role in improving classification performance.

3. Materials and Methods

3.1. Data Collection and Preprocessing

We utilized the publicly available occupancy detection dataset, which includes sensor data from temperature, humidity, light, and CO2 sensors, as well as ground-truth labels obtained from time-stamped images captured at minute intervals [7].

3.2. Feature Engineering

We performed correlation analysis on the dataset to identify relevant features for the occupancy detection task. Most features have strong positive correlations with the target variable (occupancy) except for humidity and humidity ratio, with a relatively weak correlation with the target variable. However, we retained all features without thresholding any sensor data in our analysis. Truncating below a specific threshold and trying other feature combinations is left for future research.

3.3. Model Selection

For our analysis, we selected a set of traditional ML as well as advanced (ensemble) models for the comparative study. The traditional ML models include Decision Trees, Gaussian Naïve Bayes, KNN, LR, SVM, MLP, and QDA. The Ensemble methods include RF, Bagging, AdaBoost, GradientBoosting, and ExtraTrees. Furthermore, we tried several Custom ensemble methods as follows:
  • Voting Classifier, consisting of LR, RF, and SVM;
  • StackingClassifier1, consisting of LR, RF, and SVM as base estimators with LR as the final estimator;
  • StackingClassifier2, consisting of Decision Tree, KNN, and MLP Classifiers as base estimators with LR as the final estimator;
  • Stacking Classifier3, consisting of GaussianNB, SVM, and QDA as base estimators with LR as the final estimator;
  • StackingClassifier4, consisting of RF, MLP Classifier, and SVM as base estimators with LR as the final estimator.

3.4. Hyperparameter Optimization

In order to obtain better performance, we further performed parameter tuning via grid search for RF, SVM, and KNN classifiers. Each grid search was performed with 5-fold cross-validation. For RF, the search was conducted over the number of estimators (10, 20, 30), maximum depth (15, 20, 30, 50), and criterion (gini, entropy). Also, the SVM was tuned over C (1, 10, 100) and kernel types (linear, poly, rbf, sigmoid). Finally, KNN was optimized by searching for the optimal number of neighbors (2, 3, 5, 10, 15, 20).

3.5. Model Training and Evaluation

Rigorous cross-validation experiments (using 5-fold cross-validation) were performed in order to assess the performance of the models. We then split the dataset into 70% training and 30% testing, retrained each model on the training set, and evaluated the models’ performance on the test set using accuracy, precision, recall, and F1-score as performance metrics.

4. Results and Discussions

Table 1 and Table 2, respectively, show the 5-fold cross-validation as well as the testing results for the traditional ML models, while Table 3 and Table 4, respectively, show the 5-fold cross-validation as well as the testing results for the ensemble models.
Table 1. Cross-validation results for the traditional ML methods.
Table 2. Results of the traditional ML methods on test set.
Table 3. Cross-validation results for the ensemble methods.
Table 4. Experimental results of the ensemble methods on test set.
From Table 1, LR and SVM achieved the highest validation accuracies of 98.88% and 98.65%, respectively. These models also demonstrated strong precision, recall, and F1-Score values, indicating their suitability for accurate occupancy detection. Also, for the test data (Table 2), SVM, LR, KNN, and DT models exhibit high accuracy levels above 99%. The ensemble methods (Table 3), particularly our voting and stacking models, show high performance, with stackingclassifer1 achieving the highest validation accuracy of approximately 98.89 ± 0.72%, outperforming others. Classification results on the test data (Table 4) indicate that most ensemble methods achieve high accuracy levels, with RF, ExtraTrees, and StackingClassifier1 being particularly notable achieving above 99.30% accuracy. These models also exhibit strong precision, recall, and F1-Score values, reflecting their effectiveness in making accurate predictions. Also, the voting ensemble, which recorded a slightly lower accuracy, still demonstrates a good balance between precision and recall. For the optimized models, we finally arrived at the following as the best hyperparameters for the respective algorithms: SVM (C = 10 and kernel = ‘linear’), KNN (n_neighbors = 20), RF (n_estimators = 50, max_depth = 44, and criterion = ‘entropy’). Utilizing these parameters, the test results presented in Table 5 were obtained. The performance improvements recorded for KNN and RF show that hyperparameter optimization can improve the predictive accuracy of ML classifiers.
Table 5. Experimental results of the optimized methods on test set.

5. Conclusions

In conclusion, this paper has presented a comparative study of ML approaches for office room occupancy detection using multi-sensor data. Our findings indicate that LR and SVM achieved impressive performance. Furthermore, our custom stacking ensembles demonstrated significant improvements over most base ensemble schemes. The study provides a comprehensive insight into the potential of several ML techniques in the domain of room occupancy detection.

Author Contributions

Conceptualization, Y.I. and A.I.M.; methodology, Y.I. and U.Y.B.; software, Y.I. and U.Y.B.; validation, U.Y.B., A.I.M. and Y.I.; investigation, Y.I.; resources, Y.I. and U.Y.B.; data curation, Y.I.; writing—original draft preparation, Y.I.; writing—review and editing, U.Y.B. and A.I.M.; visualization, A.I.M.; supervision, Y.I.; project administration, A.I.M.; funding acquisition, Y.I., U.Y.B. and A.I.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data used in this study was obtained from the UCI Machine Learning Repository (https://doi.org/10.24432/C5X01N). The dataset is publicly available and can be accessed at https://archive.ics.uci.edu/dataset/357/occupancy+detection (accessed on 7 June 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Erickson, V.L.; Carreira-Perpiñán, M.Á.; Cerpa, A.E. Occupancy modeling and prediction for building energy management. ACM Trans. Sens. Netw. (TOSN) 2014, 10, 1–28. [Google Scholar] [CrossRef]
  2. Kim, Y.-M.; Lee, Y.-H.; Pyo, C.-S. Accurate Occupancy Detection via Label Noise Filtering Technique. In Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 21–23 October 2020; pp. 1381–1383. [Google Scholar]
  3. Brooks, J.; Kumar, S.; Goyal, S.; Subramany, R.; Barooah, P. Energy-efficient control of under-actuated HVAC zones in commercial buildings. Energy Build. 2015, 93, 160–168. [Google Scholar] [CrossRef]
  4. Zemouri, S.; Gkoufas, Y.; Murphy, J. A machine learning approach to indoor occupancy detection using non-intrusive environmental sensor data. In Proceedings of the Proceedings of the 3rd International Conference on Big Data and Internet of Things, Melbourn, Australia, 22–24 August 2019; pp. 70–74. [Google Scholar]
  5. Zhao, H.; Hua, Q.; Chen, H.-B.; Ye, Y.; Wang, H.; Tan, S.X.-D.; Tlelo-Cuautle, E. Thermal-sensor-based occupancy detection for smart buildings using machine-learning methods. ACM Trans. Des. Autom. Electron. Syst. (TODAES) 2018, 23, 1–21. [Google Scholar] [CrossRef]
  6. Toutiaee, M. Occupancy detection in room using sensor data. arXiv 2021, arXiv:2101.03616. [Google Scholar] [CrossRef]
  7. Candanedo, L.M.; Feldheim, V. Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models. Energy Build. 2016, 112, 28–39. [Google Scholar] [CrossRef]
  8. Yang, Z.; Li, N.; Becerik-Gerber, B.; Orosz, M. A systematic approach to occupancy modeling in ambient sensor-rich buildings. Simulation 2014, 90, 960–977. [Google Scholar] [CrossRef]
  9. Dong, B.; Andrews, B.; Lam, K.P.; Höynck, M.; Zhang, R.; Chiou, Y.-S.; Benitez, D. An information technology enabled sustainability test-bed (ITEST) for occupancy detection through an environmental sensing network. Energy Build. 2010, 42, 1038–1046. [Google Scholar] [CrossRef]
  10. Lam, K.P.; Höynck, M.; Dong, B.; Andrews, B.; Chiou, Y.-S.; Zhang, R.; Benitez, D.; Choi, J. Occupancy detection through an extensive environmental sensor network in an open-plan office building. IBPSA Build. Simul. 2009, 145, 1452–1459. [Google Scholar]
  11. Zuraimi, M.; Pantazaras, A.; Chaturvedi, K.; Yang, J.; Tham, K.; Lee, S. Predicting occupancy counts using physical and statistical Co2-based modeling methodologies. Build. Environ. 2017, 123, 517–528. [Google Scholar] [CrossRef]
  12. Kraipeerapun, P.; Amornsamankul, S. Room occupancy detection using modified stacking. In Proceedings of the 9th International Conference on Machine Learning and Computing, Singapore, 24–26 February 2017; pp. 162–166. [Google Scholar]
  13. Dutta, J.; Roy, S. OccupancySense: Context-based indoor occupancy detection & prediction using CatBoost model. Appl. Soft Comput. 2022, 119, 108536. [Google Scholar] [CrossRef]
  14. Elkhoukhi, H.; Bakhouya, M.; El Ouadghiri, D.; Hanifi, M. Using stream data processing for real-time occupancy detection in smart buildings. Sensors 2022, 22, 2371. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.