An Audio-Based SLAM for Indoor Environments: A Robotic Mixed Reality Presentation
Abstract
:1. Introduction
2. Related Studies
3. Materials and Method
3.1. Overall System Architecture
3.2. Active Speaker Localization Using Microphone Array
3.3. Audio-Based Ellipsoidal Virtual HoloSLAM Algorithm Implementation
Start-launch the Holo-landmark hologram app. |
voice function command (Start) |
Place Virtual Landmark Observation– |
-Voice function command (place virtual landmark). |
Take a Picture (if needed) – |
-voice function command (takepicture). |
Exit- close the Holo-landmark hologram app. |
voice function (Exit) |
Start |
Initialization—SLAM Initialization, NAO Robot Initialization, Launch Holo-landmark hologram app (voice function command(start)). |
Get Observation (4)–Is the Active Speaker Identified? |
Yes—Holo-landmark hologram app |
-Place Virtual Landmark (voice function command (place virtual landmark)) |
No—No action (wait for speaker identification). |
while not_stop |
Prediction Step—Check for a safe distance to move by sonar. (Move command) |
No—safe distance. Turn 180 degrees. |
Is the Active Speaker Identified? |
Yes—Holo-landmark hologram app |
-Place Virtual Landmark (voice function command (place virtual landmark)) |
No—No action (wait for speaker identification) |
Data Association(5)- |
Virtual Landmark matching and data-association simplification |
Correction_Step—Run standard Ellipsoidal—SLAM update step. |
Augmented_Map—Add new Virtual Landmarks to the map |
Check if iteration numbers are achieved. |
No—Go to step 4 |
End—Close-Holo-landmark hologram app. Voice function command (stop) |
3.4. Active Speaker Representation and Modeling
3.4.1. Feature Extraction Techniques for Active Speaker Identification
3.4.2. Active Speaker Representations (Classification Algorithms)
Gaussian Mixture Model
Support Vector Machine (SVM)
Deep Learning-Based Models Architecture
- 1.
- Convolutional Long Short-Term Memory Network
- 2.
- Time-delay neural networks (TDNNs)
4. Experimental Results
4.1. Active Speaker Identification
4.2. Audio Ellipsoidal-HoloSLAM Algorithm
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Javaid, M.; Haleem, A.; Singh, R.P.; Suman, R. Substantial capabilities of robotics in enhancing Industry 4.0 implementation. Cogn. Robot. 2021, 1, 58–75. [Google Scholar] [CrossRef]
- Nilsson, N.J.; Park, M. A Mobile Automaton: An Application of Artificial Intelligence Techniques. In Proceedings of the International Joint Conference on Artificial, Washington, DC, USA, 7–9 May 1969. [Google Scholar]
- Yasuda, Y.D.V.; Martins, L.E.G.; Cappabianco, F.A.M. Autonomous Visual Navigation for Mobile Robots: A systematic literature review. ACM Comput. Surv. 2020, 53, 1–34. [Google Scholar] [CrossRef]
- Campos-Macías, L.; Aldana-López, R.; de la Guardia, R.; Parra-Vilchis, J.I.; Gómez-Gutiérrez, D. Autonomous navigation of MAVs in unknown cluttered environments. J. Field Robot. 2021, 38, 307–326. [Google Scholar] [CrossRef]
- Taheri, H.; Xia, Z.C. SLAM; definition and evolution. Eng. Appl. Artif. Intell. 2021, 97, 104032. [Google Scholar] [CrossRef]
- Alsadik, B.; Karam, S. The Simultaneous Localization and Mapping (SLAM)—An Overview. J. Appl. Sci. Technol. Trends 2021, 2, 147–158. [Google Scholar] [CrossRef]
- Takleh, T.T.O.; Abu Bakar, N.; Rahman, S.A.; Hamzah, R.; Aziz, Z.A. A Brief Survey on SLAM Methods in Autonomous Vehicle. Int. J. Eng. Technol. 2018, 7, 38. [Google Scholar] [CrossRef]
- Basilico, N. Recent Trends in Robotic Patrolling. Curr. Robot. Rep. 2022, 3, 65–76. [Google Scholar] [CrossRef]
- Panigrahi, P.K.; Bisoy, S.K. Localization strategies for autonomous mobile robots: A review. J. King Saud Univ.—Comput. Inf. Sci. 2022, 34, 6019–6039. [Google Scholar] [CrossRef]
- Munguía, R.; Grau, A. Concurrent Initialization for Bearing-Only SLAM. Sensors 2010, 10, 1511–1534. [Google Scholar] [CrossRef]
- Lahemer, E.S.; Rad, A. An Adaptive Augmented Vision-Based Ellipsoidal SLAM for Indoor Environments. Sensors 2019, 19, 2795. [Google Scholar] [CrossRef]
- Tourani, A.; Bavle, H.; Sanchez-Lopez, J.L.; Voos, H. Visual SLAM: What Are the Current Trends and What to Expect? Sensors 2022, 22, 9297. [Google Scholar] [CrossRef] [PubMed]
- Barros, A.M.; Michel, M.; Moline, Y.; Corre, G.; Carrel, F. A Comprehensive Survey of Visual SLAM Algorithms. Robotics 2022, 11, 24. [Google Scholar] [CrossRef]
- Keyrouz, F. Advanced Binaural Sound Localization in 3-D for Humanoid Robots. IEEE Trans. Instrum. Meas. 2014, 63, 2098–2107. [Google Scholar] [CrossRef]
- Wang, C.-C.; Lin, C.-H.; Hu, J.-S. Probabilistic Structure from Sound. Adv. Robot. 2009, 23, 1687–1702. [Google Scholar] [CrossRef]
- Risoud, M.; Hanson, J.-N.; Gauvrit, F.; Renard, C.; Lemesre, P.-E.; Bonne, N.-X.; Vincent, C. Sound source localization. Eur. Ann. Otorhinolaryngol. Head Neck Dis. 2018, 135, 259–264. [Google Scholar] [CrossRef] [PubMed]
- Trowitzsch, I.; Schymura, C.; Kolossa, D.; Obermayer, K. Joining Sound Event Detection and Localization Through Spatial Segregation. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 487–502. [Google Scholar] [CrossRef]
- Sun, Y.; Chen, J.; Yuen, C.; Rahardja, S. Indoor Sound Source Localization with Probabilistic Neural Network. IEEE Trans. Ind. Electron. 2018, 65, 6403–6413. [Google Scholar] [CrossRef]
- Lee, R.; Kang, M.-S.; Kim, B.-H.; Park, K.-H.; Lee, S.Q.; Park, H.-M. Sound Source Localization Based on GCC-PHAT With Diffuseness Mask in Noisy and Reverberant Environments. IEEE Access 2020, 8, 7373–7382. [Google Scholar] [CrossRef]
- Nadiri, O.; Rafaely, B. Localization of Multiple Speakers under High Reverberation using a Spherical Microphone Array and the Direct-Path Dominance Test. IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 1494–1505. [Google Scholar] [CrossRef]
- Wang, D.; Brown, G.J. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications; Wiley-IEEE Press: Hoboken, NJ, USA, 2006. [Google Scholar] [CrossRef]
- Liaquat, M.U.; Munawar, H.S.; Rahman, A.; Qadir, Z.; Kouzani, A.Z.; Mahmud, M.A.P. Localization of Sound Sources: A Systematic Review. Energies 2021, 14, 3910. [Google Scholar] [CrossRef]
- Su, D.; Vidal-Calleja, T.; Miro, J.V. Simultaneous asynchronous microphone array calibration and sound source localisation. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015. [Google Scholar] [CrossRef]
- Chen, X.; Sun, H.; Zhang, H. A New Method of Simultaneous Localization and Mapping for Mobile Robots Using Acoustic Landmarks. Appl. Sci. 2019, 9, 1352. [Google Scholar] [CrossRef]
- Qiu, W.; Wang, G.; Zhang, W. Acoustic SLAM Based on the Direction-of-Arrival and the Direct-to-Reverberant Energy Ratio. Drones 2023, 7, 120. [Google Scholar] [CrossRef]
- Zhao, J.; Zhang, G.; Qu, J.; Chen, J.; Liang, S.; Wei, K.; Wang, G. A Sound Source Localization Method Based on Frequency Divider and Time Difference of Arrival. Appl. Sci. 2023, 13, 6183. [Google Scholar] [CrossRef]
- Thai, D.Z.; Hashemi-sakhtsari, A.; Pattison, T. Speaker Localisation Using Time Difference of Arrival; Technical Report (Defence Science and Technology Organisation (Australia)); DSTO: Edinburgh, Australia, 2008; pp. 1–6. [Google Scholar]
- Knapp, C.; Carter, G. The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 1976, 24, 320–327. [Google Scholar] [CrossRef]
- Meng, L.; Li, X.H.; Zhang, W.G.; Liu, D.Z. The Generalized Cross-Correlation Method for Time Delay Estimation of Infrasound Signal. In Proceedings of the 2015 Fifth International Conference on Instrumentation & Measurement, Computer, Communication and Control (IMCCC), Qinhuangdao, China, 18–20 September 2015; pp. 1320–1323. [Google Scholar] [CrossRef]
- Evers, C.; Naylor, P.A. Acoustic SLAM. IEEE/ACM Trans. Audio Speech Lang. Process. 2018, 26, 1484–1498. [Google Scholar] [CrossRef]
- O’Reilly, J.; Cirstea, S.; Cirstea, M.; Zhang, J. A Novel Development of Acoustic SLAM. In Proceedings of the 2019 International Aegean Conference on Electrical Machines and Power Electronics (ACEMP) & 2019 International Conference on Optimization of Electrical and Electronic Equipment (OPTIM), Istanbul, Turkey, 27–29 August 2019; pp. 525–531. [Google Scholar] [CrossRef]
- Hu, J.S.; Chan, C.Y.; Wang, C.K.; Lee, M.T.; Kuo, C.Y. Simultaneous Localization of a Mobile Robot and Multiple Sound Sources Using a Microphone Array. Adv. Robot. 2011, 25, 135–152. [Google Scholar] [CrossRef]
- Valin, J.-M.; Michaud, F.; Rouat, J. Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robot. Auton. Syst. 2007, 55, 216–228. [Google Scholar] [CrossRef]
- Narang, G.; Nakamura, K.; Nakadai, K. Auditory-aware navigation for mobile robots based on reflection-robust sound source localization and visual SLAM. In Proceedings of the2014 IEEE International Conference on Systems, Man and Cybernetics, San Diego, CA, USA, 5–8 October 2014. [Google Scholar] [CrossRef]
- Milgram, P.; Fumio, K. A Taxonomy of Mixed Reality Visual Displays. IEICE Trans. Inf. Syst. 2003, E77-D, 1321–1329. [Google Scholar]
- Flavián, C.; Ibáñez-Sánchez, S.; Orús, C. The impact of virtual, augmented and mixed reality technologies on the customer experience. J. Bus. Res. 2019, 100, 547–560. [Google Scholar] [CrossRef]
- Vroegop, D. Microsoft HoloLens Developer’s Guide; Packt Publishing: Birmingham, UK, 2017; Available online: https://learning.oreilly.com/library/view/microsoft-hololens-developers/9781786460851/ (accessed on 1 January 2020).
- Gelin, R. NAO. In Humanoid Robotics: A Reference; Goswami, A., Vadakkepat, P., Eds.; Springer: Dordrecht, The Netherlands, 2019; pp. 147–168. ISBN 978-94-007-6046-2. [Google Scholar]
- Al-Qaderi, M.; Lahamer, E.; Rad, A. A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation. Sensors 2021, 21, 5097. [Google Scholar] [CrossRef]
- Reynolds, D.A.; Quatieri, T.F.; Dunn, R.B. Speaker Verification Using Adapted Gaussian Mixture Models. Digit. Signal Process. 2000, 10, 19–41. [Google Scholar] [CrossRef]
- Jakkula, V. Tutorial on Support Vector Machine (SVM). School of EECS, Washington State University. 2011, pp. 1–13. Available online: http://www.ccs.neu.edu/course/cs5100f11/resources/jakkula.pdf (accessed on 1 January 2020).
- El-Moneim, S.A.; Sedik, A.; Nassar, M.A.; El-Fishawy, A.S.; Sharshar, A.M.; Hassan, S.E.A.; Mahmoud, A.Z.; Dessouky, M.I.; El-Banby, G.M.; El-Samie, F.E.A.; et al. Text-dependent and text-independent speaker recognition of reverberant speech based on CNN. Int. J. Speech Technol. 2021, 24, 993–1006. [Google Scholar] [CrossRef]
- Waibel, A.; Hanazawa, T.; Hinton, G.; Shikano, K.; Lang, K. Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Signal Process. 1989, 37, 328–339. [Google Scholar] [CrossRef]
- Nakadai, K.; Lourens, T.; Okuno, H.G.; Kitano, H. Active Audition for Humanoid. In Proceedings of the 17th National Conference on Artificial Intelligence and 12th Conference on Innovative Applications of Artificial Intelligence, AAAI 2000, Austin, TX, USA, 30 July–3 August 2000. [Google Scholar]
- Rascon, C.; Meza, I. Localization of sound sources in robotics: A review. Robot. Auton. Syst. 2017, 96, 184–210. [Google Scholar] [CrossRef]
- Desai, D.; Mehendale, N. A Review on Sound Source Localization Systems. SSRN Electron. J. 2021, 29, 4631–4642. [Google Scholar] [CrossRef]
- Argentieri, S.; Danès, P.; Souères, P. A survey on sound source localization in robotics: From binaural to array processing methods. Comput. Speech Lang. 2015, 34, 87–112. [Google Scholar] [CrossRef]
- Flynn, A.M.; Brooks, R.A.; Wells, W.M., III; Barrett, D.S. Squirt: The Prototypical Mobile Robot for Autonomous Graduate Students; DTIC: Fort Belvoir, VA, USA, 1989. [Google Scholar]
- Irie, R.E.; Brooks, R.A.; Morgenthaler, F.R. Robust Sound Localization: An Application of an Auditory Perception System for a Humanoid Robot. Master’s Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1995. [Google Scholar]
- Liu, M.; Zeng, Q.; Jian, Z.; Peng, Y.; Nie, L. A sound source localization method based on improved second correlation time delay estimation. Meas. Sci. Technol. 2023, 34, 045102. [Google Scholar] [CrossRef]
- Klee, U.; Gehrig, T.; McDonough, J. Kalman Filters for Time Delay of Arrival-Based Source Localization. EURASIP J. Adv. Signal Process. 2006, 2006, 012378. [Google Scholar] [CrossRef]
- Kallakuri, N.; Even, J.; Morales, Y.; Ishi, C.; Hagita, N. Probabilistic approach for building auditory maps with a mobile microphone array. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 6–10 May 2013. [Google Scholar] [CrossRef]
- Zhong, X.; Hopgood, J.R. Particle filtering for TDOA based acoustic source tracking: Nonconcurrent Multiple Talkers. Signal Process. 2014, 96, 382–394. [Google Scholar] [CrossRef]
- Ogiso, S.; Kawagishi, T.; Mizutani, K.; Wakatsuki, N.; Zempo, K. Self-localization method for mobile robot using acoustic beacons. ROBOMECH J. 2015, 2, 1364. [Google Scholar] [CrossRef]
- Lee, B.-G.; Choi, J.; Kim, D.; Kim, M. Sound source localization in reverberant environment using visual information. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2010), Taipei, Taiwan, 18–22 October 2010. [Google Scholar] [CrossRef]
- Ham, I.; Asano, F.; Kawai, Y.; Kanchiro, F.; Yamamoto, K.; Asoh, H.; Ogata, J.; Ichintura, N.; Hirukawa, H. Robust speech interface based on audio and video information fusion for humanoid HRP-2. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sendai, Japan, 28 September–2 October 2004. [Google Scholar] [CrossRef]
- Sasaki, Y.; Kagami, S.; Mizoguchi, H. Multiple Sound Source Mapping for a Mobile Robot by Self-motion Triangulation. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 9–15 October 2006. [Google Scholar] [CrossRef]
- Dokmanić, I.; Parhizkar, R.; Walther, A.; Lu, Y.M.; Vetterli, M. Acoustic echoes reveal room shape. Proc. Natl. Acad. Sci. USA 2013, 110, 12186–12191. [Google Scholar] [CrossRef]
- Gentner, C.; Jost, T. Indoor positioning using time difference of arrival between multipath components. In Proceedings of the 2013 International Conference on Indoor Positioning and Indoor Navigation (IPIN), Montbeliard, France, 28–31 October 2013. [Google Scholar] [CrossRef]
- Antonacci, F.; Filos, J.; Thomas, M.R.P.; Habets, E.A.P.; Sarti, A.; Naylor, P.A.; Tubaro, S. Inference of Room Geometry From Acoustic Impulse Responses. IEEE Trans. Audio Speech Lang. Process. 2012, 20, 2683–2695. [Google Scholar] [CrossRef]
- Ureña, J.; Hernández, A.; Jiménez, A.; Villadangos, J.; Mazo, M.; García, J.; Álvarez, F.; De Marziani, C.; Pérez, M.; Seco, F. Advanced sensorial system for an acoustic LPS. Microprocess. Microsyst. 2007, 31, 393–401. [Google Scholar] [CrossRef]
- Lahemer, E.S.; Rad, A. HoloSLAM: A novel approach to virtual landmark-based SLAM for indoor environments. Complex Intell. Syst. 2024, 1–26. [Google Scholar] [CrossRef]
- SoftBank Robotics, Nao Humanoid Robot. Available online: https://wiki.seeedstudio.com/ReSpeaker_Mic_Array_v2.0/ (accessed on 1 January 2020).
- ReSpeaker Mic Array v2.0. Available online: https://wiki.seeedstudio.com/ReSpeaker_Mic_Array/ (accessed on 25 April 2024).
- Valin, J.-M.; Michaud, F.; Rouat, J.; Letourneau, D. Robust sound source localization using a microphone array on a mobile robot. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA, 27–31 October 2003. [Google Scholar] [CrossRef]
- Valencia-Palma, A.; Córdova-Esparza, D.M. Sound Source Localization Using Beamforming and Its Representation in a Mixed Reality Embedded Device. In Pattern Recognition; Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
- Jang, Y.; Kim, J.; Kim, J. The development of the vehicle sound source localization system. In Proceedings of the 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Hong Kong, China, 16–19 December 2016. [Google Scholar] [CrossRef]
- Brandstein, M.; Silverman, H. A robust method for speech signal time-delay estimation in reverberant rooms. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, 21–24 April 1997. [Google Scholar] [CrossRef]
- Li, X.; Liu, H.; Yang, X. Sound source localization for mobile robot based on time difference feature and space grid matching. In Proceedings of the 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011), San Francisco, CA, USA, 25–30 September 2011. [Google Scholar] [CrossRef]
- Hornstein, J.; Lopes, M.; Santos-Victor, J.; Lacerda, F. Sound Localization for Humanoid Robots—Building Audio-Motor Maps based on the HRTF. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, 9–15 October 2006. [Google Scholar] [CrossRef]
- Nakadai, K.; Hidai, K.-I.; Okuno, H.G.; Kitano, H. Real-time multiple speaker tracking by multi-modal integration for mobile robots. In Proceedings of the 7th European Conference on Speech Communication and Technology (EuroSpeech 2001), Aalborg, Denmark, 3–7 September 2001. [Google Scholar] [CrossRef]
- Bray, B.; Zeller, M.; Schonning, N. What Is Mixed Reality? Microsoft. 2018. Available online: https://docs.microsoft.com/en-us/windows/mixed-reality/mixed-reality (accessed on 12 February 2024).
- Alexandrea, P. Top 12 Best 3D Software for Beginners. 2019. Available online: https://www.3dnatives.com/en/3d-software-beginners100420174/ (accessed on 1 April 2019).
- Mariani, J. Spoken Language Processing; ISTE Ltd.: London, UK, 2010; ISBN 9781848210318. [Google Scholar] [CrossRef]
- Bai, Z.; Zhang, X.-L. Speaker recognition based on deep learning: An overview. Neural Netw. 2021, 140, 65–99. [Google Scholar] [CrossRef]
- Naik, J. Speaker verification: A tutorial. IEEE Commun. Mag. 1990, 28, 42–48. [Google Scholar] [CrossRef]
- Jahangir, R.; Teh, Y.W.; Nweke, H.F.; Mujtaba, G.; Al-Garadi, M.A.; Ali, I. Speaker identification through artificial intelligence techniques: A comprehensive review and research challenges. Expert Syst. Appl. 2021, 171, 114591. [Google Scholar] [CrossRef]
- Sharma, G.; Umapathy, K.; Krishnan, S. Trends in audio signal feature extraction methods. Appl. Acoust. 2019, 158, 107020. [Google Scholar] [CrossRef]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Altalbe, A. RETRACTED ARTICLE: Audio fingerprint analysis for speech processing using deep learning method. Int. J. Speech Technol. 2022, 25, 575–581. [Google Scholar] [CrossRef]
- Staroniewicz, P.; Majewski, W. SVM Based Text-Dependent Speaker Identification for Large Set of Voices. In Proceedings of the European Signal Processing Conference, Nice, France, 31 August–4 September 2015. [Google Scholar]
- Jawarkar, N.P. Speaker Identification in Noisy Environment. Int. J. Curr. Eng. Sci. Res. 2017, 4, 37–43. [Google Scholar]
- Abeßer, J. A Review of Deep Learning Based Methods for Acoustic Scene Classification. Appl. Sci. 2020, 10, 2020. [Google Scholar] [CrossRef]
- Povey, D.; Ghoshal, A.; Boulianne, G.; Burget, L.; Glembek, O.; Goel, N.; Hannemann, M.; Motlicek, P.; Qian, Y.; Schwarz, P.; et al. The Kaldi Speech Recognition Toolkit; IEEE Signal Processing Society: Piscataway, NJ, USA, 2011. [Google Scholar]
- Tchistiakova. Time Delay Neural Network. Available online: https://kaleidoescape.github.io/tdnn (accessed on 20 October 2023).
- Nao Documentation. Available online: http://doc.aldebaran.com/2-8/home_nao.html (accessed on 12 February 2024).
Covariance Type | ||||||||
---|---|---|---|---|---|---|---|---|
k-th Split | MFCC | GFCC | ||||||
k | Full | Diag | Tied | Spherical | Full | Diag | Tied | Spherical |
1 | 79.687500 | 43.750000 | 35.937500 | 32.031250 | 39.062500 | 50.000000 | 44.791667 | 43.359375 |
2 | 21.875000 | 14.843750 | 17.187500 | 18.359375 | 39.062500 | 39.062500 | 39.062500 | 40.234375 |
3 | 78.461538 | 42.307692 | 55.384615 | 46.153846 | 38.461538 | 36.153846 | 36.923077 | 43.846154 |
4 | 21.538462 | 13.846154 | 35.384615 | 31.923077 | 38.461538 | 50.000000 | 46.153846 | 43.076923 |
5 | 21.538462 | 56.923077 | 45.641026 | 39.615385 | 44.615385 | 41.538462 | 40.512821 | 40.384615 |
6 | 20.000000 | 56.923077 | 44.615385 | 53.076923 | 32.307692 | 35.384615 | 35.897436 | 36.153846 |
7 | 84.615385 | 87.692308 | 86.666667 | 68.846154 | 32.307692 | 33.846154 | 32.820513 | 33.461538 |
8 | 84.615385 | 46.923077 | 57.435897 | 46.923077 | 40.000000 | 37.692308 | 47.179487 | 45.000000 |
9 | 83.076923 | 44.615385 | 36.410256 | 47.307692 | 40.000000 | 50.000000 | 54.871795 | 50.000000 |
10 | 23.076923 | 14.615385 | 36.410256 | 47.307692 | 69.230769 | 53.846154 | 48.717949 | 45.769231 |
Training Parameters | Recognition Rate (in %) | Loss Function Value | Combined Features Used (MFCC/GFCC) | |
---|---|---|---|---|
Conv-LSTM | 824,322 | 93.75 | 0.1989 | MFCC |
Dense Neural Network | 429,936 | 87.5 | 0.6421 | MFCC |
Time Delay Neural Network | 88,322 | 100 | 0.0003 | MFCC |
Conv-LSTM | 824,322 | 90.625 | 0.2838 | GFCC |
Dense Neural Network | 429,936 | 93.75 | 0.1724 | GFCC |
Time Delay Neural Network | 88,322 | 93.75 | 0.1666 | GFCC |
Specification | Details |
---|---|
Height | 58 cm (22.8 inches) |
Weight | 4.3 kg (9.5 lbs.) |
Degrees of Freedom | 25 |
Sensors | -two HD cameras |
-four microphones | |
-Touch sensors (head, hands, feet) | |
-Inertial measurement unit (IMU) | |
-Ultrasonic sensors | |
Processing Unit | Intel Atom Z530 processor |
Memory | 1 GB RAM |
Operating System | Linux-based NAOqi OS |
Connectivity | -Ethernet |
-Wi-Fi | |
-Bluetooth | |
Power Source | Rechargeable lithium-ion battery |
Battery Life | Up to 90 min of continuous operation |
Development Framework | Choregraphe (graphical programming software) |
Python SDK | |
C++ SDK |
Algorithm | Nao Position Error/m | Nao Orientation Error/rad | Virtual Landmarks Error/m |
---|---|---|---|
Nao IMU | 33.01 | 0.675 | |
Audio-based Ellipsoidal Virtual HoloSLAM | 0.0184 | 0.119 | 0.010 |
Total Times of identification speaker called | 23 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lahemer, E.S.F.; Rad, A. An Audio-Based SLAM for Indoor Environments: A Robotic Mixed Reality Presentation. Sensors 2024, 24, 2796. https://doi.org/10.3390/s24092796
Lahemer ESF, Rad A. An Audio-Based SLAM for Indoor Environments: A Robotic Mixed Reality Presentation. Sensors. 2024; 24(9):2796. https://doi.org/10.3390/s24092796
Chicago/Turabian StyleLahemer, Elfituri S. F., and Ahmad Rad. 2024. "An Audio-Based SLAM for Indoor Environments: A Robotic Mixed Reality Presentation" Sensors 24, no. 9: 2796. https://doi.org/10.3390/s24092796
APA StyleLahemer, E. S. F., & Rad, A. (2024). An Audio-Based SLAM for Indoor Environments: A Robotic Mixed Reality Presentation. Sensors, 24(9), 2796. https://doi.org/10.3390/s24092796