Human–Computer Interaction for Intelligent Systems

Pleva, Matúš; Liao, Yuan-Fu; Bours, Patrick

doi:10.3390/electronics12010161

Open AccessEditorial

Human–Computer Interaction for Intelligent Systems

by

Matúš Pleva

^1,*

,

Yuan-Fu Liao

²

and

Patrick Bours

³

¹

Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Němcovej 32, 040 01 Košice, Slovakia

²

Artificial Intelligence Innovation, Industry Academia Innovation School, National Yang Ming Chiao Tung University, No.1001, University Road, Hsinchu 30010, Taiwan

³

Department of Information Security and Communication Technology, Faculty of Information Technology and Electrical Engineering, Norwegian University of Science and Technology, Teknologivegen 22, NO-2815 Gjøvik, Oppland, Norway

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(1), 161; https://doi.org/10.3390/electronics12010161

Submission received: 26 December 2022 / Accepted: 27 December 2022 / Published: 29 December 2022

(This article belongs to the Special Issue Human Computer Interaction for Intelligent Systems)

Download Versions Notes

1. Introduction

The further development of human–computer interaction applications is still in great demand as users expect more natural interactions. For example, speech communication in many languages is expected as a basic feature for intelligent systems, such as robotic systems, autonomous vehicles, or virtual assistants. For this Special Issue, we invited submissions from researchers addressing the unique opportunities and challenges associated with human–computer interaction with intelligent systems. We encouraged authors to submit reports describing systems built for different languages and multilingual systems. We also invited submissions from researchers studying the linguistic, emotional, prosodic, and dialogue aspects of speech communication. We proposed to have a dialogue about other input and output modalities, including multimodal systems, fusion/fission algorithms, and deep learning methods. We encouraged the authors to report in detail the state-of-art results, and provide useful reviews and data used to build such systems to support development in those areas. The rapidly growing domain of virtual reality applications is of interest both as an application domain in which new interfaces and interaction methods are needed and as a potential testbed for evaluating speech and other interface modalities.

2. Short Presentation of the Papers

Every high-quality research started using a deep state-of-the-art review. We are proud to present great reviews in our collection about speech emotions [1], automatic spelling correction [2] and the usage of art in virtual reality [3].

2.1. Review Papers

Lieskovská et al. [1] presented a review of the recent development in speech emotion recognition and also examines the impact of various attention mechanisms on speech emotion recognition performance. Overall comparison of the systems was performed on a widely used IEMOCAP [4] benchmark database.

Hládek et al. [2] created a survey of automatic spelling correction algorithms. It follows from the previous work by Kukich [5] from 1992. It covers almost 20 years of research conducted since this paper. The article proposes a theoretical framework, overview of the approaches, benchmarks, and evaluation methods. It gives great insight for researchers after the last comprehensive survey because it is the first comprehensive survey paper about this topic after a long period.

Aldridge and Bethel [3] conducted an assessment of how art is being used in virtual reality (VR), and the feasibility of brain injury patients to participate in virtual art therapy was investigated. Studies included in this review highlight the importance of the artistic subject matter, sensory stimulation, and measurable performance outcomes for assessing the effect art therapy has on motor impairment in VR.

2.2. Research Papers

Stark et al. [6] described design and implementation of a new method for control and monitoring of mechatronic systems connected to the IoT network using a selected segment of extended reality to create an innovative form of the human–machine interaction. In the proposed solution, modern detection and recognition methods of 3D objects in augmented reality are used instead of conventional methods of control and monitoring of mechatronic IoT systems based on scanning QR codes.

Machová et al. [7] focus on increasing the effectiveness of the lexicon-based sentiment analysis. Within the research, two lexicons were built: the first was a big, domain-dependent lexicon created by translating and merging several existing dictionaries, and the second was a small, domain-dependent lexicon since it contained only words with the same meaning in different domains. These lexicons were labeled by assigning a degree of polarity to each word in the lexicon using Particle Swarm Optimization methods. The article also contains the results of experiments with the distribution of polarity values for different labeling techniques. The created lexicons were used in a new approach to sentiment analysis and evaluated. Sometimes, when the lexicon does not contain the words used in an analyzed text, the lexicon-based sentiment analysis itself fails. For such cases, it was supplemented with a machine learning model for sentiment analysis. This hybrid approach achieved very good results.

Szabóová et al. [8] paper is from the field of the analysis of emotions from a text that was obtained from a dialogue between a human and a robot and thus combines the field of sentiment analysis with HRI (human–robot interaction). Information about the emotional state of the person the robot is interacting with can help the robot choose the most appropriate response. Both a lexicon-based approach and machine learning were used for the emotion recognition (Naïve Bayes (multi-nomial, Bernoulli, and Gaussian), Support Vector Machine, and feed-forward neural network using various data representations, such as Bag-of-Words, TF-IDF, and sentence embeddings (ConceptNet Numberbatch)). The result of the experiments was an ensemble classifier consisting of the nine best models for each emotion. The model was demonstrated in four different scenarios with the humanoid robot NAO. Results concluded, that the best scenario for human acceptance is the one with emotional classification accompanied by emotional movements of the robot.

Shao et al. [9] presented the classification of dual-arm robot operator’s mental workload by using the heart rate variability (HRV) signal. Average classification accuracy of 98.77% was obtained using the K-Nearest Neighbor (KNN) method.

Ondáš et al. [10] introduced a novel pediatric audiometry application for hearing detection in the home environment. Conditioned play audiometry principles were adopted to create a speech audiometry application, where children help the virtual robot Thomas assign words to pictures. Several game scenarios together with the setting condition issues were created, tested, and discussed.

Agarwal et al. [11] focuses on designing a grammar detection system that understands both structural and contextual information of sentences for validating whether the English sentences are grammatically correct. The paper proposes a new Lex-Pos sequencing approach that contains both information, linguistic, as well as syntactic, of a sentence. Long Short-Term Memory (LSTM) neural network architecture has been employed to build the grammar classifier. The study conducts nine experiments to validate the strength of the Lex-Pos sequences. The results showed that the Lex-Pos-based models are observed as superior in giving more accurate predictions and they are more stable.

Trnka et al. [12] depicted a system for predicting the values of Activation and Valence (AV) directly from the sound of emotional speech utterances without the use of its semantic content or any other additional information. The system uses X-vectors to represent sound characteristics of the utterance and a Support Vector Regressor for the estimation of the AV values. The aim of the work was to test whether in each unseen database the predicted values of Valence and Activation will place emotion-tagged utterances in the AV space in accordance with expectations based on Russell’s circumplex model of affective space.

Gondi and Pratap [13] from Facebook AI Research presented an innovative performance evaluation of offline speech recognition on Raspberry Pi CPU compared to Jetson Nano GPU. It was shown that after PyTorch mobile optimization and quantization, the models can achieve real-time inference on the Raspberry Pi CPU with a small degradation to word error rate. On the other hand, the Jetson Nano GPU has inference latency three to five times better, compared to Raspberry Pi.

Seo and Kim [14] presented a self-attentive multi-layer aggregation with feature recalibration and deep length normalization for a text-independent speaker verification system. The ResNet with the scaled channel width and layer depth was used to reduce the number of model parameters as a baseline. A self-attention mechanism was applied to perform multi-layer aggregation with dropout regularizations and batch normalizations. Further, deep-length normalization was used on a recalibrated feature in the training process. Experimental results using the VoxCeleb1 [15] evaluation dataset showed that the performance of the proposed methods was comparable to that of state-of-the-art models.

Bačíková et al. [16] used the term domain usability (DU) to describe the aspects of the user interface related to the terminology and domain. A new method called ADUE (Automatic Domain Usability Evaluation) for the automated evaluation of selected DU properties on existing user interfaces was introduced. The authors executed ADUE on several real-world Java applications and report their findings.

Lin et al. [17] developed posting recommendation systems (RSs) to support users in composing reasonable posts and receiving effective answers. The posting RSs were evaluated by a user study containing 27 participants and three tasks to examine if users engaged more in the question-generation process. The results show that the proposed mechanism enables the production of question posts with better understanding, which leads experts to devote more attention to answering their questions.

Jinsakul et al. [18] presented an innovative approach to improve Thailand’s government’s systems to include handicraft products with a 3D display option for smartphones. The 1775 participants’ evaluation results in this study proved that the proposed 3D handicraft product application affected users by attracting their attention towards them.

Author Contributions

Conceptualization, M.P.; methodology, M.P. and Y.-F.L.; writing—original draft preparation, M.P.; writing—review and editing, M.P. and P.B.; supervision, Y.-F.L.; funding acquisition, M.P. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by the Slovak Research and Development Agency (Agentúra na podporu výskumu a vývoja) under projects APVV-SK-TW-21-0002 and APVV-SK-TW-17-0005; the Scientific Grant Agency (Vedecká grantová agentúra MŠVVaŠ SR a SAV), project numbers VEGA 1/0753/20 & VEGA 2/0165/21; and the Cultural and Educational Grant Agency (Kultúrna a edukačná grantová agentúra MŠVVaŠ SR), project number KEGA 009TUKE-4-2019, both funded by the Ministry of Education, Science, Research, and Sport of the Slovak Republic.

Acknowledgments

We would like to thank all the authors for the papers they submitted to this Special Issue. We would also like to acknowledge all the reviewers for their careful and timely reviews to help improve the quality of this Special Issue.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lieskovská, E.; Jakubec, M.; Jarina, R.; Chmulík, M. A Review on Speech Emotion Recognition Using Deep Learning and Attention Mechanism. Electronics 2021, 10, 1163. [Google Scholar] [CrossRef]
Hládek, D.; Staš, J.; Pleva, M. Survey of Automatic Spelling Correction. Electronics 2020, 9, 1670. [Google Scholar] [CrossRef]
Aldridge, A.; Bethel, C.L. A Systematic Review of the Use of Art in Virtual Reality. Electronics 2021, 10, 2314. [Google Scholar] [CrossRef]
Busso, C.; Bulut, M.; Lee, C.C.; Kazemzadeh, A.; Mower, E.; Kim, S.; Chang, J.N.; Lee, S.; Narayanan, S.S. IEMOCAP: Interactive emotional dyadic motion capture database. Lang. Resour. Eval. 2008, 42, 335–359. [Google Scholar] [CrossRef]
Kukich, K. Techniques for automatically correcting words in text. Acm Comput. Surv. (CSUR) 1992, 24, 377–439. [Google Scholar] [CrossRef]
Stark, E.; Kučera, E.; Haffner, O.; Drahoš, P.; Leskovský, R. Using Augmented Reality and Internet of Things for Control and Monitoring of Mechatronic Devices. Electronics 2020, 9, 1272. [Google Scholar] [CrossRef]
Machová, K.; Mikula, M.; Gao, X.; Mach, M. Lexicon-based Sentiment Analysis Using the Particle Swarm Optimization. Electronics 2020, 9, 1317. [Google Scholar] [CrossRef]
Szabóová, M.; Sarnovský, M.; Maslej Krešňáková, V.; Machová, K. Emotion Analysis in Human–Robot Interaction. Electronics 2020, 9, 1761. [Google Scholar] [CrossRef]
Shao, S.; Wang, T.; Wang, Y.; Su, Y.; Song, C.; Yao, C. Research of HRV as a Measure of Mental Workload in Human and Dual-Arm Robot Interaction. Electronics 2020, 9, 2174. [Google Scholar] [CrossRef]
Ondáš, S.; Kiktová, E.; Pleva, M.; Oravcová, M.; Hudák, L.; Juhár, J.; Zimmermann, J. Pediatric Speech Audiometry Web Application for Hearing Detection in the Home Environment. Electronics 2020, 9, 994. [Google Scholar] [CrossRef]
Agarwal, N.; Wani, M.A.; Bours, P. Lex-Pos Feature-Based Grammar Error Detection System for the English Language. Electronics 2020, 9, 1686. [Google Scholar] [CrossRef]
Trnka, M.; Darjaa, S.; Ritomský, M.; Sabo, R.; Rusko, M.; Schaper, M.; Stelkens-Kobsch, T.H. Mapping Discrete Emotions in the Dimensional Space: An Acoustic Approach. Electronics 2021, 10, 2950. [Google Scholar] [CrossRef]
Gondi, S.; Pratap, V. Performance Evaluation of Offline Speech Recognition on Edge Devices. Electronics 2021, 10, 2697. [Google Scholar] [CrossRef]
Seo, S.; Kim, J.H. Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Deep Length Normalization for Text-Independent Speaker Verification System. Electronics 2020, 9, 1706. [Google Scholar] [CrossRef]
Nagrani, A.; Chung, J.S.; Zisserman, A. Voxceleb: A large-scale speaker identification dataset. arXiv 2017, arXiv:1706.08612. [Google Scholar]
Bačíková, M.; Porubän, J.; Sulír, M.; Chodarev, S.; Steingartner, W.; Madeja, M. Domain Usability Evaluation. Electronics 2021, 10, 1963. [Google Scholar] [CrossRef]
Lin, Y.L.; Chien, S.Y.; Chen, Y.J. Posting Recommendations in Healthcare Q&A Forums. Electronics 2021, 10, 278. [Google Scholar] [CrossRef]
Jinsakul, N.; Tsai, C.F.; Wang, P. Sentiment Level Evaluation of 3D Handicraft Products Application for Smartphones Usage. Electronics 2021, 10, 199. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pleva, M.; Liao, Y.-F.; Bours, P. Human–Computer Interaction for Intelligent Systems. Electronics 2023, 12, 161. https://doi.org/10.3390/electronics12010161

AMA Style

Pleva M, Liao Y-F, Bours P. Human–Computer Interaction for Intelligent Systems. Electronics. 2023; 12(1):161. https://doi.org/10.3390/electronics12010161

Chicago/Turabian Style

Pleva, Matúš, Yuan-Fu Liao, and Patrick Bours. 2023. "Human–Computer Interaction for Intelligent Systems" Electronics 12, no. 1: 161. https://doi.org/10.3390/electronics12010161

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Human–Computer Interaction for Intelligent Systems

1. Introduction

2. Short Presentation of the Papers

2.1. Review Papers

2.2. Research Papers

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI