Processing math: 100%
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (26)

Search Parameters:
Keywords = vocal interface

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 1064 KiB  
Review
Vocal Communication Between Cobots and Humans to Enhance Productivity and Safety: Review and Discussion
by Yuval Cohen, Maurizio Faccio and Shai Rozenes
Appl. Sci. 2025, 15(2), 726; https://doi.org/10.3390/app15020726 - 13 Jan 2025
Cited by 1 | Viewed by 1109
Abstract
This paper explores strategies for fostering efficient vocal communication and collaboration between human workers and collaborative robots (cobots) in assembly processes. Vocal communication enables the division of attention of the worker, as it frees their visual attention and the worker’s hands, dedicated to [...] Read more.
This paper explores strategies for fostering efficient vocal communication and collaboration between human workers and collaborative robots (cobots) in assembly processes. Vocal communication enables the division of attention of the worker, as it frees their visual attention and the worker’s hands, dedicated to the task at hand. Speech generation and speech recognition are pre-requisites for effective vocal communication. This study focuses on cobot assistive tasks, where the human is in charge of the work and performs the main tasks while the cobot assists the worker in various peripheral jobs, such as bringing tools, parts, or materials, and returning them or disposing of them, or screwing or packaging the products. A nuanced understanding is necessary for optimizing human–robot interactions and enhancing overall productivity and safety. Through a comprehensive review of the relevant literature and an illustrative example with worked scenarios, this manuscript identifies key factors influencing successful vocal communication and proposes practical strategies for implementation. Full article
(This article belongs to the Special Issue Artificial Intelligence Applications in Industry)
Show Figures

Figure 1

10 pages, 722 KiB  
Article
Gaze Orienting in the Social World: An Exploration of the Role Played by Caregiving Vocal and Tactile Behaviors in Infants with Visual Impairment and in Sighted Controls
by Serena Grumi, Elena Capelli, Federica Morelli, Luisa Vercellino, Eleonora Mascherpa, Chiara Ghiberti, Laura Carraro, Sabrina Signorini and Livio Provenzi
Brain Sci. 2024, 14(5), 474; https://doi.org/10.3390/brainsci14050474 - 8 May 2024
Cited by 2 | Viewed by 1511
Abstract
Infant attention is a cognitive function that underlines sensory–motor integration processes at the interface between the baby and the surrounding physical and socio-relational environment, mainly with the caregivers. The investigation of the role of non-visual inputs (i.e., vocal and tactile) provided by the [...] Read more.
Infant attention is a cognitive function that underlines sensory–motor integration processes at the interface between the baby and the surrounding physical and socio-relational environment, mainly with the caregivers. The investigation of the role of non-visual inputs (i.e., vocal and tactile) provided by the caregivers in shaping infants’ attention in the context of visual impairment is relevant from both a theoretical and clinical point of view. This study investigated the social attention (i.e., gaze orientation) skills in a group of visually impaired (VI) and age-matched sighted controls (SCs) between 9 and 12 months of age. Moreover, the role of VI severity and maternal vocalizations and touch in shaping the social attention were investigated. Overall, 45 infants and their mothers participated in a video-recorded 4 min interaction procedure, including a play and a still-face episode. The infants’ gaze orientation (i.e., mother-directed, object-directed, or unfocused) and the types of maternal vocalizations and touch (i.e., socio-cognitive, affective) were micro-analytically coded. Maternal vocalizations and touch were found to influence gaze orientation differently in VI infants compared SCs. Moreover, the group comparisons during the play episode showed that controls were predominantly oriented to the mothers, while VI infants were less socially oriented. Visual impairment severity did not emerge as linked with social attention. These findings contribute to our understanding of socio-cognitive developmental trajectories in VI infants and highlight the need for tailored interventions to promote optimal outcomes for VI populations. Full article
Show Figures

Figure 1

16 pages, 1449 KiB  
Review
Central Autonomic Mechanisms Involved in the Control of Laryngeal Activity and Vocalization
by Marta González-García, Laura Carrillo-Franco, Carmen Morales-Luque, Marc Stefan Dawid-Milner and Manuel Víctor López-González
Biology 2024, 13(2), 118; https://doi.org/10.3390/biology13020118 - 13 Feb 2024
Cited by 2 | Viewed by 3663
Abstract
In humans, speech is a complex process that requires the coordinated involvement of various components of the phonatory system, which are monitored by the central nervous system. The larynx in particular plays a crucial role, as it enables the vocal folds to meet [...] Read more.
In humans, speech is a complex process that requires the coordinated involvement of various components of the phonatory system, which are monitored by the central nervous system. The larynx in particular plays a crucial role, as it enables the vocal folds to meet and converts the exhaled air from our lungs into audible sounds. Voice production requires precise and sustained exhalation, which generates an air pressure/flow that creates the pressure in the glottis required for voice production. Voluntary vocal production begins in the laryngeal motor cortex (LMC), a structure found in all mammals, although the specific location in the cortex varies in humans. The LMC interfaces with various structures of the central autonomic network associated with cardiorespiratory regulation to allow the perfect coordination between breathing and vocalization. The main subcortical structure involved in this relationship is the mesencephalic periaqueductal grey matter (PAG). The PAG is the perfect link to the autonomic pontomedullary structures such as the parabrachial complex (PBc), the Kölliker–Fuse nucleus (KF), the nucleus tractus solitarius (NTS), and the nucleus retroambiguus (nRA), which modulate cardiovascular autonomic function activity in the vasomotor centers and respiratory activity at the level of the generators of the laryngeal-respiratory motor patterns that are essential for vocalization. These cores of autonomic structures are not only involved in the generation and modulation of cardiorespiratory responses to various stressors but also help to shape the cardiorespiratory motor patterns that are important for vocal production. Clinical studies show increased activity in the central circuits responsible for vocalization in certain speech disorders, such as spasmodic dysphonia because of laryngeal dystonia. Full article
(This article belongs to the Special Issue Cardiovascular Autonomic Function: From Bench to Bedside)
Show Figures

Figure 1

13 pages, 833 KiB  
Article
Development of an Industrial Safety System Based on Voice Assistant
by Jaime Paúl Ayala Taco, Oswaldo Alexander Ibarra Jácome, Jaime Luciano Ayala Pico and Brian Andrés López Castro
Appl. Sci. 2023, 13(21), 11624; https://doi.org/10.3390/app132111624 - 24 Oct 2023
Cited by 1 | Viewed by 1599
Abstract
Currently, there are limitations in the human–machine interfaces (HMIs) used in industry, either due to the characteristics of users’ cognitive abilities or interfaces, which hinder communication and interaction between humans and equipment. For this reason, this work presents an alternative interaction model based [...] Read more.
Currently, there are limitations in the human–machine interfaces (HMIs) used in industry, either due to the characteristics of users’ cognitive abilities or interfaces, which hinder communication and interaction between humans and equipment. For this reason, this work presents an alternative interaction model based on a voice assistant, Alexa, which promotes more natural, intuitive, direct, and understandable communication. The purpose of this work is the development of an industrial safety system for a controlled electric motor based on Alexa voice assistant, which allows the monitoring of its operating parameters, such as phase current, housing temperature, and rotor vibration, as well as making it possible to control ignition and shut down and change the rotation of the motor with a prior password, as a safety measure. Commercial smart devices and Arduino-compatible modules were used to achieve this, providing them with the Internet of Things (IoT) feature. In addition, several software platforms, such as Blynk, Tuya Smart, Node Red, and Voiceflow, are used to perform data transmission, device management, and programming of the Alexa skill, oriented to the execution of the security and run system. This shows the potential capacity of voice assistants in the industry to deliver information more naturally to humans and obtain optimal notifications. However, problems were evidenced, such as the influence of noise in the environment when communicating with the assistant, the vocalization of words, low voice tones, and accents typical of the language, that will increase the security level of the system and prevent potential identity theft. Full article
Show Figures

Figure 1

21 pages, 19613 KiB  
Article
Efficient Self-Attention Model for Speech Recognition-Based Assistive Robots Control
by Samuel Poirier, Ulysse Côté-Allard, François Routhier and Alexandre Campeau-Lecours
Sensors 2023, 23(13), 6056; https://doi.org/10.3390/s23136056 - 30 Jun 2023
Cited by 1 | Viewed by 1929
Abstract
Assistive robots are tools that people living with upper body disabilities can leverage to autonomously perform Activities of Daily Living (ADL). Unfortunately, conventional control methods still rely on low-dimensional, easy-to-implement interfaces such as joysticks that tend to be unintuitive and cumbersome to use. [...] Read more.
Assistive robots are tools that people living with upper body disabilities can leverage to autonomously perform Activities of Daily Living (ADL). Unfortunately, conventional control methods still rely on low-dimensional, easy-to-implement interfaces such as joysticks that tend to be unintuitive and cumbersome to use. In contrast, vocal commands may represent a viable and intuitive alternative. This work represents an important step toward providing a viable vocal interface for people living with upper limb disabilities by proposing a novel lightweight vocal command recognition system. The proposed model leverages the MobileNet2 architecture, augmenting it with a novel approach to the self-attention mechanism, achieving a new state-of-the-art performance for Keyword Spotting (KWS) on the Google Speech Commands Dataset (GSCD). Moreover, this work presents a new dataset, referred to as the French Speech Commands Dataset (FSCD), comprising 4963 vocal command utterances. Using the GSCD as the source, we used Transfer Learning (TL) to adapt the model to this cross-language task. TL has been shown to significantly improve the model performance on the FSCD. The viability of the proposed approach is further demonstrated through real-life control of a robotic arm by four healthy participants using both the proposed vocal interface and a joystick. Full article
(This article belongs to the Special Issue Integration of Advanced Sensors in Assistive Robotic Technology)
Show Figures

Figure 1

16 pages, 3087 KiB  
Article
Design and Implementation of a Framework for Smart Home Automation Based on Cellular IoT, MQTT, and Serverless Functions
by Marco Esposito, Alberto Belli, Lorenzo Palma and Paola Pierleoni
Sensors 2023, 23(9), 4459; https://doi.org/10.3390/s23094459 - 3 May 2023
Cited by 19 | Viewed by 7745
Abstract
Smart objects and home automation tools are becoming increasingly popular, and the number of smart devices that each dedicated application has to manage is increasing accordingly. The emergence of technologies such as serverless computing and dedicated machine-to-machine communication protocols represents a valuable opportunity [...] Read more.
Smart objects and home automation tools are becoming increasingly popular, and the number of smart devices that each dedicated application has to manage is increasing accordingly. The emergence of technologies such as serverless computing and dedicated machine-to-machine communication protocols represents a valuable opportunity to facilitate management of smart objects and replicability of new solutions. The aim of this paper is to propose a framework for home automation applications that can be applied to control and monitor any appliance or object in a smart home environment. The proposed framework makes use of a dedicated messages-exchange protocol based on MQTT and cloud-deployed serverless functions. Furthermore, a vocal command interface is implemented to let users control the smart object with vocal interactions, greatly increasing the accessibility and intuitiveness of the proposed solution. A smart object, namely a smart kitchen fan extractor system, was developed, prototyped, and tested to illustrate the viability of the proposed solution. The smart object is equipped with a narrowband IoT (NB-IoT) module to send and receive commands to and from the cloud. In order to evaluate the performance of the proposed solution, the suitability of NB-IoT for the transmission of MQTT messages was evaluated. The results show how NB-IoT has an acceptable latency performance despite some minimal packet loss. Full article
(This article belongs to the Special Issue Internet of Things for Smart Homes Ⅲ)
Show Figures

Figure 1

12 pages, 4941 KiB  
Article
Droplets Patterning of Structurally Integrated 3D Conductive Networks-Based Flexible Strain Sensors for Healthcare Monitoring
by Yang Zhang, Danjiao Zhao, Lei Cao, Lanlan Fan, Aiping Lin, Shufen Wang, Feng Gu and Aibing Yu
Nanomaterials 2023, 13(1), 181; https://doi.org/10.3390/nano13010181 - 30 Dec 2022
Cited by 5 | Viewed by 2241
Abstract
Flexible strain sensors with significant extensibility, stability, and durability are essential for public healthcare due to their ability to monitor vital health signals noninvasively. However, thus far, the conductive networks have been plagued by the inconsistent interface states of the conductive components, which [...] Read more.
Flexible strain sensors with significant extensibility, stability, and durability are essential for public healthcare due to their ability to monitor vital health signals noninvasively. However, thus far, the conductive networks have been plagued by the inconsistent interface states of the conductive components, which hampered the ultimate sensitivity performance. Here, we demonstrate structurally integrated 3D conductive networks-based flexible strain sensors of hybrid Ag nanorods/nanoparticles(AgNRs/NPs) by combining a droplet-based aerosol jet printing(AJP) process and a feasible transfer process. Structurally integrated 3D conductive networks have been intentionally developed by tweaking droplets deposition behaviors at multi-scale for efficient hybridization and ordered assembly of AgNRs/NPs. The hybrid AgNRs/NPs enhance interfacial conduction and mechanical properties during stretching. In a strain range of 25%, the developed sensor demonstrates an ideal gauge factor of 23.18. When real-time monitoring of finger bending, arm bending, squatting, and vocalization, the fabricated sensors revealed effective responses to human movements. Our findings demonstrate the efficient droplet-based AJP process is particularly capable of developing advanced flexible devices for optoelectronics and wearable electronics applications. Full article
Show Figures

Figure 1

19 pages, 2162 KiB  
Article
Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck
by Jennifer M. Vojtech, Claire L. Mitchell, Laura Raiff, Joshua C. Kline and Gianluca De Luca
Vibration 2022, 5(4), 692-710; https://doi.org/10.3390/vibration5040041 - 13 Oct 2022
Cited by 4 | Viewed by 3558
Abstract
Silent speech interfaces (SSIs) enable speech recognition and synthesis in the absence of an acoustic signal. Yet, the archetypal SSI fails to convey the expressive attributes of prosody such as pitch and loudness, leading to lexical ambiguities. The aim of this study was [...] Read more.
Silent speech interfaces (SSIs) enable speech recognition and synthesis in the absence of an acoustic signal. Yet, the archetypal SSI fails to convey the expressive attributes of prosody such as pitch and loudness, leading to lexical ambiguities. The aim of this study was to determine the efficacy of using surface electromyography (sEMG) as an approach for predicting continuous acoustic estimates of prosody. Ten participants performed a series of vocal tasks including sustained vowels, phrases, and monologues while acoustic data was recorded simultaneously with sEMG activity from muscles of the face and neck. A battery of time-, frequency-, and cepstral-domain features extracted from the sEMG signals were used to train deep regression neural networks to predict fundamental frequency and intensity contours from the acoustic signals. We achieved an average accuracy of 0.01 ST and precision of 0.56 ST for the estimation of fundamental frequency, and an average accuracy of 0.21 dB SPL and precision of 3.25 dB SPL for the estimation of intensity. This work highlights the importance of using sEMG as an alternative means of detecting prosody and shows promise for improving SSIs in future development. Full article
(This article belongs to the Special Issue Feature Papers in Vibration)
Show Figures

Figure 1

15 pages, 431 KiB  
Review
The Role of Artificial Intelligence in Decoding Speech from EEG Signals: A Scoping Review
by Uzair Shah, Mahmood Alzubaidi, Farida Mohsen, Alaa Abd-Alrazaq, Tanvir Alam and Mowafa Househ
Sensors 2022, 22(18), 6975; https://doi.org/10.3390/s22186975 - 15 Sep 2022
Cited by 15 | Viewed by 6598
Abstract
Background: Brain traumas, mental disorders, and vocal abuse can result in permanent or temporary speech impairment, significantly impairing one’s quality of life and occasionally resulting in social isolation. Brain–computer interfaces (BCI) can support people who have issues with their speech or who have [...] Read more.
Background: Brain traumas, mental disorders, and vocal abuse can result in permanent or temporary speech impairment, significantly impairing one’s quality of life and occasionally resulting in social isolation. Brain–computer interfaces (BCI) can support people who have issues with their speech or who have been paralyzed to communicate with their surroundings via brain signals. Therefore, EEG signal-based BCI has received significant attention in the last two decades for multiple reasons: (i) clinical research has capitulated detailed knowledge of EEG signals, (ii) inexpensive EEG devices, and (iii) its application in medical and social fields. Objective: This study explores the existing literature and summarizes EEG data acquisition, feature extraction, and artificial intelligence (AI) techniques for decoding speech from brain signals. Method: We followed the PRISMA-ScR guidelines to conduct this scoping review. We searched six electronic databases: PubMed, IEEE Xplore, the ACM Digital Library, Scopus, arXiv, and Google Scholar. We carefully selected search terms based on target intervention (i.e., imagined speech and AI) and target data (EEG signals), and some of the search terms were derived from previous reviews. The study selection process was carried out in three phases: study identification, study selection, and data extraction. Two reviewers independently carried out study selection and data extraction. A narrative approach was adopted to synthesize the extracted data. Results: A total of 263 studies were evaluated; however, 34 met the eligibility criteria for inclusion in this review. We found 64-electrode EEG signal devices to be the most widely used in the included studies. The most common signal normalization and feature extractions in the included studies were the bandpass filter and wavelet-based feature extraction. We categorized the studies based on AI techniques, such as machine learning and deep learning. The most prominent ML algorithm was a support vector machine, and the DL algorithm was a convolutional neural network. Conclusions: EEG signal-based BCI is a viable technology that can enable people with severe or temporal voice impairment to communicate to the world directly from their brain. However, the development of BCI technology is still in its infancy. Full article
(This article belongs to the Special Issue Brain Activity Monitoring and Measurement)
Show Figures

Figure 1

15 pages, 4970 KiB  
Article
Recognition of Uni-Stroke Characters with Hand Movements in 3D Space Using Convolutional Neural Networks
by Won-Du Chang, Akitaka Matsuoka, Kyeong-Taek Kim and Jungpil Shin
Sensors 2022, 22(16), 6113; https://doi.org/10.3390/s22166113 - 16 Aug 2022
Cited by 3 | Viewed by 2177
Abstract
Hand gestures are a common means of communication in daily life, and many attempts have been made to recognize them automatically. Developing systems and algorithms to recognize hand gestures is expected to enhance the experience of human–computer interfaces, especially when there are difficulties [...] Read more.
Hand gestures are a common means of communication in daily life, and many attempts have been made to recognize them automatically. Developing systems and algorithms to recognize hand gestures is expected to enhance the experience of human–computer interfaces, especially when there are difficulties in communicating vocally. A popular system for recognizing hand gestures is the air-writing method, where people write letters in the air by hand. The arm movements are tracked with a smartwatch/band with embedded acceleration and gyro sensors; a computer system then recognizes the written letters. One of the greatest difficulties in developing algorithms for air writing is the diversity of human hand/arm movements, which makes it difficult to build signal templates for air-written characters or network models. This paper proposes a method for recognizing air-written characters using an artificial neural network. We utilized uni-stroke-designed characters and presented a network model with inception modules and an ensemble structure. The proposed method was successfully evaluated using the data of air-written characters (Arabic numbers and English alphabets) from 18 people with 91.06% accuracy, which reduced the error rate of recent studies by approximately half. Full article
(This article belongs to the Special Issue Vision and Sensor-Based Sensing in Human Action Recognition)
Show Figures

Figure 1

22 pages, 10134 KiB  
Article
Marvin: An Innovative Omni-Directional Robotic Assistant for Domestic Environments
by Andrea Eirale, Mauro Martini, Luigi Tagliavini, Dario Gandini, Marcello Chiaberge and Giuseppe Quaglia
Sensors 2022, 22(14), 5261; https://doi.org/10.3390/s22145261 - 14 Jul 2022
Cited by 25 | Viewed by 4464
Abstract
Population aging and pandemics have been shown to cause the isolation of elderly people in their houses, generating the need for a reliable assistive figure. Robotic assistants are the new frontier of innovation for domestic welfare, and elderly monitoring is one of the [...] Read more.
Population aging and pandemics have been shown to cause the isolation of elderly people in their houses, generating the need for a reliable assistive figure. Robotic assistants are the new frontier of innovation for domestic welfare, and elderly monitoring is one of the services a robot can handle for collective well-being. Despite these emerging needs, in the actual landscape of robotic assistants, there are no platforms that successfully combine reliable mobility in cluttered domestic spaces with lightweight and offline Artificial Intelligence (AI) solutions for perception and interaction. In this work, we present Marvin, a novel assistive robotic platform we developed with a modular layer-based architecture, merging a flexible mechanical design with cutting-edge AI for perception and vocal control. We focus the design of Marvin on three target service functions: monitoring of elderly and reduced-mobility subjects, remote presence and connectivity, and night assistance. Compared to previous works, we propose a tiny omnidirectional platform, which enables agile mobility and effective obstacle avoidance. Moreover, we design a controllable positioning device, which easily allows the user to access the interface for connectivity and extends the visual range of the camera sensor. Nonetheless, we delicately consider the privacy issues arising from private data collection on cloud services, a critical aspect of commercial AI-based assistants. To this end, we demonstrate how lightweight deep learning solutions for visual perception and vocal command can be adopted, completely running offline on the embedded hardware of the robot. Full article
Show Figures

Figure 1

23 pages, 18059 KiB  
Article
Male African Elephant (Loxodonta africana) Behavioral Responses to Estrous Call Playbacks May Inform Conservation Management Tools
by Caitlin E. O’Connell-Rodwell, Monica N. Sandri, Jodie L. Berezin, Jaquelyn M. Munevar, Colleen Kinzley, Jason D. Wood, Maggie Wiśniewska and J. Werner Kilian
Animals 2022, 12(9), 1162; https://doi.org/10.3390/ani12091162 - 1 May 2022
Cited by 4 | Viewed by 4283
Abstract
Driven by reproductive motives, male African elephants (Loxodonta africana) in musth often expand their home ranges to locate estrous females. This extended range, coupled with heightened aggression often observed in musth males, can be particularly problematic in regions where human-modified landscapes and [...] Read more.
Driven by reproductive motives, male African elephants (Loxodonta africana) in musth often expand their home ranges to locate estrous females. This extended range, coupled with heightened aggression often observed in musth males, can be particularly problematic in regions where human-modified landscapes and elephant territories increasingly overlap. Several mitigation tools have been tested to resolve a wide range of human–elephant conflicts with varying degrees of success due to geographical disparities and habituation. We present findings on the potential application of estrous call playbacks in manipulating the behavior and movement of male elephants non-invasively, particularly mature musth adults and younger post-dispersal males, in Etosha National Park. Estrous vocalizations were presented across 26 experimental trials to mature musth adults (n = 5), mature non-musth adults (n = 6), and non-musth males belonging to younger, post-dispersal age classes (n = 8), with behavioral responses scored on a gradient scale from 0–1. Both mature musth adults and younger non-musth elephants were significantly more likely to respond with the highest intensity by approaching the acoustic source compared to mature non-musth adults that avoided the call. However, younger males tested in the presence of an older, higher-ranking male tended to react with a lower intensity than those tested alone. This result likely demonstrates the influence of social hierarchy and associations on male elephant behavior. We also observed a significant increase in physiological response, measured by defecation rate, across all male groups in response to the estrous call playbacks. Our findings suggest that using estrous calls as acoustic deterrents may effectively and non-invasively aid in reducing tension at the human–elephant interface, depending on the age, social context, and reproductive status of the male elephant. Full article
(This article belongs to the Special Issue Elephant Communication)
Show Figures

Figure 1

37 pages, 21822 KiB  
Article
Comparing the Effectiveness of Speech and Physiological Features in Explaining Emotional Responses during Voice User Interface Interactions
by Danya Swoboda, Jared Boasen, Pierre-Majorique Léger, Romain Pourchon and Sylvain Sénécal
Appl. Sci. 2022, 12(3), 1269; https://doi.org/10.3390/app12031269 - 25 Jan 2022
Cited by 9 | Viewed by 4522
Abstract
The rapid rise of voice user interface technology has changed the way users traditionally interact with interfaces, as tasks requiring gestural or visual attention are swapped by vocal commands. This shift has equally affected designers, required to disregard common digital interface guidelines in [...] Read more.
The rapid rise of voice user interface technology has changed the way users traditionally interact with interfaces, as tasks requiring gestural or visual attention are swapped by vocal commands. This shift has equally affected designers, required to disregard common digital interface guidelines in order to adapt to non-visual user interaction (No-UI) methods. The guidelines regarding voice user interface evaluation are far from the maturity of those surrounding digital interface evaluation, resulting in a lack of consensus and clarity. Thus, we sought to contribute to the emerging literature regarding voice user interface evaluation and, consequently, assist user experience professionals in their quest to create optimal vocal experiences. To do so, we compared the effectiveness of physiological features (e.g., phasic electrodermal activity amplitude) and speech features (e.g., spectral slope amplitude) to predict the intensity of users’ emotional responses during voice user interface interactions. We performed a within-subjects experiment in which the speech, facial expression, and electrodermal activity responses of 16 participants were recorded during voice user interface interactions that were purposely designed to elicit frustration and shock, resulting in 188 analyzed interactions. Our results suggest that the physiological measure of facial expression and its extracted feature, automatic facial expression-based valence, is most informative of emotional events lived through voice user interface interactions. By comparing the unique effectiveness of each feature, theoretical and practical contributions may be noted, as the results contribute to voice user interface literature while providing key insights favoring efficient voice user interface evaluation. Full article
(This article belongs to the Special Issue User Experience for Advanced Human-Computer Interaction II)
Show Figures

Figure 1

11 pages, 15140 KiB  
Communication
Neural Network-Enabled Flexible Pressure and Temperature Sensor with Honeycomb-like Architecture for Voice Recognition
by Yue Su, Kainan Ma, Xu Zhang and Ming Liu
Sensors 2022, 22(3), 759; https://doi.org/10.3390/s22030759 - 19 Jan 2022
Cited by 15 | Viewed by 3993
Abstract
Flexible pressure sensors have been studied as wearable voice-recognition devices to be utilized in human-machine interaction. However, the development of highly sensitive, skin-attachable, and comfortable sensing devices to achieve clear voice detection remains a considerable challenge. Herein, we present a wearable and flexible [...] Read more.
Flexible pressure sensors have been studied as wearable voice-recognition devices to be utilized in human-machine interaction. However, the development of highly sensitive, skin-attachable, and comfortable sensing devices to achieve clear voice detection remains a considerable challenge. Herein, we present a wearable and flexible pressure and temperature sensor with a sensitive response to vibration, which can accurately recognize the human voice by combing with the artificial neural network. The device consists of a polyethylene terephthalate (PET) printed with a silver electrode, a filament-microstructured polydimethylsiloxane (PDMS) film embedded with single-walled carbon nanotubes and a polyimide (PI) film sputtered with a patterned Ti/Pt thermistor strip. The developed pressure sensor exhibited a pressure sensitivity of 0.398 kPa1 in the low-pressure regime, and the fabricated temperature sensor shows a desirable temperature coefficient of resistance of 0.13% C in the range of 25 C to 105 C. Through training and testing the neural network model with the waveform data of the sensor obtained from human pronunciation, the vocal fold vibrations of different words can be successfully recognized, and the total recognition accuracy rate can reach 93.4%. Our results suggest that the fabricated sensor has substantial potential for application in the human-computer interface fields, such as voice control, vocal healthcare monitoring, and voice authentication. Full article
(This article belongs to the Topic Artificial Intelligence in Sensors)
Show Figures

Figure 1

12 pages, 547 KiB  
Article
A Novel FPGA-Based Intent Recognition System Utilizing Deep Recurrent Neural Networks
by Kyriaki Tsantikidou, Nikolaos Tampouratzis and Ioannis Papaefstathiou
Electronics 2021, 10(20), 2495; https://doi.org/10.3390/electronics10202495 - 13 Oct 2021
Cited by 2 | Viewed by 2560
Abstract
In recent years, systems that monitor and control home environments, based on non-vocal and non-manual interfaces, have been introduced to improve the quality of life of people with mobility difficulties. In this work, we present the reconfigurable implementation and optimization of such a [...] Read more.
In recent years, systems that monitor and control home environments, based on non-vocal and non-manual interfaces, have been introduced to improve the quality of life of people with mobility difficulties. In this work, we present the reconfigurable implementation and optimization of such a novel system that utilizes a recurrent neural network (RNN). As demonstrated in the real-world results, FPGAs have proved to be very efficient when implementing RNNs. In particular, our reconfigurable implementation is more than 150× faster than a high-end Intel Xeon CPU executing the reference inference tasks. Moreover, the proposed system achieves more than 300× the improvements, in terms of energy efficiency, when compared with the server CPU, while, in terms of the reported achieved GFLOPS/W, it outperforms even a server-tailored GPU. An additional important contribution of the work discussed in this study is that the implementation and optimization process demonstrated can also act as a reference to anyone implementing the inference tasks of RNNs in reconfigurable hardware; this is further facilitated by the fact that our C++ code, which is tailored for a high-level-synthesis (HLS) tool, is distributed in open-source, and can easily be incorporated to existing HLS libraries. Full article
Show Figures

Figure 1

Back to TopTop