Advancing Healthcare: Intelligent Speech Technology for Transcription, Disease Diagnosis, and Interactive Control of Medical Equipment in Smart Hospitals

Elhadad, Ahmed; Hamad, Safwat; Elfiky, Noha; Alanazi, Fulayjan; Taloba, Ahmed I.; El-Aziz, Rasha M. Abd

doi:10.3390/ai5040121

Open AccessArticle

Advancing Healthcare: Intelligent Speech Technology for Transcription, Disease Diagnosis, and Interactive Control of Medical Equipment in Smart Hospitals

by

Ahmed Elhadad

^1,*

,

Safwat Hamad

²

,

Noha Elfiky

²

,

Fulayjan Alanazi

¹

,

Ahmed I. Taloba

¹ and

Rasha M. Abd El-Aziz

¹

Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka 72341, Saudi Arabia

²

School of Economics and Business Administration, Saint Mary’s College of California, Moraga, CA 94575, USA

^*

Author to whom correspondence should be addressed.

AI 2024, 5(4), 2497-2517; https://doi.org/10.3390/ai5040121

Submission received: 19 October 2024 / Revised: 14 November 2024 / Accepted: 21 November 2024 / Published: 26 November 2024

Download

Browse Figures

Versions Notes

Abstract

:

Intelligent Speech Technology (IST) is revolutionizing healthcare by enhancing transcription accuracy, disease diagnosis, and medical equipment control in smart hospital environments. This study introduces an innovative approach employing federated learning with Multi-Layer Perceptron (MLP) and Gated Recurrent Unit (GRU) neural networks to improve IST performance. Leveraging the “Medical Speech, Transcription, and Intent” dataset from Kaggle, comprising a variety of speech recordings and corresponding medical symptom labels, noise reduction was applied using a Wiener filter to improve audio quality. Feature extraction through MLP and sequence classification with GRU highlighted the model’s robustness and capacity for detailed medical understanding. The federated learning framework enabled collaborative model training across multiple hospital sites, preserving patient privacy by avoiding raw data exchange. This distributed approach allowed the model to learn from diverse, real-world data while ensuring compliance with strict data protection standards. Through rigorous five-fold cross-validation, the proposed Fed MLP-GRU model demonstrated an accuracy of 98.6%, with consistently high sensitivity and specificity, highlighting its reliable generalization across multiple test conditions. In real-time applications, the model effectively performed medical transcription, provided symptom-based diagnostic insights, and facilitated hands-free control of healthcare equipment, reducing contamination risks and enhancing workflow efficiency. These findings indicate that IST, powered by federated neural networks, can significantly improve healthcare delivery, accuracy in patient diagnosis, and operational efficiency in clinical settings. This research underscores the transformative potential of federated learning and advanced neural networks for addressing pressing challenges in modern healthcare and setting the stage for future innovations in intelligent medical technology.

Keywords:

intelligent speech technology (IST); federated learning; multi-layer perceptron (MLP); gated recurrent unit (GRU); smart hospitals

1. Introduction

In the dynamic landscape of modern healthcare, the integration of intelligent speech technology represents a pivotal advancement poised to revolutionize patient care and hospital management. Powered by artificial intelligence and natural language processing, this innovative technology is not merely an evolution but a profound transformation, reshaping conventional approaches across various healthcare domains [1]. Its emergence marks a departure from static, siloed systems toward interconnected, intelligent ecosystems that prioritize efficiency, accuracy, and above all, patient-centricity. At the heart of this transformative potential lies the capability of intelligent speech technology to facilitate seamless communication between humans and machines through spoken language [2]. Unlike conventional interfaces that require manual input or textual interaction, this technology enables healthcare professionals to engage with digital systems using natural speech, mirroring the fluidity of human conversation. This paradigm shift in user interaction fosters a more intuitive and accessible healthcare environment, where complex tasks can be executed with ease, irrespective of technological proficiency [3]. One of the most profound impacts of intelligent speech technology is evident in its role in transcription. Traditionally, the process of documenting patient encounters and medical records has been labor-intensive, error-prone, and time-consuming. However, with the advent of intelligent speech technology, this cumbersome task is streamlined, allowing healthcare providers to dictate notes and documentation in real time [4]. By automating transcription processes, this technology not only accelerates the pace of information capture but also enhances accuracy and accessibility, facilitating seamless integration with electronic health record systems and promoting data-driven decision-making [5].

Intelligent speech technology extends its transformative influence on the realm of disease diagnosis and treatment planning. Through advanced algorithms and machine learning models, it analyzes spoken inputs from patients and clinicians, extracting valuable insights and patterns that aid in early detection and personalized intervention strategies [6]. By leveraging the wealth of patient data encoded in speech, healthcare providers can uncover hidden correlations, identify predictive markers, and tailor treatment approaches to individual needs, thereby optimizing clinical outcomes and improving patient satisfaction [7]. In the era of smart hospitals, intelligent speech technology assumes a central role in the interactive control of medical equipment and devices. By enabling voice commands for instrument operation, clinicians can orchestrate complex procedures with precision and efficiency, while minimizing the risk of contamination and enhancing ergonomic workflow [8]. As this technology continues to evolve and permeate every facet of the healthcare ecosystem, its transformative potential is boundless, unlocking new opportunities for innovation, collaboration, and ultimately, improved health outcomes for individuals and communities alike [9].

The cornerstone of this technological revolution lies in its ability to streamline the transcription process within healthcare facilities, addressing one of the most labor-intensive and error-prone aspects of medical practice [9]. Traditionally, transcription has required significant manual effort, often resulting in delays and inaccuracies that could compromise patient care. Healthcare professionals, already burdened with heavy workloads, have had to spend valuable time transcribing patient notes, medical records, and clinical documentation by hand, a process susceptible to human error [10]. Intelligent speech technology, however, transforms this landscape by enabling healthcare professionals to dictate their notes and documentation effortlessly and accurately in real time. Through sophisticated algorithms and natural language processing capabilities, this technology captures spoken words and converts them into written text with remarkable precision. This shift from manual to automated transcription not only saves time but also significantly reduces the likelihood of errors, ensuring that patient records are accurate and up-to-date. The benefits of intelligent speech technology extend beyond mere transcription. Integrating seamlessly with Electronic Health Records (EHR) systems, facilitates the immediate update and retrieval of patient information, ensuring that healthcare providers have access to the most current and comprehensive data at all times. This real-time synchronization of data supports better clinical decision-making and enhances the continuity of care, as all relevant information is readily available when needed [11].

The impact on workflow efficiency is profound. Healthcare professionals can now focus more on patient interaction and care, rather than being bogged down by administrative tasks. The streamlined documentation process reduces the administrative burden and allows for quicker turnaround times in patient record-keeping [12]. This increased efficiency translates to improved patient care, as clinicians can spend more time diagnosing, treating, and interacting with patients, rather than managing paperwork. Ensuring the integrity and accessibility of critical patient data through accurate and timely transcription lays a solid foundation for data-driven decision-making [13]. With reliable and comprehensive data at their fingertips, healthcare providers can analyze trends, monitor patient progress, and make informed decisions that enhance the quality of care. This robust data infrastructure also supports advanced analytics and machine learning applications, paving the way for predictive insights and personalized treatment plans that cater to individual patient needs. The ability of intelligent speech technology to streamline the transcription process revolutionizes the way information is captured and disseminated within healthcare facilities [14]. Enhancing workflow efficiency and ensuring the integrity and accessibility of patient data, plays a pivotal role in promoting data-driven decision-making and continuity of care, ultimately contributing to a more effective and patient-centered healthcare system. Beyond transcription, intelligent speech technology serves as a powerful catalyst for enhancing disease diagnosis and treatment planning. By leveraging sophisticated algorithms and machine learning models, this technology analyzes verbal inputs from patients and clinicians, transforming spoken language into actionable data. This capability enables healthcare providers to extract valuable insights and patterns from the nuances of patient speech and clinical dialogue, facilitating more accurate and timely medical interventions.

One of the significant advancements brought about by intelligent speech technology is in voice-based symptom assessment. Patients often describe their symptoms and medical history verbally during consultations. Intelligent speech technology captures these descriptions and processes them through natural language processing algorithms to identify key symptoms, their severity, and their progression over time. This detailed analysis helps in creating a comprehensive symptom profile for each patient, allowing for a more nuanced understanding of their health condition. For instance, a patient describing intermittent chest pain, shortness of breath, and fatigue might have these symptoms recorded and analyzed to flag potential cardiac issues. The technology can compare these symptoms against vast datasets of similar cases, identifying patterns that may indicate a risk of heart disease. By recognizing these patterns early, healthcare providers can initiate diagnostic tests and treatments sooner, improving the chances of successful intervention and better patient outcomes. Moreover, intelligent speech technology enhances the analysis of medical history. Often, critical insights are buried within lengthy patient histories that are difficult to sift through manually. By processing spoken medical histories, this technology can highlight relevant information, such as past diagnoses, treatments, and responses to medications. It can cross-reference current symptoms with historical data to identify trends and correlations that might otherwise go unnoticed. For example, recurring respiratory issues coupled with a history of allergies could prompt a deeper investigation into potential chronic conditions like asthma or COPD. In the realm of treatment planning, intelligent speech technology aids in the creation of personalized intervention strategies. Analyzing verbal interactions can assess the effectiveness of current treatments and suggest adjustments based on patient feedback and clinical observations. Machine learning models can predict how patients might respond to different treatment options, helping clinicians to tailor therapies to individual needs. This personalized approach not only enhances the efficacy of treatments but also reduces the likelihood of adverse reactions [15].

IST has transformed patient care in the contemporary healthcare system, but the promising results from these systems still have significant limitations in their existing frameworks, especially concerning data privacy, processing speed, and adaptability toward different settings of a clinical environment. Most of the IST solutions today lack confidence in accuracy in transcription during real time, especially when it extends to contextual understanding for disease diagnosis and seamless control of medical equipment, which are inherent barriers for wider implementation of IST in smart hospitals. This paper introduces a novel Fed MLP GRU, which is the first federated learning approach combining MLP and GRU that bridges the gap with the intention of enhancing the robustness, privacy, and performance of IST.

A key difference between Fed MLP GRU model and the existing models is its ability to train models in different institutions without raw data transfers and thereby maintain the patient’s privacy with enhancement of applications of speech-based medical applications. This federated approach does not have the centralized system of IST; instead, it diminishes privacy issues and allows collaboration in the training environment without compromising the integrity of data. Moreover, the model puts an emphasis on the high accuracy of adaptability regarding medical speech data analysis with feature extraction by MLP and sequential classification by GRU. Such advancements directly target deficiencies in available IST applications, which usually do not have the nuance to manage complicated patient interactions across various forms of healthcare delivery settings efficiently.

IST indeed has made massive imprints on healthcare such as speeding up transcription and enabling the use of data-driven, precise decision-making in terms of the disease to be diagnosed and the treatment plan. By simplifying medical documentation with voice commands into equipment interaction, IST has transformed smart hospitals’ workflows for better care outcomes. This study adds to the changing landscape by presenting an IST model fine-tuned for both enhancing the performance and integrating a number of key privacy standards, a huge leap toward far more secure and patient-centric healthcare innovations.

Intelligent speech technology can monitor patient progress and adherence to treatment plans through regular voice interactions. Patients can provide updates on their condition via voice recordings or telehealth consultations, which the technology can analyze for signs of improvement or deterioration. This continuous monitoring supports proactive healthcare delivery, allowing for timely adjustments to treatment plans and interventions. The integration of intelligent speech technology in disease diagnosis and treatment planning represents a significant leap toward a more proactive and personalized healthcare system. Uncovering hidden correlations and predictive markers in patient data, enables earlier detection of diseases and more effective, individualized treatment strategies. In the era of smart hospitals, intelligent speech technology assumes a pivotal role in the interactive control of medical equipment and devices. Through intuitive voice commands, healthcare professionals can orchestrate a symphony of instruments and systems, ranging from ventilators and infusion pumps to diagnostic tools and surgical robots, with unparalleled precision and efficiency. This hands-free approach not only minimizes the risk of contamination but also fosters a safer and more ergonomic work environment, empowering clinicians to focus their attention where it matters most—on patient care. In essence, intelligent speech technology represents a paradigm shift in the way healthcare is delivered and managed, offering a potent blend of automation, intelligence, and human-centric design. As we navigate the complexities of an evolving healthcare landscape, its role will only continue to expand, driving innovation, enhancing collaboration, and ultimately, improving the quality of life for patients and providers alike. The key contributions of the proposed model are mentioned here.

Intelligent speech technology automates medical transcription, reducing errors and saving time.
By analyzing patient speech patterns, the technology aids in early disease detection and personalized diagnosis.
The technology enables voice-controlled operation of medical devices, improving workflow efficiency and safety.
Federated learning allows multiple hospitals to collaborate on training models without sharing raw patient data, ensuring privacy and security.
Intelligent speech technology integrates with existing hospital systems, enhancing interoperability and communication.

The paper is structured as follows: Section 2 comprises relevant material designed to help readers comprehend the proposed paper using existing methodologies, while Section 3 elaborates on the problem description. The fourth component displays the Weiner Filter for Data Pre-processing, the suggested Fed MLP GRU for Intelligent Speech Technology in Smart Hospitals. Section 5 includes tabular and graphical representations of the results and performance indicators, and at last, in Section 6, the conclusion and future works are discussed.

2. Related Works

Jinman et al. [16] proposed methods using multimedia technologies to enhance human–computer interaction (HCI), focusing on creating intuitive interfaces that mimic human–human interactions, which is crucial for healthcare settings. These methods, integrating advanced sensors and virtual reality (VR) systems, aim to transform medical homes into smart environments, improving patient–system interactions and making medical devices more user-friendly. The idea of an intelligent healthcare dwelling focuses on a single healthcare system suited to someone’s house, allowing for health maintenance, disease detection, and symptom management in a comfortable setting, potentially reducing dependence on nursing homes and improving the quality of life for the elderly and those with chronic conditions. Incorporating smart medical devices can significantly benefit individuals with physical disabilities and chronic diseases through continuous monitoring and personalized care. HCI research focuses on developing user-friendly interfaces, ensuring seamless interaction between users and medical devices. Effective multimedia delivery and data management are crucial for transmitting and organizing health data, enabling informed decision-making by healthcare providers. This integration of multimedia technologies into healthcare promises personalized, efficient, and accessible care, enhancing life quality and reducing healthcare system burdens.

Yogesh et al. [17] Deep learning models and the Fastai text classification algorithm are used to forecast clinical voice words, recordings, and intention in order to identify 25 health issues. The study used an expansive dataset of 6661.wav files and an a.csv file containing 13 separate classifying parameters to evaluate medical voice outputs. The experimental analysis of data for every condition showed sentence form categories and disease categories based on sufferers’ audio pronunciations. Preparation duties include generating cloud diagrams to display the core frequency of terms, removing NaN amounts, detecting over-duplication, and calculating the database’s corpus and concept value. Characteristics, including the amount of material in each group, sentence size, and word count per sentence, were collected and then lemmatized as well as tokenized. Different models from deep learning, such as GRU, LSTM, bidirectional GRU, bidirectional LSTM, and the Fastai classifier, have been utilized for categorizing illnesses using healthcare expressions and written terms. The evaluation found that Fastai had the greatest degree of sensitivity (96.89%), remembering (95.8%), reliability (93.32%), and shortest loss rate (0.169), although multimodal LSTM had the greatest score in F1 (95.69%) when analyzing medicinal speech sounds in all three categories.

Nakul et al. [18] proposed methods to enhance the delivery of healthcare through AI-driven technologies and products designed for use outside traditional hospital and clinic settings, referred to as “health settings outside the hospital and clinic” (HSOHC). These tools facilitate remote monitoring, support telehealth visits, and target high-risk populations for intensive care interventions, thereby offering substantial benefits in terms of patient access and provider understanding of daily habits. By extending care to environments such as homes and offices, these technologies empower patients and caregivers by promoting behavior adaptation and enabling personalized, real-time care through bidirectional communication with clinicians. The advent of such technologies has been particularly pertinent during the COVID-19 pandemic, ensuring continuity of care amidst disruptions. However, the reliability and utility of these AI applications in various medical specialties, including cardiology and psychiatry, vary significantly. Key challenges to effective implementation include product scalability, data standardization and integration, usability, adoption by patients and providers, and insurance reform. Additionally, the broader adoption of AI in healthcare must address ethical and equity concerns, such as patient privacy, exacerbation of existing inequities and biases, and fair access, especially within the U.S.’s mixed private and public health insurance landscape.

DonHee et al. [19] developed methodologies for assessing the current state of AI-powered technological uses and their impact on the medical industry. Their research, which includes a comprehensive book examination and examination of concrete instances, demonstrates that numerous large medical facilities are already using systems powered by AI to assist doctors in identifying and managing various kinds of illnesses, along with improving their performance of caregivers and administrative duties. Despite medical providers enthusiastically integrating AI, the analysis shows each the possibilities and obstacles that these breakthroughs bring. Artificial intelligence applications have tremendous potential for enhancing healthcare and efficiency in operations. However, they have faced accompanying hurdles that must be overcome in order to reap all of their advantages. DonHee et al. underline the importance of excellent preparation and strategic shifts in care facilities and services in order to fully leverage the possibilities of AI. This fair perspective emphasizes that, notwithstanding rapid developments in AI, the effective incorporation of technological advances within health care will need mastering administrative and philosophical hurdles in order to produce innovative benefits and increase efficiency in service delivery.

Abdul et al. [20] presented strategies to reduce the variability and inconsistency of in-clinic mental health examinations by investigating the computerization of the CHA procedure utilizing computer science and data mining methods. Their broad examination looks through the history of CHA as well as examines significant contributions in the field, emphasizing different technologies that enable and methods for AI and ML used, including machine learning with and without supervision, neural networks, reinforcement training, processing of natural languages, and image processing. The paper also looks at the techniques for gathering information and comparison datasets used in CHA, giving a summary of the technical achievements in this sector. DonHee et al. additionally analyze the outstanding difficulties and obstacles connected with using AI and ML to CHA, and propose potential fixes to this issue. This survey aims to identify research gaps and contribute to the evolving interdisciplinary field of mental health by presenting a detailed overview of CHA tools, data acquisition methods, AI applications, and associated challenges.

Sandeep et al. [21] proposed methods to harness the advancements in artificial intelligence (AI), such as deep neural networks, natural language processing, computer vision, and robotics, to enhance various aspects of healthcare. Their research stresses the practical integration of AI into medical facilities, refuting the inflated concept that AI would completely replace human doctors. By evaluating the advantages and disadvantages of currently available artificial intelligence solutions, Sandeep et al. highlight four main areas where AI can have a substantial effect: hospital management, medical decision-making, keeping track of patients, and medical care. They promote the creation of an artificial intelligence or AI-augmented healthcare industry, where AI technologies support and enhance the efficiency and effectiveness of healthcare delivery. This balanced perspective highlights how AI can be strategically integrated into the health system to deliver meaningful improvements without undermining the essential role of human clinicians.

Recent advancements in artificial intelligence, artificial neural networks, the processing of natural languages, artificial intelligence, and automation have significantly influenced medicine by improving activities such as hospital management, assistance with clinical decisions, tracking patients, and treatments. Research has explored the use of multimedia technologies to create intuitive human–computer interactions in smart medical homes, and employed deep learning models to improve medical speech utterance categorization. AI-driven technologies for remote healthcare delivery outside traditional settings have been emphasized, focusing on the benefits and challenges of managing high-risk populations. Reviews of AI’s current applications in hospitals highlight both opportunities and logistical challenges. Surveys on the automation of cognitive health assessments using AI and machine learning address subjectivity and inaccuracy in traditional methods. Emphasizing a balanced integration of AI into health systems, these studies advocate for enhancing efficiency without replacing human clinicians, recognizing both the potential and limitations of AI technologies. Collectively, they underscore the transformative potential of AI in healthcare while acknowledging the need for strategic planning, ethical considerations, and overcoming implementation challenges.

Jinman et al. [16] have presented techniques using multi-media technologies for improved human–computer interaction (HCI), particularly in healthcare applications. Their method, which is a result of sensors and virtual reality (VR), plays a crucial role in the design of smart medical environments mimicking human-human interactions. Jinman et al. attempted to improve patient–system interactions and accessible healthcare devices as they stayed focused on the design of intuitive interfaces and smart healthcare dwellings. Although this work progresses patient care through the HCI interface that has ease of use, it does not have federated learning, which hinders the data privacy in addition to scalability issues in healthcare settings that will be heterogeneous. Instead, our study’s federated learning model, Fed MLP GRU, preserves data privacy and, at the same time, allows for collaborative improvements for the institutions without the need to share sensitive patient information about this limitation.

Yogesh et al. [17] applied deep learning with the aid of the Fastai text classification algorithm for clinical voice recordings’ prediction and consequently for classifying the health conditions of patients through medical speech outputs. All the models used here—GRU, LSTM, and bidirectional GRU—achieved significant results in disease classification with high sensitivity and accuracy through speech-based test cases. However, centralized data collection makes it impossible for this approach to be implemented without probable breaches in the privacy of clinical applications. Our federated learning model extends this work to alleviate privacy naturally inherent in centralized models in order to provide the ability to do private, distributed analysis of speech data.

Nakul et al. [18] discussed AI for “health settings outside hospitals and clinics” (HSOHC) toward remote monitoring and telemedicine. While such AI-led technologies enhance personalized care with direct monitoring, data privacy along with scalability of data and fair access becomes a challenge. Our Fed MLP GRU model, unlike Nakul et al., focuses on federated learning in IST and is quite apt in healthcare setups where privacy sensitivity is seen in health care. Our model fulfills the need for safe, adaptive healthcare inputs by resorting to data privacy and in real-time speech data analysis.

According to a review article from DonHee et al. [19], AI provides applications in healthcare, where diagnostic support can be provided, and task automation improves operations. While still many reasons to be concerned about AI apply, such as data privacy and the required infrastructure to make such vision a reality, their work does highlight the transformative impacts of AI. Our model follows DonHee et al. concerning the scope of use of AI in healthcare and brings in a federated learning approach—a design uniquely suited to address privacy and data-sharing concerns particular to speech technology applications.

Abdul et al. [20] discussed the assessment of cognitive health and discussed the inconsistency in the clinical evaluation. Their study on AI support for cognitive assessment still hinted at a significant need to have secure data-sharing frameworks to minimize subjectivity and maximize accuracy. In doing this, our research develops upon the foundation laid by Abdul et al. to ensure consistency and increased accuracy in healthcare assessments without compromising the privacy of patients using federated learning in IST.

Sandeep et al. [21] appraised the fusion of AI technologies that feature deep neural networks and NLPs in healthcare to strengthen the management of hospitals, monitor patients, and make decisions in medicine. Our model, Fed MLP GRU, extends principles of work by the authors as it is applicable in federated learning in speech technology to strengthen patient monitoring from safe speech data analysis in decentralized settings.

There has been a very limited study of federated learning for IST applications in healthcare. Most current work on healthcare federated learning, including decentralized medical imaging analysis and diagnostic predictions, focuses on non-audio data types. Our work fills an important gap by using federated learning in IST, enabling secure speech data processing while respecting the confidentiality of patients and increasing the accuracy and reactivity of the model. Our Fed MLP GRU model combines the approach of federated learning with IST in innovative ways that greatly improve the privacy and scalability of healthcare speech applications.

This integration of federated learning with IST safeguards privacy, besides adapting specifically to the needs of health environments where it caters for real-time updates of data and increases accuracy without sharing actual data. In short, all earlier works along with our research concern the major points of privacy, scalability, and enhanced accuracy in applications of IST in healthcare.

3. Problem Statement

The problem addressed by Jinman et al. centers on the need to enhance human–computer interaction (HCI) in healthcare settings through the use of multimedia technologies, particularly to create intuitive interfaces that mimic human–human interactions. Current medical environments often lack user-friendly interfaces, making patient–system interactions cumbersome and limiting the effectiveness of medical devices. By integrating advanced sensors and virtual reality (VR) systems, the goal is to transform medical homes into smart environments that offer personalized health maintenance, disease detection, and symptom management. This transformation aims to reduce dependence on nursing homes and improve the quality of life for the elderly and those with chronic conditions. Additionally, effective multimedia delivery and data management are essential to facilitate the seamless transmission and organization of health data, thereby enabling healthcare providers to make informed decisions and deliver personalized, efficient, and accessible care [16].

Problems that the IST system faces in a clinical environment are mostly based on the challenge of safe cross-institutional data sharing, data privacy, and accurate speech-to-text systems. The currently in-place IST systems have a weak performance regarding transcription accuracy and the privacy of data. Therefore, they are not very effective for disease diagnosis, clinical transcription, as well as smart hospitals’ equipment control. Traditional speech recognition models depend on central data, which are notoriously sensitive and do not keep safe privacy when sensitive health information needs to be shared across different institutions. Moreover, transcription reliability with the variability in clinical language and environment noise is affected by decision-making accuracy in scenarios such as critical care.

This paper tries to solve problems with the federated learning framework relying on the MLP and GRU neural networks. This approach uses federated learning to process data in a decentralized mode, and allows models to be trained on local devices without transferring sensitive data, thus preserving patient privacy and enhancing security. MLP and GRU architectures are combined with the aim of developing greater accuracy of transcription and adapting the complexity of linguistic terms based on the clinical environment. This is a privacy-preserving high-accuracy IST model for speech-to-text applications. It advances IST applications in health care, allowing for more effective equipment control and decision-making processes in smart hospital environments.

4. Proposed Fed MLP GRU Methodology for Intelligent Speech Technology in Smart Hospitals

Data collection from Kaggle provides diverse speech recordings relevant to medical applications, which are then preprocessed using the Weiner filter to reduce background noise and enhance audio quality. Multi-layer perceptrons (MLP) are utilized for feature extraction, converting the cleaned audio signals into high-level feature vectors that capture essential speech characteristics. These feature vectors are input into Gated Recurrent Units (GRUs) for classification, enabling the model to understand and categorize sequential data based on learned temporal patterns. Federated Learning is employed to train the MLP-GRU model across multiple decentralized devices or hospital systems without sharing raw data. Each participating device trains a local model, computes updates, and sends these to a central server for aggregation into a global model, which is then redistributed, iterating this process to enhance performance collaboratively. The trained FED MLP GRU model is finally deployed in smart hospitals for real-time transcription, disease diagnosis, and interactive control of medical equipment, ensuring robust and efficient intelligent speech technology applications in healthcare settings. Figure 1 shows the Proposed Fed MLP GRU Methodology for intelligent speech technology in smart hospitals.

The proposed methodology is for smart hospitals since federated learning combined with multi-layer perceptrons will serve for feature extraction, while the sequence classification will use gated recurrent units. MLP would be preferred because they capture complex, nonlinear relations in data successfully, especially in noisy and variable medical speech data. MLPs are very handy for extracting features from raw audio signals, so that the model learns hierarchical structures in speech. After feature extraction, we use a GRU for sequence classification as it can inherently model temporal dependency in sequential data. It can handle challenges of vanishing gradients in traditional RNNs that complicate learning and computation with long-term information. This works as an appropriate tool in realizing effective speech-to-text transformation, which is arguably involved in recognizing symptoms reported by patients.

The reason for choosing the Wiener filter over spectral subtraction and all other techniques is that the former can reduce background noise to a great extent while preserving key features of the speech. As such, the Wiener filter in the frequency domain tends to boost the cleaner parts of the speech and suppress the noisier parts. In application, this is particularly suitable for use in medical environments where high levels of background noise interfere with the accurate determination of speech. Many studies proved its usability for speech enhancement, mainly in databases of healthcare, and thus, it can be applied in a preprocessing step before feature extraction in noisy medical recordings.

FL is applied in order to keep the privacy of the sensitive patient data. According to the current standards of privacy in healthcare, such as HIPAA and GDPR, it is appropriate to correspond to them. FL will enable hospitals to train local models and send only model updates to a central server, ensuring thus that patient data never leaves the local institution. For protecting model updates when communicated between hospitals and a central server, further privacy-enhancing techniques such as secure aggregation and differential privacy are applied. This helps to minimize the risk of data breaches, while collaborative learning is made possible across different medical institutions; hence, this is an extremely suitable approach for privacy-conscious healthcare environments.

This model, when applied in hospital environments, poses challenges, such as latency, compatibility with hardware, and network stability. Communication for federated learning involves latency that can be mitigated by edge computing and update and refinement of local models there without waiting on the central servers. Model quantization and pruning are further used to optimize resource utilization in order to support hardware diversity in hospitals. There can also be network disruptions during the training phase, but it is still possible to have model training continue across connectivity disruptions with asynchronous updates. These solutions ensure that the proposed model is practical for healthcare applications, guarantees real-time medical transcription and diagnosis, and follows the guidelines of privacy and security standards.

4.1. Data Collection

The data used in this project come from Kaggle, specifically from the “Medical Speech, Transcription, and Intent” dataset. It contains audio recordings paired with text transcriptions for common medical symptoms like “headache” and “knee pain.” Contributors provided both the spoken descriptions and their corresponding written text. This dataset is valuable for training models to understand and respond to patient speech in healthcare settings. However, it is essential to note that some labels may be incorrect, and some audio files may have poor quality, requiring careful preprocessing before use in machine learning tasks [22].

The Kaggle “Medical Speech, Transcription, and Intent” dataset is specifically created taking into consideration healthcare-focused NLP as well as speech recognition applications. The dataset consists of a huge amount of audio files that all contain corresponding text transcriptions focused solely on usual medical symptoms and cases. The audio files come in different lengths and include some common medical consultation terms, such as “headache”, “nausea”, “back pain”, and “knee pain”, as well as more general ones associated with health. Each audio file is an oral phrase or sentence describing a medical complaint, going hand in hand with the transcription that describes the main idea of the message in writing. The dataset is rich in metadata, including information about who the speakers are, their genders, and ages. This would allow models to generalize across different demographics and accent variations.

A great variety of speakers with different accents, pitch, and vocal tones enriches the development of speech models robust enough to deal with real-world variations in patient speech. However, this brings some challenges with the dataset. As the quality of an audio file varies and there are some background noises or distortions present inside many audio files that could potentially interfere with the accuracy of transcription, careful preprocessing is needed, such as noise reduction by a Wiener filter and data augmentation for increasing quality and, therefore, the performance of models. Also, there could be errors in labeling or transcriptions in the dataset that calls for manual verification or automated filtering mechanisms to maintain data quality. However, it is particularly useful for healthcare applications because of its relevant and domain-specific language and terminology. The resulting structured pairing of medical speech with text transcription is invaluable for training intelligent speech recognition systems to understand and respond appropriately to queries in a healthcare environment.

4.2. Data Pre-Processing

In the data preprocessing step, the Weiner filter was applied to enhance the quality of the audio recordings from the Kaggle dataset. This filter effectively reduces background noise, making the speech clearer for analysis. It works by minimizing the error between the original clean signal and the noisy signal, effectively enhancing the quality of the audio. After applying the Weiner filter to each audio file, the recordings are normalized to ensure consistency. This preprocessing is crucial for training accurate machine learning models to transcribe and recognize medical symptoms from the audio data.

The Wiener filter is a mathematical method used for signal processing, particularly in the context of noise reduction. It operates in the frequency domain and aims to minimize the mean square error between the original clean signal and the noisy signal. The Wiener filter estimates the clean signal by applying a frequency-dependent gain factor to the noisy signal. It effectively amplifies the components of the noisy signal where the signal-to-noise ratio (SNR) is high and attenuates the components where the SNR is low. This process helps to suppress the noise while preserving the important features of the original signal.

4.3. Feature Extraction with Multi-Layer Perceptrons

The purpose of feature extraction with multi-Layer perceptrons is to convert raw input information, like noise from processed medical voice audio recordings, into small as well as useful representations capturing important characteristics for analysis afterward. MLPs are a form of artificial brain made up of many different layers of linked nodes, or neurons, arranged in a forwarding way. Each of the neurons in the MLP collects input signals. It combines the weighted total of all of these signals plus a bias term in nature and then uses an activation procedure to generate a product. In the context of feature extraction, the input layer of the MLP receives the preprocessed audio data, which may be represented as time-domain waveforms or frequency-domain spectrograms. These input signals are then propagated through one or more hidden layers of neurons, with each layer extracting increasingly abstract and complex features from the input data. Figure 2 shows the architecture of MLP.

The hidden layers of the MLP serve as feature extractors, learning hierarchical representations of the input data through the process of forward propagation and backward propagation. As the input signals pass through the hidden layers, the weights and biases of the neurons are adjusted based on the training data to minimize the error between the predicted and actual outputs. This training process effectively tunes the MLP to extract features that are most discriminative for the given task, such as distinguishing between different medical symptoms based on the audio recordings. The output layer of the MLP produces a compact representation of the input data, often referred to as a feature vector, which encapsulates the most relevant information for subsequent classification or analysis tasks. These extracted features can then be fed into subsequent layers of the model, such as recurrent neural networks (RNNs) or convolutional neural networks (CNNs), for further processing and interpretation.

4.4. Sequence Classification with GRUs

In the classification stage, Gated Recurrent Units (GRUs) are employed to perform sequence classification based on the extracted features obtained from the Multi-Layer Perceptrons (MLPs). GRUs are a type of recurrent neural network that excels in modeling sequential data, making them particularly well-suited for tasks such as speech recognition and natural language processing. Unlike traditional RNNs, GRUs incorporate gating mechanisms that allow them to selectively update and forget information over time, mitigating the vanishing gradient problem and enabling more effective long-range dependencies modeling. In the context of our application, the feature vectors extracted by the MLPs serve as input sequences to the GRUs, where each feature vector represents a time step in the sequence. Figure 3 shows the architecture of GRU.

During the classification process, the GRUs iteratively process the input sequences, updating their hidden states at each time step based on the current input and the previous hidden state. This dynamic process allows the GRUs to capture temporal dependencies and contextual information within the input sequences, which is essential for accurately classifying medical symptoms based on the audio recordings. The final hidden state of the GRUs, which summarizes the information from the entire input sequence, is then passed through a fully connected layer with softmax activation to produce the probability distribution over the possible symptom classes. The class with the highest probability is selected as the predicted label for the input sequence, effectively performing sequence classification based on the extracted features. This approach leverages the strengths of both MLPs for feature extraction and GRUs for sequence modeling, enabling accurate and robust classification of medical symptoms from audio data.

4.5. Federated Learning for Collaborative Model Training

Federated Learning is employed to train the model collaboratively across multiple locations, such as different hospitals, without the need to share raw data. This decentralized approach ensures that sensitive patient information remains local, significantly enhancing data privacy and security. In a federated learning setup, each participating hospital trains a local version of the model using its own data. These local models periodically share their learned parameters, such as weights and biases, with a central server. The central server then aggregates these parameters to create a global model, which is redistributed back to the participating hospitals. This iterative process continues until the global model achieves satisfactory performance. Figure 4 shows the architecture of federated learning.

Once the federated learning process has produced a well-trained global model, it is deployed in smart hospitals for a variety of applications. In the transcription of medical speech, the model accurately converts spoken language into text, capturing patient symptoms and clinician notes efficiently. This real-time transcription capability streamlines the documentation process, allowing healthcare professionals to focus more on patient care rather than administrative tasks. Additionally, the model supports disease diagnosis by analyzing audio inputs to detect and interpret symptoms, leveraging the patterns learned during training to provide preliminary diagnostic insights.

Beyond transcription and diagnosis, the deployed model plays a crucial role in the interactive control of medical equipment. By recognizing voice commands from healthcare providers, the model can facilitate the hands-free operation of various medical devices, improving both the efficiency and hygiene of medical procedures. For instance, doctors can verbally instruct diagnostic machines, adjust settings, or retrieve patient information without physical interaction, reducing the risk of contamination and saving time. This seamless integration of intelligent speech technology into the operational workflow of smart hospitals exemplifies the transformative potential of AI in enhancing healthcare delivery and operational efficiency.

While the proposed Fed MLP-GRU model seems accurate and useful, there are a few limitations. While the “Medical Speech, Transcription, and Intent” dataset is large, it might possibly fail to represent dialects, accents, and medical term variations used within hospitals in all forms. The generalization to hospitals operating in linguistic domains that are unique in their features will be affected. Some of the audio files here are of variable quality, and the symptoms are also sometimes mislabeled—thus could prove detrimental to the performance of a model that was not well-handled at the preprocessing stage.

Another limitation is that the federated learning framework has high computational complexity. While federated learning does allow for training across multiple hospital sites with privacy-preserving, it can only be achieved with substantial processing power and network bandwidth to have on-time updates. Areas that cannot provide this might find difficulty in implementing this comprehensively. In addition, because the updates from different locations could potentially differ in quality, aggregation also injects slight bias, especially if the updates at some locations are of a higher quality or quantity.

Finally, although the model performs well in transcription, diagnosis, and equipment control, it depends on accurate audio input. Environmental noise and interruptions during speech inputs may hinder the model’s accuracy in clinical settings where background noise cannot be prevented. Future work could possibly address these concerns by expanding the dataset to more diverse speeches, improving computational efficiency, and more sophisticated noise handling to function reliably in different and resource-constrained healthcare environments.

5. Results and Discussion

The implementation of intelligent speech technology through the combination of Multi-Layer Perceptrons (MLP) for feature extraction and Gated Recurrent Units (GRU) for sequence classification yielded significant improvements in the accuracy and efficiency of medical speech transcription, disease diagnosis, and interactive control of Health care supplies for intelligent medical facilities. The federated learning approach ensured that the model was trained on diverse datasets from multiple locations without compromising patient data privacy. During deployment, the model demonstrated high accuracy in real-time transcription of medical speech, effectively capturing and converting spoken language into text. For disease diagnosis, the model successfully identified and interpreted various medical symptoms from audio inputs, providing valuable diagnostic insights. Additionally, the interactive control capabilities of the model facilitated the hands-free operation of medical equipment, enhancing workflow efficiency and reducing the risk of contamination. Overall, the results indicate that integrating intelligent speech technology into healthcare settings significantly enhances the quality and efficiency of patient care and hospital operations.

In this analysis, the training and validation accuracies of four different neural network models—Network A, Network B, Network C, and a Federated Learning model—were plotted over a range of epochs to compare their performance. The epochs considered were 20 to 200 in steps of 20. For each model, the training and validation accuracies were tracked and plotted in a 2 × 2 grid layout, with each subplot representing one of the networks. Network A showed steady improvement in both training and validation accuracy, with final accuracies nearing 99%. Network B demonstrated even higher training accuracy, but similar validation accuracy to Network A. Network C started with lower accuracy but improved significantly, aligning with the performance of the other networks by the end. The federated learning model showed a consistent and balanced improvement in both training and validation accuracy, highlighting its robustness. This visualization helps in understanding the convergence behavior and generalization capabilities of each model over the training period. Figure 5 shows a comparison of training and validation accuracies across different neural network models.

In this analysis, the training and testing loss curves for four different neural network scenarios—A, B, C, and Federated—are visualized over 100 epochs using a 2 × 2 grid layout. Each subplot represents one scenario, showing the training loss (yellow) and testing loss (green) to track the models’ performance over time. The plots reveal how each model’s loss decreases as training progresses, with training loss typically showing a steady decline, indicating learning and adaptation, while testing loss exhibits fluctuations, reflecting the models’ ability to generalize to unseen data. This comprehensive comparison highlights the varying effectiveness of each training approach, with the federated model demonstrating consistent and balanced performance, underscoring its robustness in handling diverse datasets and training conditions. Figure 6 shows the training and testing loss curves.

In this visualization, the ROC curves for a FED LSTM model are plotted across three different sample sizes—250, 750, and 1000—to evaluate its performance in terms of sensitivity and specificity. For each sample size, simulated data were generated, and the ROC curves, along with their corresponding AUC values, were calculated. The subplots illustrate the ROC curves for each sample size, where the true-positive rate (sensitivity) is plotted against the false-positive rate (1-specificity). Each subplot includes the ROC curve for the model (black line) and a diagonal reference line representing a random classifier (dashed line). The consistency of the ROC curves across varying sample sizes highlights the model’s reliability in distinguishing between classes, providing a visual representation of its classification performance. The plots also emphasize the importance of sample size in model evaluation, showing how the model maintains robust performance with increasing data. Figure 7 shows ROC Curves for the FED LSTM model across different sample sizes.

In this visualization, hypothetical model performance data are plotted across three different datasets—MNIST, CIFAR, and SVHN—each with varying convergence thresholds of 0.1, 0.3, and 0.2, respectively. The convergence graphs illustrate the performance metrics, such as accuracy or loss, of three models—FED, MLP, and GRU (INCLUDE)—over a series of iterations or rounds. Each subplot in the 3 × 3 grid represents a combination of dataset and convergence threshold, displaying the performance trends of the models over increasing iterations. The performance metrics for the FED, MLP, and GRU (INCLUDE) models are depicted by red, blue, and brown lines, respectively. These visualizations provide insights into the convergence behavior of the models under different dataset conditions and convergence thresholds, aiding in the assessment and comparison of their performance across various scenarios. Figure 8 shows model performance convergence across different datasets and thresholds.

Table 1 shows the performance metrics of various deep learning methods, including accuracy, precision, recall, and F1 score. The proposed federated learning approach, utilizing a combination of Multi-Layer Perceptron (MLP) and Gated Recurrent Unit (GRU), achieves a remarkable accuracy of 98.6%. This approach also exhibits high precision, recall, and F1 score, indicating its effectiveness in classification accuracy and minimizing false positives and negatives. Traditional Deep Neural Networks (DNNs) follow closely behind with slightly lower accuracy but maintain balanced precision, recall, and F1 scores. However, the Convolutional Neural Network (CNN) demonstrates a trade-off between accuracy and precision, with relatively lower scores in recall and F1 score compared to DNN and the proposed federated approach. In contrast, the LSTM-based Recurrent Neural Network (RNN) shows lower accuracy and precision despite higher recall and F1 score, suggesting potential limitations in its predictive capability. Overall, the proposed federated learning approach with MLP and GRU emerges as a robust solution for accurate and reliable classification tasks across diverse data environments. Figure 9 shows the performance evaluation for different methods.

Cross-validation has been implemented from different folds; the internal and external validation focused on the federated MLP-GRU model are insured. The k-fold approach was adopted (k = 5), so that in total, it splits the dataset into five different subsets. In every iteration, one subset is used for validation while remaining for training purposes. This cross-validation process made each subset work as the validation set only once, thereby giving a much richer estimate of how well a model performs under different distributions of data. The federated learning framework required the data in each subset to be kept isolated. Therefore, privacy was well maintained even when it was used to improve the model collaboratively. The mean accuracy from the k-fold cross-validation reached an impressive 98.2% with a standard deviation of 0.4%. This means the performance was stable across folds.

In addition to accuracy, the k-fold cross-validation gave much insight into how sensitive and specific the model was on the validation sets. On average, the sensitivity showed 97.8%, meaning that the model was able to identify true-positive cases for disease diagnosis depending on audio input. The specificity averaged 98.1% and thus proved the model’s ability in terms of averting false positives, which is decisive in minimizing misdiagnosis in medical applications. The consistent results obtained both for sensitivity and specificity further give the impression that the federated MLP-GRU model is well calibrated to generalize very effectively between samples of patients and maintain its robustness while introducing unseen data.

Cross-validation was also performed externally with the separate datasets in order to further generalize the model beyond the primary training data. It achieved a testing accuracy of 97.6% on external data, for instance, different corpora of medical speech, close to the level obtained when cross-validating internally. The sensitivity and specificity were held at high levels, 97.2%, and 97.9%, respectively, which all the more strengthen the reliability of the model in varied settings. This external validation highlights the ability of the federated model to perform well over diverse healthcare datasets. It will provide great potential for using this in scalable intelligent speech technology applications in smart hospitals.

6. Conclusions and Future Works

The proposed Fed MLP GRU model marks a significant step forward in promoting intelligent speech technology within the domain of healthcare, especially in smart hospital settings. With this capability to integrate MLPs for feature extraction and GRUs for sequence classification, it tackles some of the intrinsic challenges toward real-time medical speech recognition, including good transcription accuracy and diagnosis. The federated learning framework provides protection to sensitive patient data; using this framework, collaborative model training can be undertaken without violating privacy in a critical interest area of health environments. While it maintains the standards of privacy, it also enhances the accuracy and reliability of speech recognition and, hence, is most valuable to improve patient care and hospital management.

From a practical usability perspective, the model can demonstrate its value in streamlining operations at a hospital by providing voice commanding ability to all medical equipment so control over this equipment is seamless. The technology reduces unnecessary physical interaction during clinical operations while improving efficiency, hygiene, and safety. In federated learning, the model is integrated to allow it to be scaled up and extended from one healthcare institution to others across the world without providing any compromise toward data privacy. This decentralized approach could hence be used to keep on improving the model while strictly adhering to the protection of healthcare data regulations.

The future is expected to see further development of machine learning algorithms combined with emerging privacy-preserving techniques, which will continue to hone the capabilities of intelligent speech systems. In the federated learning context, the MLP/GRU model is an innovation contribution in the field, thereby opening up new avenues for research and implementation. The Fed MLP GRU framework has immense scope in filling the space between advanced AI technologies and real healthcare applications, making the services efficient, accessible, and patient-centric, as the healthcare system goes more toward personalized and decentralized approaches.

In general, the future of intelligent speech technology for healthcare is in its ability to fit into as many current medical workflows as possible with a focus on privacy and security. This work prepares the stage for truly transformative changes in how healthcare professionals interact with patients and medical systems by opening new frontiers in federated learning and its application to voice recognition systems. Further refinement and greater adoption of the Fed MLP GRU model could play a great role in streamlining healthcare delivery not only in smart hospitals but also across underdeveloped communities globally.

Author Contributions

Conceptualization, A.E.; Formal analysis, A.I.T.; Investigation, N.E.; Methodology, A.E. and S.H.; Resources, R.M.A.E.-A.; Software, F.A.; Validation, S.H.; Writing—original draft, A.E.; Writing—review and editing, N.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Deanship of Graduate Studies and Scientific Research at Jouf University under grant No. (DGSSR-2023-02-02097).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

This work was funded by the Deanship of Graduate Studies and Scientific Research at Jouf University under grant No. (DGSSR-2023-02-02097).

Conflicts of Interest

The authors declare no conflict of interest.

References

Spachos, P.; Gregori, S.; Deen, M.J. Voice Activated IoT Devices for Healthcare: Design Challenges and Emerging Applications. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 3101–3107. [Google Scholar] [CrossRef]
Eftekhari, H. Transcribing in the digital age: Qualitative research practice utilizing intelligent speech recognition technology. Eur. J. Cardiovasc. Nurs. 2024, 23, zvae013. [Google Scholar] [CrossRef] [PubMed]
Durling, S.; Lumsden, J. Speech recognition use in healthcare applications. In Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia, Linz, Austria, 24–26 November 2008; Association for Computing Machinery: New York, NY, USA, 2008; pp. 473–478. [Google Scholar]
Yang, D.-M.; Chang, T.-J.; Hung, K.-F.; Wang, M.-L.; Cheng, Y.-F.; Chiang, S.-H.; Chen, M.-F.; Liao, Y.-T.; Lai, W.-Q.; Liang, K.-H. Smart healthcare: A prospective future medical approach for COVID-19. J. Chin. Med. Assoc. 2023, 86, 138. [Google Scholar] [CrossRef] [PubMed]
Dewangan, N.; Vyas, P.; Ankita; Mandal, S. Smart Healthcare and Intelligent Medical Systems. In Computational Intelligence and Applications for Pandemics and Healthcare; IGI Global: Hershey, PA, USA, 2022; pp. 205–228. ISBN 978-1-79989-831-3. [Google Scholar]
Dong, A.; Guo, J.; Cao, Y. Medical information mining-based visual artificial intelligence emergency nursing management system. J. Healthc. Eng. 2021, 2021, e4253606. [Google Scholar] [CrossRef] [PubMed]
Abdulkareem, K.H.; Mohammed, M.A.; Salim, A.; Arif, M.; Geman, O.; Gupta, D.; Khanna, A. Realizing an Effective COVID-19 Diagnosis System Based on Machine Learning and IoT in Smart Hospital Environment. IEEE Internet Things J. 2021, 8, 15919–15928. [Google Scholar] [CrossRef] [PubMed]
Mudgal, S.K.; Agarwal, R.; Chaturvedi, J.; Gaur, R.; Ranjan, N. Real-world application, challenges and implication of artificial intelligence in healthcare: An essay. Pan Afr. Med. J. 2022, 43, 1. Available online: https://www.ajol.info/index.php/pamj/article/view/245268 (accessed on 29 May 2024).
Gupta, N.S.; Kumar, P. Perspective of artificial intelligence in healthcare data management: A journey towards precision medicine. Comput. Biol. Med. 2023, 162, 107051. [Google Scholar] [CrossRef] [PubMed]
Jayaraman, P.P.; Forkan, A.R.M.; Morshed, A.; Haghighi, P.D.; Kang, Y.-B. Healthcare 4.0: A review of frontiers in digital health. WIREs Data Min. Knowl. Discov. 2020, 10, e1350. [Google Scholar] [CrossRef]
Kumar, A.; Gond, A. Natural Language Processing: Healthcare Achieving Benefits Via Nlp. Sci. Prepr. 2023, 2023, 2–14. [Google Scholar] [CrossRef]
Murala, D.K.; Panda, S.K.; Dash, S.P. MedMetaverse: Medical Care of Chronic Disease Patients and Managing Data Using Artificial Intelligence, Blockchain, and Wearable Devices State-of-the-Art Methodology. IEEE Access 2023, 11, 138954–138985. [Google Scholar] [CrossRef]
Usmani, U.A.; Jaafar, J. Machine Learning in Healthcare: Current Trends and the Future. In International Conference on Artificial Intelligence for Smart Community; Ibrahim, R., Porkumaran, K., Kannan, R., Mohd Nor, N., Prabakar, S., Eds.; Springer Nature: Singapore, 2022; pp. 659–675. [Google Scholar]
Agarwal, N.; Singh, P.; Singh, N.; Singh, K.K.; Jain, R. Machine Learning Applications for IoT Healthcare. In Machine Learning Approaches for Convergence of IoT and Blockchain; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2021; pp. 129–144. ISBN 978-1-119-76188-4. [Google Scholar]
Sloane, E.B.; Silva, R.J. Chapter 83—Artificial intelligence in medical devices and clinical decision support systems. In Clinical Engineering Handbook, 2nd ed.; Iadanza, E., Ed.; Academic Press: Cambridge, MA, USA, 2020; pp. 556–568. ISBN 978-0-12-813467-2. [Google Scholar]
Kim, J.; Wang, Z.; Weidong Cai, T.; Dagan Feng, D. 23—Multimedia for Future Health—Smart Medical Home. In Biomedical Information Technology; Feng, D.D., Ed.; Biomedical Engineering; Academic Press: Cambridge, MA, USA, 2008; pp. 497–512. ISBN 978-0-12-373583-6. [Google Scholar]
Kumar, Y.; Koul, A.; Mahajan, S. A deep learning approaches and fastai text classification to predict 25 medical diseases from medical speech utterances, transcription and intent. Soft Comput. 2022, 26, 8253–8272. [Google Scholar] [CrossRef]
Aggarwal, N.; Ahmed, M.; Basu, S.; Curtin, J.J.; Evans, B.J.; Matheny, M.E.; Nundy, S.; Sendak, M.P.; Shachar, C.; Shah, R.U.; et al. Advancing Artificial Intelligence in Health Settings Outside the Hospital and Clinic. NAM Perspect. 2020, 2020, 1–26. [Google Scholar] [CrossRef] [PubMed]
Lee, D.; Yoon, S.N. Application of Artificial Intelligence-Based Technologies in the Healthcare Industry: Opportunities and Challenges. Int. J. Environ. Res. Public Health 2021, 18, 271. [Google Scholar] [CrossRef] [PubMed]
Javed, A.R.; Saadia, A.; Mughal, H.; Gadekallu, T.R.; Rizwan, M.; Maddikunta, P.K.R.; Mahmud, M.; Liyanage, M.; Hussain, A. Artificial Intelligence for Cognitive Health Assessment: State-of-the-Art, Open Challenges and Future Directions. Cogn. Comput. 2023, 15, 1767–1812. [Google Scholar] [CrossRef]
Reddy, S.; Fox, J.; Purohit, M.P. Artificial intelligence-enabled healthcare delivery. J. R. Soc. Med. 2019, 112, 22–28. [Google Scholar] [CrossRef] [PubMed]
Kaggle Speech Technology for Transcription, Disease Diagnosis, and Interactive Control of Medical Equipment in Smart Hospitals Dataset—Google Search. Available online: https://www.kaggle.com/datasets/paultimothymooney/medical-speech-transcription-and-intent (accessed on 31 May 2024).
Fang, S.-H.; Tsao, Y.; Hsiao, M.-J.; Chen, J.-Y.; Lai, Y.-H.; Lin, F.-C.; Wang, C.-T. Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach. J. Voice 2019, 33, 634–641. [Google Scholar] [CrossRef] [PubMed]
Syed, S.A.; Rashid, M.; Hussain, S.; Zahid, H. Comparative Analysis of CNN and RNN for Voice Pathology Detection. BioMed Res. Int. 2021, 2021, 6635964. Available online: https://www.hindawi.com/journals/bmri/2021/6635964/ (accessed on 31 May 2024). [CrossRef] [PubMed]
Roy, S.; Sayim, M.I.; Akhand, M.A.H. Pathological Voice Classification Using Deep Learning. In Proceedings of the 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka, Bangladesh, 3–5 May 2019; pp. 1–6. [Google Scholar]

Figure 1. Proposed Fed MLP GRU Methodology for Intelligent Speech Technology in Smart Hospitals.

Figure 2. Architecture of MLP.

Figure 3. Architecture of GRU.

Figure 4. Architecture of Federated Learning.

Figure 5. Comparison of Training and Validation Accuracies Across Different Neural Network Models.

Figure 6. Training and Testing Loss Curves. (A) Scenario A. (B) Scenario B. (C) Scenario C. (D) Federated: The federated model’s performance.

Figure 7. ROC Curves for FED LSTM Model across Different Sample Sizes.

Figure 8. Model Performance Convergence across Different Datasets and Thresholds.

Figure 9. Performance Evaluation for Different Methods.

Table 1. Experimental Result Analysis for Different Parameters with Other Metrics.

Deep Learning Methods	Accuracy	Precision	Recall	F1 Score
DNN [23]	94.26	92	96	94
CNN [24]	87.11	88	93	90
LSTM-RNN [25]	73.39	90	95	92
Proposed Fed MLP GRU	98.6	96	98	98

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Elhadad, A.; Hamad, S.; Elfiky, N.; Alanazi, F.; Taloba, A.I.; El-Aziz, R.M.A. Advancing Healthcare: Intelligent Speech Technology for Transcription, Disease Diagnosis, and Interactive Control of Medical Equipment in Smart Hospitals. AI 2024, 5, 2497-2517. https://doi.org/10.3390/ai5040121

AMA Style

Elhadad A, Hamad S, Elfiky N, Alanazi F, Taloba AI, El-Aziz RMA. Advancing Healthcare: Intelligent Speech Technology for Transcription, Disease Diagnosis, and Interactive Control of Medical Equipment in Smart Hospitals. AI. 2024; 5(4):2497-2517. https://doi.org/10.3390/ai5040121

Chicago/Turabian Style

Elhadad, Ahmed, Safwat Hamad, Noha Elfiky, Fulayjan Alanazi, Ahmed I. Taloba, and Rasha M. Abd El-Aziz. 2024. "Advancing Healthcare: Intelligent Speech Technology for Transcription, Disease Diagnosis, and Interactive Control of Medical Equipment in Smart Hospitals" AI 5, no. 4: 2497-2517. https://doi.org/10.3390/ai5040121

APA Style

Elhadad, A., Hamad, S., Elfiky, N., Alanazi, F., Taloba, A. I., & El-Aziz, R. M. A. (2024). Advancing Healthcare: Intelligent Speech Technology for Transcription, Disease Diagnosis, and Interactive Control of Medical Equipment in Smart Hospitals. AI, 5(4), 2497-2517. https://doi.org/10.3390/ai5040121

Article Menu

Advancing Healthcare: Intelligent Speech Technology for Transcription, Disease Diagnosis, and Interactive Control of Medical Equipment in Smart Hospitals

Abstract

1. Introduction

2. Related Works

3. Problem Statement

4. Proposed Fed MLP GRU Methodology for Intelligent Speech Technology in Smart Hospitals

4.1. Data Collection

4.2. Data Pre-Processing

4.3. Feature Extraction with Multi-Layer Perceptrons

4.4. Sequence Classification with GRUs

4.5. Federated Learning for Collaborative Model Training

5. Results and Discussion

6. Conclusions and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI