Human-in-the-Loop XAI for Predictive Maintenance: A Systematic Review of Interactive Systems and Their Effectiveness in Maintenance Decision-Making

Amaliah, Nuuraan Risqi; Tjahjono, Benny; Palade, Vasile

doi:10.3390/electronics14173384

Open AccessSystematic Review

Human-in-the-Loop XAI for Predictive Maintenance: A Systematic Review of Interactive Systems and Their Effectiveness in Maintenance Decision-Making

by

Nuuraan Risqi Amaliah

¹

,

Benny Tjahjono

^1,*

and

Vasile Palade

²

¹

Centre for E-Mobility and Clean Growth, Coventry University, Coventry CV1 5FB, UK

²

Centre for Computational Science and Mathematical Modelling, Coventry University, Coventry CV1 5FB, UK

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(17), 3384; https://doi.org/10.3390/electronics14173384

Submission received: 16 July 2025 / Revised: 23 August 2025 / Accepted: 25 August 2025 / Published: 26 August 2025

(This article belongs to the Special Issue Explainability in AI and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

Artificial intelligence (AI) plays a pivotal role in Industry 4.0, with predictive maintenance (PdM) emerging as a core application for improving operational efficiency by reducing unplanned downtime and extending asset life. Despite these advancements, the black-box nature of AI models remains a significant barrier to adoption, as industry stakeholders require systems that are both transparent and trustworthy. This study presents a systematic literature review examining how human-in-the-loop explainable AI (HITL-XAI) approaches can enhance the effectiveness and adoption of AI systems in PdM contexts. This review followed the PRISMA methodology, employing predefined search strings across Scopus, ProQuest, and EBSCO databases. Sixty-three peer-reviewed journal articles, published between 2019 and early 2025, were included in the final analysis. The selected studies span various domains, including industrial manufacturing, energy, and transportation, with findings synthesized through both descriptive and thematic analyses. A key gap identified is the limited empirical exploration of generative AI (GenAI) in improving the usability, interpretability, and trustworthiness of HITL-XAI systems in PdM applications. This review outlines actionable insights for integrating explainability and GenAI into existing rule-based PdM systems to support more adaptive and reliable maintenance strategies. Ultimately, the findings underscore the importance of designing HITL-XAI systems that not only demonstrate high model performance but are also effectively aligned with operational workflows and the cognitive needs of maintenance personnel.

Keywords:

systematic review; predictive maintenance; explainable AI; human-in-the-loop; trust; decision-making; generative AI

1. Introduction

Predictive maintenance (PdM) has emerged as one of the most impactful industrial applications of artificial intelligence (AI), enabling organizations to reduce downtime, optimize operational efficiency, and extend asset life by anticipating equipment failures before they occur [1]. PdM was the leading industrial AI use case in 2019, accounting for 24% of deployments [2]. Despite its value, the adoption of AI-driven PdM remains limited in many sectors, hindered by high implementation costs, a shortage of skilled personnel, and insufficient organizational support [3]. A key technical challenge is the scarcity of labeled failure data, which limits the applicability of supervised machine learning techniques and necessitates more context-aware and unsupervised approaches [4]. Moreover, in dynamic industrial settings, not all detected anomalies signify actionable faults, requiring human expertise and intervention to validate the system’s outputs [5].

Such challenges are not confined to theoretical discourse but are also evident in practical applications. In a project conducted by an international engineering firm serving the manufacturing and utility sectors, a rule-based PdM system was deployed to monitor industrial assets using multivariate sensor data. Owing to the absence of reliable labeled fault data, the system relied on predefined thresholds (e.g., vibration exceeding 3.5 g for more than one minute, or temperature surpassing 49 °C in conjunction with high vibration) rather than on learned models. While this approach enabled timely alerts, all detected anomalies produced the same generic message (e.g., “High vibration—investigation required”), irrespective of severity, historical context, or potential root cause. As illustrated in Figure 1, the sensor dashboard displays temperature and vibration data alongside several triggered alerts on the time-series graph; however, it does not offer actionable insights to support maintenance decision-making. Figure 2 shows the event log interface, which records all triggered alerts within the manufacturing environment. Due to the uniformity of alert messages and severity levels, engineers frequently faced uncertainty regarding prioritization and appropriate response strategies. This often resulted in confusion, inaction, and ultimately diminished the effectiveness of the PdM system as a decision-support tool.

To address the limitations inherent in static-threshold methodologies, numerous organizations are transitioning towards more adaptive AI-driven predictive maintenance (PdM) systems that possess the capability to learn intricate temporal patterns within sensor data. Nevertheless, this transition introduces a novel challenge: the opaque, “black box” nature of many cutting-edge models. Operators are frequently required to rely on decisions generated by these systems without comprehending the underlying rationale [6]. Such a deficiency in transparency exacerbates the very issues that intelligent PdM aims to resolve—namely, ambiguous alerts and unverified predictions. In high-stakes industrial environments, where erroneous decisions may result in costly failures or safety hazards, the importance of trust, accountability, and interpretability cannot be overstated [7,8,9].

This growing reliance on opaque AI models has spurred the development of explainable artificial intelligence (XAI), driven by the recognition that accuracy alone is insufficient if users cannot interpret or act on a system’s outputs. XAI systems aim to produce outputs that are not only accurate but also understandable and actionable for end-users. Studies have shown that incorporating explainability into PdM systems significantly enhances user trust, adoption, and decision-making effectiveness [10]. Scholars increasingly argue that explanation should be treated not as a technical add-on but as a human-centered design problem, requiring consideration of users’ context, expertise, and cognitive demands [11,12]. This is especially relevant in PdM, where engineers must make time-critical decisions under uncertainty. The way explanations are presented can strongly influence how users engage with AI. For example, users develop mental models of an AI’s confidence score to decide when to trust its recommendations [13], highlighting that effective explanations can actively support human–AI decision-making. Importantly, the need for XAI is not universal but depends on context. In high-risk domains like PdM, interpretability is vital to ensure the safe and accountable use of AI [14]. Conversely, routine or low-stakes applications may not require the same level of transparency, and simpler models may suffice. As a result, many researchers advocate for a risk-based approach, focusing on explainability efforts where the consequences of AI decisions are most critical [15,16]. This targeted approach helps avoid unnecessary complexity in lower-risk scenarios, aligning with emerging regulatory frameworks such as the European Union Artificial Intelligence Act (EU AI Act) and the US Algorithmic Accountability Act, which both emphasize transparency in safety-critical domains [17].

However, merely providing explanations does not ensure user comprehension or trust. The inherently human-centric nature of explainability necessitates that explanations be customized to align with users’ backgrounds, objectives, and levels of expertise [18]. Furthermore, developers tend to craft techniques from a technical or developer-centric standpoint, often without incorporating input from end-users. This disconnect results in explanations that may be technically accurate but lack practical utility [11,19]. Such an orientation overlooks the fact that explanation is fundamentally a human-centered process, influenced by cognition, interaction, and sociotechnical context. Merely adding post hoc transparency to deployed models proves insufficient, particularly in critical decision-making scenarios [20]. Additionally, maintenance operators often demonstrate algorithm aversion when faced with opaque, excessively technical, or poorly aligned explanations [21]. This issue is especially significant in PdM, where decisions based on AI outputs have financial and safety repercussions. These challenges underscore the necessity for more interactive and user-inclusive approaches to explanation—approaches that recognize the human role not merely as passive recipients of AI outputs but as active participants in interpretation and decision-making.

To make AI more collaborative and context-sensitive, recent research has turned to Human-in-the-Loop (HITL) approaches in XAI [22]. HITL involves the active participation of human experts in the design, training, and validation of AI systems. By integrating human judgment with automated learning, HITL-XAI facilitates context-sensitive insights, personalized explanations, and iterative model refinement [19,23]. This direction aligns with the transition from Industry 4.0 to Industry 5.0, which emphasizes human-centric, resilient, and sustainable industrial practices [24]. In the PdM domain, this shift repositions operators from passive recipients of algorithmic output to active collaborators in decision-making [9]. HITL ensures that technology complements human expertise rather than replacing it by placing them at the core of AI systems [25].

Despite growing academic interest, practical implementation of HITL in PdM, especially regarding time-series sensor data and complex human–machine interactions, remains under-investigated. To synthesize evidence from the fragmented and interdisciplinary research on this topic, this study adopts a systematic literature review (SLR) methodology [26]. This review combines current knowledge and highlights critical gaps at the junction of HITL, temporal interpretability, and industrial decision support within PdM. Although PdM depends heavily on temporal data, little research has examined how HITL-XAI systems preserve interpretability in dynamic conditions or how expert input is structured to enable meaningful collaboration. Consequently, there is limited understanding of the human-centric outcomes and participatory mechanisms vital for effective HITL-XAI deployment in PdM. To address this, this review investigates the following research questions (RQs):

RQ1: How do human-in-the-loop XAI techniques for predictive maintenance maintain interpretability while handling temporal data?
RQ2: What human-centric metrics are used to evaluate XAI’s effectiveness on maintenance decision-making?
RQ3: How do XAI systems involve maintenance experts in their design and use, particularly in addressing challenges to sustaining effective human–AI collaboration?

Previous literature reviews have primarily addressed XAI in broader contexts, such as manufacturing or cyber–physical systems [27,28], and to the best of the authors’ knowledge, none have comprehensively examined how HITL-XAI systems address the specific demands of temporal data and human involvement in PdM contexts. Therefore, this review aims to advance both academic and practical understanding of how HITL approaches can enhance the effectiveness and acceptability of XAI in PdM systems.

The remainder of this paper is structured as follows. Section 2 describes the review methodology, including selection criteria, data extraction, and quality assessment. Section 3 presents the findings, including descriptive and thematic analyses. Section 4 synthesizes these results and discusses their implications. Finally, Section 5 concludes this study and provides directions for future research.

2. Methods

This study adopts a systematic literature review (SLR) guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework to rigorously analyze existing research on HITL-XAI in PdM. Compared with narrative or scoping reviews, an SLR provides a more structured and methodologically robust approach to synthesizing evidence in emerging and conceptually fragmented fields, thereby facilitating the identification of research gaps, thematic trends, and future directions [26]. PRISMA enhances methodological transparency, mitigates bias, and supports reproducibility, making it superior to traditional narrative reviews, which often lack systematic rigor [29]. The PRISMA 2020 checklist (Table A1) and PRISMA Abstract checklist (Table A2) are presented in Appendix A. To ensure a comprehensive and replicable search strategy, three multidisciplinary databases—Scopus, ProQuest, and EBSCO—were selected. These databases were chosen to capture the interdisciplinary nature of HITL-XAI in PdM. The use of multiple databases also increased the likelihood of retrieving research that addressed both the technical and human-centric dimensions of interactive PdM systems [30]. However, all 30 articles retrieved from EBSCO were duplicates of records in Scopus or ProQuest and were therefore excluded.

Table 1 outlines the inclusion criteria used to identify studies relevant to the research questions. The search strategy combined terms related to PdM and XAI using Boolean operators. Pilot testing showed that including explicit terms like “HITL” or niche theoretical phrases significantly reduced the number of results. To maintain breadth, broader terms such as “interactive AI” were used instead. As a result, two broad search strings were finalized. The search was limited to academic journal articles published in English between 2019 and 2025, as relevant research began to emerge meaningfully only after 2018. Earlier publications were minimal and, upon screening, found to be conceptually or methodologically unrelated to this review. Notably, the search and data extraction were conducted on 23 February 2025; therefore, studies published after this date, even within the specified range, were not included.

The sample articles were extracted into a comma-separated values file (CSV) for screening and eligibility assessment via a structured spreadsheet. Metadata and key findings were also recorded in this format to support the descriptive analysis. Figure 3 illustrates the number of articles included and excluded at each stage. The initial search retrieved 2810 records, with 400 duplicates and 745 non-journal publications (e.g., books, conference papers, editorials) removed. After screening based on the inclusion criteria, 66 articles were selected for a full-text review. Following the initial screening, 66 articles were selected for full-text review. A total of 63 studies were ultimately included in the final analysis, with 3 being excluded at this stage. Exclusion criteria were applied to articles that, despite their relevance to PdM or related domains, lacked a human-centric perspective or did not incorporate HITL mechanisms. The criterion of “relevance to PdM” was operationalized to include adjacent fields like quality control or condition monitoring only when a clear link to maintenance objectives or decision-making was demonstrated.

Table 2 presents the predetermined codebook, developed iteratively through pilot testing and prior research, which guided the analysis using NVivo 14. NVivo supports transparency and rigor by documenting coding and analysis for credible and reproducible results [31]. The themes were categorized into three groups: XAI techniques, evaluation metrics, and HITL integration. In the context of an SLR, descriptive analysis and thematic analysis are two key qualitative methods used to synthesize and interpret findings across multiple studies, each serving a distinct purpose. Descriptive analysis offers a structured summary of the included studies, focusing on characteristics such as study design, sample size, context, and main findings [32]. Thematic analysis facilitates the identification and interpretation of deeper patterns and themes within qualitative data and is particularly valued for its flexibility and systematic approach in qualitative synthesis [33]. The following chapter presents the results of this review, incorporating both the descriptive and thematic analyses.

3. Findings of the Literature Review

3.1. Descriptive Analysis

Figure 4 presents the annual distribution of published articles, including those analyzed in this study. A clear upward trend in HITL-XAI research within PdM is evident, with the number of publications increasing from 4 in 2021 to a peak of 21 in 2024. The observed decline in publications for 2025 reflects data collected early in the year (see Methodology) and thus does not reflect the full year’s expected output. To address this temporal limitation, a dotted exponential trendline (Excel default) was applied to estimate 2025 publication volumes, illustrating the general upward trajectory of HITL-XAI research in PdM rather than serving as a precise prediction. This trend reflects increasing academic interest in interpretable and human-centered predictive maintenance systems. The 63 selected articles appeared in 45 different journals, with Institute of Electrical and Electronics Engineers (IEEE) Access (n = 5) and Sensors (n = 4) being the most represented. This spread implies a growing interdisciplinary interest spanning engineering, computing, and applied AI. Empirical methodologies dominated the reviewed studies, comprising 93.65% of the sample. These studies typically involved real-world deployments, case studies, or user evaluations. Only four studies (6.35%) adopted analytical approaches such as theoretical model formulation or simulation-based evaluation.

Figure 5 illustrates the reviewed literature domains and use cases. It spans a broad range of application areas, including energy systems, transportation, AI/machine learning (ML) systems, and healthcare, with industrial settings being the most frequently represented. Among the use cases, fault diagnosis emerges as the most prominent. Notably, trust-building appears in six studies, suggesting that some researchers recognize the need to examine user trust as a distinct focus alongside more technical use cases. This spread of domains and use cases reflects diverse operational contexts and user needs, which, in turn, influences how explainability is implemented and evaluated. Consequently, approaches to temporal modeling, explanation techniques, and evaluation strategies vary considerably across studies and are further discussed in the Thematic Analysis (Section 3.2).

3.2. Thematic Analysis

This section presents the thematic findings of the SLR that address the technical challenges, human-centric evaluation, and collaborative design in HITL-XAI systems for PdM. Using a predefined codebook and inductive coding, the thematic analysis generated five cross-cutting themes: (1) Model Interpretability in Practice, (2) Evolutions and Limitations of XAI Methods, (3) Trust Dynamics and Human Reliance on XAI, (4) Collaborative Design and Human–AI Interaction, and (5) Factors Influencing the Efficacy of Explanations. Table 3 summarizes the five thematic categories identified during analysis, each accompanied by two key findings. These findings are discussed in detail in the following subsections.

The technical landscape of XAI implementation, along with its developments and limitations, is discussed in Section 3.2.1 and Section 3.2.2. The subsequent sections (Section 3.2.3 and Section 3.2.5) address RQ2 by synthesizing human-centric evaluation metrics and examining how various factors influence the usability of explanations. To address RQ3, Section 3.2.4 explores how different stakeholders shape XAI systems and how this, in turn, affects their adoption within the industry.

3.2.1. Model Interpretability in Practice

Interpretability methods ranged from graphical heatmaps [7,34,35,36] and plots [37,38,39,40] to feature importance visualizations [20,41,42,43], web interfaces [5,19,44,45], and dashboards [18,46,47]. Some studies introduced designs like assistants based on large language models (LLMs) [48], flower glyphs [20,38], and explanation levelling [49]. Heatmaps and feature importance visualizations remain popular for their simplicity and effectiveness, while textual explanations are gaining traction, especially in interactive systems. The variety of explanation modalities reflects the differing needs across domains: industrial users prefer visual formats with alerts and timelines, while healthcare and infrastructure sectors lean toward textual or rule-based explanations.

Additionally, platforms varied from purpose-built research software [50,51] to widely adopted visualization platforms [52,53,54,55], reflecting a shift toward both custom and interpretable systems. Explanations deployed via dashboards, digital twins, or semantic reasoners were only effective when the interface aligned with domain workflows.

Temporal adaptation methods are essential for handling time-series data, as they directly influence model transparency. For instance, temporal signals were enhanced using lag features and segmentation [56] and a Fast-Fourier Transform (FFT)-based approach was aligned with known fault frequencies in bearings [36], allowing explanations to match with familiar degradation patterns. This alignment between temporal structure and user expectations was especially evident in manufacturing, rotating machinery, and healthcare domains [4,57].

Many studies use windowing techniques (fixed, sliding, adaptive) alongside normalization, forward filling, or duplicate removal for consistency [4,40,58,59]. Additionally, frequency-domain transformations [36,60,61] and shapelet extraction [35] capture stable temporal patterns. These methods improve interpretability by aligning explanations with operator reasoning on fault patterns, crucial for transparent decision-making.

Furthermore, techniques such as semantic augmentation [46], lag feature construction [56], and signal transformation [62] demonstrate the sophistication of PdM pre-processing pipelines. These methods are tailored to domain needs and are highlighted as closely linked to modeling choices and application contexts.

Finding 1: Temporal adaptation strategies enhance transparency in PdM workflows.

SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIMEs) are among the most widely adopted XAI methods, often praised for their model-agnostic design, theoretical grounding, and ability to provide local feature importance explanations [20,38]. For example, a LIME was applied to improve model transparency and foster trust among engineers [25], highlighting its usefulness in decision-making tasks. Similarly, in the context of industrial anomaly detection and failure prediction, both methods have been described as effective in identifying key features [63].

However, despite their popularity, SHAP and LIMEs have faced increasing criticism for their limitations in dynamic or temporal contexts. Many general-purpose XAI methods have been criticized for failing to address complex, non-linear temporal dependencies [64]. Rather than simply identifying important variables, it is crucial to understand how variables from different time points influence model predictions. This critique was reinforced by observations that these methods are “often insufficient and continue to encounter challenges with generalization due to their inherent limitations” [45]. In addition, SHAP and LIMEs were further criticized for producing ad hoc explanations that lack domain relevance and fail to account for data uncertainties [65].

Several empirical studies underscore these limitations. The failure of LIMEs to effectively explain a gated recurrent unit-based prediction in water pump failure detection was reported [63], while SHAP’s difficulties in handling interdependent temporal features were highlighted separately [56]. Moreover, SHAP’s computational inefficiency with Long Short-Term Memory (LSTM) models when compared to tree-based methods has been noted [58]. In contrast, alternative approaches such as rule extraction [66] and counterfactual tables [67] were considered more actionable and easier to validate, particularly in smart factory settings.

Finding 2: Declining reliance on SHAP/LIMEs in temporal domains highlights the need for specialized XAI methods.

These findings indicate that in complex, temporal, or highly interconnected domains, the foundational assumptions of SHAP and LIMEs limit their effectiveness. Consequently, there is a growing shift toward specialized XAI techniques better aligned with time-series data and industrial decision-making. Studies have explicitly called for approaches that address temporal dependencies and promote domain-relevant interpretability to support this shift [56,64,66].

3.2.2. Evolutions and Limitations of XAI Methods

Many studies demonstrated that interpretable models such as Random Forest (RF), rule-based classifiers, fuzzy logic systems, and neuro-symbolic architectures can perform comparably, or even outperform, deep learning “black-box” models. For instance, an RF model was integrated to predict customer sentiment escalation, consistently outperforming Extreme Gradient Boosting (XGBoost) and LSTM [58], while neuro-symbolic nodes (NSNs) were employed for fault diagnosis and achieved 96% accuracy [61], both approaches being interpretable and transparent. Similarly, a Deep Expert Network (DEN) combining symbolic AI with neural networks was developed to enable engineers to audit decision routes, outperforming Residual Network (ResNet) and Weighted k-Nearest Neighbors (WKN) in terms of generalization and noise resilience [68]. In another study, fuzzy-rule-based classifiers were applied across five real-world binary classification datasets, with accuracy scores reaching up to 98.99% (with 0.84 standard deviation) and area under the curve (AUC) scores up to 0.9841 (with 0.01 standard deviation) for the Autism dataset. The reported standard deviations indicate the variation observed over 30 repetitions of the experiments performed on each dataset [57]. These models are favored for their robustness in imbalanced settings and resistance to overfitting.

Finding 3: Inherently interpretable models remain competitive across domains.

The selected studies spanning across five sectors highlight the diverse operational contexts of HITL-XAI. High-risk industries like aviation and nuclear engineering prioritized interpretable or hybrid deep learning models to support human oversight [5,8,55,64], whereas manufacturing sectors focused more on fault detection [69,70,71], predictive maintenance [38,51,62], and process optimization [25,50,53].

Additionally, other domains emphasized regulatory compliance (e.g., General Data Protection Regulation) [22] and user-friendly visualizations [57]. For example, in viscose fiber production, causal discovery algorithms aligned with process engineers’ workflows [56], while aerospace maintenance leveraged natural language interfaces for troubleshooting [55]. This domain specificity underscores the need for tailored XAI solutions that address unique operational risks and user expertise levels.

In addition, customizing explanations to domain needs increased model effectiveness and reduced false alarms. Gradient-weighted Class Activation Mapping (Grad-CAM) was integrated into a Convolutional Neural Network (CNN) for drill bit wear detection, achieving 91.8% accuracy while providing engineers with frequency-based attention maps [60]. In the context of reservoir management, Local Rule-Based Explanations (LUXs) and Anchor were applied to derive high-fidelity, concise rules, reducing water production by up to 69% and facilitating AI adoption in high-stakes environments [72].

Finding 4: Domain-specific XAI improves adaptation and usability but require further validation in industrial settings.

3.2.3. Trust Dynamics and Human Reliance on XAI

Several studies revealed the contradictory effects of explanations: over-reliance emerged when users accepted seemingly plausible outputs without critical evaluation [13,22], while under-reliance occurred when explanations were misaligned with users’ intuition or lacked contextual grounding [5,49,73]. Furthermore, global explanations often overwhelmed non-expert users [18], while local ones lacked generalizability. Additionally, users often mistrust overly simplistic models due to complexity-competence bias [21].

Trust in XAI systems proved to be context-dependent and fragile, influenced by explanation design and user expertise. Both insufficient and excessive transparency can erode agency, especially when AI logic and system physics are opaque (“double black box”), and hybrid intelligence has been highlighted to sustain adaptability [74]. They also argue that interpretability seems to be necessary primarily for expert users. Finally, visual explanations aligned with technician reasoning (e.g., FFT envelopes, moving averages) were found to enhance trust [36,75].

Finding 5: Explanation design can both build and erode trust.

It has been emphasized that XAI must cater to diverse users, ranging from frontline operators to technical experts [62]. The need for user studies to align XAI systems with stakeholder needs has been highlighted [51], where industrial operators prioritize actionable insights and data scientists require detailed technical outputs [20]. Experienced users often favor transparency and control [6], while field specialists, such as engineers, reject anthropomorphic interfaces in favor of abstract formats [18]. Variation in comprehension also appears among data scientists within the same company, despite agreement that the explanations were clear [19]. Moreover, some end-users prioritize model accuracy over technical transparency, trusting developers to manage complexity [18]. Critically, explanations must also address role-specific anxieties; for instance, decision-makers are likely to reject new systems unless the justifications clarify workforce impacts and resource trade-offs [9].

Finding 6: Domain expectations and role shape explanation usability.

3.2.4. Collaborative Design and Human–AI Interaction

Effective human–AI collaboration hinges on role-specific interfaces and iterative feedback mechanisms. Interactive systems that allow for annotation, overrides, or threshold adjustments enhance engagement by fostering users’ psychological ownership [76]. For instance, approval–rejection workflows were implemented in hydraulic systems [51], while field engineers were enabled to refine edge AI predictions via mobile apps, closing the loop between real-world context and model outputs [77]. Tools like Grafana dashboards [63] and Unity-based digital twins [55] further empower users to validate predictions and refine models in real time. However, sustaining user input remains a challenge; engagement was found to drop by 60% post-training unless systems demonstrated clear personal utility [76]. To address this, simplified annotation processes and transparent communication of uncertainty have been shown to help bridge the gap, particularly for non-expert users [37,44,78]. In line with this, XAI outputs were shown to help small and medium-sized enterprise (SME) managers calibrate deep learning insights with human judgment, ensuring decisions remained contextually and ethically appropriate [79]. Together, these findings highlight the essential role of continuous feedback in maintaining trust and human oversight.

Finding 7: Feedback loops improve both model calibration and user understanding.

Systems built with domain expert input, such as rule-based logic in steel production [52] or flexible threshold setting in utilities [5], showed higher usability and trust by aligning with workflow priorities. For example, SHAP-based visualizations were integrated into an iterative feature selection process to foster collaboration between data scientists and experts while maintaining human oversight for regulatory compliance [20]. Similarly, machining experts were able to adjust model thresholds and validate outputs without requiring ML expertise, ensuring alignment with physical process knowledge [42]. Co-design processes also shape explanation formats: experts often favor tabular displays over charts for usability [53], and frameworks based on Observe–Orient–Decide–Act (OODA) loop have been used to tailor explanations to specific roles (e.g., operators vs. engineers), improving fault detection in hydraulic systems [62]. However, overly technical methods like feature attribution risk alienate non-experts unless paired with intuitive interfaces [22]. Feedback-driven dashboards have also been shown to reduce false positives by 25% but highlight tensions between automation and expert control [47].

Finding 8: Domain-specific co-design increases system adoption.

Collectively, these findings stress the importance of adaptable platforms that balance transparency with workflow integration. Co-design efforts using a participatory approach for auto-response agents have demonstrated how user-aligned explanations and real-time feedback can boost user confidence, though challenges such as privacy concerns and engagement sustainability persist (e.g., an agent explaining unavailability with context-specific messages such as “she’s in a silent environment” or “the phone is in my pocket” as more data are gathered) [54]. These auto-responses incorporate contextual information about the user’s unavailability to improve the situational awareness of their contacts.

3.2.5. Factors Influencing the Efficacy of Explanations in Decision-Making

While most studies employed dual-evaluation strategies, namely, technical metrics (e.g., accuracy, F1-score, fault recall) and human-centric measures (e.g., Likert scales, task time, trust scores), explanations often failed to directly improve decision accuracy. In 70% of cases, no gains in decision accuracy were observed, though, instead, benefits such as reduced task time or improved trust calibration emerged [49,51,80]. Exceptions emerged in error-focused interfaces [13] and causal reasoning systems [81], where explanations helped users diagnose and correct model mistakes. Hybrid metrics such as the “pragmatism score” have helped bridge the gap by linking explanation clarity to actionable outcomes [65], underscoring the need for evaluation frameworks that harmonize algorithmic performance with human comprehension.

Finding 9: Explanation quality does not guarantee better outcomes.

Explanation design significantly affects usability. Simplified visualizations, such as glyph-based charts [77] or t-distributed Stochastic Neighbor Embedding (t-SNE) overlays [60,62], aided decision-making under cognitive load, while dense technical plots overwhelmed users in low-bandwidth environments [21]. Similarly, causal chains [56] and contrastive “what-if” tables [67] improved actionability by highlighting root causes and alternative scenarios, particularly in safety-critical domains like robotics and drilling.

Finding 10: Modality and structure of explanations significantly affect user comprehension.

4. Discussion

4.1. Answering Research Questions

To investigate the research questions guiding this review, the following discussion synthesizes key findings, linking them to theoretical and practical considerations in HITL-XAI for PdM. The subsections are organized thematically around interpretability in temporal data, human-centric evaluation metrics, collaborative system design, and trust dynamics, providing a holistic view of how XAI methods align with the unique demands of the PdM domain.

4.1.1. Interpretability in Temporal Data

The findings reveal critical insights into how XAI techniques preserve interpretability while handling temporal data in PdM. Finding 1 highlights that temporal adaptation strategies such as attention mechanisms in recurrent models or dynamic rule-based systems can enhance transparency by aligning explanations with the sequential nature of sensor data and equipment degradation patterns [36,46,52,60]. This demonstrates that tailoring interpretability mechanisms to the data structure can support both high model performance and explainability.

Finding 2 and Finding 3 further support this insight by presenting the declining relevance of post hoc tools like SHAP and LIMEs in PdM. Although useful for debugging, these methods often fail to capture temporal dependencies and domain-specific reasoning [45,64]. This suggests a need for specialized XAI approaches that align with PdM’s temporal nature, such as neural-symbolic integrations with updateable expert or dynamic explanation generations that evolve with every model update [56,59]. Nonetheless, some studies continue to support SHAP and LIMEs despite their limitations [8,20,38], indicating their potential utility in simpler use cases or as complementary tools. This demonstrates the need for further comparative research to define the appropriate boundaries for using post hoc versus embedded interpretability methods in PdM contexts.

Additionally, interpretable-by-design models, such as decision trees and hybrid neuro-symbolic systems, remain competitive and are increasingly preferred over black-box alternatives [55,65,68]. Contrary to the assumption that black-box models guarantee superior performance, recent evidence shows that domain-specific transparency can be achieved without compromising accuracy [70]. The trend toward interpretable architecture suggests that transparency in PdM should be embedded in the model itself rather than applied retrospectively.

However, Finding 4 cautions that even domain-specific XAI methods must be rigorously validated in real-world settings to ensure their practical effectiveness [44,53]. Simulated or controlled environments often overlook industrial challenges such as noisy data, time constraints, and high-stakes decision-making [13,49]. This finding directly addresses RQ1 by emphasizing that future XAI for PdM must move beyond algorithmic innovation toward industrial robustness.

4.1.2. Human-Centric Metrics for Evaluating XAI Effectiveness

Finding 9 and Finding 10 address RQ2 and reveal a critical gap between technical explanation quality and actual decision-making outcomes. While traditional XAI metrics such as explanation fidelity (e.g., feature importance accuracy) are commonly used, they do not necessarily correlate with improved human decisions [34,80]. As Finding 9 notes, high-fidelity explanations often fail to account for usability or cognitive burden.

Finding 10 further adds that the design of explanations, such as modality (textual vs. visual), structure (local vs. global), and context awareness, significantly affects user comprehension and trust [22,60]. Visualizing prioritized anomalies in time-series data allowed operators to detect over 85–95% of attacks by monitoring only the top 4–7% of sensors [35]. This drastically improves efficiency, especially in time-critical contexts.

Nonetheless, these performance metrics can be inflated in simulated environments [21]. As highlighted in Finding 4, trust levels observed in non-expert participants under experimental conditions appear to not reflect the skepticism and contextual judgment exercised by domain experts in real-world settings [51,74]. This discrepancy underlines the need for context-aware and role-sensitive evaluation frameworks [18].

Ultimately, this synthesis answers RQ2 by advocating for a multidimensional evaluation approach that includes task performance, cognitive load, and user satisfaction [53]. Metrics must be calibrated to reflect the diverse roles and cognitive demands of PdM stakeholders, from frontline operators to strategic managers.

4.1.3. Collaborative Design for Sustaining Human–AI Collaboration

Finding 7 and Finding 8 collectively answer RQ3 by identifying key enablers of sustained human–AI collaboration in XAI systems. Two critical pillars emerge: domain-specific co-design and iterative feedback loops. Finding 8 highlights the importance of involving maintenance practitioners early in the system development process. Case studies involving participatory design workshops with engineers report higher adoption rates, as explanations are grounded in the users’ terminology and real-world workflows [72].

Furthermore, Finding 7 complements this by showing how iterative feedback loops not only enhance model calibration but also foster user learning and trust. These loops enable the development of shared mental models between users and systems, which are essential for sustaining human–AI alignment over time [51,52]. However, these mechanisms also reveal several challenges. For instance, without tangible incentives, such as demonstrating how user feedback improves predictions, user participation tends to decline [76]. This suggests that fully manual HITL configurations are unlikely to be sustainable in the long term.

Another critical insight is that the effectiveness of HITL frameworks is highly contingent on domain expertise, as well as the quality of the model and training data [7,52]. Domain experts play a vital role in both interpreting system output and guiding model refinement. Yet, too much transparency, such as overloading users with technical details, can lead to conflicting interpretations between different stakeholders and eventually undermine both usability and knowledge transfer [22].

To address these challenges, several studies propose adaptive interface designs, including role-based interfaces [18,44] and explanations with adjustable depth [5]. These allow engineers to access detailed technical information while offering managers a higher-level strategic summary. Hybrid interaction strategies, such as combining automated alerts with optional deep-dive explanations, appear promising in resolving interpretability disputes [71].

Collectively, these insights imply that sustainable HITL-XAI systems should balance semi-automation with human oversight, domain expertise, and context-sensitive design. Periodic expert reviews and adaptive feedback mechanisms are essential to maintaining alignment between evolving system behavior and user expectations. Further research is needed to explore the long-term dynamics of HITL engagement, as current studies often lack longitudinal data on participation dynamics and system evolution.

4.1.4. Trust Dynamics in PdM

Although trust was not the explicit focus of any research question, it emerged as a cross-cutting theme throughout this review. Finding 5 reveals that explanation design significantly influences trust. For instance, non-experts demonstrate a tendency to over-trust AI outputs [80], while domain experts tend to distrust overly simplistic or overly confident predictions [21,56]. In high-stakes scenarios, operators preferred explanations that explicitly acknowledge uncertainty, which helped calibrate their reliance on the system rather than simply maximize it [44]. Trust calibration refers to aligning the user’s level of trust with the actual reliability and limitations of the system [80]. This is important because both under-trust and over-trust can be equally harmful. Effective explanations therefore support users in forming an appropriate degree of trust so that reliance is proportionate to the system’s demonstrated performance [8].

Finding 6 reinforces that trust is shaped by user roles and domain expectations in explanation usability. Experts often reject generic, one-size-fits-all explanations and instead prefer context-aware outputs [65]. In multi-sensor environments, for example, they value explanations that filter irrelevant information and emphasize actionable insights [22]. This role-specific preference becomes even more apparent in industrial settings, where different stakeholders prioritize different types of information [18]. Operators in time-sensitive roles often prefer concise, actionable local explanations [64], whereas managers require global insights to support long-term planning [9,45]. These distinctions highlight the importance of tailoring explanations to user roles, decision-making contexts, and cognitive demands to preserve trust and usability.

Based on the evidence, it is reasonable to infer that trust is dynamic and must be managed through context-aware and skepticism-preserving explanation strategies [5,76]. Effective XAI systems for PdM should have surface limitations, conflicting evidence, and potential data biases rather than aiming for false certainty [46,57]. Trust must be earned through transparency that evolves with user expertise and task demands [53]. Our findings echo the view that HITL-XAI systems must be tailored to stakeholder-specific explainability needs and usage contexts to avoid cognitive overload and misplaced trust [11].

4.2. Use Cases of Generative AI in HITL-XAI from the Literature

The integration of generative AI (GenAI) into PdM systems, particularly within HITL-XAI contexts, has recently emerged as a promising yet underexplored research direction. Among the reviewed studies, only a few explicitly incorporate GenAI components in their use cases [44,48,77]. These works highlight the potential of LLM and large vision–language models (LVLM) to enhance human-centered, explanation-driven decision-making in industrial environments.

One study proposed an explainable Edge AI (XEdgeAI), a modular framework for low-resource visual quality inspection that employs LVLMs to generate both local and global explanations [77]. Although their use case is not specifically designed for PdM, it addresses challenges common to decentralized PdM settings such as resource constraints, expert feedback integration, and non-stationary machine behaviors. By combining Grad-CAM with generative vision–language models, XEdgeAI offers accessible explanations through interactive interfaces, complemented by quantifiable interpretability metrics.

Another research team developed a cognitive assistance framework for power grid maintenance that integrates digital twins, sensor networks, and LLMs [48]. In this context, GenAI serves as a semantic translator, converting complex condition-monitoring data into actionable work orders, thereby addressing workforce shortages and operational overloads. Their emphasis on feature-level explainability and interactive decision support aligns closely with HITL-XAI principles by fostering human–AI collaboration in high-stakes industrial scenarios. Both studies exemplify the ongoing transition from static interpretability toward conversational, generative explanation modalities that support mutual learning between operators and AI systems.

Extending these efforts, a conversational decision support system was proposed for high-voltage energy infrastructure using a Retrieval-Augmented Generation (RAG) approach combining an LLM with a domain-specific knowledge graph [44]. Although formal XAI techniques are identified as future work, the current system incorporates key HITL-XAI elements, including interactive explanation, contextualized reasoning, and user feedback integration via a natural user interface. The framework supports real-time operator decision-making and emphasizes socio-technical considerations, including user trust and interface design. The study, therefore, represents a transitional yet relevant example of generative HITL-XAI for PdM and bridges current decision support needs with emerging explainability goals.

Despite recent advancements, these approaches remain limited in scope, often semi-supervised and constrained by issues such as data drift, trust calibration, and the lack of adaptive user modeling. Future research should address these gaps through domain-aware generation techniques, real-time feedback architectures, and robust human–AI interaction design to enable more effective deployment in unstructured, high-stakes environments.

4.3. Illustrative Case: Integrating GenAI and HITL-XAI into PdM

To bridge the theoretical insights of this review with practical application, this section introduces a forward-looking conceptual framework in the form of an illustrative case study. This proposal builds on this review’s findings to envision potential future directions, rather than representing a direct result of the reviewed literature. The case demonstrates how the principles discussed in the preceding sections could be operationalized by conceptually enhancing a company’s PdM system to incorporate interactive HITL, XAI, and GenAI capabilities. The framework aims to illustrate how these components can be meaningfully integrated into an operational environment to support informed and trustworthy decision-making.

The first enhancement focuses on the explainability layer. Traditional PdM dashboards often rely on basic visualizations and threshold-based alerts, which may lack sufficient context for users to determine when action is required or whether an alert is a false alarm. In the enhanced system, XAI components generate context-aware explanations by highlighting relevant sensor patterns, contributory features, and affected time windows. These visual explanations help bridge the gap between raw model outputs and human reasoning. Figure 6 illustrates an example of an augmented PdM dashboard, where the grey text box presents the current alert message and the green text box shows a context-enriched message generated by the XAI features. Alerts are now accompanied by more informative descriptions, helping engineers understand the rationale behind each triggered event.

The second component introduces HITL features that allow users to interact with the system. Engineers can validate, reject, or annotate flagged anomalies via the dashboard, creating a feedback loop that progressively refines the model’s detection logic. This process supports adaptive learning and reflects real-world practices in which expert feedback is essential for managing edge cases and evolving machine behavior. Explanation effectiveness is moderated by user expertise: senior engineers often prefer technical summaries for deeper engagement and model refinement, while novice users benefit from simplified, structured explanations. This highlights the need for role-based interaction privileges to safeguard model quality from potentially inaccurate feedback.

To enhance explanation accessibility and enable natural language interaction, a GenAI-powered assistant is embedded within the system. This assistant, built on an LLM, serves as a semantic bridge between users and the XAI layer. When an alert occurs, XAI generates a narrative explanation contextualized by historical patterns and machine-specific information. Users can then query the assistant in natural language to request clarifications, explore relevant history, or suggest refinements to the model. Figure 7 illustrates a prototype chatbot interface in which an engineer interacts with the GenAI assistant, questioning an alert or adding contextual knowledge, both of which can feed into ongoing model adaptation.

This case study demonstrates the practical feasibility of integrating XAI, HITL features, and GenAI into a unified PdM system. It illustrates how generative models can transform raw sensor data into contextualized, actionable insights, thereby significantly enhancing interpretability, adaptability, and user engagement. However, alongside these benefits, the use of GenAI introduces notable risks that require careful management. These include the generation of biased or factually inaccurate outputs, arising from either data drift (due to outdated or unrepresentative training data) [44] or hallucinations (plausible-sounding but incorrect content generated by the model) [48]. If not adequately mitigated, these risks can reduce user trust and compromise decision-making, particularly in safety-critical contexts.

Furthermore, challenges persist in ensuring the robustness of explanations and in seamlessly aligning these technologies with established human and organizational workflows. Consequently, future research should prioritize real-time deployments and in situ evaluations to rigorously assess system reliability and usability under practical conditions and across varying levels of user expertise.

4.4. Theoretical Contributions

This study contributes theoretically by identifying critical research gaps at the intersection of HITL-XAI, GenAI, and PdM. It highlights the underexplored relationship between GenAI and explanation quality, particularly in how these factors jointly influence decision-making effectiveness in industrial maintenance. While GenAI is increasingly proposed as a solution for enhancing explanation usability, its actual influence on interpretability, trust, and decision outcomes remains insufficiently studied in PdM settings. Further, the analysis reveals that most existing studies prioritize technical performance metrics and neglect how explanations are interpreted and acted upon by different user groups [9,51]. Evaluation frameworks often rely on simulated or survey-based studies that do not fully capture the high-stakes cognitive and contextual complexities of PdM decisions, where errors can have costly consequences [18].

A notable theoretical gap is the lack of longitudinal research investigating user engagement in HITL systems over time. Most studies assess one-time feedback without accounting for how user involvement evolves or diminishes, nor do they explore semi-automated annotation mechanisms to reduce user fatigue and maintain annotation quality [76]. Addressing this gap would deepen the understanding of sustainable human–AI collaboration in PdM systems. Finally, this study reveals a disconnect between positive user feedback reported in controlled studies and actual adoption of XAI systems in operational environments. This suggests the need for research that includes additional stakeholders, such as maintenance planners or decision-makers, to better evaluate the organizational factors influencing XAI system adoption.

4.5. Practical Implications

From a practical perspective, this review offers several key insights for the design, evaluation, and deployment of HITL-XAI and GenAI systems in PdM. First, it is imperative to shift the focus of evaluation from purely technical performance metrics toward usability-centered criteria that are aligned with operator needs and real-world decision-making. This study encourages developers and system integrators to embed human-centered design principles that prioritize explanation clarity, trust, and the delivery of actionable insight within existing operational workflows. Second, for future implementation, researchers should consider adopting participatory co-design methods during system development. This includes integrating modular interfaces that can accommodate diverse user roles and prioritize deployments in settings with access to real-time sensor data and established operational feedback loops. Establishing long-term, scoped field trials is essential for evaluating system performance over time under realistic constraints.

Finally, this review cautions against solely relying on controlled or simulated environments for evaluating XAI tools intended for high-risk industrial settings. Practitioners are urged to validate system performance in live or high-fidelity environments that accurately reflect operational constraints and consequences. This shift is critical for building user trust and ensuring that XAI tools provide meaningful support in real-world conditions. Additionally, it is vital to engage multiple user groups, beyond direct system users like engineers, to include supervisors, maintenance planners, and other decision-makers whose perspectives are crucial for system adoption and sustained use.

5. Conclusions

This review synthesized 63 peer-reviewed studies to explore the role of HITL-XAI for PdM. The findings addressed three key research questions. First, while temporal modeling and interpretable-by-design strategies have enhanced transparency, post hoc methods remain underdeveloped and lack industrial validation. Second, effective decision-making hinges not only on the quality of explanations but also on human factors such as usability, cognitive load, and contextual relevance. This highlights the need for multidimensional and role-sensitive evaluation frameworks. Finally, feedback loops and co-design approaches show promise for fostering user engagement, though the long-term sustainability of such interactions remains underexplored.

Overall, this review highlights a critical gap between the technical maturity of predictive models and the underdeveloped integration of human factors in XAI systems. Current evaluation methods often focus narrowly on model performance, neglecting crucial aspects of user experience, contextual alignment, and the impact on real-world decision-making outcomes. Furthermore, despite the frequent citation of human feedback as a design input, there is a notable absence of studies exploring the maintenance and adaptation of these feedback loops over time. By synthesizing current advances in HITL-XAI and the emerging role of GenAI for PdM, this review highlights both the potential and the current limitations of these technologies, especially in environments with real-time, unlabeled streaming data.

Despite its methodological rigor, this review has certain limitations. The search strategy excluded gray literature, which may have constrained insights into practical innovations not yet captured in academic discourse. Additionally, the intentional omission of the term “HITL” in the search process broadened recall but may have overlooked niche studies explicitly using the term. Future research should empirically assess HITL mechanisms and the effectiveness of generative explanations, as well as explore how semi-automated HITL approaches can lessen annotation workloads while maintaining user engagement.

As PdM systems are increasingly used in sustainable technologies—such as electric vehicle fleets, battery storage systems, and renewable energy infrastructure [82]—future work should examine how HITL-XAI designs can be customized to meet the specific operational and lifecycle needs of these areas. A key emerging factor is the rising use of advanced manufacturing processes, especially additive manufacturing (AM), in these sectors. Manufacturing data from AM parts, which directly affect their performance and failure modes, provide valuable input for creating more precise PdM models. Combining such manufacturing insights with digital twin frameworks could further support end-to-end lifecycle monitoring, connecting design, production, and maintenance within a single system [83,84]. Understanding how explanation quality, feedback integration, and model transparency function in these contexts will be crucial for aligning HITL-XAI with larger clean growth goals. This research path not only improves the technical strength of PdM systems but also guarantees that human–AI collaboration effectively promotes sustainable and resilient maintenance.

Author Contributions

Conceptualization, N.R.A. and B.T.; methodology, N.R.A.; validation, B.T. and V.P.; formal analysis, N.R.A.; investigation, N.R.A.; resources, B.T.; data curation, N.R.A.; writing—original draft preparation, N.R.A.; writing—review and editing, N.R.A., B.T., and V.P.; visualization, N.R.A.; supervision, B.T. and V.P.; funding acquisition, B.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	artificial intelligence
AM	additive manufacturing
AUC	area under the curve
CNN	convolutional neural network
CSV	comma-separated values
DEN	deep expert network
EU AI Act	European Union Artificial Intelligence Act
FFT	fast-Fourier transform
GenAI	generative artificial intelligence
Grad-CAM	gradient-weighted class activation mapping
HITL	human-in-the-loop
IEEE	Institute of Electrical and Electronics Engineers
LIMEs	local interpretable model-agnostic explanations
LLM	large language models
LSTM	long short-term memory
LUXs	local rule-based explanations
LVLM	large vision-language model
ML	machine learning
NSN	neuro-symbolic nodes
OODA	observe–orient–decide–act
PdM	predictive maintenance
PRISMA	preferred reporting items for systematic reviews and meta-analyses
RAG	retrieval-augmented generation
ResNet	residual network
RF	random forest
RQs	research questions
SHAP	SHapley Additive exPlanations
SLR	systematic literature review
SME	small and medium-sized enterprise
t-SNE	t-distributed stochastic neighbor embedding
US	United States
WKN	weighted k-nearest neighbors
XAI	explainable artificial intelligence
XEdgeAI	explainable edge AI
XGBoost	extreme gradient boosting

Appendix A

Table A1. PRISMA 2020 checklist.

Section and Topic	Item Number	Checklist Item	Location Where the Item Is Reported
TITLE
Title	1	Identify the report as a systematic review.	Title
ABSTRACT
Abstract	2	See the PRISMA 2020 for Abstracts checklist.
INTRODUCTION
Rationale	3	Describe the rationale for the review in the context of existing knowledge.	1
Objectives	4	Provide an explicit statement of the objective(s) or question(s) the review addresses.	1
METHODS
Eligibility criteria	5	Specify the inclusion and exclusion criteria for the review and how studies were grouped for the syntheses.	2
Information sources	6	Specify all databases, registers, websites, organisations, reference lists and other sources searched or consulted to identify studies. Specify the date when each source was last searched or consulted.	2
Search strategy	7	Present the full search strategies for all databases, registers and websites, including any filters and limits used.	2
Selection process	8	Specify the methods used to decide whether a study met the inclusion criteria of the review, including how many reviewers screened each record and each report retrieved, whether they worked independently, and if applicable, details of automation tools used in the process.	2
Data collection process	9	Specify the methods used to collect data from reports, including how many reviewers collected data from each report, whether they worked independently, any processes for obtaining or confirming data from study investigators, and if applicable, details of automation tools used in the process.	2
Data items	10a	List and define all outcomes for which data were sought. Specify whether all results that were compatible with each outcome domain in each study were sought (e.g., for all measures, time points, analyses), and if not, the methods used to decide which results to collect.	3
Data items	10b	List and define all other variables for which data were sought (e.g., participant and intervention characteristics, funding sources). Describe any assumptions made about any missing or unclear information.	3
Study risk of bias assessment	11	Specify the methods used to assess risk of bias in the included studies, including details of the tool(s) used, how many reviewers assessed each study and whether they worked independently, and if applicable, details of automation tools used in the process.	2
Effect measures	12	Specify for each outcome the effect measure(s) (e.g., risk ratio, mean difference) used in the synthesis or presentation of results.	N/A
Synthesis methods	13a	Describe the processes used to decide which studies were eligible for each synthesis (e.g., tabulating the study intervention characteristics and comparing against the planned groups for each synthesis (item #5)).	2
	13b	Describe any methods required to prepare the data for presentation or synthesis, such as handling of missing summary statistics, or data conversions.	2
	13c	Describe any methods used to tabulate or visually display results of individual studies and syntheses.	2
	13d	Describe any methods used to synthesize results and provide a rationale for the choice(s). If meta-analysis was performed, describe the model(s), method(s) to identify the presence and extent of statistical heterogeneity, and software package(s) used.	2
	13e	Describe any methods used to explore possible causes of heterogeneity among study results (e.g., subgroup analysis, meta-regression).	2
	13f	Describe any sensitivity analyses conducted to assess robustness of the synthesized results.	2
Reporting bias assessment	14	Describe any methods used to assess risk of bias due to missing results in a synthesis (arising from reporting biases).	N/A
Certainty assessment	15	Describe any methods used to assess certainty (or confidence) in the body of evidence for an outcome.	N/A
RESULTS
Study selection	16a	Describe the results of the search and selection process, from the number of records identified in the search to the number of studies included in the review, ideally using a flow diagram.	2
Study selection	16b	Cite studies that might appear to meet the inclusion criteria, but which were excluded, and explain why they were excluded.	N/A
Study characteristics	17	Cite each included study and present its characteristics.	N/A
Risk of bias in studies	18	Present assessments of risk of bias for each included study.	N/A
Results of individual studies	19	For all outcomes, present, for each study: (a) summary statistics for each group (where appropriate) and (b) an effect estimate and its precision (e.g., confidence/credible interval), ideally using structured tables or plots.	3
Results of syntheses	20a	For each synthesis, briefly summarise the characteristics and risk of bias among contributing studies.	3
	20b	Present results of all statistical syntheses conducted. If meta-analysis was done, present for each the summary estimate and its precision (e.g., confidence/credible interval) and measures of statistical heterogeneity. If comparing groups, describe the direction of the effect.	3
	20c	Present results of all investigations of possible causes of heterogeneity among study results.	3
	20d	Present results of all sensitivity analyses conducted to assess the robustness of the synthesized results.	3
Reporting biases	21	Present assessments of risk of bias due to missing results (arising from reporting biases) for each synthesis assessed.	N/A
Certainty of evidence	22	Present assessments of certainty (or confidence) in the body of evidence for each outcome assessed.	3
DISCUSSION
Discussion	23a	Provide a general interpretation of the results in the context of other evidence.	4
	23b	Discuss any limitations of the evidence included in the review.	5
	23c	Discuss any limitations of the review processes used.	5
	23d	Discuss implications of the results for practice, policy, and future research.	5
OTHER INFORMATION
Registration and protocol	24a	Provide registration information for the review, including register name and registration number, or state that the review was not registered.	N/A
	24b	Indicate where the review protocol can be accessed, or state that a protocol was not prepared.	N/A
	24c	Describe and explain any amendments to information provided at registration or in the protocol.	N/A
Support	25	Describe sources of financial or non-financial support for the review, and the role of the funders or sponsors in the review.	5
Competing interests	26	Declare any competing interests of review authors.	5
Availability of data, code and other materials	27	Report which of the following are publicly available and where they can be found: template data collection forms; data extracted from included studies; data used for all analyses; analytic code; any other materials used in the review.	5

Table A2. PRISMA abstract checklist.

Section and Topic	Item Number	Checklist Item	Reported (Yes/No)
TITLE
Title	1	Identify the report as a systematic review.	Yes
BACKGROUND
Objectives	2	Provide an explicit statement of the main objective(s) or question(s) the review addresses.	Yes
METHODS
Eligibility criteria	3	Specify the inclusion and exclusion criteria for the review.	Yes
Information sources	4	Specify the information sources (e.g., databases, registers) used to identify studies and the date when each was last searched.	Yes
Risk of bias	5	Specify the methods used to assess risk of bias in the included studies.	Yes
Synthesis of results	6	Specify the methods used to present and synthesise results.	Yes
RESULTS
Included studies	7	Give the total number of included studies and participants and summarise relevant characteristics of studies.	Yes
Synthesis of results	8	Present results for main outcomes, preferably indicating the number of included studies and participants for each. If meta-analysis was done, report the summary estimate and confidence/credible interval. If comparing groups, indicate the direction of the effect (i.e., which group is favoured).	Yes
DISCUSSION
Limitations of evidence	9	Provide a brief summary of the limitations of the evidence included in the review (e.g., study risk of bias, inconsistency and imprecision).	Yes
Interpretation	10	Provide a general interpretation of the results and important implications.	Yes
OTHER
Funding	11	Specify the primary source of funding for the review.	No
Registration	12	Provide the register name and registration number.	No

References

Cinar, Z.M.; Abdussalam Nuhu, A.; Zeeshan, Q.; Korhan, O.; Asmael, M.; Safaei, B. Machine Learning in Predictive Maintenance towards Sustainable Smart Manufacturing in Industry 4.0. Sustainability 2020, 12, 8211. [Google Scholar] [CrossRef]
Rykov, M. The Top 10 Industrial AI Use Cases. IoT Analytics. Available online: https://iot-analytics.com/the-top-10-industrial-ai-use-cases/ (accessed on 9 August 2025).
Shang, G.; Low, S.P.; Lim, X.Y.V. Prospects, drivers of and barriers to artificial intelligence adoption in project management. Built Environ. Proj. Asset Manag. 2023, 13, 629–645. [Google Scholar] [CrossRef]
Hermansa, M.; Kozielski, M.; Michalak, M.; Szczyrba, K.; Wróbel, Ł.; Sikora, M. Sensor-Based Predictive Maintenance with Reduction of False Alarms—A Case Study in Heavy Industry. Sensors 2021, 22, 226. [Google Scholar] [CrossRef]
Liu, D.; Alnegheimish, S.; Zytek, A.; Veeramachaneni, K. MTV: Visual Analytics for Detecting, Investigating, and Annotating Anomalies in Multivariate Time Series. Proc. ACM Hum. Comput. Interact. 2022, 6, 1–30. [Google Scholar] [CrossRef]
Garouani, M.; Ahmad, A.; Bouneffa, M.; Hamlich, M.; Bourguin, G.; Lewandowski, A. Towards big industrial data mining through explainable automated machine learning. Int. J. Adv. Manuf. Technol. 2022, 120, 1169–1188. [Google Scholar] [CrossRef]
Martinović, B.; Bijanić, M.; Danilović, D.; Petrović, A.; Delibasić, B. Unveiling Deep Learning Insights: A Specialized Analysis of Sucker Rod Pump Dynamographs, Emphasizing Visualizations and Human Insight. Mathematics 2023, 11, 4782. [Google Scholar] [CrossRef]
Najar, M.; Wang, H. Establishing operator trust in machine learning for enhanced reliability and safety in nuclear Power Plants. Prog. Nucl. Energy 2024, 173, 105280. [Google Scholar] [CrossRef]
van Oudenhoven, B.; Van de Calseyde, P.; Basten, R.; Demerouti, E. Predictive maintenance for industry 5.0: Behavioural inquiries from a work system perspective. Int. J. Prod. Res. 2022, 61, 7846–7865. [Google Scholar] [CrossRef]
Ingemarsdotter, E.; Kambanou, M.L.; Jamsin, E.; Sakao, T.; Balkenende, R. Challenges and solutions in condition-based maintenance implementation—A multiple case study. J. Clean. Prod. 2021, 296, 126420. [Google Scholar] [CrossRef]
Liao, Q.V.; Varshney, K.R. Human-Centered Explainable AI (XAI): From Algorithms to User Experiences. April 2022. Available online: http://arxiv.org/abs/2110.10790 (accessed on 5 May 2025).
Moosavi, S.; Razavi-Far, R.; Palade, V.; Saif, M. Explainable Artificial Intelligence Approach for Diagnosing Faults in an Induction Furnace. Electronics 2024, 13, 1721. [Google Scholar] [CrossRef]
Bansal, G.; Wu, T.; Zhou, J.; Fok, R.; Nushi, B.; Kamar, E.; Ribeiro, M.T.; Weld, D. Does the whole exceed its parts? The effect of ai explanations on complementary team performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Virtual, 8–13 May 2021; pp. 1–16. [Google Scholar] [CrossRef]
Marques-Silva, J.; Ignatiev, A. Delivering Trustworthy AI through Formal XAI. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; AAAI Press: Palo Alto, CA, USA, 2022; Volume 36, pp. 12342–12350. [Google Scholar] [CrossRef]
Ali, S.; Abuhmed, T.; El-Sappagh, S.; Muhammad, K.; Alonso-Moral, J.M.; Confalonieri, R.; Guidotti, R.; Del Ser, J.; Díaz-Rodríguez, N.; Herrera, F. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Inf. Fusion 2023, 99, 101805. [Google Scholar] [CrossRef]
Sadeghi, Z.; Alizadehsani, R.; Cifci, M.A.; Kausar, S.; Rehman, R.; Mahanta, P.; Bora, P.K.; Almasri, A.; Alkhawaldeh, R.S.; Hussain, S.; et al. A review of Explainable Artificial Intelligence in healthcare. Comput. Electr. Eng. 2024, 118, 109370. [Google Scholar] [CrossRef]
Sovrano, F.; Sapienza, S.; Palmirani, M.; Vitali, F. Metrics, Explainability and the European AI Act Proposal. J 2022, 5, 126–138. [Google Scholar] [CrossRef]
Herm, L.-V.; Steinbach, T.; Wanner, J.; Janiesch, C. A nascent design theory for explainable intelligent systems. Electron. Mark. 2022, 32, 2185–2205. [Google Scholar] [CrossRef]
Kotsiopoulos, T.; Papakostas, G.; Vafeiadis, T.; Dimitriadis, V.; Nizamis, A.; Bolzoni, A.; Bellinati, D.; Ioannidis, D.; Votis, K.; Tzovaras, D.; et al. Revolutionizing defect recognition in hard metal industry through AI explainability, human-in-the-loop approaches and cognitive mechanisms. Expert Syst. Appl. 2024, 255, 124839. [Google Scholar] [CrossRef]
Zacharias, J.; von Zahn, M.; Chen, J.; Hinz, O. Designing a feature selection method based on explainable artificial intelligence. Electron. Mark. 2022, 32, 2159–2184. [Google Scholar] [CrossRef]
Wanner, J.; Herm, L.-V.; Heinrich, K.; Janiesch, C. A social evaluation of the perceived goodness of explainability in machine learning. J. Bus. Anal. 2022, 5, 29–50. [Google Scholar] [CrossRef]
He, G.; Balayn, A.; Buijsman, S.; Yang, J.; Gadiraju, U. Opening the Analogical Portal to Explainability: Can Analogies Help Laypeople in AI-assisted Decision Making? J. Artif. Intell. Res. 2024, 81, 117–162. [Google Scholar] [CrossRef]
Ehsan, U.; Wintersberger, P.; Liao, Q.V.; Watkins, E.A.; Manger, C.; Iii, H.D.; Riener, A.; Riedl, M.O. Human-Centered Explainable AI (HCXAI): Beyond Opening the Black-Box of AI. In Proceedings of the CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April–5 May 2022; pp. 1–7. [Google Scholar]
Breque, M.; De Nul, L.; Petridis, A. Industry 5.0—Towards a Sustainable, Human-Centric and Resilient European Industry; Publications Office of the European Union: Luxembourg, 2021. [Google Scholar] [CrossRef]
Pan, Y.; Stark, R. An interpretable machine learning approach for engineering change management decision support in automotive industry. Comput. Ind. 2022, 138, 103633. [Google Scholar] [CrossRef]
Sauer, P.C.; Seuring, S. How to conduct systematic literature reviews in management research: A guide in 6 steps and 14 decisions. Rev. Manag. Sci. 2023, 17, 1899–1933. [Google Scholar] [CrossRef]
Angelov, P.P.; Soares, E.A.; Jiang, R.; Arnold, N.I.; Atkinson, P.M. Explainable artificial intelligence: An analytical review. WIREs Data Min. Knowl. Discov. 2021, 11, e1424. [Google Scholar] [CrossRef]
Moosavi, S.; Farajzadeh-Zanjani, M.; Razavi-Far, R.; Palade, V.; Saif, M. Explainable AI in Manufacturing and Industrial Cyber–Physical Systems: A Survey. Electronics 2024, 13, 3497. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]
Cheng, X.; Chaw, J.K.; Goh, K.M.; Ting, T.T.; Sahrani, S.; Ahmad, M.N.; Kadir, R.A.; Ang, M.C. Systematic Literature Review on Visual Analytics of Predictive Maintenance in the Manufacturing Industry. Sensors 2022, 22, 6321. [Google Scholar] [CrossRef]
Dalkin, S.; Forster, N.; Hodgson, P.; Lhussier, M.; Carr, S.M. Using computer assisted qualitative data analysis software (CAQDAS.; NVivo) to assist in the complex process of realist theory generation, refinement and testing. Int. J. Soc. Res. Methodol. 2021, 24, 123–134. [Google Scholar] [CrossRef]
Vaismoradi, M.; Turunen, H.; Bondas, T. Content analysis and thematic analysis: Implications for conducting a qualitative descriptive study. Nurs. Health Sci. 2013, 15, 398–405. [Google Scholar] [CrossRef]
Braun, V.; Clarke, V. Using thematic analysis in psychology. Qual. Res. Psychol. 2006, 3, 77–101. [Google Scholar] [CrossRef]
Brito, L.C.; Susto, G.A.; Brito, J.N.; Duarte, M.A.V. Fault Diagnosis using eXplainable AI: A transfer learning-based approach for rotating machinery exploiting augmented synthetic data. Expert Syst. Appl. 2023, 232, 120860. [Google Scholar] [CrossRef]
Lim, S.; Kim, J.; Lee, T. Shapelet-Based Sensor Fault Detection and Human-Centered Explanations in Industrial Control System. IEEE Access 2023, 11, 138033–138051. [Google Scholar] [CrossRef]
Lundström, A.; O’nIls, M.; Qureshi, F.Z. Contextual Knowledge-Informed Deep Domain Generalization for Bearing Fault Diagnosis. IEEE Access 2024, 12, 196842–196854. [Google Scholar] [CrossRef]
Lughofer, E. Evolving multi-user fuzzy classifier systems integrating human uncertainty and expert knowledge. Inf. Sci. 2022, 596, 30–52. [Google Scholar] [CrossRef]
Spangler, R.M.; Raeisinezhad, M.; Cole, D.G. Explainable, Deep Reinforcement Learning–Based Decision Making for Operations and Maintenance. Nucl. Technol. 2024, 210, 2331–2345. [Google Scholar] [CrossRef]
Zabaryłło, M.; Barszcz, T. Proposal of Multidimensional Data Driven Decomposition Method for Fault Identification of Large Turbomachinery. Energies 2022, 15, 3651. [Google Scholar] [CrossRef]
Zeng, C.; Zhao, G.; Xie, J.; Huang, J.; Wang, Y. An explainable artificial intelligence approach for mud pumping prediction in railway track based on GIS information and in-service train monitoring data. Constr. Build. Mater. 2023, 401, 132716. [Google Scholar] [CrossRef]
Krupp, L.; Wiede, C.; Friedhoff, J.; Grabmaier, A. Explainable Remaining Tool Life Prediction for Individualized Production Using Automated Machine Learning. Sensors 2023, 23, 8523. [Google Scholar] [CrossRef]
Orošnjak, M.; Beker, I.; Brkljač, N.; Vrhovac, V. Predictors of Successful Maintenance Practices in Companies Using Fluid Power Systems: A Model-Agnostic Interpretation. Appl. Sci. 2024, 14, 5921. [Google Scholar] [CrossRef]
Usuga-Cadavid, J.P.; Lamouri, S.; Grabot, B.; Fortin, A. Using deep learning to value free-form text data for predictive maintenance. Int. J. Prod. Res. 2022, 60, 4548–4575. [Google Scholar] [CrossRef]
Ieva, S.; Loconte, D.; Loseto, G.; Ruta, M.; Scioscia, F.; Marche, D.; Notarnicola, M. A Retrieval-Augmented Generation Approach for Data-Driven Energy Infrastructure Digital Twins. Smart Cities 2024, 7, 3095–3120. [Google Scholar] [CrossRef]
Rajaoarisoa, L.; Randrianandraina, R.; Nalepa, G.J.; Gama, J. Decision-making systems improvement based on explainable artificial intelligence approaches for predictive maintenance. Eng. Appl. Artif. Intell. 2025, 139, 109601. [Google Scholar] [CrossRef]
Steenwinckel, B.; De Paepe, D.; Hautte, S.V.; Heyvaert, P.; Bentefrit, M.; Moens, P.; Dimou, A.; Bossche, B.V.D.; De Turck, F.; Van Hoecke, S.; et al. FLAGS: A methodology for adaptive anomaly detection and root cause analysis on sensor data streams by fusing expert knowledge with machine learning. Futur. Gener. Comput. Syst. 2021, 116, 30–48. [Google Scholar] [CrossRef]
Wanner, J.; Herm, L.-V.; Heinrich, K.; Janiesch, C. The effect of transparency and trust on intelligent system acceptance: Evidence from a user-based study. Electron. Mark. 2022, 32, 2079–2102. [Google Scholar] [CrossRef]
Gitzel, R.; Hoffmann, M.W.; Heiden, P.Z.; Skolik, A.; Kaltenpoth, S.; Müller, O.; Kanak, C.; Kandiah, K.; Stroh, M.-F.; Boos, W.; et al. Toward Cognitive Assistance and Prognosis Systems in Power Distribution Grids—Open Issues, Suitable Technologies, and Implementation Concepts. IEEE Access 2024, 12, 107927–107943. [Google Scholar] [CrossRef]
Simard, S.R.; Gamache, M.; Doyon-Poulin, P. Development and Usability Evaluation of VulcanH, a CMMS Prototype for Preventive and Predictive Maintenance of Mobile Mining Equipment. Mining 2024, 4, 326–351. [Google Scholar] [CrossRef]
Agostinho, C.; Dikopoulou, Z.; Lavasa, E.; Perakis, K.; Pitsios, S.; Branco, R.; Reji, S.; Hetterich, J.; Biliri, E.; Lampathaki, F.; et al. Explainability as the key ingredient for AI adoption in Industry 5.0 settings. Front. Artif. Intell. 2023, 6, 1264372. [Google Scholar] [CrossRef]
Gentile, D.; Donmez, B.; Jamieson, G.A. Human performance consequences of normative and contrastive explanations: An experiment in machine learning for reliability maintenance. Artif. Intell. 2023, 321, 103945. [Google Scholar] [CrossRef]
Beden, S.; Lakshmanan, K.; Giannetti, C.; Beckmann, A. Steelmaking Predictive Analytics Based on Random Forest and Semantic Reasoning. Appl. Sci. 2023, 13, 12778. [Google Scholar] [CrossRef]
Galanti, R.; de Leoni, M.; Monaro, M.; Navarin, N.; Marazzi, A.; Di Stasi, B.; Maldera, S. An explainable decision support system for predictive process analytics. Eng. Appl. Artif. Intell. 2023, 120, 105904. [Google Scholar] [CrossRef]
Jain, P.; Farzan, R.; Lee, A.J. Co-Designing with Users the Explanations for a Proactive Auto-Response Messaging Agent. Proc. ACM Hum. Comput. Interact. 2023, 7, 1–23. [Google Scholar] [CrossRef]
Siyaev, A.; Valiev, D.; Jo, G.-S. Interaction with Industrial Digital Twin Using Neuro-Symbolic Reasoning. Sensors 2023, 23, 1729. [Google Scholar] [CrossRef] [PubMed]
Choudhary, A.; Vuković, M.; Mutlu, B.; Haslgrübler, M.; Kern, R. Interpretability of Causal Discovery in Tracking Deterioration in a Highly Dynamic Process. Sensors 2024, 24, 3728. [Google Scholar] [CrossRef] [PubMed]
Souza, P.V.d.C.; Lughofer, E. EFNC-Exp: An evolving fuzzy neural classifier integrating expert rules and uncertainty. Fuzzy Sets Syst. 2023, 466, 108438. [Google Scholar] [CrossRef]
Nguyen, A.; Foerstel, S.; Kittler, T.; Kurzyukov, A.; Schwinn, L.; Zanca, D.; Hipp, T.; Da Jun, S.; Schrapp, M.; Rothgang, E.; et al. System Design for a Data-Driven and Explainable Customer Sentiment Monitor Using IoT and Enterprise Data. IEEE Access 2021, 9, 117140–117152. [Google Scholar] [CrossRef]
Nadim, K.; Ragab, A.; Ouali, M.-S. Data-driven dynamic causality analysis of industrial systems using interpretable machine learning and process mining. J. Intell. Manuf. 2023, 34, 57–83. [Google Scholar] [CrossRef]
Senjoba, L.; Ikeda, H.; Toriya, H.; Adachi, T.; Kawamura, Y. Enhancing Interpretability in Drill Bit Wear Analysis through Explainable Artificial Intelligence: A Grad-CAM Approach. Appl. Sci. 2024, 14, 3621. [Google Scholar] [CrossRef]
Li, Q.; Qin, L.; Xu, H.; Lin, Q.; Qin, Z.; Chu, F. Transparent information fusion network: An explainable network for multi-source bearing fault diagnosis via self-organized neural-symbolic nodes. Adv. Eng. Informatics 2025, 65, 103156. [Google Scholar] [CrossRef]
Tran, T.-A.; Ruppert, T.; Abonyi, J. The Use of eXplainable Artificial Intelligence and Machine Learning Operation Principles to Support the Continuous Development of Machine Learning-Based Solutions in Fault Detection and Identification. Computers 2024, 13, 252. [Google Scholar] [CrossRef]
Dintén, R.; Zorrilla, M. Design, Building and Deployment of Smart Applications for Anomaly Detection and Failure Prediction in Industrial Use Cases. Information 2024, 15, 557. [Google Scholar] [CrossRef]
Bhakte, A.; Chakane, M.; Srinivasan, R. Alarm-based explanations of process monitoring results from deep neural networks. Comput. Chem. Eng. 2023, 179, 108442. [Google Scholar] [CrossRef]
Kabir, S.; Hossain, M.S.; Andersson, K. An Advanced Explainable Belief Rule-Based Framework to Predict the Energy Consumption of Buildings. Energies 2024, 17, 1797. [Google Scholar] [CrossRef]
Zou, D.; Zhu, Y.; Xu, S.; Li, Z.; Jin, H.; Ye, H. Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching. ACM Trans. Softw. Eng. Methodol. 2021, 30, 1–31. [Google Scholar] [CrossRef]
Kisten, M.; Ezugwu, A.E.-S.; Olusanya, M.O. Explainable Artificial Intelligence Model for Predictive Maintenance in Smart Agricultural Facilities. IEEE Access 2024, 12, 24348–24367. [Google Scholar] [CrossRef]
Li, Q.; Liu, Y.; Sun, S.; Qin, Z.; Chu, F. Deep expert network: A unified method toward knowledge-informed fault diagnosis via fully interpretable neuro-symbolic AI. J. Manuf. Syst. 2024, 77, 652–661. [Google Scholar] [CrossRef]
Martakis, P.; Movsessian, A.; Reuland, Y.; Pai, S.G.S.; Quqa, S.; Cava, D.G.; Tcherniak, D.; Chatzi, E. A semi-supervised interpretable machine learning framework for sensor fault detection. Smart Struct. Syst. 2022, 29, 251–266. [Google Scholar] [CrossRef]
Wang, Z.; Zhou, Z.; Xu, W.; Sun, C.; Yan, R. Physics informed neural networks for fault severity identification of axial piston pumps. J. Manuf. Syst. 2023, 71, 421–437. [Google Scholar] [CrossRef]
Chew, M.Y.L.; Yan, K.; Shao, H. Enhancing Interpretability of Data-Driven Fault Detection and Diagnosis Methodology with Maintainability Rules in Smart Building Management. J. Sensors 2022, 2022, 5975816. [Google Scholar] [CrossRef]
Kuk, E.; Bobek, S.; Nalepa, G.J. Explainable proactive control of industrial processes. J. Comput. Sci. 2024, 81, 102329. [Google Scholar] [CrossRef]
Sobrie, L.; Verschelde, M.; Roets, B. Explainable real-time predictive analytics on employee workload in digital railway control rooms. Eur. J. Oper. Res. 2024, 317, 437–448. [Google Scholar] [CrossRef]
Wahlström, M.; Tammentie, B.; Salonen, T.-T.; Karvonen, A. AI and the transformation of industrial work: Hybrid intelligence vs double-black box effect. Appl. Ergon. 2024, 118, 104271. [Google Scholar] [CrossRef] [PubMed]
Dang, J.-F.; Chen, T.-L.; Huang, H.-Y. The human-centric framework integrating knowledge distillation architecture with fine-tuning mechanism for equipment health monitoring. Adv. Eng. Informatics 2025, 65, 103167. [Google Scholar] [CrossRef]
Gómez-Carmona, O.; Casado-Mansilla, D.; López-De-Ipiña, D.; García-Zubia, J. Human-in-the-loop machine learning: Reconceptualizing the role of the user in interactive approaches. Internet Things 2024, 25, 101048. [Google Scholar] [CrossRef]
Nguyen, H.T.T.; Nguyen, L.P.T.; Cao, H. XEdgeAI: A human-centered industrial inspection framework with data-centric Explainable Edge AI approach. Inf. Fusion 2025, 116, 102782. [Google Scholar] [CrossRef]
Lughofer, E. Evolving multi-label fuzzy classifier with advanced robustness respecting human uncertainty. Knowl. Based Syst. 2022, 255, 109717. [Google Scholar] [CrossRef]
Abrokwah-Larbi, K. The role of IoT and XAI convergence in the prediction, explanation, and decision of customer perceived value (CPV) in SMEs: A theoretical framework and research proposition perspective. Discov. Internet Things 2025, 5, 4. [Google Scholar] [CrossRef]
Fok, R.; Weld, D.S. In search of verifiability: Explanations rarely enable complementary performance in AI-advised decision making. AI Mag. 2024, 45, 317–332. [Google Scholar] [CrossRef]
Diehl, M.; Ramirez-Amaro, K. A causal-based approach to explain, predict and prevent failures in robotic tasks. Robot. Auton. Syst. 2023, 162, 104376. [Google Scholar] [CrossRef]
Shin, W.; Han, J.; Rhee, W. AI-assistance for predictive maintenance of renewable energy systems. Energy 2021, 221, 119775. [Google Scholar] [CrossRef]
Nasiri, S.; Khosravani, M.R. Machine learning in predicting mechanical behavior of additively manufactured parts. J. Mater. Res. Technol. 2021, 14, 1137–1153. [Google Scholar] [CrossRef]
Ucar, A.; Karakose, M.; Kırımça, N. Artificial Intelligence for Predictive Maintenance Applications: Key Components, Trustworthiness, and Future Trends. Appl. Sci. 2024, 14, 898. [Google Scholar] [CrossRef]

Figure 1. The rule-based PdM system showing the sensor dashboard with triggered alerts.

Figure 2. The rule-based PdM system displaying the event log interface with outlined boxes showing the identical alert messages and severity levels.

Figure 3. Systematic literature review process based on the PRISMA flow diagram [29].

Figure 4. Annual distribution of retrieved articles and included articles for the SLR. The “Estimated Publication Trend” line is based on an exponential fit to the observed annual data and serves as a visual illustration of expected growth for 2025. This estimate assumes continued growth consistent with previous years and does not represent a formal statistical forecast.

Figure 5. Clustered bar chart of selected articles categorized by: (a) application domains; (b) use cases.

Figure 6. Visual interface of the proposed enhanced PdM system, showing the sensor dashboard used for real-time monitoring and context-aware alert explanations generated by the XAI feature.

Figure 7. Illustration of the proposed enhanced PdM system, highlighting (a) the context-aware XAI component, (b) the interactive HITL feedback feature, and (c) the GenAI-based assistant.

Table 1. Eligibility criteria for the literature identification and screening process.

Study Selection Parameters	Title and Abstract Screening	Full-Text Assessment
Databases: Scopus, ProQuest, EBSCO Studies contain the following search strings in the title, abstract, and keywords: (“predictive maintenance” OR “PdM” OR “condition-based maintenance” OR “condition monitoring” OR “failure prediction”) AND (“explainable AI” OR “XAI” OR “explainable artificial intelligence” OR “AI-assisted decision making” OR “AI-advised decision making”) Document type: academic journal article Publication year: 2019–2025 Language: English	Addresses the research questions Indicates use of explainable or interpretable models Involve human-in-the-loop (HITL) or interactive AI systems It is not a secondary research article	Full-text available for review Present a human-centric or HITL component Relevant to predictive maintenance or related applied contexts, unless directly linked to HITL-XAI systems

Table 2. Predetermined codebook for thematic analysis.

Group	Themes	Code
XAI techniques	XAI methodologies Temporal data adaptation Interpretability design	Model architecture XAI techniques Dataset type Domain application Proposed XAI method Adaptation strategy Explanation platform Design type
Evaluation	Technical performance Human-centric impact	Technical metric Computational efficiency Benchmark comparison Evaluation type User satisfaction Decision accuracy Empirical evidence or outcome
HITL integration	Collaboration dynamic Interaction design Challenges and Findings	HITL methods User roles Task allocation Workforce skepticism Real-time collaboration features HITL case studies HITL challenges

Table 3. Summary of five themes and associated findings from the thematic analysis.

Theme	Finding
Model Interpretability in Practice	1. Temporal adaptation strategies improve transparency in PdM workflows 2. The declining use of SHAP and LIMEs in temporal contexts underscores the need for specialized time-aware XAI methods
Evolutions and Limitations of XAI Methods	3. Inherently interpretable models remain competitive across domains 4. Domain-specific XAI techniques enhance usability but require further validation in industrial environments
Trust Dynamics and Human Reliance on XAI	5. Explanation design can both strengthen and undermine user trust. 6. Trust and usability are influenced by domain norms and user roles
Collaborative Design and Human–AI Interaction	7. Feedback loops enhance both model calibration and user understanding 8. Domain-specific co-design approaches increase system adoption and relevance
Factors Influencing the Efficacy of Explanations in Decision-Making	9. High-quality explanations do not always lead to better decisions 10. The modality and structure of explanations significantly affect user comprehension

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Amaliah, N.R.; Tjahjono, B.; Palade, V. Human-in-the-Loop XAI for Predictive Maintenance: A Systematic Review of Interactive Systems and Their Effectiveness in Maintenance Decision-Making. Electronics 2025, 14, 3384. https://doi.org/10.3390/electronics14173384

AMA Style

Amaliah NR, Tjahjono B, Palade V. Human-in-the-Loop XAI for Predictive Maintenance: A Systematic Review of Interactive Systems and Their Effectiveness in Maintenance Decision-Making. Electronics. 2025; 14(17):3384. https://doi.org/10.3390/electronics14173384

Chicago/Turabian Style

Amaliah, Nuuraan Risqi, Benny Tjahjono, and Vasile Palade. 2025. "Human-in-the-Loop XAI for Predictive Maintenance: A Systematic Review of Interactive Systems and Their Effectiveness in Maintenance Decision-Making" Electronics 14, no. 17: 3384. https://doi.org/10.3390/electronics14173384

APA Style

Amaliah, N. R., Tjahjono, B., & Palade, V. (2025). Human-in-the-Loop XAI for Predictive Maintenance: A Systematic Review of Interactive Systems and Their Effectiveness in Maintenance Decision-Making. Electronics, 14(17), 3384. https://doi.org/10.3390/electronics14173384

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Human-in-the-Loop XAI for Predictive Maintenance: A Systematic Review of Interactive Systems and Their Effectiveness in Maintenance Decision-Making

Abstract

1. Introduction

2. Methods

3. Findings of the Literature Review

3.1. Descriptive Analysis

3.2. Thematic Analysis

3.2.1. Model Interpretability in Practice

3.2.2. Evolutions and Limitations of XAI Methods

3.2.3. Trust Dynamics and Human Reliance on XAI

3.2.4. Collaborative Design and Human–AI Interaction

3.2.5. Factors Influencing the Efficacy of Explanations in Decision-Making

4. Discussion

4.1. Answering Research Questions

4.1.1. Interpretability in Temporal Data

4.1.2. Human-Centric Metrics for Evaluating XAI Effectiveness

4.1.3. Collaborative Design for Sustaining Human–AI Collaboration

4.1.4. Trust Dynamics in PdM

4.2. Use Cases of Generative AI in HITL-XAI from the Literature

4.3. Illustrative Case: Integrating GenAI and HITL-XAI into PdM

4.4. Theoretical Contributions

4.5. Practical Implications

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI