Data Interpretation in Structural Health Monitoring: Toward a Universal Language
Abstract
1. Introduction: Toward a Standardized Data Language in SHM and Beyond
- To address the critical challenge of effective data communication in SHM, framing it as a problem of linguistic coherence.
- To propose a perspective on data analysis, treating it as a language with key linguistic elements—syntax, semantics, and pragmatics.
- To outline a standardized protocol for data interpretation, reducing ambiguities, including those from inconsistent labeling, and improving the reliability of structural diagnostics.
2. Contributions and Novelty
- A linguistic framework for SHM: Drawing parallels with natural language, this work formalizes SHM data processing in terms of syntax, semantics, and pragmatics, providing a structured approach to reduce diagnostic ambiguity.
- A standardized lexicon for structural conditions: A universal classification protocol is proposed to improve the interoperability of AI models across different monitoring systems.
- A conceptual approach to category standardization: The study highlights the impact of subjective labeling and proposes methods to align classification criteria across disciplines, ensuring consistency in diagnostics.
3. A New Perspective: Data as Language in SHM and Beyond
4. Parallelism Between Natural Language and Data Language
4.1. Syntax: Rules That Shape Monitoring
4.2. Grammatical Structures and SHM Syntax
- -
- Declarative syntax (data collection protocols): In SHM, data collection follows a structured set of protocols, much like declarative sentences in language. For example, the placement of sensors in a UAV’s fuselage at stress-critical points ensures that the system can “speak” clearly about its health. The data are gathered in a pre-determined manner, similar to how declarative syntax structures factual statements [1].
- -
- Interrogative syntax (diagnostic testing): When testing a structure, engineers often simulate specific scenarios, much like asking a question in natural language. The system responds with data, and the engineers must “decode” the response to assess the structure’s condition. This process mirrors interrogative syntax in language, where a question seeks information [1,6].
- -
4.3. Vocabulary, Lexicon, and Neologisms
- -
- Building SHM vocabulary: Just as vocabulary in natural language evolves to accommodate new concepts, SHM expands its “vocabulary” with new variables and sensor technologies. The richer the vocabulary of data (more variables and sensors), the more comprehensive the understanding of the structure’s health [1,5].
- -
- Neologisms in SHM: Just as language evolves by introducing new words (neologisms), SHM introduces new variables or sensor types to capture emerging phenomena. For example, the introduction of micro-deformation sensors can be seen as adding new “words” to the SHM lexicon, allowing for more nuanced monitoring and analysis [1,8]. However, while variables and sensors are essential elements, the most significant source of interpretive divergence often lies in the categorization of data clusters. The arbitrary labeling of clusters can lead to non-equivalent diagnostic potentials, especially when similar structural conditions are classified differently. Standardizing these categories, similar to defining precise terms in a lexicon, is vital for reducing inconsistencies in diagnostics.
4.4. Cultural Dependency and Bias in Syntax
- -
- Cultural influence in SHM: Different fields within engineering may prioritize different aspects of structural monitoring, just as different cultures emphasize different linguistic structures. For example, while aerospace engineers might focus on stress and fatigue in UAV structures, civil engineers monitoring bridges may prioritize load-bearing capacity and material deformation [7,9].
5. Methodology
5.1. Principal Component Analysis (PCA)
- Relevance: PCA reduces the dimensionality of datasets, retaining only the most critical variables. It mirrors the semantic process in language, where core meanings are distilled from complex sentences.
- Support: By showcasing PCA’s ability to highlight essential features, the paper demonstrates how a “grammar” of data can be created, reducing ambiguity in SHM.
- Example: Mujica et al.’s work on PCA-based damage indicators illustrates the role of dimensionality reduction in isolating significant patterns for consistent interpretation [10].
5.2. Statistical Process Control (SPC)
- Relevance: SPC techniques, such as control charts, are essential for identifying outliers and understanding process variability. These methods introduce standardized protocols for monitoring changes in structural health.
- Support: The use of SPC emphasizes the need for universal criteria to distinguish normal variations from critical anomalies, mirroring linguistic syntax.
- Example: Ruiz and Mujica’s application of SPC in monitoring multi-sensor systems highlights the value of consistent thresholds in diagnostics [3].
5.3. Machine Learning Techniques
- Relevance: Algorithms like K-Nearest Neighbors (KNNs) and Support Vector Machines (SVMs) classify structural states by identifying patterns in the data. They demonstrate how “meaning” can be assigned to numerical inputs.
- Support: Machine learning exemplifies the integration of semantics into SHM by providing standardized interpretations of complex datasets, showing the need for a shared “vocabulary” in data analysis.
- Example: Ruiz’s use of supervised learning models in aeronautical SHM illustrates how classification algorithms can standardize diagnostics [1].
5.4. Context-Aware Algorithms
- Relevance: Pragmatics in language depend on context; similarly, SHM requires models that incorporate environmental conditions, such as load or temperature, into their analyses.
- Support: Highlighting context-aware algorithms strengthens the argument for a shared “pragmatics” in SHM, ensuring diagnostics account for external variables.
- Example: Mujica’s studies on integrating environmental factors into PCA-based diagnostics illustrate the importance of context in reliable assessments [10].
5.5. Case-Based Reasoning (CBR)
- Relevance: CBR uses historical cases in SHM to interpret new structural health data, similar to how prior experiences guide decision-making in language interpretation.
- Support: The inclusion of CBR emphasizes the potential for developing “contextual databases” in SHM that enable shared learning across different structures and monitoring systems.
- Example: Mujica et al.’s application of CBR in structural health monitoring demonstrates how leveraging historical data can enhance fault detection and diagnostic accuracy [11].
5.6. Multi-Sensor Data Fusion
- Relevance: Combining data from multiple sensors ensures a comprehensive view of structural health, similar to integrating multiple linguistic cues for better communication.
- Support: Multi-sensor fusion illustrates the necessity for a shared syntax and semantics to interpret diverse data streams consistently.
- Example: Ruiz’s work on guided wave analysis for UAV structures demonstrates the importance of coordinated sensor networks in SHM [1].
6. Case Studies
6.1. SHM for UAV Fuselage
6.2. Torsional Wave Detection
6.3. Aeronautical Structure Monitoring
6.4. UAV Wing Monitoring
6.5. Thermal Hot Spot Detection
7. The Future of SHM: Toward a Universal Lexicon and Standardized Language of Data
7.1. Creating a Common Syntax: Regulating Data Collection
- Sensor placement: A unified framework for sensor placement across various structures would ensure that data from different projects can be meaningfully compared [15].
- Variable selection: Establishing standard criteria for selecting variables like stress, strain, and temperature will reduce biases and ensure that only relevant data are collected and analyzed [17].
- Labeling of data clusters: Developing consistent methods for labeling data clusters—based on categories with diagnostic significance—minimizes ambiguity and improves the interpretability of SHM results. This ensures the non-equivalent diagnostic potential for similar structural conditions is preserved [10].
7.2. Semantics of Data Interpretation: Standardizing Meaning
- Universal models for dimensionality reduction: Shared models like PCA can distill key features from data, improving the consistency of interpretation [16].
- Benchmarking machine learning algorithms: Algorithms like K-Nearest Neighbors (KNN) should be evaluated against standard datasets to minimize biases and enhance reliability [15].
7.3. Pragmatics and Contextual Influence: Controlling for Environmental Factors
7.4. Proposed Universal Lexicon for SHM
- Damage identification: Detecting and characterizing changes in a structure that may affect its performance [15].
- Statistical pattern recognition: Utilizing statistical methods to identify patterns in data for damage detection [16].
- Structural degradation monitoring: Observing and assessing the deterioration of structural components over time [17].
- Machine learning in SHM: Algorithms that enable systems to learn from data and improve damage detection accuracy [16].
- Sensor technologies: Devices and methods used to collect data on structural integrity [15].
- Data preprocessing: Techniques for preparing raw data for analysis, including cleaning and normalization [17].
- Damage prognosis: Predicting the future condition and remaining useful life of a structure [15].
- Operational and environmental variability: Factors that influence structural performance and monitoring data [16].
- Feature extraction: Identifying relevant information from data that indicates structural health [17].
- Anomaly detection: Identifying deviations from normal behavior that may indicate damage [15].
- Category labeling: Assigning diagnostic categories to structural states or conditions based on the observed data. These labels often reflect subjective thresholds or domain-specific interpretations, leading to a non-equivalent diagnostic potential. Establishing criteria for consistent and context-aware labeling is critical for ensuring reliable diagnostics across similar structural conditions.
7.5. Benchmarking and Standardization
7.6. The Emergence of a Data Lexicon: Defining Neologisms in SHM
- A shared SHM lexicon: A glossary of standardized terms for describing defects, measurements, and algorithms will unify the SHM community [15].
- New categories for data clusters: As advanced sensors are developed, defining consistent categories for clustering similar structural states will help reduce the arbitrariness in diagnostic labeling. These categories will provide a more reliable framework for interpreting data across different conditions [16].
8. Scalability and Automation of AI in SHM
8.1. Current Practices in Data Analysis by AI in SHM
- Data collection: Sensors installed on structures collect raw data, including vibration signals, strain measurements, and environmental variables. These datasets often lack standardization, resulting in inconsistent formats and representations.
- Preprocessing: Domain-specific preprocessing pipelines clean, normalize, and organize the data. However, these steps introduce variability, as preprocessing approaches differ significantly among practitioners.
- Feature extraction: Relevant features, such as principal components or modal parameters, are extracted to enable machine learning (ML) models to detect anomalies or classify structural states.
- Modeling: AI models, including supervised and unsupervised learning algorithms, are trained to identify patterns in the data and predict damage or degradation.
- Interpretation and decision making: The results produced by AI models are interpreted by experts, who make decisions based on the insights provided. This step remains semi-automated, often requiring substantial human input due to ambiguities in data interpretation.
8.2. Accelerating AI in SHM Through Standardization
8.2.1. Improving Scalability
- Interoperability across systems: A universal lexicon ensures that data collected from different SHM systems can be seamlessly integrated, allowing AI models trained on one dataset to generalize effectively to new datasets without extensive retraining.
- Reusable preprocessing pipelines: Standardized preprocessing methods eliminate variability, enabling automated and consistent data preparation for various applications.
- Unified training datasets: Benchmark datasets aligned with the universal lexicon provide high-quality inputs for training AI models, improving accuracy and robustness across deployments.
- Consistency in category labeling: Standardized approaches to labeling data clusters address the arbitrariness inherent in category assignments. These assignments often vary based on subjective thresholds or context-specific criteria, introducing inconsistencies that reduce diagnostic reliability. By defining universal criteria for labeling, this effort ensures more consistent interpretations across applications.
8.2.2. Enhancing Automation
- Automated feature selection: Standardized data representations allow AI systems to automatically identify relevant variables, minimizing reliance on manual feature engineering.
- Real-time monitoring: Consistent data inputs enable real-time AI analysis, allowing SHM systems to detect anomalies and trigger alerts with minimal delay.
- Context-aware AI: Including environmental and operational variables in a standardized format allows AI systems to adapt to varying conditions, enhancing reliability.
- Reduction of diagnostic ambiguity: Addressing the arbitrariness in cluster labeling ensures that automated systems categorize structural states with greater precision, reducing potential errors stemming from subjective or non-standard labeling practices.
8.3. Generalization and Transfer Learning
8.4. Strengthening Human–AI Collaboration
- Transparent models: A universal lexicon ensures that AI outputs are interpretable, enabling experts to validate model decisions with greater confidence.
- Decision support systems: Automated analyses integrated into decision-making dashboards reduce cognitive load for engineers while providing clear, actionable insights.
8.5. Integrating Data Science, AI, and ML for SHM Scalability
- Data science: Automated preprocessing pipelines and unified datasets streamline data preparation and exploratory analysis.
- AI: Scalable, context-aware AI models generalize across diverse applications, reducing development time and improving efficiency.
- ML: Enhanced feature extraction and transfer learning methods allow ML algorithms to perform more effectively on standardized inputs, improving prediction and classification accuracy.
9. Conclusions
- Improved diagnostic accuracy: By minimizing subjective biases in data collection, cluster labeling, and interpretation, SHM systems can provide more accurate and reliable diagnoses.
- Enhanced collaboration: A common language facilitates interdisciplinary and international cooperation, enabling researchers and practitioners to share knowledge more effectively.
- Scalability and automation: Standardized terminology and methodologies are crucial for integrating SHM into automated systems, including AI and machine learning models.
- Educational impact: A universal lexicon provides a foundation for training future SHM professionals, ensuring consistency and clarity in the field.
- Developing and validating the proposed lexicon through interdisciplinary collaboration and case studies.
- Establishing international working groups to refine and formalize SHM standards.
- Leveraging AI and machine learning to operationalize the lexicon in real-time monitoring systems, with particular emphasis on reducing the arbitrariness in labeling data clusters.
- Promoting the lexicon through education and training initiatives to ensure widespread adoption.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Camacho, J.; Ruiz, M.; Mujica, L.E.; Villamizar, R. Implementation of a piezo-diagnostics approach for damage detection based on PCA in a Linux-based embedded platform. Sensors 2018, 18, 3730. [Google Scholar] [CrossRef] [PubMed]
- Camacho, J.; Ruiz, M.; Mujica, L.E. PCA based stress monitoring of cylindrical specimens using PZTs and guided waves. Sensors 2017, 17, 2788. [Google Scholar] [CrossRef] [PubMed]
- Mujica, L.E.; Vehí, J.; Ruiz, M.; Verleysen, M.; Staszewski, W.; Worden, K. Multivariate statistics process control for dimensionality reduction in structural assessment. Mech. Syst. Signal Process. 2008, 22, 155–171. [Google Scholar] [CrossRef]
- Echeverría, R. Ontología del Lenguaje; Joaquín Mortiz: Mexico City, Mexico, 1994. [Google Scholar]
- Salvatierra Cáceres, J.C. UAV Wing Monitoring for Stress and Vibration Detection. Master’s Thesis, Universitat Politècnica de Catalunya, Barcelona, Spain, 2018. [Google Scholar]
- Corrales Mascaró, J. Torsional Wave Detection in SHM Using PCA. Master’s Thesis, Universitat Politècnica de Catalunya, Barcelona, Spain, 2019. [Google Scholar]
- Mascaró Janer, D. Aeronautical Structure Monitoring with UAV Sensors. Master’s Thesis, Universitat Politècnica de Catalunya, Barcelona, Spain, 2020. [Google Scholar]
- Watzlawick, P. The Invented Reality: How Do We Know What We Believe We Know? W. W. Norton & Company: New York, NY, USA, 1984. [Google Scholar]
- Leal, Y.; Ruiz, M.; Lorencio, C.; Bondia, J.; Mujica, L.E.; Vehí, J. Principal component analysis in combination with case-based reasoning for detecting therapeutically correct and incorrect measurements in continuous glucose monitoring systems. Biomed. Signal Process. Control. 2013, 8, 603–614. [Google Scholar] [CrossRef]
- Mujica, L.E.; Rodellar, J.; Fernández, A.; Güemes, A. Q-statistic and T2-statistic PCA-based measures for damage assessment in structures. Struct. Health Monit. 2011, 10, 539–553. [Google Scholar] [CrossRef]
- Mujica, L.E.; Rodellar, J.; Ferreiro, I.; Ruiz, M.L. A case-based reasoning approach for fault detection in aerospace structures. In Proceedings of the 6th International Workshop on Structural Health Monitoring, Stanford, CA, USA, 11–13 September 2007; Stanford University: Stanford, CA, USA, 2008; pp. 1–8. [Google Scholar]
- Vargas Gómez, M. SHM for UAV Fuselage Using PCA and Machine Learning. Master’s Thesis, Universitat Politècnica de Catalunya, Barcelona, Spain, 2018. [Google Scholar]
- Mohammad Jabeen, H.F. Monitorización e Identificación de Puntos Calientes téRmicos en Estructuras Utilizando Transductores PiezoeléCtricos. Master’s Thesis, Universitat Politècnica de Catalunya, Barcelona, Spain, 2022. [Google Scholar]
- Mujica, L.E.; Ruiz, M.; Pozo, F.; Rodellar, J.; Güemes, A. A structural damage detection indicator based on principal component analysis and statistical hypothesis testing. Smart Mater. Struct. 2013, 23, 025014. [Google Scholar] [CrossRef]
- Farrar, C.R.; Worden, K. Structural Health Monitoring: A Machine Learning Perspective; Wiley: Hoboken, NJ, USA, 2007. [Google Scholar]
- Worden, K.; Farrar, C.R. An Overview of Intelligent Fault Detection in Systems and Structures. Struct. Health Monit. 2007, 6, 3–35. [Google Scholar] [CrossRef]
- Staszewski, W.J.; Boller, C.; Tomlinson, G.R. Advanced Signal Processing Techniques for Vibration-Based Health Monitoring. Smart Mater. Struct. 2004, 13, 261–269. [Google Scholar]
Aspect | Current Problem | Proposed Solution (Data Language) |
---|---|---|
Bias in Machine Learning | Arbitrary data categorization introduces distortions in models and reduces their generalization ability. | Define a clear semantic framework for data, eliminating subjective interpretations. |
Interoperability | Lack of standards in data organization prevents efficient model transfer across disciplines. | Structure data with shared syntactic rules to improve compatibility. |
Automation in Data Science | AI cannot operate effectively with ambiguous or inconsistently structured information. | Create a unified syntax, semantics, and pragmatics framework that enables algorithms to interpret data as a language. |
Replicability in Studies | Subjectivity in variable selection and labeling makes comparisons between research studies difficult. | Apply a standardized data language to ensure consistent interpretation across studies. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ruiz, M.; Gualdrón, Ó.; Peral Mondaza, J.A.; Mujica Delgado, L.E. Data Interpretation in Structural Health Monitoring: Toward a Universal Language. Sensors 2025, 25, 3054. https://doi.org/10.3390/s25103054
Ruiz M, Gualdrón Ó, Peral Mondaza JA, Mujica Delgado LE. Data Interpretation in Structural Health Monitoring: Toward a Universal Language. Sensors. 2025; 25(10):3054. https://doi.org/10.3390/s25103054
Chicago/Turabian StyleRuiz, Magda, Óscar Gualdrón, José A. Peral Mondaza, and Luis Eduardo Mujica Delgado. 2025. "Data Interpretation in Structural Health Monitoring: Toward a Universal Language" Sensors 25, no. 10: 3054. https://doi.org/10.3390/s25103054
APA StyleRuiz, M., Gualdrón, Ó., Peral Mondaza, J. A., & Mujica Delgado, L. E. (2025). Data Interpretation in Structural Health Monitoring: Toward a Universal Language. Sensors, 25(10), 3054. https://doi.org/10.3390/s25103054