Preliminary Studies to Bridge the Gap: Leveraging Informal Software Architecture Artifacts for Structured Model Creation
Abstract
:1. Introduction
2. Background
2.1. Architecture Diagrams
2.2. Behavioral (Activity) Diagrams
2.3. Logical Network Diagrams
2.4. Cloud Architecture Diagrams
2.5. Structural Diagrams
2.6. Other Diagram Types
- Sequence Diagrams
- Use Case Diagrams
- Deployment Diagrams
- Component Diagrams
- State Machine Diagrams
- Class Diagrams
- Package Diagrams
3. Research on Diagramming Tools
3.1. Overview of Diagramming Tools
3.2. Integration with Visual Studio Code
3.3. Evaluation and Selection of Diagramming Tools
4. Methodology
- 1.
- Step 1: Informal Diagram Creation: Begin with a diagram demonstrating the system’s features. This step relies on visual thinking, which has been shown to enhance the understanding and communication of complex systems [16]. While various diagramming tools can be used, the focus is on tools that embed structural metadata within the diagram file.
- Choose a diagramming tool that supports metadata embedding
- Ensure the diagram captures essential system elements and relationships
- Consider using standardized notations (e.g., UML or ArchiMate) for improved interoperability
- 2.
- Step 2: Structural Data Extraction: Extract the embedded structural data from the diagram file. This step is crucial for preserving the semantic information inherent in the visual representation.
- Parsing the file structure
- Identifying and isolating the metadata section
- Decoding the metadata into a machine-readable format
- 3.
- Step 3: Intermediate Format Conversion (Optional): If needed, convert the extracted data into an intermediate format for easier processing. This step adheres to the principle of separation of concerns, isolating the complexities of different file formats from the core transformation logic.
- Choose a format that balances human readability with machine processability (e.g., JSON or YAML)
- Ensure the chosen format can adequately represent all relevant diagram elements and their properties
- Consider using established data exchange formats such as XMI (XML Metadata Interchange) for improved interoperability [20]
- 4.
- Step 4: Graph Model Creation: Transform the data into a graph-based model using a library such as NetworkX. This step leverages graph theory to represent the system structure.
- Graph theory concepts (nodes, edges, and properties) [21]
- Isomorphism between visual diagrams and graph structures
- Node creation: Represent system elements as graph nodes
- Edge creation: Represent relationships between elements as graph edges
- Property mapping: Attach relevant metadata to nodes and edges
- Preservation of structural semantics from the original diagram
- 5.
- Step 5: Information Inference (Optional): Analyze the graph model to infer additional information not explicitly present in the original diagram. This step employs various analytical techniques to enhance the model’s utility.
- Geometric analysis for containment and proximity relationships
- Hierarchical structure detection
- Path analysis for indirect relationships
- Pattern recognition for identifying common architectural styles or design patterns
5. Data Extraction from Informal Artifacts (Example Case)
5.1. File Formats
5.2. The .drawio File Format
Listing 1. XML Representation of Diagram from draw.io (Diagrams.net). |
<mxfile> <diagram id=“juJWlWKBKwTHVthH_bm8” name=“Page-1”> <mxGraphModel dx=“684” dy=“351” grid=“1” gridSize=“10” guides=“1” tooltips=“1” connect=“1” arrows=“1” fold=“1” page=“1” pageScale=“1” pageWidth=“850” pageHeight=“1100”> <root> … </root> </mxGraphModel> </diagram> </mxfile> |
5.3. The .drawio.svg File Format
Listing 2. XML/SVG Content Extraction and Transformation Pipeline with Decoding. |
XPath_expression(‘/svg/@content’, ‘\\n’) Find_/Replace( {‘option’: ‘Regex’, ‘string’: ‘content=“‘}, ‘‘, true, false, true, false ) Find_/Replace( {‘option’: ‘Regex’, ‘string’: ‘“‘}, ‘‘, true, false, true, false ) From_HTML_Entity() XML_Beautify(‘\\t’, ‘disabled’) XPath_expression(‘/mxfile/diagram[text()]’, ‘\\n’) Strip_HTML_tags(true, true) From_Base64(‘A-Za-z0-9+/=‘, true, false) Raw_Inflate(0, 0, ‘Adaptive’, false, false) URL_Decode() XML_Beautify(‘\\t’) |
5.4. The .drawio.png File Format
Listing 3. URL Decoding, Regex Matching, and XML Beautification Workflow. |
URL_Decode() Regular_expression( ‘User defined’, ‘<mxfile>.*</mxfile>‘, true, true, false, false, false, false, ‘List matches’ ) XML_Beautify(‘\\t’) |
6. Extracting Data from PNGs
6.1. Retrieving the MxFile
Listing 4. Extracting mxfile Content from PNG with URL Decoding and Regex. |
def get_mxfile(fpath): pngbytes = open(fpath, mode=‘rb’).read() png = pngbytes.decode(‘utf-8’, errors=‘ignore’) decoded = unquote(png, encoding=‘utf-8’) match = re.search(‘<mxfile>.*</mxfile>’, decoded) mxfile = match.group(0) return mxfile |
6.2. Intermediate Format Conversion
Listing 5: Parsing and Converting mxGraph XML to JSON Using xmltodict. |
xml = get_xml(fpath) d = xmltodict.parse(xml) mxgraph = d[‘mxfile’][‘diagram’][‘mxGraphModel’] graph = MxGraph(mxgraph) print(json.dumps(graph.g, indent=4)) |
7. Creating Models
Intermediate Format Conversion
Listing 6. Converting Diagram Elements to a NetworkX Graph with Nodes and Edges. |
def to_networkx(elements): G = nx.Graph() nodes = [] edges = [] # Loop over all diagram elements for element in elements: # Get the element ID _id = element.get(‘@id’, None) # If the element is a vertex if element.get(‘@vertex’, None) == ‘1’: nodes.append((element.get(‘@id’), element)) # If the element is an edge elif element.get(‘@edge’, None) == ‘1’: src = element.get(‘@source’, None) tgt = element.get(‘@target’, None) edges.append((src, tgt, element)) # Add the nodes G.add_nodes_from(nodes) # Add the edges for e in edges: print(f‘Adding edge {e[0]} --> {e[1]}’) G.add_edge(e[0], e[1], **e[2]) return G |
8. Inferring Additional Information
8.1. Geometric Inferences
Listing 7. Determining Containment Relationships Between Graph Nodes Based on Bounds. |
for i in graph.nodes: for j in graph.nodes: # The bounds of element i xi_lim = (i.x, i.x + i.width) yi_lim = (i.y, i.y + i.height) # The bounds of element j xj_lim = (j.x, j.x + j.width) yj_lim = (j.y, j.y + j.height) # True if element j’s x bounds are inside element i’s x bounds xj_in_xi = (xi_lim[0] < xj_lim[0] and xj_lim[1] < xi_lim[1]) # True if element j’s y bounds are inside element i’s y bounds yj_in_yi = (yi_lim[0] < yj_lim[0] and yj_lim[1] < yi_lim[1]) # If element j‘s X and Y bounds are inside elementi’s bounds, # create a relationship identifying element j is inside i if xj_in_xi and yj_in_yi: graph.add_edge(j, i, relationship=‘in’) |
8.2. Parent–Child Relationships
Listing 8. Establishing Parent–Child Relationships Between Graph Nodes. |
for i in graph.nodes: for j in graph.nodes: if i.parent == j: graph.add_edge(i, j, relationship=‘parent’) graph.add_edge(j, i, relationship=‘child’) |
8.3. Other Inferences
- Traversing Intermediate Connections: This technique allows for an understanding of the indirect relationships between diagram elements, as suggested in the methodology’s discussion of path analysis.
- Grouping by Proximity: This method groups related elements based on their spatial proximity, enhancing the model’s interpretability.
- Element Type Identification and Edge Labeling: Future work could involve refining the model by identifying different diagram elements and incorporating edge labels, adding further granularity to the analysis.
9. End-to-End Example
9.1. Creating the Model
9.2. Indexing with Graph Databases
9.3. Querying with Cypher
Listing 9. Querying Data Lake Relationships in a Graph Database Using Cypher. |
MATCH (db:Database)<-[r:EDGE*0..4]->(n) WHERE db.label STARTS WITH ‘Data Lake’ RETURN db, n |
10. Results and Comparison with Other Methods
10.1. Methodology Effectiveness
- Accessibility and Ease of Use: The methodology employs widely available tools (e.g., Draw.io) and Python libraries, making it accessible to many users.
- Flexibility: The process can handle various file formats and diagram types, accommodating different needs and preferences in software architecture modeling.
- Scalability: The approach is scalable, allowing users to analyze large and complex systems by converting informal diagrams into structured models that can be easily queried and analyzed.
10.2. Comparison with Other Methods
- Traditional Methods: Traditional methods, such as those using UML (Unified Modeling Language) or SysML (Systems Modeling Language) [9], require specialized knowledge and tools to create semantically precise models. These methods ensure a high level of rigor but can be inaccessible to those without specific training in these languages.
- This Methodology: By contrast, the approach described here allows users to start with informal diagrams and gradually transition to formal models. This lowers the barrier to entry, enabling a wider range of professionals to participate in model creation and analysis.
- Microsoft Visio and Lucidchart: Tools such as Microsoft Visio and Lucidchart are popular for creating diagrams. However, they often lack the integration to convert these diagrams into formal models that can be analyzed using advanced techniques such as graph theory. Additionally, these tools are proprietary, which may limit accessibility and flexibility.
- This Methodology: Combining Draw.io, an open-source tool with Python’s NetworkX, offers a more flexible and cost-effective solution. Users are not locked into a specific ecosystem and can easily integrate this methodology with other open-source tools.
- MBSE Approaches: MBSE frameworks, such as those following the Model-Driven Architecture (MDA) paradigm, provide a rigorous approach to model creation, focusing on separating design and architecture. These methods are powerful but can be complex and resource-intensive.
- This Methodology: While this methodology aligns with some principles of MBSE (e.g., the separation of concerns during format conversion), it offers a more lightweight and user-friendly alternative. It is particularly well-suited for organizations or projects where full-scale MBSE adoption is impractical due to time, cost, or expertise constraints.
11. Conclusions and Further Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Basili, V.; Briand, L.; Bianculli, D.; Nejati, S.; Pastore, F.; Sabetzadeh, M. Software Engineering Research and Industry: A Symbiotic Relationship to Foster Impact. IEEE Softw. 2018, 35, 44–49. [Google Scholar] [CrossRef]
- Richards, M.; Ford, N. Fundamentals of Software Architecture. O’Reilly Media, Inc. Available online: https://learning.oreilly.com/library/view/fundamentals-ofsoftware/9781492043447/ (accessed on 10 April 2024).
- Carroll, E.; Malins, R. Systematic Literature Review: How is Model-Based Systems Engineering Justifed? Sandia National Laboratories: Albuquerque, NM, USA, 2016. [Google Scholar] [CrossRef]
- Ozkaya, M. Do the informal & formal software modeling notations satisfy practitioners for software architecture modeling? Inf. Softw. Technol. 2018, 95, 15–33. [Google Scholar] [CrossRef]
- Keim, J.; Schneider, Y.; Koziolek, A. Towards consistency analysis between formal and informal software architecture artefacts. In Proceedings of the 2019 IEEE/ACM 2nd International Workshop on Establishing the Community-Wide Infrastructure for Architecture-Based Software Engineering (ECASE), Montreal, QC, Canada, 27 May 2019; pp. 6–12. [Google Scholar] [CrossRef]
- Ali, N.; Baker, S.; O’Crowley, R.; Herold, S.; Buckley, J. Architecture consistency: State of the practice, challenges and requirements. Empir. Softw. Eng. 2018, 23, 224–258. [Google Scholar] [CrossRef]
- Fowler, M. Software Architecture Guide. Available online: https://martinfowler.com/architecture/ (accessed on 10 April 2024).
- Object Management Group. OMG® Uni ed Modeling Language® (OMG UML®), Versionb2.5.1. 2023. Available online: https://www.omg.org/spec/UML/2.5.1/PDF (accessed on 10 April 2024).
- Object Management Group. OMG Systems Modeling Language™ (SysML®), Version 2.0 Beta, Part 1 Language Specification. 2023. Available online: https://www.omg.org/spec/SysML/2.0/Beta1/Language/PDF (accessed on 1 March 2024).
- JGraph Ltd. draw.io. July 2023. Available online: https://www.drawio.com/ (accessed on 1 March 2024).
- JGraph Ltd. Github—jgraph/drawio-desktop (Source Code). July 2023. Available online: https://github.com/jgraph/drawio-desktop (accessed on 1 March 2024).
- Henning Dieterichs. Github—hediet/vscode-drawio (Source Code). July 2023. Available online: https://github.com/hediet/vscode-drawio (accessed on 10 February 2024).
- Henning Dieterichs. Draw.io Integration—Visual Studio Marketplace. July 2023. Available online: https://marketplace.visualstudio.com/items?itemName=hediet.vscode-drawio (accessed on 10 April 2024).
- JGraph Ltd. MxGraph. Available online: https://jgraph.github.io/mxgraph/ (accessed on 10 April 2024).
- Pastor, O.; Molina, J.C. Model-Driven Architecture in Practice: A Software Production Environment Based on Conceptual Modeling; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Franconeri, S.L.; Padilla, L.M.; Shah, P.; Zacks, J.M.; Hullman, J. The Science of Visual Data Communication: What Works. Psychol. Sci. Public Interest 2021, 22, 110–161. [Google Scholar] [CrossRef]
- Bentrad, S.; Meslati, D. Visual Programming and Program Visualization—Toward an Ideal Visual Software Engineering System. ACEEE Int. J. Inf. Technol. 2011, 1, 43–49. [Google Scholar]
- Kaplan, J. Agile Architecture in Practice. 2023. Available online: https://jdkaplan.com/articles/agile-architecture-in-practice (accessed on 15 March 2024).
- Leipzig, J.; Nüst, D.; Hoyt, C.T.; Ram, K.; Greenberg, J. The role of metadata in reproducible computational research. Patterns 2021, 2, 100322. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Object Management Group. XML Metadata Interchange (XMI), Version 2.5.1. 2015. Available online: https://www.omg.org/spec/XMI/2.5.1/PDF (accessed on 10 April 2024).
- Majeed, A.; Rauf, I. Graph Theory: A Comprehensive Survey about Graph Theory Applications in Computer Science and Social Networks. Inventions 2020, 5, 10. [Google Scholar] [CrossRef]
- Leskovec, J.; Lang, K.J.; Mahoney, M. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NA, USA, 26–30 April 2010; pp. 631–640. [Google Scholar]
- Li, W.; Zhou, X.; Wu, S. An Integrated Software Framework to Support Semantic Modeling and Reasoning of Spatiotemporal Change of Geographical Objects: A Use Case of Land Use and Land Cover Change Study. ISPRS Int. J. Geo-Inf. 2016, 5, 179. [Google Scholar] [CrossRef]
- Würsch, M.; Ghezzi, G.; Hert, M.; Reif, G.; Gall, H. SEON: A pyramid of ontologies for software evolution and its applications. Computing 2012, 94, 857–885. [Google Scholar] [CrossRef]
- GCHQ. CyberChef. Available online: https://gchq.github.io/CyberChef/ (accessed on 10 April 2024).
- Robinson, I.; Webber, J.; Eifrim, E. Graph Databases, 2nd ed.; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2015; Available online: https://learning.oreilly.com/library/view/graph-databases-2nd/9781491930885/ (accessed on 10 April 2024).
- Kassab, M.; Mazzara, M.; Lee, J.; Succi, G. Software architectural patterns in practice: An empirical study. Innov. Syst. Softw. Eng. 2018, 14, 263–271. [Google Scholar] [CrossRef]
- Schilling, R.D.; Aier, S.; Winter, R. Designing an Artifact for Informal Control in Enterprise Architecture Management. In Proceedings of the ICIS, 2019, Munich, Germany, 15–18 December 2019. [Google Scholar]
- Rabelo, L.; Bhide, S.; Gutierrez, E. Artificial Intelligence: Advances in Research and Applications; Nova Science Publishers, Inc.: Sebastopol, CA, USA, 2018. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kaplan, J.; Rabelo, L. Preliminary Studies to Bridge the Gap: Leveraging Informal Software Architecture Artifacts for Structured Model Creation. Information 2024, 15, 642. https://doi.org/10.3390/info15100642
Kaplan J, Rabelo L. Preliminary Studies to Bridge the Gap: Leveraging Informal Software Architecture Artifacts for Structured Model Creation. Information. 2024; 15(10):642. https://doi.org/10.3390/info15100642
Chicago/Turabian StyleKaplan, Joshua, and Luis Rabelo. 2024. "Preliminary Studies to Bridge the Gap: Leveraging Informal Software Architecture Artifacts for Structured Model Creation" Information 15, no. 10: 642. https://doi.org/10.3390/info15100642
APA StyleKaplan, J., & Rabelo, L. (2024). Preliminary Studies to Bridge the Gap: Leveraging Informal Software Architecture Artifacts for Structured Model Creation. Information, 15(10), 642. https://doi.org/10.3390/info15100642