NLP and Text Mining for Enriching IT Professional Skills Frameworks

Zare, Danial; Fernandez-Sanz, Luis; Pospelova, Vera; López-Baldominos, Inés

doi:10.3390/app15179634

Open AccessArticle

NLP and Text Mining for Enriching IT Professional Skills Frameworks

by

Danial Zare

,

Luis Fernandez-Sanz

^*

,

Vera Pospelova

and

Inés López-Baldominos

Department of Computer Sciences, University of Alcala, E-28805 Alcalá de Henares, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(17), 9634; https://doi.org/10.3390/app15179634

Submission received: 30 July 2025 / Revised: 18 August 2025 / Accepted: 27 August 2025 / Published: 1 September 2025

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Featured Application

The application of Natural Language Processing (NLP) to IT professional skills frameworks opens a promising opportunity of analysis of the vast amount of text information and complex relationships now expressed in traditional documents, unapproachable by manual methods.

Abstract

The European e-Competence Framework (e-CF) and the European Skills, Competences, Qualifications and Occupations (ESCO) classification are two key initiatives developed by the European Commission to support skills transparency, mobility, and interoperability across labour and education systems. While e-CF defines essential competences for ICT professionals through a structured framework, it provides only a limited number of illustrative skills and knowledge examples for each competence. In contrast, ESCO offers a rich, multilingual taxonomy of skills and knowledge, each accompanied by a detailed description, alternative labels, and links to relevant occupations. This paper explores the possibility of enriching the e-CF framework by linking it to relevant ESCO ICT skills using text embedding (MPNet) and cosine similarity. This approach allows the extension to 15–25 semantically aligned skills and knowledge items per competence in e-CF, all with full description and officially translated into all EU languages, instead of the present amount of 4–10 brief examples. This significantly improves the clarity, usability, and interpretability of e-CF competences for the various stakeholders. Furthermore, since ESCO terminology serves as the foundation for labour market analysis across the EU, establishing this linkage provides a valuable bridge between the e-CF competence model and real-time labour market intelligence, a connection not available now. The results of this study offer practical insights into the application of semantic technologies to the enhancement and mutual alignment of European ICT skills frameworks.

Keywords:

skills frameworks; ESCO; e-CF; NLP; text mining; mapping

1. Introduction

The rapid expansion of the Information and Communication Technology (ICT) sector has fundamentally transformed economies worldwide. The demand for skilled ICT professionals has surged as organizations across industries increasingly need to rely on digital infrastructure. As a consequence, ICT profession generates many job opportunities [1]. However, studies confirm the scarcity of skilled professionals in the technology sector, a shortage of qualified workforce for the demand of developing new processes. Even the European Union (EU) has reported that the skills gap in ICT specialists hinders growth in the EU countries, finding out that more than half of companies surveyed struggled to fill ICT vacancies according to a study published by Eurostat [2,3]. Moreover, the EU has setup the Digital Decade policy programme to guide Europe’s digital transformation, with concrete targets and objectives such as 20 million ICT Specialists for 2030.

The term ‘skill’ refers to the ability to carry out managerial or technical tasks. Skills may be cognitive or practical (know how to do it) [4]. ‘Knowledge’ is also an important term referred to the body of facts which can be applied in a field of work or study (know what to do) [4]. Another term frequently used in the literature is ‘competency’. Competency (or competence) typically refers to a demonstrated ability to apply knowledge, skills and attitudes for achieving observable results [4]. In the digital labour market, skills are traditionally categorized into “hard” and “soft” skills. The so-called hard-skills are those specific to a discipline that a person can acquire through formal learning processes, such as how to use a programming language [5]. On the other hand, the so-called soft-skills [5], such as leadership or communication capacity, are mainly referred to transversal ones and cannot be easily acquired through formal learning as they refer to behavioural patterns. These skills can play a decisive role in determining the qualitative matching of a worker and an open job vacancy, the performance in activities or the amount of time the worker stays in the job [6].

Skill frameworks are structured and systematic approaches to identifying, organizing, and evaluating skills needed for success in various occupations, sectors and disciplines [7]. These frameworks provide a common language and a reference point for employers, educators, employees, and policymakers to understand and develop the skills needed across various industries, particularly in fields like ICT, healthcare, engineering, and many others. The most important skills frameworks at European level for the ICT professionalism are ESCO [8] (rigorously not framework but a labour classification) and the European standard EN16234, known as e-CF [4]. ESCO [8] is a multilingual classification of Skills, Competences, Qualifications, and Occupations created by the European Commission to improve the supply of information on skills demand in the labour market. e-CF [4] is a reference framework of competences applied within the ICT sector that can be used and understood by ICT user and supply companies, ICT practitioners, managers and human resources departments, the public sector, educational and social partners across Europe [9].

The importance of skills frameworks is undebatable. They help to create a common language for describing the competencies needed or recommended for various occupations [10]. This standardization allows clearer communication between employers, employees, and educational institutions [11]. Frameworks also help education and training programmes to be aligned with the actual needs of the industry. Finally, governments and organizations use skills frameworks for workforce and labour market analysis: by identifying emerging trends and skill shortages, policies and training initiatives can be designed to address those gaps [12].

The e-CF framework serves as a widely recognized reference model for identifying and developing ICT professional competences. However, a notable limitation of the e-CF is that it provides only a limited set of illustrative skills and knowledge items for each defined e-Competence. This constraint can reduce the granularity, clarity, and adaptability of the framework when applied in diverse professional, educational, or recruitment contexts. Enriching e-CF has several benefits. Additional examples of skills and knowledge help to clarify the meaning and scope of each e-Competence, reducing ambiguity and improving interpretability. Moreover, the illustrative examples of skills and knowledge for each e-competence are limited to a mere expression (e.g., “maintain data integrity and interoperability”) without additional description or explanation, thus possibly leading to ambiguity or limited understanding, especially for users who are not deeply familiar with the framework or with the specific ICT terminology. So, contributing to mitigating these two weak points is a key motivation for the study presented in this article aimed at enhancing the information richness and versatility of the e-CF standard framework.

Despite the valuable contributions of both frameworks (e-CF and ESCO), their separate structures and terminologies make it challenging for stakeholders to cross-reference and align skills and competences effectively. This paper proposes the use of natural language processing (NLP) techniques to automate the mapping of ESCO’s (ICT-related) skills to the 41 e-competences defined in e-CF. Possible approaches to skills mapping often rely on manual analysis processes or basic keyword matching to find similarities, which are inadequate given the nuances of technical skills descriptions and the number and complexity of relationships between different competencies and their detailed components. For example, ESCO contains approximately 14,000 skills, of which 1238 are classified as ICT skills: the ones extracted by querying the occupations database within ESCO and selecting occupations associated with the ISCO codes 133, 25, and 35 (they correspond, respectively, to Information and Communications Technology Service Managers, Information and Communications Technology Professionals, and Information and Communications Technicians). Each of these skills in ESCO is documented with a distinct title, alternative labels, descriptions, and additional metadata. When looking at e-CF (EN 16234), we find 41 competences with a total of 278 knowledge examples (6.78 average per each competence) and 245 for skills (average of 5.97 per competence). The comparison with the ICT skills of ESCO requires the analysis of 647,474 potential relationships. Obviously, possible manual methods not supported by advanced NLP are evidently unfeasible. This has motivated our proposal for applying solid NLP methods to manage this type of analysis needed for the mutual enrichment of both frameworks thus enabling more detailed and pragmatic references for ICT professional profiles.

The structure of the remaining part of the paper is as follows: Section 2 discusses the relevant details extracted from literature review on the two frameworks and on the suitable advanced methods of NLP for the case. Section 3 presents the applied methodology, data preparation and text processing methods and metrics for the frameworks analysis. Section 4 presents the results and their corresponding discussion while Section 5 shows the main conclusions and depicts possible future works.

2. Literature Review

This section starts by reviewing and describing the two target frameworks in our work: e-CF and ESCO. A brief overview of previous studies and related work on mapping and linking frameworks will follow and will also highlight the role of NLP in facilitating such connections. Finally, the section concludes with a summary of the motivation behind our research and a short explanation on the relevance and value of this study.

2.1. Skills Frameworks

The increasing digitization of the global economy has spurred the development of numerous skills frameworks aimed at standardizing, categorizing, and communicating workforce competencies. These frameworks are pivotal for aligning education and training with labour market needs, fostering transparency in skill recognition, and enhancing workforce mobility. Prominent among these frameworks in Europe are the European e-Competence Framework (e-CF) [4] and the European Skills, Competences, Qualifications, and Occupations (ESCO) [8] taxonomy, both playing a critical role in structuring ICT-related skills and competences in Europe.

2.1.1. e-CF (Standard EN 16234)

e-CF is a reference framework of e-competences applied within the ICT sector that can be used and understood by ICT users and supply companies, ICT practitioners, managers and human resources departments, the public sector, educational and social partners across Europe. The development of this framework was triggered by the need of supporting mutual understanding and providing transparency of language across Europe through the articulation of competences required and deployed by ICT professionals, including both practitioners and managers [9].

The standard is structured across four dimensions. Dimension 1 categorizes competences into five main areas, representing the main different functions or processes within the ICT domain. Dimension 2 defines 41 e-competences that describe the associated functions, responsibilities and behaviours required for various ICT roles. These e-competences cover a wide range of activities, from technical tasks like systems integration to strategic roles such as ICT governance. Each e-competence is also defined across five proficiency levels, although not all five levels are meaningful for all e-competences. These levels (dimension 3) describe increasing levels of expertise, responsibility, and autonomy in performing ICT-related tasks.

As commented in the introduction, the present version of the e-CF framework defines each e-Competence with a limited number of illustrative examples of skills and knowledge items (around 6 or 7 in each category in each e-competence). Moreover, they only show a mere title without further detail or clarification. As we will see in the next section, all ESCO skills and knowledge items are richly described with a precise title, alternative labels (e.g., synonyms, spelling variations, technology versions), a detailed description, links to related occupations and knowledge areas. So, the mapping and link between elements of both frameworks could enrich and complement the information they provide to practitioners. This has inspired the work of this study.

2.1.2. ESCO

ESCO [8] is a multilingual classification of skills, competences, qualifications, and occupations which was first mooted in 2008 in the European Commission (EC)’s New Skills for New Jobs Communication [13]. The ESCO classification is structured into three interconnected pillars: Occupations, Skills, and Competences and Qualifications. After a decade of development, it currently contains information on the skills of more than 3000 occupations. Separately, it provides a description of about 14,000 skills/knowledge. The Occupations pillar includes more than 3000 job titles mapped to descriptions and alternative job titles used in different languages, allowing employers and job seekers to refer to job roles consistently across Europe. The Skills and Competences pillar defines specific abilities and knowledge areas recommended for the various occupations, thus facilitating better alignment between the demands of employers and the profiles of workers. Finally, the Qualifications pillar provides references to formal national qualifications recognized across Europe, aiding in the recognition of educational achievements with a link to job markets. Together, these pillars form a comprehensive taxonomy that enables a clearer understanding of the relationships between various job roles, required skills, and relevant qualifications. It is important to note that this study will focus exclusively on the ICT-related skills defined within the ESCO classification. Specifically, these skills are categorized, as commented in the introduction, under occupation groups 25, 35, and 133 of the ESCO taxonomy.

This framework was part of the Europe 2020 strategy, designed to address skill gaps, improve job matching, and enhance the recognition of skills across borders. ESCO is available in 27 EU languages, which makes it particularly useful for cross-border recruitment and for individuals seeking work in different EU countries. ESCO is used in a variety of applications, from public employment services and private recruitment platforms to educational institutions and policymakers. Employers can use ESCO to define job requirements and assess candidates’ skills in a standardized way, while job seekers can identify the skills needed for specific occupations. It is also an information resource for training providers and careers guidance counsellors [11]. Educational institutions, vocational training centres or professional training organizations can also use the framework to design curricula that are better aligned with labour market needs [14]. Additionally, ESCO supports the development of digital tools, such as job-matching platforms, that help bridge the gap between skills supply and demand [15].

Generating the mapping from these ESCO ICT skills to e-CF competences could benefit the study of ICT professional labour market and profiles by bridging both models, facilitating their interoperability by supporting automated tools for job matching, CV analysis and by harmonizing educational curricula with actual job market expectations. Furthermore, this will allow multiple phrasing alternatives, improving search and semantic alignment in multilingual or diverse organizational settings. This list of benefits has strengthened our motivation to enrich the e-CF framework with the link to the information from ESCO.

2.2. Skills Frameworks Mapping and the Application of NLP

In recent years, research into automating competency mapping has been relatively limited. Traditional NLP techniques include keyword matching and statistical methods such as Term Frequency—Inverse Document Frequency (TF-IDF), while more sophisticated ML and Deep Learning (DL) models include Doc2Vec, BERT or other LLMs [16,17,18]. The writers in [19] paid particular attention to the rapidly evolving application of LLMs, evaluating their potential for generating synthetic training data or improving the accuracy of skill matching through re-ranking potential results [20,21]. Connecting individual skills and qualifications to job requirements, career paths, or training opportunities requires sophisticated mapping methodologies [19]. A prominent approach involves using semantic similarity techniques [16,22,23]. This typically entails representing texts (like sentences from a resume, job description, or training material) as numerical vectors using embedding models and then calculating the distance or similarity (e.g., cosine similarity) between these vectors [22,23,24].

Skill or competency framework mapping has historically depended on the manual efforts of experts [25,26,27]. As an example, prior research has explored competency framework mapping for public health training [28], quantitatively assessing the alignment of eight competency frameworks with an international standard for public health training and education programmes. Several terms have been used in the literature to refer to the mappings between skill taxonomies [29]. The term “crosswalk” is one, derived from the idea of creating a path to cross a street, used to describe the connection between two taxonomies or sets of educational standards [25,27]. Other terms like “transfer” [26] and “alignment” [30,31] have also been used in related work. Notably, Yilmazel et al. [31] explored an automated approach by employing rule-based techniques to systematically extract features from skill descriptions within one competency standard. These extracted features were used to train a machine learning classifier to map these features to another standard. A slightly enhanced method was employed in [32] to map the concepts connecting the interviewees’ experiences with the project management competencies required for the development of the framework. The C-map tool was utilized for this purpose, which is a visual representation technique that helps in organizing and representing knowledge or concepts [33]. Despite the advantages of this method, manual revisions remained necessary for the mapping process, and the results were subjected to the interviewees’ experiences. On the other hand, Choi, Song, and Zhu [30] transformed skill statements into a verb phrase graph and a noun phrase graph, subsequently measuring the similarity between skills through graph matching for competency mapping. A more recent study investigated learning skill equivalencies across platform taxonomies [29]. The researchers proposed using text embedding and cosine similarity to identify skill mapping across digital learning platforms, including ASSISTments, Khan Academy, and Cognitive Tutor. They evaluated several models, such as Bag-of-Words, TF-IDF, Context2Vec, Skip-Gram, and Text-Associated Matrix Factorization.

Recently, Jemal et al. [34] proposed a new approach for mapping competency frameworks, similar to the ICT ones of this study, using LLM. They investigated various pre-trained LLMs to encode competency names from each framework and finally selected cosine similarity to measure semantic similarity scores, as it facilitated the identification of equivalent or closely related competencies across different frameworks. The inspiration in the previous successful work with similar skills frameworks with specific NLP techniques inspired us to apply them to the case of e-CF and ESCO in this study. The choice of MPNet in our work was motivated by prior empirical evidence, particularly the results reported in Jemal et al., 2025 [34]. In that study—especially in Table 4—MPNet consistently achieved the highest Recall values (Recall@5 = 0.91, Recall@3 = 0.84, Recall@1 = 0.55) and matched the highest MRR score (0.76) when compared with other LLM-based embedding methods, including BERT, RoBERTa, and DistilBERT, in a similar competency-mapping context. Given that our task of mapping e-CF to ESCO ICT skills is structurally and semantically analogous to the PMBOK-based mappings in that work, we adopted MPNet as the embedding model to yield the best recall and ranking performance. On the other hand, based on the findings of [35], MPNet (specifically the MPNet base v2 model) demonstrates superior performance compared to the other sentence transformer models evaluated. The study reports that the combination of MPNet with Logistic Regression (MPNet + LR) achieved high accuracy and precision in classifying similar sentences and outperformed competing models such as RoBERTa-based variants across key metrics including WSS\@95, RRF\@10, and ATD. These results indicate that MPNet’s deep contextual embeddings are particularly effective for sentence similarity tasks, delivering contextually relevant and semantically accurate results while maintaining efficient processing speed and memory usage.

3. Analysis of the Framework

This section describes the process for collecting the required textual data for the analysis of the frameworks and their mutual links. It also describes the NLP techniques that supported this analysis and the tools used for the study.

3.1. Methodology

Based on the present limitations of the e-CF framework mentioned in the introduction and Section 2, our study starts by stating the research questions inspired by our motivation for enriching the e-CF framework with information from ESCO. The first main question is the following one:

RQ1:

How can e-CF be enriched in its dimension 4 by adding more applicable examples of skills and knowledge from ESCO?

As a logical complement, a second research question is directly concluded:

RQ2:

What is the most equivalent ESCO ICT skills or knowledge for each of the illustrative examples in dimension 4 of e-CF competences?

Figure 1 illustrates the overall workflow of the methodology applied to the analysis. First, we convert the data from the e-CF framework and the ESCO classification into textual format. In the next phase, we transform these textual data into text embeddings using a pre-trained Large Language Model (LLM). Finally, we compare the resulting embeddings using the Cosine Similarity algorithm to identify semantic relationships.

All steps of this process are implemented using custom Python scripts (Python 3.10). The source code and results of this implementation are publicly available in our Zenodo repository (https://zenodo.org/uploads/16575072, (accessed on 26 August 2025)).

3.2. Data Preparation and Pre-Processing

The textual data from each framework was stored in separate CSV files as a first step to apply NLP techniques to skills frameworks. For example, the data of ESCO can be directly downloaded from the official ESCO website as CSV and ODS files. Since our primary goal is to work specifically on ICT-related skills in ESCO, the first step required the extraction of the list of the 110 ICT-related occupations in the ESCO classification. These occupations fall under the ISCO occupation groups 133 (Information and Communications Technology Service Managers: 13 occupations), 25 (Information and Communications Technology Professionals: 75 occupations), and 35 (Information and Communications Technicians: 22 occupations). The next logical step was extraction all the skills and knowledge items linked to each of these 110 occupations resulting in 1238 unique items after removing duplicates. All the information related to these items—including their names, alternative names, hidden names, descriptions, and more—was compiled into a CSV file, here referred as the ESCO ICT Skills dataset. Table 1 illustrates a portion of the textual data of ESCO ICT skills.

Similarly, the data from the e-CF framework was organized and saved in a structured tabular format. Table 2 displays a portion of this dataset.

3.3. Text Embedding Using LLMs

Word Embedding is currently the standard in machine learning methods for NLP. It is a matrix that represents the interdependence between words in a given linguistic corpus. This matrix is N x dimension, where N is the number of words in a corpus, and the dimension is most often 100 or 300 [36]. The representation can be a set of real numbers (a vector). Word embeddings are scattered depiction of a text in an n-dimensional space, which tries to capture the word meanings [37]. Embeddings go beyond lexical similarity by capturing the meaning of words in context. For example, embeddings can differentiate between “bank” as a financial institution and “bank” as a riverbank.

Large language models (LLMs) are deep learning models with billions of parameters, trained on massive corpora of text data. They demonstrate strong performance on a wide range of NLP tasks, often exhibiting emergent capabilities as a result of scale [38]. This study employed a Large Language Model (LLM) for textual data encoding to accurately capture and process linguistic nuances as employed in [34]. A Large Language Model (LLM) is a deep learning model based on the transformer architecture, trained on extensive text data to understand, encode, and generate text. LLMs feature millions or billions of parameters, enabling them to capture complex semantic patterns effectively. They can discern relationships between words in a sentence by utilizing attention mechanisms, allowing for better handling of longer context. Transformers outperform traditional models like RNNs, LSTMs, and CNNs primarily because they avoid sequential processing and recursion, processing sentences as a whole rather than sequentially.

As explained in Section 2.2, text embedding resulted from a pretrained MPNet language model. The motivation for this approach, as discussed earlier, lies in its superior performance on evaluation metrics such as Recall@k, MRR, and others [34]. Specifically, we used the all-mpnet-base-v2 model, a base MPNet sentence transformer. This model was used with the general-purpose pretrained version, which is well-suited for generating high-quality sentence embeddings, instead of been fine-tuned on our dataset. The text data from all columns in each row were concatenated in this implementation without additional preprocessing steps such as lowercasing, lemmatization, or stop-word removal. The embeddings were generated directly from the raw text. The MPNet model demonstrates robustness to textual variations and can effectively capture semantic similarity without explicit preprocessing. It is highly capable of processing input sentences in conjunction with multiple corpora, allowing it to retrieve the most similar sentences from the relevant corpus along with their associated similarity scores [35].

3.4. Similarity Metrics

A similarity metric is a real-valued function used to calculate the similarity between two items. This similarity computation is achieved by mapping distances to similarities within a vector space [34]. The cosine similarity was applied here because it effectively quantifies semantic similarity based on the angular relationship between vectors, independent of their magnitudes, as shown in, while alternative metrics as Dot Product and TS-SS may distort results or add computational complexity. The value of Cosine similarity ranges from −1 (opposite) to 1 (identical). Given two vectors A and B:

C o s i n e S i m i l a r i t y = (A \cdot B) / (∥ A ∥ ∥ B ∥)

where

A \cdot B

is the dot product of

A

and

B

, and

∥ A ∥

and

∥ B ∥

are the magnitudes (lengths) of the vectors and are computed as:

∥ A ∥ = \sqrt{\sum_{i = 1}^{n} A_{i}^{2}}, ∥ B ∥ = \sqrt{\sum_{i = 1}^{n} B_{i}^{2}}

As shown in Figure 2, consider two vectors, denoted as A and B. As the distance between these vectors increases, the similarity between them decreases, and vice versa [22].

4. Results and Discussion

The implementation of the mentioned methods was based on the Python programming language complemented with various NLP libraries. Initially, the implementation of the mentioned methods used a pre-trained large language model (LLM). After importing the textual data from the e-CF framework and the ESCO classification into a CSV file, the texts were converted into embeddings using the loaded model. Next, the comparison between the ESCO skills and the e-CF e-Competences—both transformed into embeddings—used the Cosine Similarity function. Finally, the results were stored in a CSV file in the form of a DataFrame.

4.1. The Most Similar ESCO ICT Skills to e-Competences

Table 3 illustrates a partial view of the results. For example, the most similar ESCO ICT skills to the e-Competence A.1. Information Systems and Business Strategy Alignment include, in order: ‘Develop information security strategy’, ‘optimise choice of ICT solution’, ‘analyse ICT system’, ‘manage ICT data architecture’, ‘Manage IT Security Compliances’, ‘Manage Information Sources’, and so on. The list of skills is sorted in descending order of similarity. Or even, the most similar ESCO ICT skills to the e-Competence C.4. Problem Management includes, in order: ‘ICT Problem Management Techniques’, ‘Implement ICT Risk Management’, ‘Identify ICT System Weaknesses’, ‘Solve ICT System Problems’, ‘Manage System Security’, and so on.

The complete results of the most similar ESCO ICT skills to each e-Competence can be explored by visiting the Zenodo repository (https://zenodo.org/uploads/16575072 (accessed on 26 August 2025)).

4.1.1. Threshold Analysis

It is crucial to define an appropriate thresholding method for selecting matching skills to identify the most relevant ESCO ICT Skills for each e-Competence using MPNet text embeddings and cosine similarity. The first step is the analysis of commonly used approaches seen in the literature to then decide the application to our case. The first common method is to choose a fixed number of top matches (e.g., selecting the top 10 most similar skills for each e-Competence). This method assumes that each e-Competence has the same number of relevant ESCO skills, which is rarely the case. Some e-Competences may have many highly similar ESCO skills, while others may only have a few. Forcing the same number of matches leads to including weakly relevant matches in some cases and missing out on good matches in others. The second method is to choose a fixed similarity threshold (e.g., selecting all skills with similarity >0.4). Cosine similarity scores vary depending on the semantic density and structure of each e-Competence. Therefore, even a similarity of 0.35 could indicate meaningful alignment for some e-Competences, while 0.4 might be too low for others. This rigid threshold fails to account for natural variation across different e-Competences and may either miss relevant skills or include noisy ones. The third option is to combine a fixed number and fixed threshold (e.g., selecting up to 10 skills with similarity >0.4). This hybrid method inherits the limitations of both approaches. It arbitrarily caps the number of matches, which may not reflect the actual distribution of relevant skills per e-Competence. Also, using a fixed similarity threshold still assumes a universal scale, ignoring variations in score distribution for each competence.

As seen, the results of common methods tend to be suboptimal, so it is recommended to work with a new approach. The proposed method is to calculate a dynamic threshold based on the distribution of similarities. For each e-Competence, we compute the mean and standard deviation of the similarity scores across all 1238 ESCO ICT Skills. Then, we select those skills whose similarity scores exceed:

T h r e s h o l d = M e a n + (k \times S T D), f o r k = 1, 1.5, 2, \dots

This method implicitly assumes that the similarity scores are approximately normally distributed, or at least symmetric and continuous enough for the mean and standard deviation to be statistically meaningful indicators of central tendency and dispersion. Under normality, the interpretation of thresholds like “Mean + 1.5 × std” corresponds to selecting values in the upper tail of the distribution, thus targeting significantly above-average matches. If the similarity scores are not normally distributed, using such thresholds could lead to biased or misleading selection. The Kolmogorov–Smirnov (K–S) test is a non-parametric statistical test that compares the empirical distribution function (EDF) of the observed data with the cumulative distribution function (CDF) of a reference distribution—in this case, the normal distribution. The K–S test was applied to check the similarity scores for each e-Competence separately, confirming that normality holds (or approximately holds) for the data distributions involved in threshold computation.

The threshold is dynamically calculated for each e-Competence, making it sensitive to the distribution of similarity scores specific to that competence. The use of standard deviation reflects how dispersed the scores are, so selecting skills that are significantly above the mean helps capture only strongly relevant matches. Furthermore, controlling how strict the filtering is can be performed by adjusting the value of k: lower k favours broader coverage, and higher k ensures stricter filtering. Table 4 represents different thresholds for each e-Competence based on values of different k. The values T, N and P in Table 4 represent the threshold, the number of skills above the threshold and the percentage of the skills above the threshold, respectively.

The analysis of the number and percentage of ESCO ICT Skills selected for each e-Competence by testing different threshold values helped to find a value of k that ensures high precision (i.e., only selecting strongly relevant skills) while avoiding an excessively large or small selection that might either dilute quality or miss key matches. At k = 1, on average 15–17% of ESCO ICT Skills per competence were selected. This is a relatively large number (often over 200 skills), many of which may have only moderate semantic similarity. This introduces noise and reduces the discriminative power of the matching. Similarly, at k = 1.5, the percentage drops to about 6–8%, which is an improvement in precision. However, there were still a considerable number of skills per e-Competence (~80–100), and in many cases, the similarity scores are only marginally above average.

With k = 2, the threshold increases significantly, meaning only those skills with strong semantic similarity (i.e., outliers far above the average) are selected. The number of selected skills drops to 2–4% of the total (usually fewer than 40 skills per e-Competence). This tighter selection improves precision by focusing on the most relevant and confidently matched skills and avoids overwhelming downstream processes (e.g., manual validation, integration into frameworks). This was confirmed by the two senior experts of the department who acted as independent human assessors of the results. Therefore, the analysis will focus on the ESCO ICT skills that are most similar to each e-Competence, based on the threshold calculated using k = 2 from this point onwards.

4.1.2. Skills/Knowledge and Number of Repetitions

The extracted skills for each e-Competence were further categorized based on their nature in the ESCO classification into two groups: Skill/Competence and Knowledge. Categorisation was provided by ESCO, as each item in the Skills Pillar was already categorized by the more than 200 hundred experts who developed the classification as well as the consultants, taxonomists and the maintenance committee of ESCO who supported the work. For example, Table 5 presents this categorization for the e-Competence B2. Component Integration. The skills are listed in descending order based on their similarity scores.

Out of the 20 extracted skills, 14 are categorized as Skill/Competence, accounting for 70% of the total and 6 are categorized as Knowledge, accounting for 30% of the total. This distribution shows a strong alignment with the practical and applied nature of B.2. Component Integration, which focuses on designing, integrating, and deploying system components in a coherent manner. The predominance of skill/competence-type entries suggests that the semantic mapping has effectively captured the operational and implementation-oriented aspects of this e-Competence. Moreover, the knowledge-type entries such as ICT system integration, hardware platforms, and system design support the theoretical and foundational understanding necessary for effective component integration.

4.1.3. Complementary Human Check

Human check of results on an analysis with NLP is a usual practice in the literature with a good number of different possible methods [39]. This study relied in the independent opinion of two senior experts with long experience in ICT skills frameworks who works in the Department of Computer Science at Universidad de Alcalá. They helped to determine the filtering threshold for the selection of ESCO items for each competence. When dealing with the specific results, they were given a small subset of 10 competences with a limited amount of 5 related ESCO items to obtain their opinion on their relevance for the mapping. They can rate relevance with three categories: Totally linked, Relevant, Irrelevant. The level of agreement was high reaching 88%, with positive opinion in 100% of cases (either totally linked and/or relevant). Regarding the typical indicators of inter-rater agreement reliability, while the traditional Cohen’s Kappa is 0.4339, a moderate value, the PABAK indicator [40], that prevents paradoxes of high levels of agreement and low traditional Kappa Index, reaches value of 0.865, very high. High values of both agreement and index provide a reasonable endorsement of validation to the results of the NLP analysis.

4.1.4. Answering RQ1

The first research question for this study was “RQ1: How can e-CF be enriched by adding more skills and knowledges from ESCO?”. As discussed in Section 2, the e-CF framework currently provides a limited number of example skills and knowledge items for each e-Competence, which may not fully capture the scope and practical applications of those competences. Obviously, the purpose of dimension 4 in e-CF has never been providing an exhaustive or wide sample of possible skills or knowledge items linked to each e-competence but only some illustrative examples. However, less experienced practitioners and readers of the standard may need a larger number of examples to obtain a clearer understanding of each e-competence and more clues for practical identification of elements covered by each competence.

It is possible to identify semantically similar ESCO ICT skills for each e-Competence by using MPNet text embeddings and cosine similarity and applying a dynamic thresholding method based on the distribution of similarity scores (Mean + (k × STD)). This allowed to select only those ESCO skills that are strongly aligned with each competence. For example, e-Competence A.1 Information Systems and Business Strategy Alignment was linked to highly relevant ESCO skills such as ‘develop information security strategy’, ‘optimize choice of ICT solution’, ‘analyse ICT system’ and so on—providing a richer, more practical view of the competence in real-world contexts.

This enrichment brings multiple benefits. ESCO skills are well-defined, with detailed descriptions, alternative labels, and links to occupations and related knowledge, making them highly informative for end-users. Linking these to e-CF competences improves clarity, enhances semantic understanding, and enables better application in areas such as workforce development, education design, and job matching. Furthermore, since ESCO is maintained by the European Commission and widely adopted across labour and education systems, aligning it with e-CF fosters interoperability between frameworks and increases the usability of e-CF across multilingual and cross-border contexts. Overall, this integration supports a more flexible, data-driven, and future-ready competence framework for Europe’s evolving digital landscape.

4.2. Equivalent ESCO ICT Skills to Skills and Knowledge Examples in e-CF

As previously mentioned, the e-CF framework provides only a limited number of skills and knowledge examples for each e-Competence (dimension 4). These examples are often brief, and their interpretation can be challenging, especially for beginners or those unfamiliar with the specific terminology. The implementation of the method presented above enables the comparison of the he skills and knowledge examples in each e-Competence with the ESCO ICT skills, to identify the most equivalent ESCO entry for each item.

For example, Table 6 shows the most similar ESCO ICT skills to the third knowledge example (K3) and the fourth skill example (S4) within e-Competence A.8. The results are sorted in descending order of similarity.

A relevant outcome of Table 6 is the multiplicity of equivalent ESCO ICT skills for a single knowledge example such as K3 in e-Competence A.8, thus the equivalence is not biunivocal with a simple link 1:1. The equivalent ESCO skills come with alternative names, a detailed description, and additional metadata such as the related occupations and relevant knowledge areas. However, it does not allow the automatic definition of all examples present in dimension 4 of e-CF although it provides several key benefits (e.g., enhance the interpretability of e-CF knowledge examples, richer context for implementation in real-world settings, better alignment between competence frameworks and labour market data, etc.),

Answering RQ2

The second research question RQ2 “What is the most equivalent ESCO ICT skill or knowledge for each of the illustrative examples in the dimension 4 of e-CF competences?” can be answered through the results of the analysis with text embeddings and cosine similarity (as described in Section 3). It is possible to identify the most similar ESCO item for each example associated with every e-Competence in the e-CF framework. However, it is relevant to point out several relevant facts and findings:

The comparison of a simple denomination of an example of skills or knowledge present in the dimension 4 of e-CF is challenging because it only contains a short text (e.g., “K3 green ICT and environmental standards”) without additional description or information. The methods based on NLP are more precise and solid when the length of text to be compared is long. This means that the same example K3 in Table 4 would be then linked to several ESCO items with a high degree of similarity, possibly because all of them represent an aspect of the suggested meaning of K3, without a clear possibility of discerning only one given the scarce explanation of the example. This reflects the same uncertainty that a human may experience in trying to do the same exercise of comparison. So, the effectiveness in the mapping of examples of dimension 4 of e-CF to ESCO is hindered but the scarce information provided in the standard. Possibly only those examples which are almost the same text as the name of an ESCO items could offer a very clear link with strict semantic allocation (e.g., S1 “create and manage a test plan” of B.3 Testing competence linked to “plan software testing”): it is relevant to remind that ESCO recognized that the development of the ICT skills and knowledges was also inspired by the precedent version of 2016 of the e-CF standard [41] which was only slighted modified by the present version of 2019 used in the analysis.
Although the previous point might be considered a problem for the effectiveness of the analysis, the mapping of example of dimension 4 should be more considered as a pre-processing information for users of the frameworks rather than a relatively precise solution. The order of similarity is meaningful but selecting the threshold above which ESCO item/s of the list is/are adequate as equivalent/s to the example should require extra decision by humans based on additional information of the case context. Unfortunately, our human check (see Section 4.1.3) could not be helpful given the dependence on the context for the assessment, so the experts were not inquired on this point. It is possible to find that one example could be initially connected to several ESCO items and that one ESCO item could be selected as the most equivalent to several examples of dimension 4 of e-CF (see Section 4.2). The level of granularity of the items in both sides might also influence the results.

While this mapping is inconclusive for those who would like a simple and evident set of links between both frameworks, it still offers several key advantages. Firstly, ESCO skills are well-defined and richly described, including not only clear definitions but also alternative labels, related occupations, and associated knowledge areas. Linking e-CF examples to their closest ESCO equivalence candidates enhances understanding, especially for users who require more context or practical interpretation. They can decide the best options after this processing of information on 647,474 potential relationships provides a feasible list of ESCO candidate items for each case. Secondly, this alignment bridges two important European frameworks, increasing semantic consistency, searchability, and interoperability across education, training, and labour market applications.

5. Conclusions and Future Work

This study showed how NLP can enable two relevant actions for improving the European e-Competence Framework (e-CF) by enriching it with information extracted from ESCO set of ICT skills and knowledge items. While e-CF has been adopted as a reference for ICT competence development in multinational companies in Europe, it has a key limitation: it provides only a small set of illustrative examples (typically 6–8) for each e-Competence. NLP allowed semantic mapping by using text embedding techniques based on a pre-trained MPNet language model and cosine similarity. This approach enabled the identification of the most relevant ESCO skill and knowledge items that can be logically added as meaningful illustrative examples for each e-CF e-competence in dimension 4.

Another limitation of e-CF is that its examples in dimension 4 are often concise and lack sufficient context for practical understanding or implementation by non-experts. This limitation can be effectively addressed by mapping ESCO skills to e-CF. Each ESCO skill is supported by a detailed description, alternative labels, and connections to relevant occupations, which collectively enrich the semantic depth and practical clarity of each e-CF competence. The results of the mapping are not so simplistic and conclusive as the nature of dimension 4 example with short text and without additional explanation is not large enough for more precise alignment to ESCO items. However, the list of most similar candidates can help practitioners in deciding the number of ESCO items best connected to each of the examples of dimension 4 of e-CF, enabling a feasible analysis if we compare with more than 600,000 possible relationships among items of both frameworks.

This bidirectional mapping brings significant advantages for a variety of stakeholders. Educators can design more targeted curricula, HR professionals can develop more accurate job profiles, and policymakers can better align digital competence frameworks with labour market data—particularly because ESCO is already the foundational taxonomy used in EU-wide labour market analysis: all reports and labour intelligence tools of the EU use the terminology of ESCO. Linking ESCO to e-CF therefore bridges the gap between competence frameworks and real-world workforce needs. However, it is relevant to note that this work and the adopted approach is not intended to support automated decision support systems (DSS) for, e.g., job matching, hiring, and educational decisions. The goal was enriching the information available in the text of EN16234 to facilitate its application by humans. The mentioned DSS might be influenced by possible bias in the underlying LLMs or the fed data, raising concerns on ethical considerations in possible real-world application and fairness of the system. Obviously, that line of work with DSS would require additional research for a proper application within ethical considerations.

The methodology presented here can be extended to more aspects for further e-CF enrichment. The same NLP-based approach can be used to map not only other frameworks to each other but also training courses to frameworks like e-CF or ESCO, enabling the evaluation of how well educational offerings align with current job market demands. It is even possible to add more information to e-CF not explicitly clear in the standard: (a) detecting references to attitudes and soft skills in the text of e-competences as they are only “embedded in all three dimensions” according what it is declared in Section 5.6 of EN16234 (as opposed to the illustrative examples of skills and knowledge in dimension 4 used in this analysis); (b) analyzing the possible allocation of a proficiency level to each of the example of dimension 4 to be connected to one defined in dimension 3.

Obviously, the focus of this study is on the ICT profession and, specifically, on two European frameworks—ESCO and e-CF. Two main reasons have impulse this scope: (a) the existence of a standard like EN16234 on ICT professional competences with specific opportunities and deficits of details that make it suitable for enriching and complementing its information; (b) the special fast evolution of ICT that recommends agile solutions supported by NLP for managing the information on professional profiles and skills. One objective in this work was applying the proposed methodology within a well-defined terminology and data-rich context, with well-structured models and frameworks. The prospects for extending to new cases would start with different skill frameworks also applicable to the ICT professionalism (such as SFIA, O*NET, or national qualification frameworks). It is also possible to plan a broader applicability with subsets of skills like soft skills (where solid frameworks like SkillsMatch exists [42]) or digital skills for all (with EU official frameworks like DigComp [43]). However, it is important to note that offering a universal solution for all possible skills frameworks without any limitation in the scope or the structuredness of models is out of the intentions of this study.

Finally, exploring other pre-trained language models, such as BERT, Sentence-BERT (S-BERT), or RoBERTa, and comparing their performance against MPNet could help identify the most effective model for semantic mapping for the research with the above-mentioned skills frameworks.

Author Contributions

Conceptualization, L.F.-S. and D.Z.; methodology, L.F.-S. and D.Z.; software, D.Z.; validation, V.P., I.L.-B. and L.F.-S.; formal analysis, D.Z.; investigation, V.P., I.L.-B. and L.F.-S.; resources, D.Z.; data curation, D.Z., V.P., I.L.-B. and L.F.-S.; writing—original draft preparation, L.F.-S. and D.Z.; writing—review and editing, V.P. and I.L.-B.; visualization, D.Z.; supervision, L.F.-S. and V.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Statement excluded as the study did not involve humans.

Data Availability Statement

Relevant data available at https://zenodo.org/uploads/16575072 (accessed on 26 August 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ICT	Information and Communication Technology
EU	European Union
NLP	Natural Language Processing
LLM	Large Language Model
e-CF	European e-Competence Framework
ESCO	European Skills, Competences, Qualifications and Occupations

References

Vu, K.; Hanafizadeh, P.; Bohlin, E. ICT as a driver of economic growth: A survey of the literature and directions for future research. Telecommun. Policy 2020, 44, 101922. [Google Scholar] [CrossRef]
Digital Skills & Jobs Platform. Eurostat Survey on the Skills Gap|Digital Skills and Jobs Platform. Available online: https://digital-skills-jobs.europa.eu/en/latest/news/ict-specialists-skills-gap-hinders-growth-eu-countries (accessed on 6 September 2023).
Alldigital, Huawei, and Supported by EY. Strategies to Address the Digital Skills Gap in the EU. Available online: https://www.europeandigitalskills.eu/sites/TDSG/uploads/files/white-paper-eu-digital-skills-gap.pdf (accessed on 2 July 2025).
EN 16234-1:2021; e-Competence Framework (e-CF)—A Common European Framework for ICT Professionals in All Sectors—Part 1: Framework. CEN European Committee for Standardization: Brussels, Belgium, 2021.
Tissot, P.; Centre Européen Pour le développement de la Formation Professionnelle. Terminology of Vocational Training Policy: A Multilingual Glossary for an Enlarged Europe; Office for Official Publications of the European Communities: Luxembourg, 2004. [Google Scholar]
Blázquez, M. Skills-Based Profiling and Matching in PES; Publications Office of the European Union: Luxembourg, 2014. [Google Scholar]
Geskus, D. Skill frameworks: Definition and Use. Available online: https://www.learned.io/en/hr-dictionary/skill-frameworks-definition-and-use/ (accessed on 28 June 2025).
European Commission. European Multilingual Classification of Skills, Competences, Qualifications and Occupations. ESCO. Available online: https://esco.ec.europa.eu/en/classification (accessed on 19 June 2024).
Fernández-Sanz, L.; Gómez-Pérez, J.; Castillo-Martínez, A. e-Skills Match: A framework for mapping and integrating the main skills, knowledge and competence standards and models for ICT occupations. Comput. Stand. Interfaces 2017, 51, 30–42. [Google Scholar] [CrossRef]
Bowers, D.; Sabin, M. Using a Professional Skills Framework to Support the Assessment of Dispositions in IT Education. In Proceedings of the 23rd Annual Conference on Information Technology Education, Chicago, IL, USA, 21–24 September 2022; ACM: New York, NY, USA, 2022; pp. 103–109. [Google Scholar] [CrossRef]
González-Pérez, L.I.; Ramírez-Montoya, M.S. Components of Education 4.0 in 21st Century Skills Frameworks: Systematic Review. Sustainability 2022, 14, 1493. [Google Scholar] [CrossRef]
Nikoloski, D.; Sulich, A.; Sołoducho-Pelc, L.; Mancheski, G.; Angelski, M.; Petkoska, M.M. Identifying green skills gaps through labor market intelligence. J. Infrastruct. Policy. Dev. 2024, 8, 4868. [Google Scholar] [CrossRef]
De Smedt, J.; le Vrang, M.; Papantoniou, A. ESCO: Towards a Semantic Web for the European Labor Market. In Proceedings of the Workshop on Linked Data on the Web, LDOW 2015, Florence, Italy, 19 May 2015; Available online: https://ceur-ws.org/Vol-1409/paper-10.pdf (accessed on 26 August 2025).
CEN/TS 17699:2022; Guidelines for Developing ICT Professional Curricula as Scoped by EN 16234-1 (e-CF). CEN: Brussels, Belgium, 2022. Available online: https://standards.iteh.ai/catalog/standards/sist/8e0e2338-0b25-4b4b-add2-560b031c7d94/sist-ts-cen-ts-17699-2022 (accessed on 26 August 2025).
EN 16234-2:2021; e-Competence Framework (e-CF)—A Common European Framework for ICT Professionals in All Industry Sectors—Part 2: User Guide. CEN European Committee for Standardization: Brussels, Belgium, 2021.
Fraile, F.; Psarommatis, F.; Alarcón, F.; Joan, J. A Methodological Framework for Designing Personalised Training Programs to Support Personnel Upskilling in Industry 5.0. Computers 2023, 12, 224. [Google Scholar] [CrossRef]
Chang, X.; Wang, B.; Hui, B. Towards an Automatic Approach for Assessing Program Competencies. In Proceedings of the LAK22: 12th International Learning Analytics and Knowledge Conference, Online, 21–25 March 2022; ACM: New York, NY, USA, 2022; pp. 119–129. [Google Scholar] [CrossRef]
Gugnani, A.; Misra, H. Implicit Skills Extraction Using Document Embedding and Its Use in Job Recommendation. AAAI 2020, 34, 13286–13293. [Google Scholar] [CrossRef]
Nkrumah, S.K.; Tucker, S.M.; Boyle, F.; Walsh, J. A review of competency frameworks and AI-driven NLP techniques for skill extraction, mapping and recommending: Informing the design of the reshape interactive digital skills platform. In Proceedings of the 17th International Conference on Education and New Learning Technologies, Palma, Spain, 30 June–2 July 2025; pp. 7056–7064. [Google Scholar] [CrossRef]
Decorte, J.-J.; Verlinden, S.; Van Hautte, J.; Deleu, J.; Develder, C.; Demeester, T. Extreme Multi-Label Skill Extraction Training using Large Language Models. arXiv 2023. [Google Scholar] [CrossRef]
Clavié, B.; Soulié, G. Large Language Models as Batteries-Included Zero-Shot ESCO Skills Matchers. arXiv 2023. [Google Scholar] [CrossRef]
Mason, C.M.; Chen, H.; Evans, D.; Walker, G. Illustrating the application of a skills taxonomy, machine learning and online data to inform career and training decisions. IJILT 2023, 40, 353–371. [Google Scholar] [CrossRef]
Neutel, S.; de Boer, M.H.T. Towards Automatic Ontology Alignment using BERT. In Proceedings of the AAAI Spring Symposium Combining Machine Learning with Knowledge Engineering, Stanford University, Palo Alto, CA, USA, 22–24 March 2021; Volume 2846. [Google Scholar]
Demchenko, Y.; Maijer, M.; Comminiello, L. Data Scientist Professional Revisited: Competences Definition and Assessment, Curriculum and Education Path Design. In Proceedings of the 2021 4th International Conference on Big Data and Education, London, UK, 3–5 February 2021; ACM: New York, NY, USA, 2021; pp. 52–62. [Google Scholar] [CrossRef]
Conley, D.T. Crosswalk Analysis of Deeper Learning Skills to Common Core State Standards; Educational Policy Improvement Center (NJ1): Eugene, OR, USA, 2011; p. 17. Available online: https://files.eric.ed.gov/fulltext/ED537878.pdf (accessed on 26 August 2025).
Razzaq, L.; Heffernan, N.T.; Feng, M.; Pardos, Z.A. Developing Fine-Grained Transfer Models in the ASSISTment System; OCP Science imprint: Philadelphia, PA, USA, 2007; Volume 5, Available online: https://web.cs.wpi.edu/~leenar/publications/ticl_final.pdf (accessed on 26 August 2025).
Subramaniam, M.; Ahn, J.; Waugh, A.; Taylor, N.G.; Druin, A.; Fleischmann, K.R.; Walsh, G. Crosswalk between the ‘Framework for K-12 Science Education’ and ‘Standards for the 21st-Century Learner’: School Librarians as the Crucial Link. Sch. Libr. Res. 2013, 16, 28. [Google Scholar]
Coombe, L.; Severinsen, C.A.; Robinson, P. Mapping competency frameworks: Implications for public health curricula design. Aust. N. Z. J. Public Health 2022, 46, 564–571. [Google Scholar] [CrossRef] [PubMed]
Li, Z.; Ren, C.; Li, X.; Pardos, Z.A. Learning Skill Equivalencies Across Platform Taxonomies. In Proceedings of the LAK21: 11th International Learning Analytics and Knowledge Conference, Irvine, CA, USA, 12–16 April 2021; ACM: New York, NY, USA, 2021; pp. 354–363. [Google Scholar] [CrossRef]
Choi, N.; Song, I.-Y.; Zhu, Y. A Model-Based Method for Information Alignment: A Case Study on Educational Standards. J. Comput. Sci. Eng. 2016, 10, 85–94. [Google Scholar] [CrossRef]
Yilmazel, O.; Balasubramanian, N.; Harwell, S.; Bailey, J.; Diekema, A.; Liddy, E. Text Categorization for Aligning Educational Standards. In Proceedings of the 2007 40th Annual Hawaii International Conference on System Sciences (HICSS’07), Waikoloa, HI, USA, 3–6 January 2007; IEEE: New York, NY, USA, 2007; p. 73. [Google Scholar] [CrossRef]
Takey, S.M.; Carvalho, M.M.D. Competency mapping in project management: An action research study in an engineering company. Int. J. Proj. Manag. 2015, 33, 784–796. [Google Scholar] [CrossRef]
Cañas, A.J.; Carnot, M.J.; Feltovich, P.J.; Coffey, J.W. A Summary of Literature Pertaining to the Use of Concept Mapping Techniques and Technologies for Education and Performance Support. Report to the Chief of Naval Education and Training. 2003. Available online: https://www.researchgate.net/publication/220017490_A_Summary_of_Literature_Pertaining_to_the_Use_of_Concept_Mapping_Techniques_and_Technologies_for_Education_and_Performance_Support (accessed on 26 August 2025).
Jemal, I.; Armand, N.S.W.; Chikhaoui, B. A new approach for competency frameworks mapping using large language models. Expert Syst. Appl. 2025, 263, 125648. [Google Scholar] [CrossRef]
Hussain, S.A.; Kohli, R.; Zahoor, S.; Sofi, S.A. Transforming the GUI Landscape: Harnessing the Power of MPNet base v2 Sentence Transformers. Procedia Comput. Sci. 2025, 259, 1809–1816. [Google Scholar] [CrossRef]
Czajka, M.M.; Kubacka, D.; Świetlicka, A. Embedding representation of words in sign language. J. Comput. Appl. Math. 2025, 465, 116590. [Google Scholar] [CrossRef]
Birunda, S.S.; Devi, R.K. A Review on Word Embedding Techniques for Text Classification. In Innovative Data Communication Technologies and Application; Lecture Notes on Data Engineering and Communications Technologies; Raj, J.S., Iliyasu, A.M., Bestak, R., Baig, Z.A., Eds.; Springer: Singapore, 2021; Volume 59, pp. 267–281. [Google Scholar] [CrossRef]
Minaee, S.; Mikolov, T.; Nikzad, N.; Chenaghlu, M.; Socher, R.; Amatriain, X.; Gao, J. Large Language Models: A Survey. arXiv 2024, arXiv:2402.06196. [Google Scholar] [CrossRef] [PubMed]
Madsen, A.; Reddy, S.; Chandar, S. Post-hoc Interpretability for Neural NLP: A Survey. ACM Comput. Surv. 2022, 55, 1–42. [Google Scholar] [CrossRef]
Byrt, T.; Bishop, J.; Carlin, J.B. Bias, prevalence and kappa. J. Clin. Epidemiol. 1993, 46, 423–429. [Google Scholar] [CrossRef] [PubMed]
EN 16234-1:2016; e-Competence Framework (e-CF) a Common European Framework for ICT Professionals in All Industry Sectors Part 1: Framework. CEN: Brussels, Belgium, 2016.
Pospelova, V.; Baldominos, I.L.; Fernández-Sanz, L.; Castillo-Martínez, A. Big data and skills frameworks to determine recommendedprofiled of soft skills for IS development. In Proceedings of the Information Systems Development: Crossing Boundaries between Development and Operations (DevOps) in Information Systems (ISD 2021), Valencia, Spain, 8–10 September 2021. [Google Scholar]
Vuorikari, R.; Kluzer, S.; Punie, Y. DigComp 2.2: The Digital Competence Framework for Citizens—With New Examples of Knowledge, Skills and Attitudes; JRC Publications Repository. Available online: https://publications.jrc.ec.europa.eu/repository/handle/JRC128415 (accessed on 17 August 2023).

Figure 1. Analysis methodology.

Figure 2. Similarity between two vectors A and B.

Table 1. Textual data of ESCO—1238 ESCO ICT skills.

ESCO ICT Skill	Alternative Names	Hidden Named	Description
3D lighting	3D lighting effect		The arrangement or digital effect which simulates lighting in a 3D environment.
ASP.NET	ASP.NET framework	ASP.NET 3.5, ASP.net, ASPX ASP+, ASP.NET 2.0, Aspx	The techniques and principles of software development, such as analysis, algorithms, coding, testing and compiling of programming paradigms in ASP.NET.
adjust ICT system capacity	adjust ICT network capacity		Change the scope of an ICT system by adding or reallocating additional ICT system components, such as network components, servers or storage to meet capacity or volume demands.
⋮	⋮	⋮	⋮

Table 2. Textual data of e-CF framework.

Dimension 2 e-Competence	Description	Dimension 3 Proficiency Level	Dimension 4 Knowledge Examples	…
A.1. Information Systems and Business Strategy Alignment	Anticipates long-term business requirements, influences improvement of the organization’s process efficiency and effectiveness. Determines the IS model and enterprise architecture maintaining consistency with organizational policy and ensuring a secure environment. Recognizes …	Level 4. Provides leadership for the construction and implementation of long-term innovative IS solutions. Level 5. Provides IS strategic leadership to reach consensus and commitment from …	K1 business strategy concepts, K2 trends and implications of ICT internal or external developments, K3 potential and opportunities of relevant business models, K4 business aims and organizational objectives, K5 issues and …	…
A.2. Service Level Management	Defines, validates and makes applicable service level agreements (SLAs) and underpinning contracts tailored to services offered. Negotiates service performance levels taking into account the needs and capacity of stakeholders and business.	Level 3 Ensures the content of the SLA. Level 4 Negotiates revision of SLAs, in accordance with the …	K1 SLA documentation, K2 how to compare and interpret management data, K3 elements forming the metrics of service level agreements, K4 how service …	…
⋮	⋮	⋮	⋮

Table 3. Partial list of similarity values between ESCO ICT skills and e-CF for e-Competences A.1 and C.4.

A.1. Information Systems and Business Strategy Alignment		C.4. Problem Management	Similarity
develop information security strategy	0.689305	ICT problem management techniques	0.703097
optimize choice of ICT solution	0.675653	implement ICT risk management	0.666545
analyze ICT system	0.65918	identify ICT system weaknesses	0.665475
manage ICT data architecture	0.652841	solve ICT system problems	0.663608
design enterprise architecture	0.638055	manage system security	0.658589
manage ICT project	0.635664	handle cybersecurity incidents	0.647437
conduct impact evaluation of ICT processes on business	0.633256	perform ICT troubleshooting	0.64483
develop solutions to information issues	0.63275	maintain ICT system	0.630015
propose ICT solutions to business problems	0.62897	advice on security risk management	0.629663
business ICT systems	0.628387	execute ICT audits	0.623198
ICT architectural frameworks	0.625536	lead disaster recovery exercises	0.621125
manage IT security compliances	0.625014	identify ICT security risks	0.618774
apply ICT systems theory	0.620004	propose ICT solutions to business problems	0.610768
implement ICT risk management	0.61911	establish an ICT security prevention plan	0.610649
execute ICT audits	0.616462	respond to incidents in cloud	0.607902
analyze business plans	0.615702	system backup best practice	0.605229
manage business knowledge	0.608695	provide ICT support	0.602158
manage system security	0.607102	maintain ICT server	0.598241
analyze business requirements	0.600925	establish an ICT customer support process	0.596271
⋮	⋮	⋮	⋮

Table 4. Threshold based on the value of k.

			K = 1			K = 1.5			K = 2
	Mean	STD	T1	N1	P1	T2	N2	P2	T3	N3	P3
A.1	0.359174	0.108794	0.46796808	208	17%	0.52236505	98	8%	0.57676202	33	3%
A.2	0.280455	0.095374	0.37582887	212	17%	0.42351565	81	7%	0.47120243	20	2%
A.3	0.331246	0.108783	0.44002883	207	17%	0.49442018	96	8%	0.54881152	32	3%
A.4	0.346605	0.110622	0.45722772	208	17%	0.51253892	91	7%	0.56785012	33	3%
A.5	0.345457	0.103856	0.44931237	192	16%	0.50124019	90	7%	0.55316802	35	3%
A.6	0.362504	0.100225	0.46272926	203	16%	0.51284177	89	7%	0.56295428	27	2%
A.7	0.359154	0.099462	0.45861647	188	15%	0.50834751	95	8%	0.55807856	40	3%
A.8	0.247841	0.098681	0.34652143	169	14%	0.39586186	80	6%	0.44520229	44	4%
A.9	0.322629	0.091987	0.41461639	181	15%	0.46060994	82	7%	0.50660348	32	3%
A.10	0.320139	0.088158	0.40829661	195	16%	0.45237555	75	6%	0.49645448	37	3%
B.1	0.332462	0.091072	0.42353407	205	17%	0.46907021	77	6%	0.51460634	23	2%
B.2	0.353304	0.085212	0.43851642	196	16%	0.48112238	76	6%	0.52372834	20	2%
B.3	0.359076	0.091263	0.45033952	189	15%	0.49597105	81	7%	0.54160259	27	2%
B.4	0.400683	0.09851	0.49919383	203	16%	0.54844901	69	6%	0.59770418	11	1%
B.5	0.350833	0.099502	0.45033529	173	14%	0.5000862	75	6%	0.5498371	31	3%
B.6	0.358816	0.097748	0.45656411	206	17%	0.50543806	88	7%	0.554312	24	2%
C.1	0.345209	0.095391	0.44060045	200	16%	0.48829597	90	7%	0.53599149	36	3%
C.2	0.385289	0.101528	0.48681726	199	16%	0.53758139	92	7%	0.58834552	24	2%
C.3	0.369409	0.110855	0.48026374	220	18%	0.53569124	93	8%	0.59111874	24	2%
C.4	0.360058	0.105649	0.46570708	192	16%	0.51853183	86	7%	0.57135658	33	3%
C.5	0.341335	0.092815	0.43414981	195	16%	0.4805573	88	7%	0.5269648	27	2%
D.1	0.294545	0.113822	0.40836722	182	15%	0.46527818	92	7%	0.52218914	41	3%
D.2	0.303714	0.11049	0.41420385	189	15%	0.469449	91	7%	0.52469415	42	3%
D.3	0.278496	0.101334	0.37982969	191	15%	0.43049677	103	8%	0.48116384	51	4%
D.4	0.309614	0.102691	0.41230445	193	16%	0.46364974	81	7%	0.51499504	33	3%
D.5	0.266207	0.098868	0.36507475	193	16%	0.41450866	100	8%	0.46394257	44	4%
D.6	0.291185	0.093712	0.38489674	192	16%	0.43175259	89	7%	0.47860845	31	3%
D.7	0.316185	0.099481	0.4156655	175	14%	0.46540586	87	7%	0.51514622	45	4%
D.8	0.297701	0.099224	0.39692552	214	17%	0.4465376	87	7%	0.49614968	29	2%
D.9	0.284809	0.098598	0.38340649	196	16%	0.43270537	87	7%	0.48200424	35	3%
D.10	0.325382	0.101363	0.42674541	217	18%	0.47742713	83	7%	0.52810886	31	3%
D.11	0.352069	0.100741	0.45281065	202	16%	0.50318123	92	7%	0.55355181	28	2%
E.1	0.254621	0.094824	0.34944512	192	16%	0.39685703	91	7%	0.44426894	42	3%
E.2	0.327934	0.098999	0.42693249	201	16%	0.47643185	84	7%	0.5259312	31	3%
E.3	0.322887	0.106451	0.42933796	197	16%	0.48256352	90	7%	0.53578908	29	2%
E.4	0.281301	0.100163	0.38146471	211	17%	0.43154635	89	7%	0.48162798	37	3%
E.5	0.327938	0.102579	0.43051722	211	17%	0.48180669	84	7%	0.53309617	27	2%
E.6	0.312241	0.109632	0.42187283	191	15%	0.47668883	86	7%	0.53150483	42	3%
E.7	0.3055	0.103337	0.40883689	202	16%	0.46050548	94	8%	0.51217406	33	3%
E.8	0.338597	0.110367	0.44896467	202	16%	0.50414838	100	8%	0.55933209	35	3%
E.9	0.342209	0.116822	0.45903091	204	16%	0.51744173	90	7%	0.57585255	37	3%

Table 5. Classification of Skill/Competence or Knowledge in e-Competence B.2.

B.2. Component Integration
ESCO ICT Skills	Skill Type
integrate system components	skill/competence
define integration strategy	skill/competence
ICT system integration	knowledge
design component interfaces	skill/competence
ICT system programming	knowledge
acquire system component	skill/competence
solution deployment	knowledge
align software with system architectures	skill/competence
use interface description language	skill/competence
analyze software specifications	skill/competence
interfacing techniques	knowledge
interpret technical requirements	skill/competence
execute integration testing	skill/competence
hardware components	knowledge
manage system testing	skill/competence
deploy ICT systems	skill/competence
hardware platforms	knowledge
keep up with the latest information systems solutions	skill/competence
system design	knowledge
maintain ICT server	skill/competence

Table 6. The most similar ESCO ICT skills to K3 and S4 in e-Competence A.8.

K3 Green ICT and Environmental Standards		S4 Analyze Social and Financial Sustainability Implications of ICT Developments and Operations
ESCO ICT Skill	Similarity	ESCO ICT Skill	Similarity
ICT environmental policies	0.772534311	ICT market	0.626046836
green computing	0.714408696	ICT environmental policies	0.61523807
protect the environment from the impact of the digital technologies	0.635635018	conduct impact evaluation of ICT processes on business	0.591272593
develop environmental policy	0.560505867	apply ICT systems theory	0.534303904
legal requirements of ICT products	0.545170069	optimize choice of ICT solution	0.521724939
sustainable technologies	0.535755217	ICT safety	0.518734813
manage environmental impact of operations	0.530024529	green computing	0.516135335
⋮	⋮	⋮	⋮

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zare, D.; Fernandez-Sanz, L.; Pospelova, V.; López-Baldominos, I. NLP and Text Mining for Enriching IT Professional Skills Frameworks. Appl. Sci. 2025, 15, 9634. https://doi.org/10.3390/app15179634

AMA Style

Zare D, Fernandez-Sanz L, Pospelova V, López-Baldominos I. NLP and Text Mining for Enriching IT Professional Skills Frameworks. Applied Sciences. 2025; 15(17):9634. https://doi.org/10.3390/app15179634

Chicago/Turabian Style

Zare, Danial, Luis Fernandez-Sanz, Vera Pospelova, and Inés López-Baldominos. 2025. "NLP and Text Mining for Enriching IT Professional Skills Frameworks" Applied Sciences 15, no. 17: 9634. https://doi.org/10.3390/app15179634

APA Style

Zare, D., Fernandez-Sanz, L., Pospelova, V., & López-Baldominos, I. (2025). NLP and Text Mining for Enriching IT Professional Skills Frameworks. Applied Sciences, 15(17), 9634. https://doi.org/10.3390/app15179634

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

NLP and Text Mining for Enriching IT Professional Skills Frameworks

Abstract

Featured Application

Abstract

1. Introduction

2. Literature Review

2.1. Skills Frameworks

2.1.1. e-CF (Standard EN 16234)

2.1.2. ESCO

2.2. Skills Frameworks Mapping and the Application of NLP

3. Analysis of the Framework

3.1. Methodology

3.2. Data Preparation and Pre-Processing

3.3. Text Embedding Using LLMs

3.4. Similarity Metrics

4. Results and Discussion

4.1. The Most Similar ESCO ICT Skills to e-Competences

4.1.1. Threshold Analysis

4.1.2. Skills/Knowledge and Number of Repetitions

4.1.3. Complementary Human Check

4.1.4. Answering RQ1

4.2. Equivalent ESCO ICT Skills to Skills and Knowledge Examples in e-CF

Answering RQ2

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI