1. Introduction and Literature Review
Businesses need effective leaders who understand and can manage the complexities of the rapidly changing global environment. We are witnessing a new paradigm shift in traditional management and leadership understanding. In recent years, leadership has increasingly emphasized qualities such as creativity, guidance, motivation, and influence, diverging from traditional command- and authority-based management paradigms [
1,
2]. These qualities are considered essential for navigating organizational complexities and fostering innovation [
3].
To maintain their presence in a competitive environment, enterprises must expect potential leadership candidates in human resource management to appreciate employees, care for them, involve them in decision-making, allow for transparency, improve the working atmosphere, and reduce stress while increasing resources [
4]. Leadership candidates should acquire these competencies through training programs and/or experience during their career development process. The transformational leadership approach provides a roadmap for acquiring these competencies.
According to Burns, who first introduced the concept of “transformational leadership”, a transformational leader is someone who recognizes the wants or needs of customers, employees, and other stakeholders, meets these needs, understands how to motivate the relevant parties, and enables them to move toward higher-than-normal goals [
5]. Theories on transformational leadership that began with Burns emphasize the leader’s ability to be visionary, to make all internal and external stakeholders believe in the proposed vision, to be a role model, to build close relationships, to instill confidence in change, to convince and encourage the stakeholders, to motivate them to adapt to change, to turn personal interests into common interests, and to mobilize and direct them toward common goals. Transformational leaders can be categorized into various types. Among intellectual transformational leaders are thinkers such as Locke, Bentham, and Mill, while notable reformist leaders include Alexander the Great. Leaders who radically changed existing systems include Luther and Lenin. Alongside these examples, Jeff Bezos’ vision for Amazon and its entry into e-commerce exemplify successful transformational leadership based on short-term goals [
6,
7].
In career planning, public administrators’ technical, managerial, and leadership skills are developed through better training curricula compared to the past [
8]. An appropriate management approach that aligns with the corporate culture is a competency that administrator candidates will gain through experience. The ability to make career plans by selecting leadership candidates from the current employee profile provides human resource management with a proactive capability. Effective data collection is critical in this context as it informs resource management and enhances employee capability. Systematic data collection enables HR managers to identify potential leaders within the organization, ensuring a well-prepared talent pipeline that aligns with strategic goals [
9]. Additionally, collecting and analyzing data on employee performance, skills, and career aspirations helps in making informed decisions about training and development programs, thereby fostering a more competent and motivated workforce [
10]. This proactive approach not only aids in succession planning but also enhances overall organizational fluidity by ensuring that the right talent is ready to meet future challenges [
11]. Hence, effective data collection serves as the foundation for strategic HR practices that support long-term organizational success. Therefore, the selection and training of leadership candidates in organizational development, transformation, and human resources is an important subject of study and enterprises or organizations need model-based approaches to be pursued in this matter.
Although studies on the development of model-based approaches are limited, research indicates that individuals with high-quality relationships with their managers are more likely to accept change [
12]. In evolving market conditions, enterprises seeking to enhance their competitiveness are increasingly adopting transformational leadership day by day. In the public sector, where the demand for improved public service is rising, HR management decisions should be guided by a systematic, data-driven leadership approach [
13].
In literature-based research, it has been observed that academic research on the concept of “Transformational Leadership” has started to increase since the 2000s, and the number of publications, which was 19 per year in 2010, increased to 157 publications per year in 2020 [
14]. Especially in recent years, the importance and popularity of this concept has increased significantly. Likewise, “Job Satisfaction” is also a prominent topic in the field of leadership. In addition, leadership approaches that focus on ethical values such as “Ethical Leadership” and “Service Leadership” have also been found to be an important area of research [
14,
15]. These approaches emphasize that leaders exhibit behaviors in accordance with ethical values and adopt a service-oriented leadership approach. Researchers emphasize these concepts to understand the various dimensions of leadership and their interactions.
Based on these analyses, it is possible to make certain inferences about the future orientations of leadership studies. In particular, it can be predicted that topics such as “Transformational Leadership” and “Job Satisfaction” will continue to increase in importance. In addition, leadership approaches focusing on ethical values and research on different leadership styles will also be among the topics that will persist. It is seen that the literature on leadership has shown significant development in the period between 2000 and 2022 and that certain concepts have come to the forefront. These analyses provide a valuable resource for understanding the trends in leadership research and for guiding future studies. In future studies, these findings can be further examined in depth and contribute to the development of leadership theories and practices.
Sources of technology are pioneering digital transformation and require a set of skills that are not understood and need to be developed by leadership. Cloud computing has facilitated ML and AI where human insight is limited, using algorithms for analytics that require greater size and scale to provide data for decision-making and enabling transformative technologies that are changing the face of industry sectors [
16]. The extensive data streams in the digital economy have created a new paradigm for business intelligence processes, increasing the potential of advanced analytical and cognitive data tools. Big data structures are used in business intelligence (BI) to work with large amounts of data to extract value for effective business decisions [
17]. Big data is transforming BI processes and increasing the use of advanced analytics as well as cognitive data tools. Sound leadership can be achieved by administrators being role models and managing their employees effectively. Appreciating and caring for employees, involving them in decision-making, providing transparency, improving the work atmosphere, reducing stress, and increasing resources are key characteristics of sound leadership.
However, factors such as leadership skills and information availability are critical for the use of performance information. Decentralized structures in the public sector can foster a culture of training and development through networks and intensive collaboration. New technologies such as ML and AI are geared toward processing larger data volumes and a greater number of traits in leadership research. However, the interpretation of the relationship between leadership effectiveness and traits is less clear compared to traditional methods. Therefore, it is important to consider methodological approaches and new methods in leadership research.
In various studies examining behavioral analyses, information was collected and analyzed through studies in which visual reality experiences were tested in a virtual environment, where real-life experiences that address digitalization were simulated on subjects [
18,
19,
20,
21,
22]. ML techniques offer the opportunity to develop innovative and model-based analytical approaches in leadership research [
23]. The use of AI techniques and ML in leadership research studies contributes to the literature as it enables analysis with predictive causal models to complement hypothesis-based research [
23,
24]. Using these techniques in large-volume and diverse datasets contributes to producing meaningful results. ML is a discipline that enables computers to develop predictive models based on empirical data through algorithms used in AI [
25]. Nowadays, ML algorithms can examine a large number of variables and generate combinations that reliably predict outcomes [
23]. Recently, a growing number of researchers have investigated how ML techniques applied to big data can be used to study the behavior of individuals in the workplace [
26]. Some leading companies have started to use AI techniques such as ML to digitize decision-making and improve processes to increase employee engagement as well as customer satisfaction [
27]. In a study conducted to predict employee turnover in human resource management, employee turnover and the reasons for turnover in organizations were investigated using ML algorithms and forecasts were made with an accuracy of 93%. In this study, monthly income, hourly wage, job level, and age were also evaluated as the most important factors, and assessments were made to improve the causes of employee turnover [
22].
Academics and practitioners have used various ML and AI algorithms in dataset analysis and forecast models to analyze the dataset and predict relationships. For instance, in an organization where board members are the decision-makers, complex data on human resources were analyzed with ML algorithms and significant results were obtained in identifying and predicting relationships in datasets [
28]. In a study where an AI model was developed by combining artificial neural networks and fuzzy logic, the Multifactor Leadership Scale was used in data collection to measure leadership perception in 102 construction companies operating in the construction sector in the city of Adana, and it was observed that BP, HB, and GA optimization algorithms produced successful results in predicting leadership perception [
29].
In another study, a computational ML method was used to identify and predict leadership perceptions for prominent individuals, building knowledge representations for individuals with high-dimensional semantic vectors derived from large-scale news media datasets. Subsequently, a model paired with respondents’ ratings of leadership effectiveness achieved high out-of-sample accuracy in predicting respondent ratings and was shown to be capable of predicting leadership perceptions of different demographic and political subgroups [
30].
In a study investigating the relationship between personality traits and leadership using ML techniques, an empirical study was conducted to resolve uncertainties. In this study, a large database of trait variables and leadership role occupation (
n = 3385) was used to compare the forecast performance of traditional (parametric) linear models (LMs) with the nonparametric Random Forest (RF) analysis technique. By comparing the predictive performance of an LM and an RF, the complexity of the leader trait paradigm was tested and how academics can unlock the black box of RF models with a range of analytical techniques was presented. It is stated that the complex yet explicable results obtained with ML techniques represent an important step forward in the study of leadership [
24].
In a study aiming to identify management practices that lead to satisfaction among employees in a holistic and consistent manner, using text mining and unsupervised ML methods, an examination of a large number of employees (
n = 5650) found that investing in tools that help foster employees’ positive emotions both increases employee motivation and positively affects company profitability [
31]. However, the concepts of positive practices or virtuousness emanating from administrators have not yet been consistently explained.
Many of the new ML methods are designed for larger data volumes (i.e., larger sample sizes) and a higher dimensionality (i.e., a greater number of features, such as forecast variables) in a dataset than are the norm in leadership research. Furthermore, while advances in ML can help with forecasts, interpreting the results is often less clear than with traditional approaches. This raises the question to what extent using various ML algorithms and techniques can clarify the relationship between traits and leader effectiveness (or any other leadership relationship). In recent years, advances in big data science and data analytics have provided opportunities for innovative work in the analysis of human psychology and productivity and behavioral differences. Nevertheless, despite their potential, big data and ML techniques are rarely used in management studies and especially in leadership research [
30]. Developments related to Industry 4.0 have popularized the use of data-driven systems in leadership, organizational management, and customer relations for businesses [
32].
As a result, it is concluded that the development of approaches that effectively use ML and AI models that take into account big data in the analysis of leadership skills will provide significant benefits to businesses on the way to achieving successful and sustainable results in human resource management career planning. In this study, big data of human resources is analyzed using ML algorithms and a forecast is made for employee titles to identify potential leadership candidates. In
Section 2, the model developed and the algorithms used are presented. In
Section 3, the case study is described and the model implementations are shared. In
Section 4 and
Section 5, the results are evaluated.
3. Case Study of Human Resources of Turkish Post Corporation
The Turkish Post Corporation is a 183-year-old well-established public institution. The PTT offers postal, cargo, logistics e-services, and postal banking services and is the organization with the most widespread service network in Turkey. With nearly 40 thousand employees, the PTT has a very diverse employee profile and it is thought that the results of the case analysis conducted with the big data obtained from the employees will provide very important outputs for the literature. We aimed to use the results of this study on predicting the titles of the personnel employed in the PTT by using ML algorithms in human resource career planning.
3.1. Big Data Analysis and Basic Statistics
The dataset lacks the volume and velocity of big data, yet the application of advanced machine learning algorithms such as k-Nearest Neighbor, Random Forest, Gradient Boosting, and Support Vector Machine significantly enhances its analytical capabilities. These algorithms enable a sophisticated analysis that allows for the extraction of meaningful and valuable insights, thus positioning the dataset within the realm of big data analytics. For instance, RF and GB algorithms excel at uncovering complex relationships among numerous features and variables, while kNN and SVM algorithms are highly effective in performing accurate classification and prediction tasks. This methodological approach compensates for the dataset’s modest size, providing analytical capabilities comparable to those typically associated with larger datasets.
The strategic application of these machine learning techniques offers substantial advantages in business decision-making processes. RF and GB algorithms can identify and prioritize significant features and variables, optimizing business operations and revealing critical insights into customer behavior patterns, market trends, and operational efficiencies. kNN and SVM algorithms enhance predictive analytics, facilitating the forecasting of future trends and events, which supports proactive measures and strategic planning. Despite the dataset comprising only 30 columns and 4890 rows, the integration of structured, semi-structured, and unstructured data types enriches its analytical potential, aligning it more closely with the criteria of big data. The sophisticated use of these machine learning algorithms not only elevates the dataset’s significance but also demonstrates that valuable big data insights can be derived, underscoring the dataset’s utility in advanced data analytics.
A data cleaning process was carried out to correct inconsistencies and/or errors and missing data within the raw dataset. Data integration efforts were undertaken to combine and harmonize the data obtained from different sources across 81 provincial directorates in Turkey. The data from the provincial directorates were cross-checked with the central data for verification, and a data validation process was conducted.
In this study, many parameters such as education, personal development processes, such as demographic structure, and the working conditions of the employee, which are thought to be effective on leadership, especially on title types, were taken into consideration. Within the scope of these parameters, data belonging to approximately 5000 employees were analyzed, and some titles and some data belonging to the considered titles were extracted from the general data. In order not to affect the results of the statistical analyses (biased), the data of the titles with particularly low data counts were not used and a total of 4890 data points belonging to the titles with high data counts were included in this study. This study was conducted on 141,810 cellular data points describing 29 different qualities of 4890 employees.
Statistically defined variables (Title Start Date, Title End Date, Title Total Days, Registry Number, PTT Service Year, PTT Start Date, PTT Turnover Date, Year of Birth, Gender, Age, Number of Children, Place of Birth, Province of Registration, Education Status, Manual School, Manual Department, Duration of Education, Total Duration of Higher Education, Date of Graduation, Health, Number of Awards, Number of Penalties, Duration of Education, Number of Trainings, Certificate, Language Score, Report, Union, Unified Service Days) are defined as input variables and one variable (Title) is defined as the output variable. For a better understanding of the descriptive statistics, tables explaining the input and output variables have been added to
Appendix A.
The dataset, which was created categorically and by coding the titles, was visualized using a clustered column graph in order to compare the title values. Descriptive statistical data were obtained by taking into account the years of service of the personnel employed in the PTT organization under the titles they hold. In addition to taking into account the years of service under different titles of different personnel, the years of service of the same personnel belonging to different titles were also taken into account, and the average working time for the titles of the executive staff is shown in
Figure 2.
When the provincial and central administrators are evaluated together, the descriptive statistics of the three titles with the highest average years of service are shown in
Table 5.
When the provincial and central administrators are evaluated together, the descriptive statistics of the three titles with the lowest average years of service are shown in
Table 6.
Descriptive statistical data were sought to be obtained by taking into account the years of service of the staff employed in the PTT under the titles they hold. In addition to taking into account the different title years of different staff, the service years of the same staff belonging to different titles were also taken into consideration, and the average working time of the staff in their titles is presented in
Figure 3.
When the rural and headquarter employees are evaluated together, the descriptive statistics of the three titles with the highest average years of service are shown in
Table 7.
When the rural and headquarter employees are evaluated together, the descriptive statistics of the three titles with the lowest average years of service are shown in
Table 8.
According to years of service, B4 (Provincial Director), D1 (Deliverer), G3 (Office Worker), K4 (Controller), K7 (Security Officer), M6 (Manager), P2 (Postman), T9 (Technicist), and V4 (Treasurer) have more years of service than other titles. The high number of staff employed in these titles is due to the fact that these are the staff serving in the provinces.
When the descriptive statistical data of the length of service of the staff employed in the PTT based on their titles without making any distinction between administrators and employees are evaluated on the basis of years, it is observed that the staff employed in the PTT under the titles coded B4 (Provincial Director), D1 (Deliverer), D7 (Assistant Auditor), G3 (Office Worker), K7 (Security Officer), P2 (Postman), T9 (Technicist), and V4 (Treasurer) have spent more time (years) in these titles compared to other titles.
The box plot obtained by taking into account years of service by title is illustrated in
Figure 4. An outlier is a large or small observation in a dataset that is outside of the normality. Especially in statistical analyses, outliers, which are effective in highlighting the emergence of a disproportionate effect, cause misleading results (or interpretations). In general, discrete values that negatively or misleadingly affect the results of statistical analysis are excluded from the scope of the analysis, but in this study, they are included in the scope of the analysis in order to avoid bias features. The titles with the most outlier value points are K7 (Security Officer) and G3 (Office Worker), followed by B12 (Computer Operator), B4 (Provincial Director), D1 (Deliverer), D5 (Typist), K4 (Controller), M3 (Official (No.3)), M6 (Manager), P2 (Postman), P4 (Programmer), T5 (Technical Manager), T9 (Technicist), U3 (Assistant Specialist), V3 (Data Preparation and Control Operator), and V4 (Treasurer).
When the gender data of the staff employed in the PTT are analyzed according to years of experience, it is observed that 73% of the 4890 employees are male and 27% are female. When the years of service of PTT employees are analyzed according to their gender, it is seen that 3568 male employees have worked in a position for an average of 19.26 years and 1322 female employees have worked in a position for an average of 13.45 years. These results are considered to be related to the high number of operational-level employees (standard deviation of 10.82 for males and 9.71 for females). The visualization of the frequency graph of male and female employees by title according to the normal distribution curve is presented in
Figure 5.
The staff employed at the PTT have different service periods (years) under different titles. The normally distributed histogram graph of title service durations according to the gender of the staff along with the title they hold is shown in
Figure 6.
When the length of service in terms of title is analyzed, it is observed that the length of service under a title is clustered between 0 and 7 years. This period is similar for both male and female staff. It is noted that a male employee of the organization has served under a title for an average of 9.2 years, while a female employee has served under a title for an average of 7.86 years.
Multidimensional Scaling Algorithms (MDSs)
Multidimensional scaling algorithms (MDSs) iteratively move points around in a simulation of some kind of physical model, where if two points are too close together (or too far apart), there is a force pushing them apart or pulling them together. At each time interval, the change in a point’s position corresponds to the sum of the forces acting on it. The MDS plot of the title variables in this study is shown in
Figure 7.
When we evaluate the relationships between the datasets studied with the MDSs, it is seen that the same titles are generally located close to each other. This suggests that these data are similar to each other. There are some gaps or less dense parts between different sets of titles, which means that there are differences between these titles. The titles are represented by different colors. On the graph, the more intense the color, the more intense the data frequency in dense parts. It can be inferred that these titles are more represented and have more data.
Employees are grouped according to their titles using different colors. Areas where colors are concentrated indicate that certain title categories are clustered together and share similar characteristics in the graph. For example, the concentration of green, purple, and brown dots in the upper left corner of the graph suggests that employees with titles such as D1, G3, and K4 have similar traits and are clustered together. Similarly, the concentration of yellow and light blue dots in the lower right corner indicates that title categories like M6 and T9 are clustered together, forming a specific group (see
Appendix A for more information about employee titles). These types of clusters suggest that employees in these title categories have similar job roles or responsibilities. Additionally, isolated points (e.g., V4 and TB title categories in the upper right corner) indicate that employees in these categories have different characteristics from others and may possess special skills or be involved in different job functions.
The breadth of the color scale indicates that there are many different title categories in the dataset, and these categories are highly diverse. This demonstrates that the HR data are comprehensive and varied, reflecting employees’ different skill sets and job roles. The density of points in certain areas of the graph indicates the data density and the number of employees in those areas. For example, the dense points representing the P8 and PB categories in the center of the graph suggest that the number of employees in these categories is higher compared to others. Such visualizations provide a practical example of how big data analytics can be utilized in human resource management, contributing to more data-driven and effective HR strategies [
39,
40]. Understanding how employees are distributed according to titles and which title groups share similar characteristics provides critical information for talent management, training and development programs, and workforce planning.
As a result, in this study, the multidimensional scaling (MDS) technique was used to visualize the titles of the staff employed in the PTT on a two-dimensional plane, allowing the similarities and differences between titles to be clearly revealed. The resulting graph visually expresses how different titles are grouped and how the relationships between them are shaped. This method offers a great advantage, especially in examining the distribution and density of the dataset and in understanding the hierarchical structure and transitions between titles.
3.2. Machine Learning (ML) Model
The ML algorithms in this study were implemented in the Orange 3.35 Machine Learning computer program to obtain the forecast data. This program uses python software infrastructure. It is open-source (free of charge) and is compatible with all computer operating systems. This program offers a large, diverse toolbox that allows for the visual creation of data analysis workflows. A structure has been designed in the created ML model to evaluate the known and unknown states of the raw data output variable. The input and output variables have been defined in the “select columns” module, and the training and test ratios have been selected in the “data sampler” module. In this study, 75% of the data were used for training and 25% for testing. Algorithms that produced meaningful results for this study were selected from the Orange program and added to the model. The model results were evaluated by comparing the prediction and test data. The “MDS” table created for the model is also presented in this study. A screenshot of the workflow of the ML model designed through the Orange program for this study is presented in
Figure 8.
Four different ML algorithms yielded statistically significant results for predicting the type of staff title belonging to PTT staff data. These are the k-Nearest Neighbor (kNN) algorithm, the Random Forest (RF) algorithm, the Gradient Boosting (GB) algorithm, and the Support Vector Machine (SVM) algorithm.
3.3. Analysis of Model Results
When the performance measurement values obtained by using ML algorithms according to the title type are analyzed, the values in
Table 9 are obtained.
In this study, the performance of various machine learning models, including k-Nearest Neighbor (kNN), Random Forest (RF), Gradient Boosting (GB), and Support Vector Machine (SVM), was evaluated using different metrics. The kNN model stood out with a high AUC value (0.979) but showed weaker performance in other metrics (accuracy: 0.646; F1 Score: 0.624) compared to models like RF and GB. Both RF and GB models generally exhibited superior performance across the AUC and other metrics, with RF’s training time and GB’s fast training capabilities highlighted as strengths, but also as weaknesses. The SVM model, however, demonstrated weaker performance, with low AUC and accuracy values, particularly struggling with training time and parameter optimization in large datasets. These results suggest a preference for ensemble models like RF and GB, while also indicating the need to consider specific models like SVM for special cases.
The forecast data were collected in two ways. In the first step, the dataset with titles was used for the forecast data, and in the second step, the forecast data were obtained using the dataset without title codes. The performance values of the ML algorithms used in both steps are given in
Table 10.
ML algorithms were run with 75% training and 25% test datasets and no cross-validation was performed. The number of test data with 25% of the dataset was calculated as 1222. Among the data belonging to the same dataset, 3668 data points were used for the training phase. Random data were selected for both phases.
We randomly selected 100 data points from the forecast data of the ML algorithms and calculated the false, true, and accuracy rates. The values of false and true forecasts and the rate for each algorithm are shown in
Table 11. According to this table, the kNN algorithm provided true forecast data closest to the real data at a rate of 96%. The SVM algorithm showed the worst performance with a rate of 41%.
Model outputs were also evaluated using multidimensional scaling. Multidimensional scaling is a statistical method that helps visualize and analyze high-dimensional data in a lower-dimensional space. This technique makes complex relationships more understandable, providing important insights in a variety of fields. These projections try to preserve the distances between the original points as much as possible. It is often difficult to achieve a perfect fit because the data are high-dimensional or the distances are not Euclidean. On input, the particle needs a dataset or a distance matrix.
4. Results and Evaluations
The use of performance data in the public sector can be enhanced by decentralized structures, promoting a culture of training and development through networks and collaborative efforts. Emerging technologies like ML and AI are adept at processing extensive and diverse datasets in leadership research. However, understanding the relationship between leadership effectiveness and traits through these technologies is less straightforward compared to traditional methods. Therefore, it is crucial to consider methodological approaches and novel methods in leadership research.
Machine learning demonstrates superior performance with large and diverse datasets. Human resource data encompass a wide variety and diverse types of data sources related to candidates. Machine learning algorithms enhance learning and generalization capabilities with a broad spectrum of data sources. Machine learning can predict future events based on past data, providing a significant advantage in making strategic decisions in human resource management.
This study was conducted to identify potential leadership candidates using data-driven systems in HR. Identifying potential leadership candidates is important in terms of addressing the career development process with a transformational leadership approach.
In this study, we investigated the importance of transformational leadership in HR management and the use of ML and AI techniques to identify potential leadership candidates in large-scale organizations. The forecast model developed by using a dataset of 5000 employees in the PTT achieved a 96% success rate with the kNN algorithm, demonstrating that big data and ML algorithms can be an effective tool for identifying leadership candidates.
Predicting potential leadership candidates is an important step in HR management and leadership development practices. In the public sector and other large-scale businesses, it will be possible to create more objective and data-driven leadership candidate selection and development programs using big data and ML algorithms, taking into account the corporate culture, and thus, it will be possible to train visionary and inspiring leaders with transformational leadership skills in a sustainable way, ensuring competitive business management. Therefore, this study will make significant contributions to the literature in terms of using artificial intelligence and machine learning algorithms in leadership research on public institution data and making forecasts that will benefit the public sector. If potential leadership candidates can be identified with the help of a decision support model, the competencies that the candidates will need can be planned for and gained during in-service training and experience during their careers.
It is estimated that data such as salary information that could not be provided in the study and the ratio of salary differences between titles can also be effective in the analysis. This study is based on the limited human resource dataset of the PTT, which excludes personal data, and the results and evaluations are limited to this scope. Moreover, future studies can explore how machine learning can be applied to areas like leadership training, leadership behavior, and the leadership process.