1. Introduction
In a rapidly evolving digital landscape, the public sector grapples with critical human resource management challenges, particularly in resource optimization and cohesive control over human capital utilization. Traditional hiring and resource allocation methods, often reliant on subjective assessments and bureaucratic processes, no longer prove effective in dynamic environments. Compounded by a pervasive non-compliance culture and inadequate management systems, these limitations result in inefficiencies, productivity gaps, and a lack of actionable insights for decision-makers. As public sector organizations navigate digital transformation, there is an urgent need for robust, data-driven frameworks to evaluate employee capabilities and optimize workload allocation.
Efficiency in public service delivery hinges on an evaluation framework that integrates skill management, competence, training, experience, and diversity in managerial positions. However, the absence of a reliable central management system impedes the collection of statistical data on employee productivity, perpetuating reliance on outdated evaluation systems laden with subjectivity. This gap underscores the necessity for transformative approaches that align with the demands of the digital era.
Building on prior work by Michalopoulos et al. [
1], which proposed an integrated system for assessing employee potential using neuro-fuzzy inference, this study expands the scope by conducting a rigorous comparative analysis of machine learning algorithms. While Michalopoulos et al. emphasized skill management and task optimization through quantifiable criteria, their work did not systematically evaluate alternative methodologies. Similarly, Giotopoulos et al. [
2] highlighted the role of time-based efficiency metrics but left the question of algorithmic superiority in modeling complex workforce dynamics unresolved.
To address these gaps, we introduce a comprehensive framework leveraging seven machine learning algorithms: Linear Regression, Artificial Neural Networks (ANNs), Adaptive Neuro-Fuzzy Inference System (ANFIS), Gradient Boosting Machines (GBMs), Bagged Decision Trees, XGBoost, and Support Vector Machines (SVMs). Our methodology, extending the integrated system proposed by Giotopoulos et al. [
3], emphasizes human-independent efficiency assessment, departing from traditional systems reliant on domain experts. Linear Regression serves as a foundational baseline, elucidating linear relationships between input factors (e.g., work experience, education, age) and employee performance. Subsequent exploration of ANNs captures intricate, non-linear patterns, while ANFIS bridges interpretability and complexity through its hybrid architecture, combining fuzzy logic with neural networks. Ensemble methods such as GBM and XGBoost harness collective decision-making, and SVM maximizes separation margins between performance classes.
Through meticulous experimentation, we evaluate these algorithms using metrics including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Median Absolute Error (MedAE), and Huber Loss. ANFIS emerges as the standout performer, demonstrating superior accuracy (54% lower RMSE compared to Linear Regression) and robustness in handling diverse data patterns. Its ability to minimize bias while modeling non-linear relationships positions it as an optimal tool for public sector workforce management.
This study makes three key contributions:
A systematic comparative analysis of machine learning algorithms tailored to public sector constraints, addressing the lack of empirical benchmarks in bureaucratic environments.
Validation of ANFIS’s superiority in accuracy and interpretability, achieving a significant lower RMSE compared to all other algorithms, as detailed in
Section 4.
Practical insights for integrating ANFIS into operational systems, enabling real-time workload management and resource allocation.
The remainder of this paper is structured as follows:
Section 2 reviews related work on public sector efficiency, machine learning applications in workforce analytics, and hybrid neuro-fuzzy systems.
Section 3 details the methodology, including data collection (employee records from West Greece), preprocessing, and implementation within the Apache Spark framework.
Section 4 presents experimental results, comparing performance across algorithms, while
Section 5 discusses implications, limitations, and future directions for deploying ANFIS in public sector workflows.
2. Related Work and Contributions
In the domain of understanding factors influencing employee performance, numerous studies have shed light on key aspects such as motivation, compensation, engagement, intellectual capital, and human resource management practices. Each of these factors plays an essential role in shaping the performance of employees within organizations. A substantial body of research emphasizes applying regression analysis to evaluate human performance. The study by Shahzadi et al. (2014) investigates the impact of employee motivation on performance, revealing a positive correlation mediated by job satisfaction, organizational commitment, and work–life balance. This relationship emphasizes the need for organizations to focus on enhancing employee motivation through appropriate interventions [
4]. Hameed, Ramzan, and Zubair (2014) further emphasized the influence of compensation on employee performance, highlighting the positive correlation between various compensation factors and employee performance. The findings underlined the significance of fair compensation practices in motivating and enhancing employee productivity [
5]. Anitha (2014) investigated the determinants of employee engagement and their effect on performance, revealing a positive and significant relationship between engagement and employee performance. This suggested that engaged employees are likely to perform better, emphasizing the importance of engagement initiatives in organizations [
6]. Moving beyond motivational and engagement factors, Ahangar (2011) explored intellectual capital and financial performance. The study uncovered a significant influence of intellectual capital on profitability and productivity, underscoring the strategic importance of intellectual capital in corporate performance [
7], while Hong et al. (2012) explored human resource management practices and their impact on employee retention. The research highlighted the crucial role of training, compensation, and appraisal in retaining employees, providing valuable insights for organizations striving to enhance employee loyalty [
8]. In the domain of intellectual capital and performance, Phusavat et al. (2011) established a positive correlation, emphasizing the role of intellectual capital in fostering innovation and, subsequently, firm success. This highlights the need for organizations to invest in and effectively manage their intellectual capital [
9]. Darmawan et al. (2020) further stressed the importance of human resource quality, showcasing its positive correlation with job performance. The study underscored the significance of education, experience, skills, and motivation in enhancing overall performance, guiding organizations in investing wisely in their human resources [
10]. Lastly, Rivaldo and Nabella (2023) highlighted the positive correlation between employee education, training, experience, work discipline, and performance. Their findings reinforced the importance of investing in these fundamental aspects to enhance employee performance and drive organizational success [
11].
In exploring the intersection of artificial intelligence and workforce dynamics, several studies have leveraged Artificial Neural Networks (ANNs) to model and predict various aspects of productivity and performance. Chen and Chang (2010) employed ANNs to elucidate the intricate, non-linear relationships between firm size, profitability, employee productivity, and patent citations within the US pharmaceutical industry. This approach discerned complex patterns that traditional Linear Regression models may overlook [
12].
Simeunović et al. (2017) proposed an ANN-based model tailored for optimizing workforce scheduling by considering diverse influential factors such as employee skills, customer demand, and machine capacity. Their model showcased its effectiveness in enhancing productivity and reducing operational costs in manufacturing settings [
13].
Fekri Sari and Avakh Darestani (2019) introduced an ANN model that predicts fuzzy overall equipment effectiveness (OEE) and line performance. The integration of fuzzy logic and ANNs allowed for accurate performance measurement, particularly in the context of manufacturing operations [
14]. The construction industry also benefits from ANN-based predictions. Goodarzizad et al. (2023) proposed a hybrid model merging ANNs with the grasshopper optimization algorithm to forecast construction labor productivity, considering factors like worker skills, project complexity, and weather conditions. Their approach proved accurate and effective for prediction [
15].
Similarly, Heravi and Eslamdoost (2015) employed ANNs to measure and predict construction labor productivity, emphasizing the consideration of factors such as worker skills, project complexity, and weather conditions. Their study underscored the accuracy and efficacy of ANNs in predicting productivity in the construction domain [
16].
In a different domain, Proto et al. (2020) pioneered a three-step neural network artificial intelligence modeling approach for time, productivity, and cost prediction within the Italian forestry sector. Their methodology incorporated a diverse array of factors, including tree species, terrain characteristics, and weather conditions, to predict outcomes accurately [
17].
Lastly, Gelmereanu, Morar, and Bogdan (2014) presented an ANN model for predicting productivity and cycle time in manufacturing processes. Their model factors in various critical elements such as machine type, worker skills, and material characteristics, highlighting the accuracy and efficiency of ANN-based predictions in optimizing manufacturing processes [
18]. These works collectively demonstrate the versatile applications of ANNs in forecasting and optimizing productivity across diverse domains.
In the domain of performance estimation and analysis across diverse sectors, the integration of the Adaptive Neuro-Fuzzy Inference System (ANFIS) proves to be a powerful tool. Ershadi, Qhanadi Taghizadeh, and Hadji Molana (2021) introduced a hybrid approach utilizing technology readiness level (TRL), data envelopment analysis (DEA), and ANFIS to select and estimate the performance of Green Lean Six Sigma (GLSS) projects. Their hybrid methodology demonstrated effectiveness in accurately determining and evaluating project performance [
19].
In the context of labor loss estimation, ARSLANKAYA (2023) compared the performance of fuzzy logic with ANFIS. Their evaluation of labor loss data from a manufacturing company revealed that ANFIS is superior in accuracy and precision [
20].
Keles et al. (2023) focused on determining the leadership perceptions of construction employees, utilizing ANFIS. Their study emphasized ANFIS as an effective tool for accurately determining these perceptions, demonstrating its utility in the construction sector [
21].
Education quality assessment is another domain in which ANFIS achieves excellence. Ahanger et al. (2020) propose an ANFIS-inspired smart framework to assess education quality, showcasing its effectiveness in evaluating student performance, teacher quality, and school infrastructure [
22], while Azadeh and Zarrin (2016) introduced an intelligent framework for productivity assessment and analysis of human resources, considering various factors such as resilience engineering, motivational aspects, health, safety, and ergonomics. Their approach proves to be effective in accurately assessing human resource productivity [
23].
Considering organizational cohesion and its impact on employee productivity, Nikkhah-Farkhani et al. (2022) utilized ANFIS to model these relationships. Their study emphasized the significant positive impact of organizational cohesion on employee productivity and highlighted ANFIS as an effective analytical tool [
24].
Mirsepasi, Faghihi, and Babaei (2013) proposed a system model for performance management in the public sector, employing a balanced scorecard approach. Their model considers various crucial aspects of public sector performance and showcases effectiveness in enhancing overall performance [
25]. Elshaboury et al. (unpublished) presented an improved ANFIS model based on the particle swarm optimization (PSO) algorithm for predicting labor productivity. Their model demonstrated superior accuracy and precision compared to traditional ANFIS models, highlighting the potential of optimization algorithms in enhancing ANFIS [
26]. In contemporary research, machine learning (ML) techniques are increasingly employed to predict critical factors such as employee attrition and performance. Jain and Nayyar (2018) showcased the efficacy of the XGBoost algorithm in predicting employee attrition, emphasizing its efficient memory utilization, high accuracy, and low running times, ultimately achieving an accuracy of nearly 90% [
27].
Shifting the focus to the academic sphere, Sekeroglu, Dimililer, and Tuncal (2019) evaluated three prominent ML algorithms for predicting student performance. Notably, neural networks emerged as the most accurate in predicting student grades in both secondary schools and universities [
28]. Regarding environmental contexts, Aldin and Sözer (2022) used ANFIS and Artificial Neural Networks (ANNs) to predict thermal data. The study underscored ANFIS’s superior accuracy in predicting temperature and humidity, making it a robust tool for thermal data prediction [
29]. In HR- and employee-related domains, Zhao et al. (2019) explored the prediction of employee turnover using various ML methods. Among them, tree-based ensemble methods such as extreme gradient enhancement proved highly effective, especially for medium and large HR datasets, showcasing its superior predictive power and efficiency [
30].
Pathak, Dixit, Somani, and Gupta (2023) advocated for ML in predicting employee performance, highlighting its potential to identify high-performing employees and enhance overall workforce performance. The study evaluated diverse ML techniques, including decision trees and Support Vector Machines, underlining the importance of ML in Industry 4.0 [
31]. In Saad’s (2020) research, data were gathered with twelve variables, each having 121 instances, aiming to predict the evaluation of the process for individual workers. To ensure the highest prediction accuracy (reaching 99.16%), an ensemble algorithm (Bagging) was employed, combining the four decision tree algorithms. The standard errors for the four algorithms were notably minor, suggesting a strong relationship between the seven input variables and the evaluation output [
32]. Adeniyi et al. (2022) rigorously compared the performance of three ML techniques—decision tree (DT), Artificial Neural Network (ANN), and Random Forest (RF)—in predicting employee performance. The study concluded that ANN outperforms RF in performance during testing [
33]. Jantan, Puteh, Hamdan, and Ali Othman (2010) took a data mining approach, utilizing classification techniques to predict employee performance patterns within HR databases. The C4.5/J4.8 classifier exhibited the highest accuracy, suggesting its potential for future endeavors [
34]. Lastly, Li, Lazo, Balan, and de Goma (2021) focused on employee performance prediction within a company using ML techniques. Logistic Regression emerged as the most accurate classifier among the employed methods, offering a promising avenue for predictive accuracy [
35].
Advancing Workforce Management Through ANFIS: Contributions Beyond Existing Research
Building upon these studies, our research contributes to the field by integrating Adaptive Neuro-Fuzzy Inference System (ANFIS) into a dynamic workload management framework for the public sector. While previous studies have extensively explored factors such as motivation (Shahzadi et al., 2014) [
4], compensation (Hameed et al., 2014) [
5], employee engagement (Anitha, 2014) [
6], and intellectual capital (Ahangar, 2011) [
7] in shaping performance, our approach focuses on quantifying employee capability using machine learning-driven workload metrics. Additionally, research leveraging Artificial Neural Networks (ANNs) (Chen & Chang, 2010; Simeunovic et al., 2017; Goodarzizad et al., 2023) [
12,
13,
15] has demonstrated their effectiveness in modeling complex workforce dynamics. However, our study advances this by employing ANFIS, which not only captures non-linear relationships but also integrates fuzzy logic for greater interpretability in employee capability assessment. Furthermore, while past research has applied ANFIS in various domains, such as manufacturing performance (Arslankaya, 2023) [
20], education quality assessment (Ahanger et al., 2020) [
22], and organizational cohesion (Nikkhah-Farkhani et al., 2022) [
24], our study is among the first to implement ANFIS in dynamic workload distribution within the public sector, addressing critical gaps in real-time task allocation and resource optimization. This research, therefore, complements existing work by providing a novel AI-driven approach to workforce management, moving beyond static evaluations toward a real-time, data-driven optimization model.
3. Methodology
This chapter outlines the methodology employed in conducting a comparative analysis of various machine learning algorithms to evaluate employee capability within a dynamic workload management system in the public sector. The primary goal of this research is to identify the most effective predictive model. After an extensive examination of different algorithms, the Adaptive Neuro-Fuzzy Inference System (ANFIS) emerged as the standout performer. These algorithms were executed within the Apache Spark framework, leveraging its distributed computing capabilities to process large-scale datasets efficiently. This chapter provides a detailed overview of the research design, data collection, algorithms utilized, and the evaluation metrics employed to make this determination.
3.1. Machine Learning Technique Overview
This section provides an overview of the machine learning techniques utilized in this study, focusing on their unique strengths and applications in predictive modeling. Machine learning has become a key tool for solving complex problems by analyzing large datasets and uncovering patterns that traditional methods may overlook. The selected algorithms encompass a range of approaches, from simple linear models to advanced ensemble methods and hybrid systems, enabling a thorough evaluation of their effectiveness. Each technique was chosen based on its ability to address the specific challenges of modeling employee performance and potential in dynamic environments, highlighting their contributions to improving predictive accuracy and decision-making processes.
In this study, the data analysis was carried out using a broad suite of advanced software tools and configurations to ensure efficiency and accuracy. Apache Spark 3.4.2 was employed as the primary framework for complex data processing, leveraging its powerful distributed computing capabilities. The Java Development Kit (JDK) 21 was utilized to support the development and execution of custom algorithms, ensuring compatibility and optimal performance. The computational environment was built on Ubuntu Linux 22.04, chosen for its stability, security, and robust support for open-source tools. This setup provided a reliable and scalable platform for conducting rigorous analysis, effectively supporting the study’s objectives.
3.1.1. Step 1: Data Collection
To facilitate this study, a dataset was compiled that included information on employees in the public sector. The dataset was carefully curated to ensure its relevance and accuracy. Data points were collected from public sector organizations in the broad region of West Greece to provide a representative sample. It included information on employees, their work experience in the public and private sectors, age and educational history Michalopoulos et al. (2022) [
1]. Each data point was carefully recorded and validated to ensure its accuracy.
3.1.2. Step 2: Data Preprocessing
Data preprocessing was essential in preparing the dataset for analysis, ensuring its quality and suitability for machine learning algorithms. Missing values within the dataset were identified and addressed to prevent biases or inaccuracies in the models. This involved techniques such as mean or median imputation for continuous variables and mode imputation or assigning default categories for categorical variables. Additionally, categorical data were encoded into numerical representations using methods such as one-hot encoding or label encoding, enabling algorithms to process these features effectively. These preprocessing steps ensured that the data were complete, consistent, and compatible with the requirements of the machine learning techniques used in this study, thereby enhancing the overall reliability and accuracy of the analysis.
3.1.3. Step 3: Data Management
Based on the research mentioned above, we will proceed to our analysis where each employee is denoted as Node
, and for each one, there is a correlation between each task weight (TW) the employee undertakes, described as
and measured in time units (minutes). Therefore, the interconnectedness among these tasks will be established, with
as the fundamental reference point, exerting minimal influence on employee productivity. Hence, the relationship, for instance, between
and
,
and
,
and
,
and
, can be expressed as a function, where the minimum task weight is represented by
. This approach, outlining task interdependencies, gives rise to a generalized principle that applies to every task pair
and
, as referred to in Michalopoulos et al. (2022) [
1], such as
In their work, Michalopoulos et al. (2022) [
1] utilized a four-tier factor profile, with the skill set grounded in adherence to the Greek Council of State’s Decision No. 540/2021 in alignment with Council Directive 90/270/EEC, as shown in
Figure 1.
K1 (Academic Proficiency)
Count of seminars related to the current work (maximum of three)—(S1, S2, S3).
Number of Bachelor’s degrees (maximum of two)—(B1, B2).
Possession of a Master’s degree.
Certification from the National School of Public Administration.
PhD diploma.
K2 (Public Sector Work Experience)
K3 (Private Sector Work Experience)
K4 (Age)
Based on the algorithm applied on Node B, a unique Time Factor is calculated for every dataset profile.
In this study, employee capability is measured on a continuous scale through the Time Factor (TF), which represents the average time an employee takes to complete a task. The Time Factor is derived from the dataset, which includes information on employees’ work experience (both in the public and private sectors), educational history, and age. The Capacity Factor (CF), introduced by Michalopoulos et al. [
1], is calculated as the mean value of the number of
tasks (tasks with the minimum task weight) completed per time unit. This approach allows for a continuous and granular assessment of employee capability, enabling more precise predictions and comparisons.
An employee profile with precise skills recorded on the dataset will produce a Capacity Factor that determines its ability to manage task workload as shown in
Figure 2 above.
3.1.4. Step 4: Model Selection and Data Splitting
The model selection and data separation process were critical components of this study, which aimed to ensure the robustness and reliability of the machine learning models used to predict employee performance. To identify the best-performing model, a diverse set of algorithms, including Linear Regression, Artificial Neural Networks (ANN), Adaptive Neuro-Fuzzy Inference System (ANFIS), gradient boost machine (GBM), bounded decision trees, Support Vector Machine (SVM), and XGBoost, were evaluated.
To achieve this, the dataset was first divided into three subsets: training, validation, and testing datasets. The training set comprising most of the data was used to fit the models and optimize their parameters. The validation set was employed to fine-tune hyperparameters and prevent overfitting, ensuring that the models generalized well to unseen data. Finally, the test set was used exclusively to evaluate the final performance of the selected models, providing an unbiased assessment of their predictive accuracy.
The model selection process involved comparing algorithms across various performance metrics, such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE), alongside computational metrics like CPU time and memory usage. This thorough approach enabled the identification of ANFIS as the most effective algorithm to model complex relationships in the dataset with superior accuracy and robustness. The structured data splitting methodology ensured that the evaluation results reflected the true predictive power of the models, laying a strong foundation for their application in real-world scenarios.
The dataset used in this study consists of task assignments and employee profiles obtained from public sector organizations in the Region of Western Greece. The dataset used in our study consists of 638 employee records. Each record is characterized by four key factors: academic qualifications, professional experience in the public and private sectors, and age. These factors provide a detailed view of the employees’ skills, roles, and demographics, enabling us to explore patterns and make meaningful predictions as seen in
Table 1 below.
In terms of academic qualifications, the dataset reflects a diverse range of skills. Over half of the employees hold at least one Bachelor’s degree, with 53% having completed their first degree and some employees holding a second Bachelor’s degree. Advanced qualifications are also well-represented, with 35% of employees holding a Master’s degree, few of them holding an NSPA degree, and even fewer possessing a Ph.D. Participation in professional seminars is another common characteristic, with 47% of employees having attended at least one seminar. This diversity of qualifications highlights a workforce with significant educational accomplishments.
Leadership roles in the public and private sectors are another focus of our dataset. In the public sector, 23% of employees are Heads of Small Departments, 12% are Heads of Departments, and 3% are General Managers. Regarding experience in the private sector, these roles are slightly more prevalent, with 26% served as Heads of Small Departments, 12% as Heads of Departments, and 5% as General Managers. These statistics demonstrate a workforce with considerable leadership experience, particularly in the private sector.
The dataset also captures the age distribution of the employees, which is skewed toward older age groups. A significant portion of the workforce, 47%, is aged 50 years or older. Employees aged 40 to 49 years make up 30% of the dataset, while those aged 30 to 39 account for 16%. Only 5% of employees fall within the 20 to 29-year age range. This age distribution aligns with the dataset’s emphasis on experienced professionals who likely possess advanced academic qualifications and leadership roles.
The dataset is moderately balanced, ensuring robust analysis. While Bachelor’s and Master’s degrees are highly prevalent, fewer employees possess NSPA degrees or PhDs, reflecting the specialized nature of these qualifications. Leadership roles show a slightly higher representation in the private sector, with General Managers being relatively rare in both sectors. The age distribution provides a diverse yet heavily experienced workforce, with nearly 78% of employees aged 40 or older.
To reproduce or approximate our results, researchers should utilize a dataset with similar characteristics. Such a dataset should include binary-encoded attributes representing academic qualifications, leadership roles, and age demographics. It should reflect a mix of employees with diverse academic qualifications and leadership roles, including Heads of Departments and General Managers in both sectors. The age distribution should emphasize experienced professionals, particularly those aged 40 and above.
3.1.5. Step 5: Model Performance Evaluation
The evaluation of model performance is a critical aspect of this study, as it provides information on the accuracy, efficiency, and reliability of the machine learning algorithms applied. To achieve this, various evaluation metrics were employed to assess how well each model predicted employee potential and performance.
Each metric provides a unique perspective on model performance, allowing for a thorough comparative analysis. The algorithms were evaluated using training, testing, and validation datasets to ensure their generalizability and robustness. The results highlighted significant differences in model performance, with the Adaptive Neuro-Fuzzy Inference System (ANFIS) consistently achieving superior accuracy and stability across multiple metrics. This thorough evaluation process underscores the importance of selecting appropriate metrics tailored to the specific goals and characteristics of the dataset, enabling the identification of the most effective machine learning models for dynamic workload management systems in the public sector.
3.1.6. Step 6: ML Implementation
This study’s machine learning models were implemented using a robust and efficient computational environment. The process began with preparing the data set through preprocessing steps, including handling missing values, encoding categorical variables, and normalizing numerical features to ensure compatibility with the algorithms. A diverse set of machine learning techniques, including Linear Regression, Artificial Neural Networks (ANN), Adaptive Neuro-Fuzzy Inference System (ANFIS), gradient boost machine (GBM), bounded decision trees, Support Vector Machines (SVM), and XGBoost, were selected to capture different data patterns and complexities.
The models were implemented using Apache Spark 3.4.2 for distributed computing, leveraging its scalability and speed to handle complex datasets effectively. The Java Development Kit (JDK) 21 provided a stable framework for developing and executing custom algorithms. Ubuntu Linux 22.04 is an operating environment known for its reliability and open-source ecosystem.
Each algorithm was hyperparameterized to optimize its performance, ensuring the best results for the data set. Evaluation metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) were used to compare model accuracy. Additionally, computational metrics such as CPU time and memory usage were recorded to assess each model’s efficiency.
This systematic and well-orchestrated implementation process ensured a thorough exploration of the selected machine learning techniques, providing valuable insights into their suitability for modeling employee performance and workload management in the public sector. The results of this implementation not only highlight the strengths and weaknesses of each approach but contribute to the development of a more effective and efficient evaluation framework.
3.1.7. Data Description and Task Details
As indicated by Giotopoulos et al. [
2], the following primary tasks, summarized in
Table 2, were identified for areas of interest:
These tasks represent critical administrative functions, including financial management, documentation, and tender processes, which are central to the operations of public sector organizations. The target variable in this study is the time taken to complete assigned tasks, measured in hours. Tasks are categorized by complexity, defined in terms of execution time, and priority levels are applied where necessary. Given that each public sector department manages distinct tasks, the modeling framework is adaptable to the specific needs and task portfolios of individual organizations.
Load factors are defined as the number of tasks assigned to a user within a specified time frame, normalized by their historical capacity and completion rates. Task allocation is based on skill alignment, capacity, and historical performance, with an emphasis on minimizing idle time and maximizing efficiency. Giotopoulos et al. (2024) further analyzed the impacts of load on various user profiles under different scenarios. In this context, a capacity factor (CF) was introduced, serving as a function
, where
K1, K2, K3, and
K4 are illustrated in
Figure 1. The capacity factor is calculated as the average time spent on completed tasks for each profile per unit of time [
2,
3].
The simulation results, detailed in
Section 4.2, demonstrate significant differences in algorithm performance under varying load scenarios. These findings underscore the importance of aligning task assignments with user capacity to enhance overall efficiency and effectiveness in task management.
3.2. Machine Learning Algorithms
The study incorporated several machine learning algorithms to evaluate their performance in predicting employee capability. These algorithms included the following:
Linear Regression
Artificial Neural Networks (ANNs)
Adaptive Neuro-Fuzzy Inference System (ANFIS)
Support Vector Machine (SVM)
Gradient Boosting Machine (GBM)
Bagged Decision Trees (BDTs)
XGBoost
The selection of these algorithms represents a well-rounded and strategic approach to identifying the most effective predictive techniques for employee capability assessment in the public sector. Each algorithm was chosen to contribute a unique perspective to the analysis, leveraging its strengths to address different facets of the dataset’s complexity. Linear Regression served as a foundational benchmark, providing insights into linear relationships between variables. Its simplicity established a reference point against which the performance of more sophisticated algorithms could be evaluated. Building on this baseline, Artificial Neural Networks extended the analysis to non-linear dynamics. Their multi-layered architecture enabled the capture of intricate patterns in the data, laying the groundwork for understanding more complex relationships. The Adaptive Neuro-Fuzzy Inference System (ANFIS) was introduced to bridge the gap between interpretability and modeling complexity. By combining the transparency of fuzzy logic with the adaptability of neural networks, ANFIS excelled in capturing nuanced and non-linear relationships while addressing uncertainties inherent in employee data.
Support Vector Machine added another dimension to the analysis by focusing on maximizing decision boundaries. Its ability to handle linear and non-linear relationships provided valuable comparisons, especially in scenarios involving distinct data classes. The study of ensemble methods began with Gradient Boosting Machine and XGBoost, both of which leverage iterative corrections to residual errors to enhance predictive accuracy. These algorithms excelled at modeling interactions among features, offering insights into hierarchical and complex decision-making processes. Complementing these, Bagged Decision Tree utilized ensemble learning to improve stability and reliability by aggregating outputs from multiple decision trees, effectively reducing variance and providing robustness against overfitting.
All the aforementioned algorithms were executed within the **Apache Spark** framework, created at UC Berkeley’s AMPLab [
36,
37], to leverage its powerful capabilities for large-scale data processing. Spark’s hybrid framework fluidly integrates batch and stream processing, outperforming Hadoop’s MapReduce engine due to its innovative design [
38]. The primary advantage of Spark in this context lies in its in-memory computation model, which significantly accelerates processing by minimizing reliance on disk I/O. Spark’s Directed Acyclic Graphs (DAGs) for workflow optimization and resilient distributed datasets (RDDs) for fault tolerance were instrumental in efficiently handling the complex datasets involved in the study [
39].
The notable features of Apache Spark, such as its exceptional speed—up to 100 times faster than Hadoop and support for multiple programming languages (Java, Scala, R, and Python)—enabled the algorithms to execute efficiently and at scale [
40]. Furthermore, Spark’s real-time stream processing, fault tolerance, and scalability ensured the robust performance of the algorithms in processing large, dynamic datasets.
Spark MLlib is a library that enables Apache Spark to perform machine learning algorithms with exceptional speed and accuracy. Built on the RDD API, MLlib leverages multiple cluster nodes to alleviate memory bottlenecks. Complementing MLlib, Apache Spark also includes SparkML, a DataFrame-based machine learning API. This dual-library approach allows developers to choose the most suitable option based on the dataset’s characteristics and size, ensuring optimal performance. These libraries support various algorithms, including classification, regression, recommendation, clustering, and topic modeling.
MLlib offers core machine learning features such as Featurization, Pipelines, Model Tuning, and Persistence. It supports essential tasks, including data preprocessing, model training, and prediction. With a design focused on simplicity and scalability, MLlib is highly effective for various machine learning tasks, including deep learning [
41]. Apache Spark, developed in Scala, integrates effortlessly with APIs such as Java and Python and operates efficiently in both Hadoop and standalone environments [
42].
A notable algorithm in MLlib is the Multilayer Perceptron (MLP) classifier, a feedforward Artificial Neural Network used for classification [
43]. This research utilizes the MLP classifier, configured with a single hidden layer containing ten neurons. This configuration provides a robust framework for analyzing and modeling complex datasets.
This layered analytical approach, executed within Apache Spark, allowed robust comparative analysis across models. By progressively introducing greater complexity and capability, the study identified the Adaptive Neuro-Fuzzy Inference System as the most effective algorithm. ANFIS outperformed others in accuracy, robustness, and interpretability, demonstrating its ability to model complex relationships within the dataset. This iterative and comparative methodology underscores the importance of tailoring algorithm selection to the specific challenges of the domain and leveraging powerful platforms like Apache Spark for efficient large-scale computation in dynamic workload management systems.
3.2.1. Linear Regression
Linear Regression served as a fundamental baseline for comparison. It provided insights into linear relationships between input factors and employee capability. More specifically, it was deployed to model the relationship between factors considered input (work experience in the public sector, work experience in the private sector, educational history, and age). Linear Regression aims to find the best-fitting linear relationship between the input features and the output, defined as a Time Factor. This allows us to make predictions and understand the impact of individual features on the target variable [
44,
45].
Y is the dependent variable (Time Factor).
are the four independent variables (input features).
is the intercept.
are the coefficients associated with each independent variable.
represents the error term, accounting for unexplained variability.
3.2.2. Artificial Neural Networks (ANNs)
ANNs were used to capture intricate patterns within the data and identify non-linear relationships between input parameters and employee capability. A feedforward neural network (FNN) consisting of multiple layers, including an input layer, at least one hidden layer, and an output layer, was deployed [
46,
47].
In forward propagation, the network computes the predicted output (ypred) based on the input features (X) and the weights and biases of the neurons. The mathematical representation of forward propagation is as follows:
Input Layer: The input layer simply passes the input features (X) to the hidden layer.
Hidden Layer: The hidden layer computes the weighted sum of the four inputs, applies an activation function, and passes the result to the output layer. This can be represented as follows for the
i-th neuron in the hidden layer:
is the weighted sum for the i-th neuron in the hidden layer.
is the weight connecting the j-th input feature to the i-th neuron.
is the j-th input feature.
is the bias for the i-th neuron.
is the activation function applied to to compute , the activation of the neuron.
Output Layer: The output layer computes the final predicted output. The activation function used in the output layer is the identity function (linear activation)
Mean Squared Error (MSE) was deployed as the loss function
n is the number of data points.
Backpropagation was used to compute the loss gradients concerning the network’s weights and biases. These gradients are then used to update the weights and biases through gradient descent, which involves adjusting the weights to minimize the loss.
The weight updates are typically carried out using gradient descent, where the weights (
) are updated as follows:
where
is the learning rate, and
is the gradient of the loss concerning the weight
.
3.2.3. Adaptive Neuro-Fuzzy Inference System (ANFIS)
ANFIS, as a hybrid algorithm combining fuzzy logic and neural networks, was a central focus of the study. Its capacity to model complex, non-linear relationships while maintaining interpretability made it a promising choice for this context. It was widely analyzed by Jang et al. (1997) [
48], Michalopoulos et al. (2022) and Giotopoulos et al. (2023) [
1,
3]. The dataset underwent a filtration process as a preliminary step, ensuring its suitability for constructing a fuzzy system within the MATLAB R2023b, software environment. A rigorous assessment of performance metrics led to selecting the gbellmf membership function in conjunction with a hybrid algorithm, primarily due to its remarkable capacity to minimize Root Mean Square Error. This choice facilitated the establishment of a well-structured four-input to one-output system. Each of the input variables, denoted as K1 through K4, was subsequently employed as input data for the Adaptive Neuro-Fuzzy Inference System (ANFIS), where individual nodes were assigned specific roles to address the unique functionalities associated with each input provided.
3.2.4. Support Vector Machine (SVM)
SVM was evaluated for its effectiveness in handling both linear and non-linear relationships and maximizing the margin of separation between distinct classes in employee capability. In SVM regression, the goal is to find a function f(x) that estimates the target variable, which is referred to as “Time Factor” (Y), given specific input features X [
49,
50]. The objective is to find a hyperplane that best fits the data while minimizing the margin of error. The SVM regression model was represented as
where
is the predicted Time Factor,
w is the weight vector,
x is the input feature vector, and
b is the bias term.
The SVM regression algorithm aims to minimize the following optimization problem:
subject to the following constraints:
where
is the actual Time Factor for data point
i,
is the maximum permissible error (epsilon-tube), and
are slack variables, representing the amount by which the predicted Time Factor can deviate from the actual time factor.
The objective function aims to minimize the norm of the weight vector while allowing for errors that are bounded by and penalized by the hyperparameter C. The parameter C controls the trade-off between maximizing the margin and minimizing the error. A small C encourages a broader margin with more errors, while a large C encourages a narrower margin with fewer errors.
The Support Vector Machine (SVM) algorithm was implemented using MATLAB’s fitrsvm function, configured to utilize a linear kernel by default. The slack variable (CCC), which controls the trade-off between maximizing the margin and minimizing classification error, was set to its default value of 1. This configuration ensures a straightforward comparison of linear relationships within the dataset. While this study focuses on linear SVMs, future research could explore non-linear kernels such as RBF or polynomial to better capture complex, non-linear relationships in workforce data.
3.2.5. Gradient Boosting Machine (GBM)
The Gradient Boosting Machine (GBM) algorithm is an ensemble learning method for regression problems. It builds a predictive model by combining the predictions from an ensemble of weak learners, typically decision trees. GBM aims to minimize residual errors stepwise, where each new model is trained to fit the residual errors left by the previous models [
51,
52].
The Gradient Boosting Machine (GBM) algorithm was configured to use decision trees as weak learners. A total of 100 trees were trained iteratively, using the least squares loss function to minimize residual errors. This setup, implemented via MATLAB’s fitensemble function, leverages default parameters for learning rate and tree depth, ensuring a robust and efficient model for this application.
For each boosting iteration, , where T is the total number of boosting iterations:
Compute the negative gradient (in terms of the loss function) of the loss function concerning the current model’s predictions. This gradient represents the residual errors.
Train a weak learner, often a decision tree, to fit the negative gradient (residuals). This creates a new model that is added to the ensemble.
Update the model by adding the new model’s predictions to the current one. This is achieved with a learning rate that controls the step size of the update.
3.2.6. Bagged Decision Trees
As far as the Bagged Decision Tree algorithm is concerned, Bagging, or bootstrap aggregating, is an ensemble learning method that combines multiple decision trees to reduce variance and improve the predictive accuracy of regression models [
53].
The Bagged Decision Tree (BDT) algorithm was implemented using MATLAB’s fitensemble function. A total of 100 decision trees were used as weak learners, with bagging employed to aggregate predictions and reduce variance. Configured for regression tasks, the model leveraged default MATLAB parameters for tree depth and leaf size, ensuring robust and unbiased predictions across the dataset.
Bootstrapped Data:
For each iteration (where T is the number of iterations), randomly sample the training data with replacements to create bootstrapped datasets .
Decision Tree Training:
Train a decision tree on each bootstrapped dataset . Each tree is denoted as and aims to capture different patterns or noise in the data.
For each tree t, the decision tree is constructed by recursively partitioning the data based on feature splits that minimize impurity or reduce the Mean Squared Error (MSE).
Predictions:
For a given input instance
, the predictions are made using all the trained decision trees:
where
represents the predicted output for instance
, which is the average prediction across all decision trees.
3.2.7. XGBoost
XGBoost is a specific implementation of gradient boosting, an ensemble learning method that combines the predictions of multiple weak models (decision trees) to create a strong predictive model. The key idea is to fit a new model to the existing model’s residuals (differences between actual and predicted values). This process continues iteratively [
54,
55].
The XGBoost algorithm was employed to model the dataset, utilizing a learning rate () of 0.3 to balance convergence speed and generalization. Each decision tree in the ensemble had a maximum depth of 10, enabling the model to capture intricate patterns in the data. The objective function was set to “reg:linear” for regression tasks, and the ensemble was trained over 60 boosting rounds. Default hyperparameters were used for subsampling and regularization to maintain computational efficiency and avoid overfitting
The key mathematical equations in XGBoost relate to the objective function and the gradient of the loss concerning the model’s predictions. The objective function, for instance, illustrates the loss for the entire dataset and the ensemble of trees.
where
N is the number of data points,
is the actual target value,
is the predicted value,
and
are regularization terms,
measures the complexity of the tree, and
is the magnitude of the weights on the leaves.
3.2.8. Workflow
The workflow for determining the capacity factor for each profile in the dataset consists of three distinct phases, as illustrated in
Figure 3.
In the first phase, the synthesis of a generic profile structure is performed to establish a standardized representation of employee characteristics. Subsequently, tasks are classified based on their complexity levels, facilitating the identification of temporal dependencies in execution time. Personnel selection criteria are then defined, creating a subset of eligible employees who meet the predefined qualifications. To ensure the reliability of the analysis, time-related performance metrics are collected over a statistically valid period, enabling a comprehensive evaluation of task execution patterns.
In the second phase, various machine learning algorithms are assessed and compared to identify the most suitable model for load distribution within the targeted public sector environment. Among the tested models, the Adaptive Neuro-Fuzzy Inference System (ANFIS) was selected due to its effectiveness in modeling the transformation of employee profiles into task execution times, referred to as the Time Factor (TF). This facilitates a more adaptive and context-aware prediction of workload distribution, improving efficiency in task allocation.
The third phase focuses on implementing load redistribution mechanisms, including Load Control and load balancing strategies, to further enhance workforce efficiency and optimize task distribution.
As Phases 1 and 3 fall outside the primary scope of this study, the present research remains focused on its main objective: the comparative evaluation of machine learning algorithms for load distribution.
3.3. Comparative Analysis and Evaluation Metrics
The core of the research involved conducting a comparative analysis of these machine learning algorithms using a range of evaluation metrics. The aim was to assess their performance in predicting employee capability and identifying the most effective model.
The comparative analysis involved running each of the selected machine learning algorithms on the prepared dataset and evaluating their performance using a range of metrics, including Mean Squared Error, Root Mean Squared Error, Mean Absolute Error, Median Absolute Error, Mean Squared Logarithmic Error, Root Mean Squared Logarithmic Error, Mean Bias Deviation, Huber Loss, and MAPE.
Root Mean Squared Error (RMSE) Mean Absolute Error (MAE) Median Absolute Error (MedAE) Mean Absolute Percentage Error (MAPE) MAPE is an additional accuracy metric used to evaluate forecasting performance. It is calculated as the mean of the absolute percentage differences between actual (
) and predicted (
) values, expressed as
Mean Squared Logarithmic Error (MSLE) Root Mean Squared Logarithmic Error (RMSLE) Huber Loss
A Time Factor (TF) based on the four input features, the Huber Loss can be expressed as
where
represents the Huber Loss,
y represents the true Time Factor (employee capability),
represents the predicted Time Factor by the model based on the four input features, and
is the hyperparameter that controls the threshold at which the loss transitions from quadratic (squared error) to linear (absolute error) behavior.
The choice of the parameter depends on the specific problem and is a trade-off between robustness to outliers and the smoothness of the loss function. It is often determined through experimentation, cross-validation, or domain knowledge. Smaller values of make the Huber Loss more robust to outliers, while larger values make it behave more like the Mean Squared Error (MSE) loss.
Mean Bias Deviation (MBD) Concordant pairs are pairs of observations where the predicted order matches the actual order, tied pairs are pairs where the predicted outcomes or the actual outcomes are equal, and the total number of pairs is the total number of possible comparisons between pairs of data points, calculated as
The C-index (Concordance Index) evaluates a predictive model’s ranking ability by measuring the concordance between predicted and actual outcomes. It ranges from 0.5 (random guessing) to 1.0 (perfect concordance), making it valuable for continuous or ordinal outputs.
The C-index complemented other metrics, such as Mean Squared Error, by analyzing the proportion of correctly ranked pairs, validating the effectiveness of ANFIS in predicting employee performance.
These metrics serve different purposes in assessing the performance of predictive models. MSE, RMSE, MAE, and RMSLE evaluate the accuracy, while MBD helps identify prediction bias and measures the model’s ability to explain the data’s variance.
5. Discussion
5.1. Time Complexity
The efficiency and scalability of algorithms in machine learning are crucial. Understanding the time complexity of various machine learning algorithms is critical in choosing the most appropriate for any task. We investigate the time complexity of all seven algorithms. The time complexity of Linear Regression is n, where n is the number of data points. Simple mathematical calculations make them highly efficient, especially for large datasets. Neural networks are known for their adaptability to complex tasks. Still, the time complexity of training neural networks depends on factors such as architecture and optimization algorithms. At the same time, it ranges from , where ‘n’ is the number of samples, ‘m’ is the number of layers, and ‘k’ is the number of iterations. ANFIS combines fuzzy logic and neural networks to model complex relationships. At the same time, its time complexity is analogous to neural networks, standing at . ‘n’ represents data points, ‘m’ denotes the number of rules, and ‘k’ signifies the number of iterations. Bagging involves the training of multiple decision trees in parallel and introduces time complexity of , where ‘m’ stands for features, and ‘n’ for data points.
The time complexity for Bagging is shown by , where ‘m’ stands for features, and ‘n’ for data points, while the parallelization introduced further enhances the efficiency of this ensemble method.
XGBoost is close enough to the time complexity of GBM at approximately . In addition, it leverages data pruning techniques and parallel processing, giving credit to its efficiency and scalability. SVMs, on the other hand, while effective in many scenarios, exhibit variable time complexities. They range from to , contingent on the kernel used. Here, ‘n’ stands for data points, and ‘m’ for features. SVMs require substantial hardware processing for extensive datasets and high-dimensional feature spaces.
While ANFIS incurs a substantial computational expense, it remains a prominent choice across various applications owing to its proficiency in acquiring and depicting intricate non-linear associations.
As shown in
Table 14, the comparative analysis of CPU time and memory usage highlights significant differences across the algorithms, reflecting their computational efficiency and resource demands. Linear Regression is the most lightweight algorithm, with minimal CPU time (10–30 ms) and memory usage (5–15 MB), making it suitable for simple, linear datasets. On the other hand, Artificial Neural Networks (ANNs) and Adaptive Neuro-Fuzzy Inference System (ANFIS) demand higher resources, with CPU times of 120–180 ms and 180–220 ms and memory usage of 30–50 MB and 40–60 MB, respectively, due to their ability to model complex, non-linear relationships. Ensemble methods, such as Gradient Boosting Machine (GBM) and Bagged Decision Trees, exhibit moderate resource demands, with CPU times ranging from 140 to 200 ms and memory usage from 30 to 50 MB, balancing complexity and efficiency. Support Vector Machine (SVM) and XGBoost, known for their robust performance on complex datasets, show the highest CPU times (200–260 ms and 200–240 ms, respectively) and memory usage (50–60 MB and 40–55 MB), reflecting their computational intensity. This analysis underscores the trade-off between computational resources and predictive power, emphasizing the importance of selecting algorithms that align with the specific requirements of the dataset and application.
5.2. Comparative Analysis and Practical Recommendations for Algorithm Selection
The results of this study highlight the relative strengths and limitations of the selected machine learning algorithms in the context of workload distribution and predictive modeling. Linear Regression served as a baseline model, demonstrating its inability to effectively capture non-linear dependencies within the dataset. In contrast, Artificial Neural Networks (ANNs) significantly improved predictive accuracy by modeling complex patterns through their layered structure and adaptive learning capabilities.
Among the tested models, the Adaptive Neuro-Fuzzy Inference System (ANFIS) emerged as the most effective, achieving superior accuracy and robustness. Its hybrid architecture, which integrates fuzzy logic with neural networks, enables efficient handling of uncertainty and non-linearity, making it particularly well suited for workload optimization.
Ensemble-based models, including Gradient Boosting Machine (GBM) and Bagged Decision Trees, also exhibited strong predictive performance, while XGBoost proved less effective. By leveraging multiple weak learners, these methods enhanced generalization capabilities and delivered competitive accuracy. In addition, they demonstrated a balance between predictive power and computational efficiency, making them viable alternatives for large-scale workload management applications.
The Support Vector Machine (SVM) provided a versatile approach although managed non-linear relationships, thus proving effective in high-dimensional spaces. However, its computational demands were significantly higher compared to other models, requiring careful consideration of trade-offs between accuracy and resource efficiency. Similarly, GBM, while offering strong performance, required extensive computational resources, underscoring the need to balance precision with operational feasibility.
This analysis underscores the importance of selecting machine learning models based on the specific demands of a given task. Factors such as interpretability, scalability, and computational constraints should guide the choice of algorithm to ensure optimal decision-making processes in dynamic environments. By systematically evaluating these models, this study provides insights that can inform both future research and real-world applications, fostering more efficient and adaptive workload distribution strategies.
5.3. Limitations
This study’s constraints include both the data sample and the methodological approach. Regarding the data sample range, it is essential to acknowledge that the study’s findings can not be easily extrapolated to all kinds of populations or different contextual settings. This limitation arises due to the limited sample size, which may not represent the broader populace. To mitigate this constraint, prospective research endeavors should consider acquiring data from a more expansive and heterogeneous sample, facilitating a wider scope of generalizability.
Regarding selecting the ANFIS algorithm, two significant factors are essential for its deployments: specialized expertise and hardware resources. Both of these constraints could restrict its applicability in specific organizational settings. It also has to consider the relatively compact dataset comprising 450 records. This may elucidate why the performance of the XGBoost model does not exceed that of the GBM model. One plausible interpretation for this divergence could be overfitting, stemming from XGBoost’s additional hyperparameters. Consequently, future research should consider these limitations to enhance the robustness and generalizability of the study’s outcomes while considering the pragmatic constraints associated with ANFIS in real-world public-sector scenarios.
5.4. Task Quality
While the primary focus of this study is on optimizing load distribution and predicting employee capability based on the Time Factor (TF), we acknowledge the critical importance of task quality in workload management systems. Task quality is a multifaceted aspect that encompasses accuracy, completeness, and adherence to standards, all of which are essential for effective service delivery in the public sector. Although this paper does not explicitly model or predict task quality, our system indirectly addresses it through a structured validation process supervised by managers.
In the current framework, tasks are assigned to employees based on their predicted capability, as determined by the Time Factor. Once a task is completed, it undergoes a validation process where supervisors assess the output for compliance with required standards. If the task is deemed incomplete or contains errors, it is returned to the same employee for corrections, and the timer resumes until the task meets the necessary quality criteria. This iterative process ensures that tasks are completed to an acceptable standard, albeit at the cost of additional time. While this approach does not explicitly measure or predict task quality, it provides a mechanism for maintaining quality control within the system.
The decision to focus on task completion time rather than quality was driven by the primary objective of this study: to compare machine learning algorithms for optimizing load distribution in dynamic workload management systems. Task completion time is a more straightforward and quantifiable metric, making it suitable for the comparative analysis of the algorithms in the current research. However, we recognize that task quality is equally important and can significantly impact organizational efficiency and service delivery. For instance, an employee who completes tasks quickly but with frequent errors may ultimately require more time for corrections, negating the benefits of faster task completion. Therefore, while this study does not explicitly address task quality, it lays the groundwork for future research in this area.
6. Conclusions and Future Work
In the current study, we conducted a comparative analysis performed on various machine learning algorithms. These algorithms were executed within the Apache Spark framework to utilize its distributed computing power for large-scale data processing. The primary focus was to enhance workforce management within the public sector by predicting employee capability based on multiple factors, such as work experience, educational background, and age of each employee. After meticulous evaluation, ANFIS emerged as the most effective predictive model.
ANFIS’s superiority can be attributed to its remarkable ability to capture complex non-linear relationships, offering unparalleled accuracy and interpretability. It adeptly navigated the intricacies of employee potential prediction within the dynamic workload management system, demonstrating its invaluable potential in enhancing public sector decision-making processes.
This research underscores the critical importance of selecting a machine learning algorithm tailored to the specific domain and dataset demonstrated in this study. ANFIS stands out as a powerful tool for addressing the complex, intricate challenges of workforce management in the public sector.
The next step is to integrate the ANFIS-based predictive model into operational systems within the public sector. This real-time integration can facilitate dynamic workload management and resource allocation, enhancing efficiency and productivity in public sector organizations. Future studies should consider collecting data from more extensive and diverse samples to further bolster the findings’ generalizability. This can help assess the performance of the ANFIS model in various populations and contextual settings. Acknowledging that ANFIS may require specialized expertise and computing resources, future work can examine resource-efficient implementations or cloud-based solutions, making it more accessible to a broader spectrum of organizations.
The application of ANFIS has showcased its potential to revolutionize decision-making processes. Future work can build upon these foundations to further advance the public sector’s efficiency and effectiveness. Through ongoing research and implementation, the public sector can successfully embrace data-driven human resource management practices to navigate the challenges of the digital transformation era.