1. Introduction and Background
Sports analytics has become an increasingly important tool for enhancing team performance and reducing costs within the fiercely competitive sports industry. This study examines NBA player performance data, employing a variety of data mining (DM) techniques to uncover significant trends and patterns. The insights gained offer valuable guidance for enhancing strategic decision-making, financial management, and tactical planning. It also identifies the key predictors of injury risk, underscoring the critical role of injury prevention and management strategies [
1,
2,
3,
4].
The core issue this research addresses are the methodological underpinnings of sports analytics, with a particular focus on data science. It utilizes principal component analysis (PCA) for feature selection, leveraging feature loadings to estimate injury costs alongside advanced basketball performance analytics. Players were categorized by age and position, and their data analyzed in conjunction with PCA-derived feature loadings to gauge their impact on earnings and performance. Despite the growing prevalence of data analytics in sports, the accuracy and effectiveness of these methods remain to be validated. Thus, the objective was to analyze NBA player data to identify patterns and trends that inform performance and injury risk, aiding in strategic decision-making, financial planning, and tactical execution. The methodology presented can guide future sports analytics research, ensuring the effective and accurate use of data science methods to boost team performance and success.
Sports, including basketball, are affected by variables such as age, position, and health, which influence performance and financial outcomes. This research applies DM techniques to analyze socioeconomic, demographic, and injury-related attributes in relation to basketball performance metrics. The extraction and analysis of sophisticated sports metrics yield insights that support decision makers in strategy, budgeting, tactics, and training to minimize risks and financial losses [
5,
6].
Athletes hold significant popularity beyond the sports sphere, affecting social, economic, and marketing activities. Enhanced player and team performance elevates the social and financial standing of all involved, highlighting the interconnectedness of success across these domains [
7].
Sports clubs prioritize profit-making through competition wins and indirect means like advertising, attendance, player reputation, and partnerships. The integration of sports and business analytics aims to find a balance between victories and cost reduction. For example, one research exploits the optimal ticket price based on the city, team reputation and fan level of interest. Thus, the sports industry tries to combine sports analytics with business analytics in order to find the critical path between wins and cost reduction [
6,
8].
Big data analysis in basketball has shown the potential of economics and game theory in predicting trends across various domains, though accurately projecting such correlations remains challenging. Player and team momentum are called, in sports jargon, “hot/cold hand” and “hot/cold streak” in basketball and baseball, respectively. This momentum is derived from psychological, social, economic and form aspects. Nevertheless, studies have shown that it is difficult to accurately project these correlations [
6,
9,
10]. Another research has found that athletes with high social ranking and reputation designate more “winners” and “losers” characteristics in their playing attitudes [
11].
An athletic injury needs proper medical and recovery assistance for an athlete to return to action. Therefore, sports clubs must know that all parts of the service chain (doctors, physiotherapists, performance coaches, trainers, coaches, etc.) are required for each issue in correlation with appropriate financial analysis at the management level. Hence, injury impact should be seriously considered regarding team strategy, load, rest, and recovery management. Noteworthy sports and other combined analytics provided by microtechnology (such as training process, type of training, injury history, stress tolerance, recovery adaptation, psychological status and leadership skills) have financial implications [
11,
12,
13].
The focus on sports by health and medical businesses since the 1990s highlights the growing economic returns from athlete care, from college to professional levels. Stakeholders in the sports industry play a crucial role in addressing performance-affecting issues, guiding athletes towards success. Sports industry stakeholders consist of doctors, performance coaches, physical trainers, physicians and others taking care of athletes, in order to address issues affecting their performance and lead them into success [
11,
14].
Studies indicate that effective sports data analysis can improve performance and have a positive economic impact. Sports ecosystems rely on data to glean insights, with media and betting companies increasingly utilizing sports analytics for evaluation and prediction [
15,
16]. In addition, media and betting companies increasingly tend to use sports analytics and advanced statistics through metrics, infographics and visualizations for evaluation and prediction purposes [
17]. Resource management is crucial in the sports industry, with the challenge of hiring the best personnel at the lowest cost. Data analysis in talent management is key to maintaining this balance [
18].
The following aims, objectives and corresponding contributions succinctly convey the primary goals of the research. By focusing on enhancing data analysis techniques, supporting strategic decisions in team management, and analyzing the financial implications of injuries, the research directly addresses the critical needs of decision makers in the sports industry.
- (i)
Enhance Feature Selection and Data Preprocessing; Utilize PCA to refine feature selection with a focus on estimating the economic impact of injuries; Implement advanced data preprocessing techniques to minimize data noise and enhance the reliability of results, thereby providing decision makers with a refined understanding of the attributes that influence injury-related expenses in the NBA;
- (ii)
Optimize Team Management and Performance Analysis; Employ clustering techniques to identify peak performance periods of players, based on age and position, in conjunction with advanced basketball performance analytics; This aims to support sports clubs in strategic budget allocation, player selection, rotation management, and fostering a clear strategic vision;
- (iii)
Analyze Financial Implications of Health Pathologies and Injuries; Determine which health pathologies and injuries incur the most significant financial costs; Provide insights that assist decision makers in formulating strategies for game selection, training schedules, load management, and recovery processes to mitigate financial risks and optimize player health and performance.
This study employs DM methodologies to explore the economic and performance impacts of injuries in NBA players, offering scientific value by highlighting the financial losses from such injuries. By analyzing both economic and performance impacts, this research provides a comprehensive understanding of the effects of musculoskeletal injuries on players and teams, thereby informing injury prevention strategies and improving sports player health management.
This research involves the careful assembly and analysis of a comprehensive high-volume and complex dataset, encompassing a collection of player performance metrics, detailed injury records, and economic indicators spanning over two decades of NBA history. Leveraging DM techniques, including PCA for dimensionality reduction, and age group analysis for identifying patterns in player performance and injury data, our study explores the complex interplay between player attributes and their multifaceted effects on performance outcomes and financial implications. Such an approach allows for a depth of analysis, uncovering dynamics that have remained underexplored in the literature.
Notably, this study measures the financial impact of player injuries with adjustments for inflation and the real purchasing power of money. Through this perspective, domain experts can leverage how injuries influence not just the immediate financial landscape, but also the long-term economic sustainability of team operations. It recalibrates the financial burden of player injuries to account for inflation over time, ensuring that financial assessments remain accurate and relevant across different economic conditions. Meanwhile, it provides an insight into how injury-related expenditures affect a team’s purchasing power in the real terms of goods and services that can be acquired, considering the broader economic implications of these costs. By incorporating these insights into our analysis, we can more accurately quantify the economic impact of player injuries, considering the evolving economic environment and its effect on financial planning and management within NBA teams.
This innovative approach not only deepens our understanding of the economic dimensions of sports injuries, but also furnishes stakeholders with critical insights for making informed decisions that align with both current and future financial realities. By integrating these novel analytical tools into our methodology, we offer tangible, actionable insights that extend beyond academic inquiry to inform strategic decision-making in professional sports management. Thus, our research makes a significant leap forward in the application of data science to sports analytics, bridging the gap between theoretical models and practical applications. The introduction of novel and analytical techniques enriches the academic discourse on sports analytics, while also equipping practitioners with the data-driven strategies necessary to navigate the complexities of the modern NBA landscape.
The manuscript integrates classification and clustering methods to analyze NBA player data, with a focus on understanding the impacts of player demographics, injuries, and performance metrics on financial outcomes. This comprehensive approach showcases the manuscript’s methodological contributions to sports analytics, particularly in the innovative application of DM techniques to the field of basketball analytics. Overall, this work provides valuable insights which can inform injury prevention strategies and improve the management of sport player health.
2. Data and Methods
In the followed methodology for this study, as well as the key research questions along with aims and objectives, the aim was to support domain experts to understand the relationships and correlations among socioeconomic and demographic factors, and advanced basketball performance analytics for players and teams.
In this research, demographic indicators refer to the characteristics of NBA players that can be measured and categorized, such as age and playing position. These indicators are used to identify patterns in player performance and injury trends within specific demographic segments. In addition, socioeconomic indicators were defined as factors that relate to both economic and social dimensions, impacting players’ professional careers and teams’ financial decisions. Hence, this includes players’ salaries and the economic impact of injuries. These indicators help us analyse how economic aspects influence, and are influenced by, player performance and team strategies.
2.3. Methodology
The methodology section describes the process used to collect and analyse data on NBA basketball players from 1996–1997 to 2019–2020. The data were primarily scrapped from various sources such as [
46,
47,
48]. In order to gain the maximum amount of information possible, the data were unstructured and heterogeneous, so DM techniques were utilized for feature loading, variance explainability, PCA, and exploratory-based categorization to gain insights. The methodology included data collection, pre-processing, analysis, and result evaluation. The data scraping and wrangling process was executed using Python, SQL scripts, Power BI and the KNIME Analytics Platform.
The code for data scraping, and the relevant KNIME workflows, Power BI and Excel files for this data analysis can be accessed on GitHub at:
https://github.com/vsarlis/NBAinjurydemoeconomics (accessed on 2 April 2023). Data preparation and pre-processing includes several steps to ensure that the data are cleaned, transformed, and formatted correctly. The first step is to identify and remove any missing values, outliers, and irrelevant data. This step is crucial to ensure that the data are consistent and accurate. For that reason, various data sources are selected for cross-validation purposes. Next, the data are transformed by scaling and normalizing the features to ensure that all features are on the same scale. Then, the data are organized into a format that is easy to use and understand, which may include creating new features or variables, or aggregating data by player or team. Lastly, feature selection is performed to select the most relevant features for the analysis and modelling, which is important to ensure that the model is not overfitting and is not too complex. The gathered data underwent an Extract, Transform, and Load (ETL) process for data cleansing, standardization, and homogenization to improve the quality of data for analysis [
53,
54,
55].
The data model flow in
Figure 2 starts with data collection from various sources, followed by data scraping and wrangling, and Python scripts for the retrieval process. Next, data preparation includes cleaning, transforming, and normalizing the data for analysis. Data pre-processing involves cleaning and transforming data to remove any inconsistencies or errors, while data cleansing identifies and corrects or removes errors and inaccuracies through KNIME Analytics tools. ETL transforms data from different sources into a format suitable for analysis and loads them into a data warehouse. Data transformation converts data into another format to make them suitable for analysis, while data consolidation combines data from multiple sources into a single dataset.
In the data analysis stage, dimensionality reduction was applied via techniques such as exploratory data analysis (EDA) and PCA, as well as feature selection methods, to identify the most important variables. Moreover, clustering and classification techniques were used to group similar data points together and to predict outcomes or categorize data into different classes. These methods leverage the power of DM and machine learning to identify relationships, detect anomalies, and make predictions based on the available data. In the last part of this research, visualizations and reports are established to communicate insights and findings to decision makers, while evaluation measures the accuracy of predictions or determines the impact of certain variables on outcomes.
EDA is an important step in the clustering process as it helps to understand the characteristics and structure of the data. EDA can be used to identify patterns, outliers, and trends in the data, which can inform the selection of the appropriate clustering algorithm and the tuning of its parameters. EDA and PCA are powerful techniques that can be used in sports analytics to gain insights into basketball data. EDA is a technique used to understand the characteristics and structure of the data. PCA, on the other hand, is a technique used to reduce the dimensionality of the data and analyse the distribution of the data points. Together, ECA and PCA can be used to identify patterns, trends, and outliers in the data and help inform the selection of the appropriate classification algorithm and the tuning of its parameters [
56,
57,
58].
The data analysis stage focused on advanced basketball analytics selected from 1996–97 up to 2019–20 regular seasons. The consolidated dataset includes 585,800 rows, for each game of 1577 players from 35 teams incorporated with Absence analytics (injuries, health problems, suspensions and not selection reasoning for the game), demographics (Age, country, college, origin, etc.), financial information (player’s salary, team’s salary cap, etc.) and advanced basketball performance analytics. The basketball analytics used in this paper include rating and performance indicators like Points (PTS), Rebounds (REBS), Assists (AST) Total Rebounds Percentage (TRB%), Turnover Percentage (TOV%), Usage Percentage (USG%), True Shooting Percentage (TS%), Assists Percentage (AST%), Steals Percentage (STL%), Blocks Percentage (BLK%), Net Rating (NetRtg), Player Efficiency Rating (PER), Win Share (WS), Win Share per 48 min (WS per_48), Box Plus Minus (BMP or Plus/Minus or +/−) and Value over Replacement (VORP). Data were filtered for players with average playing time over 10 min per game (mpg) who played at least 19 games (gp) in each of the 24 regular seasons. This selection threshold was strategically set to capture data from players who consistently contributed to their teams throughout the season, thereby providing a reliable basis for evaluating performance and financial implications. By concentrating on the regularly participating players, we aimed to enhance the precision of our insights into the dynamics of player performance, injuries, and their economic impacts on NBA teams. This approach allowed us to extract meaningful patterns and trends from the vast dataset, ensuring that our conclusions are drawn from a foundation of substantial, consistent player engagement across multiple seasons.
In
Table A1, the calculated inflated rates are adjusted for 2019–2020 as the baseline dollar value, taking into account the time value of money that was adjusted for the salaries in order to measure the changes in each year of the purchasing power of money [
59]. Inflation can affect the salaries of NBA players by reducing the purchasing power of the dollar. If a player’s salary is not adjusted to keep up with inflation, they may find that they are struggling to make ends meet. On the other hand, NBA players’ salaries have been rising faster than inflation in recent years. This is partly due to the increasing popularity and revenue of the league, which allows teams to pay higher salaries to attract top players. Additionally, the NBA has a salary cap that is determined by the league’s revenue, which tends to increase over time. So, the players’ salaries tend to increase as well [
60,
61,
62,
63,
64,
65].
To improve data quality and insights, data that existed in more than one source were kept, and outliers were excluded from the consolidated dataset, resulting in a final working dataset of 490,584 rows. Inflation-adjusted rates were calculated as a baseline dollar value to measure the changes in the purchasing power of money over time. The methodology also mentions the limitations of the approach and potential areas for future work. The results of this study provide valuable insights into the relationships between various performance metrics, demographics, and financial information of NBA basketball players.
To ensure the robustness of our methodology and the reliability of our findings, we employ additional quality controls, such as mutual information, correlation-based feature selection, and recursive feature elimination. These methods rigorously ascertain the relevance of features for our classification tasks, ensuring that our analysis is both precise and pertinent. Furthermore, the integration of EDA, ECA, and PCA facilitates a comprehensive yet nuanced exploration of basketball analytics. We enhance our analysis with visual representations, including plots and graphs, to elucidate the relationships between features and class labels vividly. This visual approach not only aids pattern recognition, but also the identification of outliers and trends, providing a more accessible interpretation of complex data for decision makers in sports management. By adopting these advanced analytical techniques and stringent quality controls, our research offers deep insights into basketball data, supporting informed decision-making for coaches, team managers, and players. This methodological framework not only elevates the analytical rigor of sports analytics, but also ensures that our conclusions are derived from a thorough and reliable examination of the data [
56,
58,
66,
67].
3. Results
Basketball is a sport with a high level of ambiguity and parameters intercorrelated in a multivariate model. It presented the findings from clustering, in terms of age (RQ2), and feature selection for injury cost relationships (RQ1) according to state-of-the-art techniques. For the age and the position criteria regarding high earnings in the NBA league, clustering and demonstrated associations were performed between demographics and financial analytics (RQ3).
Figure 3 indicates that musculoskeletal injuries cost approximately half (47.2%) of the total cost of health pathologies and injuries in NBA league teams, while head injuries cost 27.4% and general health issues provoke the 25.4% of total financial losses while players are injured, out of the line-up but earn a salary. Data over 10 years was analysed and the total cost of health pathologies and injuries for NBA league players was found to be over 150 million dollars. The annual evolution of this cost, starting at some 12 million dollars in 2010–11, all the way to 16 million dollars in 2019–20, is illustrated in
Figure A1. The high-level effect of injuries by organ system and major anatomical area on NBA is presented in
Figure 3. The study analysed NBA player performance data spanning from the 1996–97 to the 2019–20 seasons, sourced from official publicly available data sources [
46,
47,
48]. In the current study, the financial losses from player health pathologies and injuries were investigated based on the injury classification performed in previous research [
2] (RQ1).
Additionally, in this study, data analysis for 24 NBA league seasons (from 1996–97 up to 2019–20) was conducted with the purpose to discover the player position earning the highest salary per year. For that reason, 1577 players were categorized into three classes, namely Centers (“C”), Forwards (“F”) and Guards (“G”). A “Forward” player is someone used in the positions of Power Forward and Small Forward, while a “Guard” is used as Shooting Guard or/and Point Guard. A “Center” is used a Center most of the time (RQ3).
Figure 3 and
Figure A1 present these three classes in correlation with the average salary for each class. According to this, Centers are the best-paid players in the league, not only for their height but also for their good performance [
68,
69,
70,
71]. Based on the law of supply and demand, the position of a “Center” player shows the relationship of players in that class with the salary that teams offer. It is difficult to find a player of such height and of the highest quality and performance at this competitive level [
72].
According to data analysis for 1577 players, Centers accounted for 21.3% of the total, with an average salary of USD 6.84 million, while Guards were 38% of the total, with the lowest average annual salary of USD 5.37 million, over 24 NBA Seasons (
Table 1) (RQ3).
Figure 4 displays a detailed analysis of NBA salary trends over two decades, differentiated across time and grouped by player positions: Centers (C), Forwards (F), and Guards (G). Each column represents the average inflated salary for these positions, offering insights into the economic evolution within the league from 1996–97 to 2019–20. The data are adjusted for inflation, providing a clear view of the real earning power over time. A year-over-year percentage change is also indicated, highlighting salary fluctuations which can be associated with market changes, collective bargaining agreements, and shifts in the league’s economic landscape. For instance, Centers saw a peak increase of 34.02% in 1997–98 and 33.21% in 2016–17 season, while Guards experienced a significant rise of 37.54% in the 2016–17 season. Forwards encountered a less dramatic, but still notable peak increase of 29.02% in the 2015–16 season.
The global average inflated salary across all positions demonstrates a steady growth in player compensation, with an average salary increase of 7.49% from the previous year by the end of the 2019–20 season. These insights are particularly relevant when considering the financial strategy of teams and the negotiation dynamics of player contracts. Furthermore, the data underscore the importance of position-specific financial planning, especially when accounting for the impact of player injuries and the subsequent adjustments that may be necessary for team salary caps and budgeting.
Figure 4 also illustrates NBA annual salary drill down analysis per position. From 1996–97 season up to 2019–20, there is an increase in salaries for “Centers” from USD 3,88 to USD 9.44 million (143% increase), for “Forwards” from USD 3.54 to USD 7.6 million (115% increase) and for “Guards” from USD 3.26 to USD 8.05 million (147% increase), and an average increase in salaries for all players from USD 3.49 to USD 8.17 (134% increase) (RQ3). Based on the results, it seems that the game is changing, and all players are eligible and capable of playing in every position needed during matches.
Without a doubt, the age criterion is vital for players regarding their career. Based on that, sport clubs invest in players’ careers, create their strategies, structure their team roster, and finally make important decisions for their budget. Age influences performance, as professional basketball players older than 30 years have lower speed and/or jumping ability than younger ones, which means that performance declines severely by that age [
73].
In addition, this research study aimed to correlate NBA player age with salaries and advanced basketball performance analytics.
Figure 5 presents a matrix with the attributes of age, salary and advanced basketball performance analytics contained by PTS, REBS, AST, TRB%, TOV%, USG%, TS%, AST%, STL%, BLK%, NetRtg, PER, WS, WS per_48, BMP or Plus/Minus or +/− and VORP which used, as the most appropriate rating, basketball analytics in the background [
1,
19,
74,
75,
76] (RQ2 and RQ3).
Figure 5 and
Figure A2 show the financial cost simplification regarding the average annual salary paid by each team in the analogy of 24 NBA seasons from 1996–97 to 2019–20. New Orleans Pelicans (NOP), Miami Heat (MIA) and New York Knicks are the top 3 clubs with the highest paid salaries: USD 6 M, USD 5.74 M and USD 5.72 M, respectively. On the other hand, Atlanta Hawks (ATL), Charlotte Hornets (CHA) and Philadelphia 76 ers (PHI) are the teams offering the lowest salaries, USD 4.04 M, USD 4.03 M and USD 3.97 M, respectively. The bottom 3 clubs, SEA, NJN and VAN, were excluded from this comparison because these teams ceased to exist many years ago, when lower salaries were the norm.
As illustrated in
Figure 6, the green highlighted cells are those which demonstrate players reaching their peak performance within each age group. Studying 24 seasons of the NBA league showed that players aged between 27 and 29 are at the peak of their careers regarding average performance in the provided most-accredited basketball advanced rating analytics. However, their maximum earnings on average are achieved when they are between 29 and 34 years old (RQ2 and RQ3) with the maximum average salary for an NBA league player achieved at age 34.
Regarding performance evaluation, there is high complexity and limitation in correlation analysis between tactics formation, SportsVU video-tracking analytics, physical performance measurements and psychometric/biometric analytics. Based on the possible implications proposed, this study can be considered as providing valuable insights to decision makers. By performing clustering through dimensionality reduction, based on feature extraction (PCA) and feature selection (Wrapper, Filter, and Embedded methodologies), eight age clusters were formed, as presented in
Figure 6 and detailed in
Table 2. Approximately 50% of players in the NBA are in the age group 23 up to 28. Players aged between 29 and 33 receive the highest salaries of approx. 8.2 million dollars.
Table 2 and
Figure A3 categorize NBA players into eight distinct age groups to reflect various financial tiers within player contracts, and capture the diversity of player earnings. This segmentation is critical for analysing the financial impact of player injuries on team economics. The age ranges facilitate an in-depth exploration of how age influences both the occurrence of injuries and salary structures in the NBA. The data showcases the percentage of players in each age cluster, the average salary adjusted for inflation in U.S. dollars, and the associated PER, a measure of their on-court performance. These groups are essential for investigating correlations and patterns that elucidate the complex relationships between player age, performance, and their accrued professional experience within the league. By doing so, the study provides a nuanced understanding of how age-related factors contribute to economic considerations in team management.
Table 2 focuses on the overall percentage of players by age, their inflation-adjusted salaries, and efficiency ratings (PER), while
Figure A3 breaks down players by age, position, and includes detailed performance analytics. Together, they suggest that players’ salaries tend to increase and peak during their late twenties to early thirties, whereas the PER peaks in the mid to late twenties and then varies or decreases. Younger players make up a smaller portion of the league, and the most common age bracket is from 23 to 26 years old. Overall, these insights indicate a complex interplay between age, experience, position, and financial compensation in the NBA.
Guards (G) show the best playing performance when aged 29–30 and, in the same cluster, they are getting the highest salary on average with ~7.5 million dollars. In addition, Forwards (F) play better when aged 27–28 with the highest salary of ~8.6 million dollars when they are 31–33 years old. On the other hand, Centers (C) are paid more when aged 31–33 with 9.4 million dollars and their highest performance is when aged 25–26.
The radial chart (also known as a spider chart) in
Figure 7 visualizes and compares the average salaries of teams in the timespan of 24 years. A supportive explanation for the teams is as follows:
3 New Jersey Nets changed their brand name at the end of season 2011–12. From 2012–13 season onwards, they use the name Brooklyn Nets;
4 New Orleans Hornets changed their brand name at the end of season 2012–13, and from the 2013–14 season onwards, they use the name of New Orleans Pelicans;
5 Seattle SuperSonics played their last season in the NBA in the season 2007–08;
6 Vancouver Grizzlies played for 6 seasons in the period from 1995 to 2001.
The tornado funnel diagram (
Figure 8) illustrates the correlation between age and inflated salary for all NBA players in descending salary order. The percentage in brackets indicates the percent of the first in order to have a clear comparison. Based on the results, it was concluded that players in the age group of 29–34 get the highest salaries (
Table 2). Hence, teams select them as experienced professionals to use as mentors to teach younger players and achieve the best mix for achieving more wins and championships.
Limitations and challenges such as the availability and quality of data, data size, complexity of the game, selection bias, injuries, human factors, validity of the metrics and modelling assumptions. Basketball is a complex game, making it difficult to accurately capture all the nuances and interactions of the players and teams. Selection bias, injuries and player turnover can greatly impact a team’s performance, making it difficult to draw accurate conclusions about a team’s true talent level. Additionally, human factors such as coaching, player motivation, and team dynamics can be difficult to quantify and incorporate into analytics.
4. Discussion
Important criteria, such as age, nutrition, training status, history and load monitoring, psychological status, social status and lifestyle, stress tolerance and proper recovery procedure need to be considered in order to avoid the extra costs that provoke an injury or health pathology [
11].
Players during, their careers, can change playing position for various reasons, such as coaching staff decision, opponent adaptation, specific skills targeting, body transformation, and age criteria. For consistency purposes, a player was assigned to his most usual position (RQ3).
A basketball team comprises 12 active players; these players are eligible to play at any moment during a game. Therefore, proper player selection in the roster can enhance team performance and each minor good decision during every second of the game can take the lead or provide the win in a match. Over the past 10 years, the trending coaching style has been to distribute playing time to all the players of the roster by keeping or changing the game tempo according to their strategy. For that reason, the nomination of the 6th player of the NBA league is a good sign that technical staff are giving extra attention to all the players of the roster. Based on that, appropriate roster selection, team management, role assignment and rotation are significant factors with the purpose to achieve better results, and can provide added value to a team [
77,
78,
79,
80].
NBA league seasons, with important salary decreases, occurred in 2011–12 and 2012–13 due to the NBA lockout decision, which shortened the season from 82 to 66 games per team, with a −3.38% and −9.52%, respectively, total percentage average annual salary difference compared with the previous year’s average salaries [
50,
81]. The worldwide recession of the global economy between 2007 and 2009 influenced NBA player salaries, resulting in an 8% average decrease in season 2009–10 [
82,
83] (RQ3).
The aggregated data containing basketball advanced performance analytics, demographics, and financial information were presented earlier. Additionally, the combined data uncover relationships and influences between performance, demographics and economic analytics for teams and players. According to the pre-processed aggregated data, the relationship between player health pathologies/injuries and the financial cost these imply in the NBA competition were examined.
Table 1 illustrates three different groups, namely general health problems, head injuries and musculoskeletal injuries in the NBA league for 10 seasons, starting from 2010 up to 2020. It is clear for club owners, as well as medical and technical staff, that they should organize and structure their strategy in terms of training, game selection, load management, and recovery procedure with the purpose that the players should return active and healthy in the team roster, avoiding any additional implications (RQ1 and RQ2).
The novelty of our research lies in the sophisticated integration of clustering and classification techniques through exploratory-based categorization, complemented by the application of PCA-driven pattern recognition for nuanced feature selection. This approach enables us to unearth complex associations among demographics, performance metrics, and financial variables specific to NBA players, offering fresh perspectives on how these factors interrelate to influence outcomes within the league. The methodology goes beyond conventional analyses by thoroughly detailing the data collection and preprocessing steps, ensuring the integrity and robustness of our dataset before applying our advanced analytical techniques.
Our analysis extends into the economic domain, investigating the consequences of player injuries and health issues, an aspect only marginally explored within sports analytics. Through a rigorous quantitative methodology, we unveil the profound economic burdens these health-related factors impose on NBA teams. Our research outlines the direct and indirect costs associated with player injuries, from immediate health expenses to longer-term impacts on team performance and player valuation. By mapping these economic implications, our study yields critical insights, fostering more informed strategies around player health management and financial structuring. This comprehensive approach not only charts a course for the application of advanced analytics in sports management, but also aids empirical research on the complex interplay between player health and the financial mechanisms of professional sports leagues.
Addressing threats to the validity of this study, we acknowledge the exclusion of discussions on potential confounding variables that could influence the observed relationships. The scope of our study does not encompass factors like nutrition, training programs, load monitoring, and psychological or social influences, which could also play significant roles in injury prevention and recovery. While these dimensions merit consideration, our focused examination on the NBA league provides a foundational analysis that prompts further inquiry into these areas. Moreover, our findings, while robust within the context of NBA data, may necessitate adaptation for applicability to other basketball leagues or sports. This limitation underscores the need for tailored analytical frameworks to understand the unique dynamics of different sporting environments [
84].
5. Conclusions and Future Work
The aim of this work was to help decision makers to assemble their vision, strategy, make appropriate budget allocation and investments, and win championships with the minimum cost in terms of money and time. Age and position criteria are factored in player wage estimation, based on advanced basketball analytics. In addition, injury history status and health problem patterns can help technical staff to structure their training, load management, psychological encouragement, recovery procedures, tactics, and player nutrition.
Throughout the manuscript, the exploration of multivariate relationships between player performance metrics, salaries, and age is meticulously conducted. Advanced DM techniques are employed, encompassing the application of PCA for feature selection, PCA-driven pattern recognition, exploratory-based categorization, as well as classification and clustering processes. The feature selection process is pivotal, enabling the identification and prioritization of the most significant variables impacting player performance and economic outcomes. Subsequently, classification methods are utilized to categorize players based on these selected features, revealing insights into how different attributes correlate with performance levels and financial rewards. Clustering techniques further segment players into distinct groups based on age, position, and performance metrics, facilitating a deeper understanding of the dynamics within player cohorts. These combined methods afford a nuanced comprehension of the complex interplay between a player’s age, performance, and salary, transcending the capabilities of simple database queries. By integrating feature selection with classification and clustering, patterns and insights crucial for strategic decision-making in sports management are unveiled, offering a comprehensive analysis of the underlying economic and performance-related implications.
Our study enriches the sports analytics field by introducing a novel methodology that integrates PCA for refined feature selection, clustering techniques to pinpoint players’ peak performance periods, and an in-depth financial analysis of the impacts of health pathologies and injuries. Unlike the existing literature, our approach offers a more precise understanding of how player attributes influence performance and economic outcomes in the NBA.
The avoidance of injuries is significant in sports. According to the findings, musculoskeletal injuries cost 47.2% of the total health problems and pathologies in the NBA league (RQ1). Based on the age criterion, clubs can form their strategy and allocate their budget accordingly (RQ2).
One of the main objectives was to provide relationships in the form of the metrics of economics, demographics, injuries, and pathologies using advanced basketball statistics. Sports performance is a multivariate model difficult to predict because of its extensive uncertainty. Some of the parameters involved include shape, psychology, social background, lifestyle, mental status, playing usage and monetary traits [
85]. There are many interdependencies and correlations among these parameters. Therefore, the purpose of analytics is to minimize the risk and make forecasting for uncertainties with the purpose to maximize efficiency [
1].
Player position is important not only for starting line-up selection, but also for rotation management, as well as financial reasons (RQ3). NBA players who belong to the age group between 27 and 29 are at the peak of their performance, based on advanced basketball analytics, while they get the highest average salary between 29–30 years old (RQ2 and RQ3). Additionally, NBA top level players in the “Center” position are fewer (21.3% of the total) but earn more (USD 5.69 million) in comparison with “Forwards” (USD 5.1 million) and “Guards” (USD 4.48 million). Therefore, “Centers” dominate the NBA league in terms of the highest annual earnings, but it is difficult to discover “Centers” combining quality and performance characteristics (RQ3).
Further, this study proposes to explore the socio-economic impacts of NBA careers more deeply. By analysing how changes in player performance due to aging or injuries affect their marketability, sponsorship opportunities, and post-career prospects, this research can offer a holistic view of a player’s economic lifecycle within and beyond their active sports career.
Basketball is a team sport that focuses on minor details prior to and during a match. Bad or good choices, effective game tempo, the condition in the first minutes of the match, and efficient offensive and defensive matchups are some examples that lead to advantages or disadvantages to a club against opponents. However, there are examples of players who influence a match with their energy, motivation, and psychological support to the whole team. These are characteristics that data analysts cannot quantify and allocate to statistical categories but need to be combined and aggregated into a sophisticated model. However, domain experts, with the help of data scientists, can recognize important patterns through the game in order to increase productivity and minimize any upcoming bias [
86]. Nonetheless, data experts and technical staff can use advanced analytics for evaluation, tracking, monitoring, and forecasting purposes of team and player performance. Based on these analyses, technical staff can make significant decisions for roster management, team culture fit, strategy, vision, scouting, future transfers and new talent acquisitions [
87].
Future Work
In addition to current research, further work can be applied to a variety of sports analytics predictive or descriptive segments. Possible areas of future focus include the impact of psychology on player and team performance, medical and health issues, recovery, tactics, biometrics, social, nutrition and leadership. These can be studied not only for basketball but also for other team sports. Examining sports data analysis for individual sports is an opportunity to identify patterns and insights as an appealing alternative.
For club owners, management, and technical staff, it is vital to receive accurate predictions for uncertain situations that could cost them either financially or in terms of performance. The psychological and behaviour analysis of athletes is also vital to prepare them mentally. Nowadays, the engagement of players and teams through social networks is extensive, motivating sports data scientists to focus on them with the purpose to retrieve valuable information and insights. Studying social networks can facilitate behavioural analysis, build persona clustering, and recognize patterns. These events can be associated with fan and team activities for descriptive analysis, but also to make associations and predictions for the future in a variety of attributes, as discussed previously.
There are many platforms and applications that use data to quantify biometrics, muscular soreness, sleeping quality, physical condition, nutrition, fan engagement, training, marketing analytics, and social analytics. The purpose is not only for data visualization but also for a sports business focus to avoid extra costs. In addition, the convergence of science and technology can reduce extra investments and optimize the productivity and effectiveness of a team and player’s daily routine.
The COVID-19 pandemic created new standards in every moment worldwide and influenced dramatically every sport. For that reason, the sports industry should adjust the training, fan engagement, and contacts, and focus on every detail that could affect any participant of each team. Data science and analytics can similarly help to minimize any upcoming risk among sports members and also aid with a wide range of physical and mental issues that could provoke economical, mental and health problems [
88,
89].
Additionally, it could identify injury factors with prescriptive analysis. Big data analysis can define which game period of a sport causes the most injuries and whether fatigue plays a significant role in this health problem. Furthermore, it is useful to understand how much player fatigue can impact the last minutes of the game and if the leadership and mental skills of a player can alleviate this impact. Finally, the quality of sleep, long trips and number of trips can influence players and teams to perform well or not. Analysis on this topic can help technical staff in rotation and load management with the purpose to keep the team fresh.