1. Introduction
The COVID-19 pandemic has indelibly marked global public health, economic stability, and societal norms. As of 2024, the virus has prompted varied responses from different jurisdictions, influenced by geographic, demographic, and socio-economic factors [
1,
2]. Initial measures in the United States focused predominantly on curtailing transmission through widespread lockdowns and mask mandates, often without consideration for regional disparities in healthcare access, political alignment, and public compliance levels. COVID-19 has had devastating impacts on the United States (U.S.) population. Specifically, the pandemic impacted certain populations differently than others based on demographics [
3,
4,
5,
6], financial status [
3,
7], behavioral/psychographics [
4,
8,
9,
10,
11], and geographies [
6,
8]. Additionally, socio-economic factors such as exposure to media and political party affiliation [
12] played an important role in influencing vaccine uptake, [
9] and different jurisdictions (e.g., counties, states, and federal) had varying responses on how to control and reduce risks of the pandemic (e.g., mask mandates and school closures) [
8,
13].
With the introduction of multiple vaccine boosters and new vaccine technologies, understanding the current landscape of vaccination uptake is critical. This variability in pandemic responses and their outcomes presents a critical opportunity for analysis. Understanding the effectiveness of different strategies across diverse contexts is vital for preparing more resilient public health responses in the future. Thus, our study focuses on a data-driven approach to dissect these varied responses within the United States at the county level.
As the global community continues to navigate the challenges of COVID-19 and future pandemics, the insights derived from this analysis aim to contribute to a more informed, agile, and region-specific response strategy that can be adapted to the unique needs of diverse populations.
Employing clustering analysis, this research identifies patterns and correlations between pandemic outcomes and the socio-economic characteristics of counties. By applying unsupervised machine learning techniques, specifically K-Means and hierarchical clustering, we categorized counties into distinct groups based on their performance in managing COVID-19—assessed through metrics such as mortality rates and vaccine uptake. The analysis was further enhanced by Principal Component Analysis (PCA) to identify the most significant variables contributing to these outcomes, ensuring a robust dimensionality reduction and feature selection process. Specifically, we aimed to achieve the following:
- 1.
Analyze vaccination rates and COVID-19 mortality rates at the county level.
- 2.
Explore the relationship between these rates and various socio-economic variables such as median household income, education levels, racial composition, and political affiliation.
- 3.
Identify distinct clusters of counties that share similar socio-economic profiles and vaccination responses.
- 4.
Provide insights and recommendations for public health agencies to improve vaccination uptake and manage pandemic responses more effectively.
The clustering of the demographic data allowed us to identify distinct groups of U.S. counties that share similar socio-economic and demographic profiles. By doing so, we can understand how different socio-economic factors influence COVID-19 vaccine hesitancy across various regions. This approach helps in recognizing patterns and disparities that may not be evident when analyzing data on an individual basis. In addition, the criteria for selecting variables were based on their relevance to vaccine hesitancy and their availability at the county level. Key variables included socio-economic factors (e.g., median household income and higher education levels), demographic factors (e.g., racial composition and age distribution), health-related factors (e.g., vaccine hesitancy rates and COVID-19 case rates), and political affiliation (e.g., percentage of Republican voters). These variables were chosen because they are known to influence health behaviors and vaccination uptake, as documented in previous research. The selected variables provided a comprehensive overview of the factors that potentially affect vaccine hesitancy. By including socio-economic, demographic, health-related, and political factors, we aimed to capture the multi-faceted nature of vaccine hesitancy. Some variables were excluded due to a lack of reliable data at the county level or because they were not directly relevant to the focus of our study. For example, variables like federal education investment were excluded due to high levels of missing data, and variables like ICU bed occupancy were excluded because they were not directly linked to vaccine hesitancy.
This study is unique in its comprehensive approach to clustering U.S. counties based on a wide range of socio-economic, demographic, health-related, and political variables. While previous studies have focused on individual factors influencing vaccine hesitancy, our study integrated multiple variables to provide a more holistic view. Additionally, the use of advanced clustering techniques like K-Means and hierarchical clustering sets our study apart by identifying distinct county clusters that share similar profiles.
In the following section, we highlight the relevant academic research conducted on this subject to date and their implication for informing future pandemic decision making.
2. Literature Review
During the COVID-19 pandemic peak years of 2020–2021, the United States followed a broad, conservative approach to contain the pandemic that focused strongly on preventing transmission [
8]. This was logical, given the contagious nature of the virus; however, there might have been underlying systemic differences putting certain groups of society at a bigger disadvantage than others [
6,
13]. Vaccine hesitancy, even to date, has been cited as a barrier to the effective control of COVID-19, therefore, it is imperative to understand its root cause to better handle future unforeseen circumstances. Previous geography-specific analyses on the spread of the pandemic were mostly limited to the number of “cases”, “deaths”, and “vaccines” [
14,
15,
16,
17]. This was due to the unavailability of the data [
18] and the limitations of those data that do exist [
6,
19,
20]. While current approaches are informative in nature, they lack the prescriptive aspect of data analysis and fail to provide recommendations on what actions should be taken to promote vaccination.
Clustering analysis has been increasingly used in recent studies to explore various dimensions of the COVID-19 pandemic. For instance, Otto et al. utilized advanced clustering techniques to examine mobility data and its correlation with virus transmission rates across Europe, highlighting how mobility patterns could predict outbreak severity [
21]. The integration of socio-economic data into clustering analyses has also been a critical development, revealing disparities in health outcomes. A study by Paul et al. combined health data with economic indicators using a multi-layered clustering approach, identifying high-risk areas and suggesting targeted interventions [
22].
In the United States, studies have applied K-Means and hierarchical clustering to better understand the socio-economic drivers behind the pandemic’s impact. For example, Callaghan et al. and Fridman et al. demonstrated that political beliefs and party affiliations significantly impact individuals’ willingness to get vaccinated against COVID-19. Areas with higher proportions of Republican voters tend to have lower vaccination rates, likely due to differing attitudes towards the pandemic and vaccination policies [
23,
24].
Clustering techniques have also been used to analyze spatial and temporal patterns of disease spread, healthcare access, and vaccination behaviors. Singh et al. utilized hierarchical clustering to identify regions with similar healthcare access and outcomes, providing insights for targeted healthcare interventions [
25]. Wang et al. used K-Means clustering to categorize countries based on their COVID-19 response strategies, revealing distinct clusters with varying levels of success in controlling the virus [
26].
Combining socio-economic variables with vaccination data in clustering analysis provides a comprehensive view of the factors influencing vaccine uptake. Biswas et al. conducted a clustering analysis integrating socio-economic, demographic, and health-related variables to study the determinants of COVID-19 vaccine acceptance in India, highlighting the importance of considering multiple dimensions to understand vaccine hesitancy comprehensively [
27]. Similarly, Barber and Kim integrated socio-economic and behavioral data in their clustering analysis of influenza vaccination rates, identifying distinct clusters with unique vaccination behaviors and socio-economic profiles [
28]. These studies underscore the utility of clustering analysis in understanding and managing pandemics by linking data-driven insights with practical policy applications. They also highlight the potential for these techniques to facilitate better preparedness and response strategies for future global health crises.
While significant progress has been made in understanding the determinants of COVID-19 vaccination uptake, gaps remain in the literature. Many studies have focused on individual factors, such as income, education, or political affiliation, without considering the combined impact of these variables. Additionally, there is limited research on the application of advanced clustering techniques to analyze vaccination behaviors at the county level in the United States. This study aims to fill these gaps by employing hierarchical and K-Means clustering to analyze the combined impact of multiple socio-economic variables on COVID-19 vaccination rates at the U.S. county level. Additionally, we aim to employ multiple regression analysis within each formed cluster to evaluate the impact of various socio-economic variables on vaccine hesitancy for the corresponding cluster of counties to provide insights on how targeted policy making can assist in tackling issues in the times of pandemic. In the following sections, the data sources and methodologies to accomplish this goal are provided in detail.
3. Materials and Methods
This section outlines the structured approach undertaken to gather, clean, analyze, and visualize the data used in our clustering analysis of COVID-19 responses at the county level in the United States. The methodology is divided into two main areas: (1) data collection and cleaning and (2) computation and analysis.
3.1. Data Collection and Cleaning
The data utilized for this study were obtained from a study by Dong et al. [
29] at the Johns Hopkins University where various healthcare-related and socio-economic variables were selected to ensure a robust and comprehensive dataset. The data were sourced from various reputable agencies including the Economic Research Service (ERS) at the United States Department of Agriculture (USDA) [
30], the Census Bureau [
31], Johns Hopkins University (JHU)’s COVID-19 research center [
32], and Harvard University’s dataverse [
33].
Table 1 summarizes variables used in this study and, to ensure uniformity, variables were codified into concise labels as summarized. The codification of each variable follows the XX-YY pattern, where XX is a two-letter abbreviation for the category of the variable and YY is a two-letter abbreviation for the name of the variable itself. For instance, the median household income was labeled as SO-MI, SO representing
socio-economic category and MI representing
median household income.
The obtained data account for 3063 counties (from the total of 3142 counties) in the United States. There were a total of 79 counties not included in this study due to the unavailability of some data among the variables used in this study. The missing counties are mostly located in Nebraska, South Dakota, and Iowa in the United States.
It is important to note that vaccine hesitancy (HE-VH) data were obtained from the work by Tiu et al. [
34] who leverage the COVID-19 vaccination tracking dashboard maintained by Georgetown University [
35]. The vaccine hesitancy metric in the collected data was based on the estimates provided by the U.S. Department of Health & Human Services [
36] by means of surveying the population within the United States. Additionally, COVID-19 case rates (HE-CD, HE-CS) refer to cumulative death cases due to COVID-19 per 100,000 people as of December 2021 and September 2021, respectively. The reason for inclusion of the data for both timeframes can be attributed to COVID-19 vaccine approval in November 2021 in the United States; therefore, a measure of the COVID-19 rate before and after vaccine approval was provided for comparison purposes. However, due to high correlation, only one of the COVID-19 case rate variables (HE-CD) was used as a predicting variable in the regression analysis to avoid multicollinearity.
Finally, a preliminary analysis of the distribution of the variables used in this study was performed by means of density plots to better comprehend the variations compared to one another. Density plots in
Appendix A.1 illustrate this notion further. It is important to note that these figures have been augmented with the cluster categories obtained from the K-Means model to better assist with the comparison.
Subsequently, we screened the data for completeness and accuracy, removing any variables with more than 50% missing values to maintain the quality of our analysis. Non-numeric and irrelevant columns were excluded, and missing values were addressed by removing rows with incomplete data. These data cleaning and pre-processing steps provided a solid foundation for the subsequent clustering and analytical procedures.
Next, we examined the correlation matrix to understand the relationships between different variables. The correlation plot in
Figure 1 revealed several notable associations. For instance, vaccine hesitancy rates in December 2021 (HE-VH) were negatively correlated with the percentage of individuals with higher education (SO-HE). Conversely, political affiliation (percentage of Republican voters (PO-RE)) showed a positive correlation with vaccine hesitancy (HE-VH), aligning with previous findings that political beliefs significantly impact vaccination behaviors.
In addition to the preliminary and exploratory analyses of the data, various computational methods were employed to better understand and interpret the relationship among variables and profiles of the population.
Table 2 summarizes these methodologies and the applicable uses/domains they were applied to in this work. More in-depth explanations and applications are provided in the subsequent sections.
Table 1.
Description of variables used in the analysis.
Table 1.
Description of variables used in the analysis.
Code | Category | Variable Name | Description |
---|
DE-FI | Demographic | FIPS | County-level FIPS code [30] |
DE-PO | Demographic | Population 2020 | Population as of 2020 [31] |
DE-MA | Demographic | Median Age | Median age of the population [37] |
DE-BP | Demographic | Black Population (%) | % of the population that is Black alone [38] |
DE-HP | Demographic | Hispanic Population (%) | % of the population that is Hispanic or Latino [39] |
SO-HE | Socio-economic | Higher Education (%) | % of adults with a high school diploma only, 2015–2019 [30] |
SO-MI | Socio-economic | Median Household Income | Median household income [40] |
SO-VH | Socio-economic | Vehicles per Household | Average number of vehicles per household [41] |
HE-MV | Health | MMR Vaccination Coverage | Measles, mumps, and rubella (MMR) vaccination coverage (0–1) [42] |
HE-VH | Health | Vaccine Hesitancy December 2021 | % of the population hesitant to vaccinate as of 15 December 2021 [35,36] |
HE-VS | Health | Vaccine Hesitancy September 2021 | % of the population hesitant to vaccinate as of September 2021 [35] |
HE-CD | Health | Case Rate December 2021 | Cumulative COVID-19 death cases per 100,000 people as of December 2021 [35] |
HE-CS | Health | Case Rate September 2021 | Cumulative COVID-19 death cases per 100,000 people as of September 2021 [35] |
HE-WI | Health | Without Insurance (%) | % of people who reported not having health insurance [43] |
PO-RE | Political | Republican (%) | % of Republican voters during the 2020 presidential election [44] |
These initial findings provided an overview of the socio-economic factors influencing vaccination rates and highlighted the complex interplay between these variables. The insights gained from the correlation analysis provided insights for a more detailed clustering analysis, enabling us to categorize counties into distinct groups with similar characteristics and vaccination responses.
In the following sections, additional statistical methods are employed to set the stage for a more comprehensive clustering analysis.
3.2. Computation and Analysis
To ensure a robust clustering analysis, we scaled the data to normalize the variables. This step was crucial for mitigating the impact of different units and scales, ensuring that each variable contributed equally to the clustering process. Next, we performed Principal Component Analysis (PCA) to reduce the dimensionality of the dataset. The cumulative explained variance plot (
Figure 2a) was used to determine the optimal number of principal components. This plot illustrates the cumulative proportion of variance explained by each additional principal component. We observed that the first eight principal components accounted for 90% of the total variance in the data, which informed our decision to retain these eight components for subsequent clustering analysis.
To identify the optimal number of clusters, we employed the silhouette method. The silhouette plot (
Figure 2b) evaluates the separation distance between the resulting clusters, helping to determine the number of clusters that best fit the data. The silhouette analysis indicated that five clusters provided the best separation, as evidenced by the highest silhouette score at this point.
Further details on clustering analysis, its interpretation, and implication for policymakers are provided in the subsequent sections.
4. Results
Both K-Means and hierarchical clustering were conducted on the processed and scaled data. The results from K-Means clustering is summarized in
Table 3. In this table, for each cluster formed as an output of the K-Means algorithm, the mean of each variable used in this study is calculated and summarized, and the values for the same variables and across clusters can be compared. As the table illustrates, for some variables such as SO-VH and HE-MV (except Cluster 4), there are no significant differences across clusters, and, no matter the type of cluster that a certain number of counties are part of across the United States, the mean for these variables do not vary substantially. On the other hand, there are variables such as PO-RE, HE-WI, SO-HE, DE-BP, and DE-HP that differ significantly across clusters, which illustrates the need for further evaluation and analysis of how different the targeted health campaigns should be when addressing these groups of counties within different clusters.
Pairwise association analyses were also performed to better interpret relationships among variables against one another.
Figure 3 illustrates the pairwise relationship among main healthcare-related variables and political preference, where counties are separated by different colors corresponding to their cluster numbers. A more detailed visual is provided in
Appendix A.2. As can be observed, some clusters such as Clusters 2 and 5 are distinctly separated from one another in most subplots. Additional information regarding the interpretation and insights is provided in the
Section 5.
Additionally, the resulting dendrogram from the hierarchical clustering in
Figure 4 illustrates common socio-economic profiles and vaccination behaviors.
Figure 4 shows the dendrogram resulting from hierarchical clustering using Ward’s method [
45]. This visualization provides insights into the hierarchical relationships and similarities between counties based on the selected socio-economic and health variables. The dendrogram’s branches represent clusters of counties, with the height of each branch indicating the level of dissimilarity between them. The lower branches correspond to more similar clusters, while higher branches indicate greater dissimilarity. The colors differentiate the five distinct clusters identified through the hierarchical clustering process. This hierarchical structure helps to understand the nested grouping of counties, revealing how smaller, more similar clusters combine to form larger, more diverse groups. By examining the dendrogram, one can infer the relative similarity of counties within and between clusters, aiding in the interpretation of socio-economic and health disparities across regions.
In hierarchical clustering, the dendrogram provides a visual representation of the clustering process, illustrating how data points (in this study, counties) are grouped based on their similarities. Each branch in the dendrogram indicates a cluster, with the length of the branch representing the dissimilarity between clusters. The topmost branches in the dendrogram, often shown in a distinct color (such as blue in our case), represent the broadest groupings of data points. These top branches merge clusters at a higher hierarchical level before reaching the specified number of final clusters.
In our analysis, we specified five clusters, and these are visualized by the colored branches (orange, green, red, purple, and blue) in the dendrogram. The different colors help distinguish the distinct clusters formed at the specified level. The top blue branches represent the initial broad groupings, which then split into more refined clusters as we move down the hierarchy. This hierarchical structure allows us to see both the broad and detailed relationships between counties based on their socio-economic and health-related characteristics, providing a comprehensive view of the clustering process.
Following the clustering analysis using K-Means and hierarchical clustering, certain metrics were utilized to evaluate the performance of each model. Additional information regarding the metrics used, and the results of the evaluation are provided in the
Appendix A.3 section, which provides the rationale for the conclusion to proceed with K-Means clustering results as a superior model for our analysis. Therefore, in all the following illustrations and analyses, the K-Means clustering results were leveraged.
We, additionally, were interested in identifying the geospatial patterns across the United States based on the clusters formed. Upon displaying the countries across five clusters on the U.S. map, we obtained intriguing results.
Figure 5 illustrates the geospatial distribution of counties across the contiguous United States, categorized by their K-Means cluster assignments. Each color on the map corresponds to a different cluster, allowing for a visual comparison of regional patterns in vaccine hesitancy and other socio-economic and health-related variables.
The map reveals distinct regional trends in cluster distribution. For instance, Cluster 5, predominantly located in the southern United States, exhibits higher vaccine hesitancy rates, lower socio-economic status, and higher percentages of minority populations. This suggests that public health interventions in these areas need to address specific socio-economic barriers and cultural factors to improve vaccination rates and health outcomes.
Conversely, clusters in the Northeast and West Coast regions tend to have higher socio-economic status, lower vaccine hesitancy, and better overall health outcomes. These regions may benefit from continued public health education and targeted efforts to maintain high vaccination rates. The geospatial distribution underscores the importance of tailoring public health strategies to regional contexts, considering the unique socio-economic and demographic characteristics that influence health behaviors and outcomes.
Additionally, the heatmap presented in
Figure 6 illustrates the scaled values of key socio-economic and health variables across the clusters identified through K-Means clustering. By scaling the variables between 0 and 1, the heatmap provides a normalized view of the data, enabling a straightforward comparison of relative values within each cluster. In this heatmap, darker shades of blue represent higher values, highlighting areas where certain socio-economic or health characteristics are more pronounced. For instance, clusters with darker shades for MMR vaccination coverage (HE-MV) indicate regions with better immunization rates. Conversely, clusters with darker shades for vaccine hesitancy rates in December 2021 (HE-VH) highlight regions with significant resistance to vaccination. The heatmap effectively captures the heterogeneity in socio-economic and health profiles across different clusters. It reveals how certain clusters might have higher median household incomes, lower percentages of uninsured populations, or different racial compositions, which are critical factors influencing public health outcomes. This visualization aids in identifying target areas for intervention and allows policymakers to tailor their strategies to the unique needs of each cluster, ultimately contributing to more effective public health responses.
Multiple Regression Analysis Results
The regression analysis was performed to identify the key socio-economic and demographic factors that significantly influence vaccine hesitancy across different clusters. Understanding these relationships can help policymakers and public health officials design targeted interventions to increase vaccination uptake and manage pandemic responses more effectively.
The regression analysis follows Equation (
1) and utilized a multiple linear regression model with vaccine hesitancy in December 2021 as the dependent variable. The independent variables are the variables listed in
Table 1 with the exception of the September 2021 health-related variables HE-VS and HE-CS due to high correlation with the corresponding December variables, and to avoid multicollinearity in the multiple regression analysis. Each cluster’s data were analyzed separately to account for the unique socio-economic and demographic characteristics within each group. It is important to note that the multiple regression analysis in this study focuses on within-cluster analysis to better evaluate the relationship among variables in each of the groups of counties.
The regression results provided in
Table 4 indicate that different variables significantly influence vaccine hesitancy in each cluster. For instance, in Cluster 1, the MMR vaccination coverage, COVID-19 death rate, and percentage of Republican voters were significant predictors of vaccine hesitancy. Higher MMR vaccination coverage was associated with lower vaccine hesitancy, while higher COVID-19 death rates and a higher percentage of Republican voters were associated with increased vaccine hesitancy.
In Cluster 2, higher education levels and median household income were significant factors reducing vaccine hesitancy. This cluster also showed that racial composition, specifically the percentage of Black population, had a significant positive association with vaccine hesitancy, indicating higher hesitancy rates among these communities. Cluster 3 highlighted the importance of socio-economic status, with median household income and education levels being significant predictors. Political affiliation, again, played a crucial role, with areas having higher Republican affiliations showing increased vaccine hesitancy.
The findings from the regression analysis underscore the multi-faceted nature of vaccine hesitancy. Socio-economic factors such as education and income, along with demographic characteristics like race and political affiliation, play critical roles in shaping vaccination behaviors. Public health strategies must, therefore, be multi-faceted and tailored to address these diverse factors. For instance, interventions in areas with high Republican affiliation might focus on bipartisan messaging to reduce vaccine hesitancy. In contrast, regions with significant minority populations may benefit from community-specific outreach programs that address cultural and historical factors contributing to hesitancy.
Overall, the regression analysis provides valuable insights into the determinants of vaccine hesitancy across different socio-economic contexts. These findings can inform targeted public health interventions, ensuring that efforts to increase vaccine uptake are both effective and equitable.
5. Discussion
The results of our clustering analysis offer several important insights into the socio-economic factors influencing COVID-19 vaccine hesitancy across U.S. counties. By examining the profiles of the identified clusters, we can better understand the complex interplay between vaccination behaviors and various socio-economic variables. This discussion section delves into the implications of our findings, the limitations of the study, and potential directions for future research projects.
5.1. Insights and Implications
The analysis reveals that socio-economic factors play a significant role in shaping vaccine hesitancy behaviors. Counties with higher median household incomes, higher education levels, and more diverse racial compositions generally exhibited lower vaccine hesitancy rates. This underscores the importance of socio-economic stability in facilitating vaccine uptake. Public health initiatives targeting low-income and low-education areas could benefit from addressing these socio-economic barriers to improve vaccination rates. The geographic distribution of the clusters across the contiguous United States reveals significant insights into the socio-political and demographic characteristics influencing COVID-19 vaccine hesitancy rates and case rates.
Cluster 5, predominantly located in the U.S. South, corresponds to counties with high vaccine hesitancy rates (approximately 47.81%) and moderate COVID-19 case rates (approximately 16,424 for December 2021). These counties also exhibit higher Black population percentages (approximately 35.99%) and moderate socio-economic status. This region’s vaccination uptake could be influenced by public health efforts targeted at minority populations and the presence of robust community health programs.
Cluster 3 is mainly found in large urban centers such as Boston, NYC, Miami, Houston, SFO, and LAX. Counties in this cluster generally show low vaccine hesitancy rates (approximately 24.43%) and moderate COVID-19 case rates (approximately 13,584 for December 2021). The socio-economic status in these areas is higher, with significant percentages of higher education (approximately 66.49%) and median household incomes. The lower Republican affiliation in these counties (approximately 34.83%) might suggest a correlation between political ideology and lower vaccine hesitancy, which has been observed in various studies. The presence of higher Black populations (approximately 15.10%) in these areas also indicates that socio-economic disparities and access to healthcare could play a significant role in vaccination and case rates.
Clusters 1 and 2 are primarily situated in traditionally Democrat and liberal areas, including parts of the Northeast, West Coast, and some urban centers. Cluster 1, with moderate vaccine hesitancy rates (approximately 45.75%) and case rates, is characterized by balanced racial compositions and moderate socio-economic status. Cluster 2, also found in Democrat regions, shows lower vaccine hesitancy rates (approximately 44.96%) and case rates but stands out with a higher socio-economic status, including significant percentages of higher education (approximately 49.13%) and median household incomes.
These clusters underscore the importance of socio-economic factors and political climates in shaping public health outcomes. The higher socio-economic status in these areas likely contributes to better access to healthcare resources and public health information, influencing vaccination behaviors and outcomes.
Overall, the spatial analysis of the clusters highlights the complex interplay between political affiliation, socio-economic status, and public health responses to the COVID-19 pandemic. The distinct regional patterns suggest that targeted public health strategies considering local socio-political and economic contexts are crucial for improving vaccination uptake and managing COVID-19 case rates effectively.
Our analysis highlights the significant role that socio-economic factors play in shaping vaccine hesitancy. The identified clusters reveal distinct patterns, with higher socio-economic status generally correlating with lower vaccine hesitancy. For instance, counties in Cluster 2, characterized by higher education levels and median household incomes, exhibit lower vaccine hesitancy rates. Conversely, Cluster 5, with the lowest socio-economic status, shows the highest vaccine hesitancy rates. This suggests that improving socio-economic conditions could be a key strategy in reducing vaccine hesitancy.
The geographical distribution of clusters also underscores regional disparities. Clusters with higher vaccine hesitancy rates are often found in areas with higher proportions of Republican voters, indicating that political affiliation significantly influences vaccination behaviors. Public health campaigns must consider these political and regional differences to design more effective interventions.
5.1.1. Racial and Ethnic Disparities
Our study highlights substantial racial and ethnic disparities in vaccine hesitancy rates. Clusters with higher percentages of Black and Hispanic populations showed varying vaccination behaviors. For example, Cluster 3, characterized by a significant Black population (15.10%) and lower socio-economic status, exhibited the lowest vaccine hesitancy rates (24.43%) but also faced moderate COVID-19 case rates. This indicates that, while vaccine hesitancy might be lower in some demographics, other factors such as healthcare access and underlying health conditions may still pose challenges. Tailored public health messages and interventions that address specific community needs are essential.
5.1.2. Political Affiliation
Political affiliation emerged as a crucial factor influencing vaccine hesitancy rates. Counties with a higher percentage of Republican voters tended to have higher vaccine hesitancy rates and higher COVID-19 case rates. For instance, Cluster 1 and Cluster 2, with significant Republican affiliations (68.13% and 66.45%, respectively), exhibited moderate to high vaccine hesitancy rates (45.75% and 44.96%). This finding aligns with previous research indicating that political beliefs significantly impact health behaviors and attitudes towards vaccination. Public health campaigns need to consider these political disparities and strive to develop bipartisan messaging that can resonate across different political landscapes.
5.1.3. Clusters and Public Health Interventions
The distinct profiles of the identified clusters provide valuable information for designing targeted public health interventions. For instance, Cluster 1, which exhibits moderate vaccine hesitancy rates (45.75%) and a relatively high socio-economic status, can serve as a model for best practices in addressing vaccine hesitancy through educational campaigns and improving access to vaccines. In contrast, Cluster 5, with the highest vaccine hesitancy rates (47.05%) and lowest socio-economic status, requires urgent and targeted interventions. Public health strategies in these areas should focus on addressing socio-economic barriers and enhancing community engagement to improve vaccine uptake.
Overall, the analysis emphasizes the need for tailored public health strategies that consider the socio-economic and political context of different regions. By understanding the unique characteristics of each cluster, policymakers can implement more effective interventions to reduce vaccine hesitancy and manage public health responses more efficiently.
5.1.4. Final Thoughts
A U.S.-national cluster analysis is necessary in the post-pandemic era to identify regions that are at risk for low vaccine uptake in future vaccination campaigns. Understanding the socio-economic and demographic factors that contributed to vaccine hesitancy during the COVID-19 pandemic can help public health officials design targeted interventions for future health crises. By recognizing these patterns, we can develop strategies to address vaccine hesitancy more effectively and ensure higher vaccination rates across the country.
The findings from our study provide valuable insights into the socio-economic and demographic factors that influence vaccine hesitancy. These insights can be used to inform public health policies and vaccination campaigns in the post-pandemic era. For example, areas identified as high-risk for vaccine hesitancy can be targeted with tailored public health messages and resources to improve vaccine uptake. Additionally, understanding the impact of political affiliation on vaccination behaviors can help in developing bipartisan public health campaigns that resonate across different political landscapes.
Additionally, the findings from this work can be generalized to other vaccination situations by applying the same clustering techniques and variable selection criteria. The socio-economic and demographic factors that influence vaccine hesitancy are likely to be relevant for other vaccines as well. By using a similar approach, public health officials can identify regions at risk for low uptake of other vaccines and design targeted interventions to improve vaccination rates. This generalizability makes our study a valuable tool for addressing vaccine hesitancy beyond the COVID-19 pandemic.
5.2. Limitations
5.2.1. Data Limitations
While the study leverages comprehensive data from multiple sources, there are inherent limitations. The accuracy and completeness of the data can vary, and missing values were handled by removing incomplete rows, which could introduce bias. Additionally, the data are static and reflect a specific time period; therefore, temporal changes in vaccine hesitancy rates and socio-economic factors are not captured. Future research could benefit from longitudinal data to observe how these factors evolve over time.
5.2.2. Generalizability
The findings are specific to the U.S. context and may not be directly applicable to other countries with different socio-economic and political landscapes. Comparative studies across countries could provide broader insights into global vaccination behaviors. Understanding how different health policies and cultural contexts affect vaccine hesitancy would help generalize the findings and inform international public health strategies.