1. Introduction
In a world grappling with pressing environmental challenges, the issue of environmental injustice has garnered heightened attention. Environmental injustice, often manifested as disproportionate exposure to environmental hazards or lack of access to ecosystem services for socially vulnerable communities [
1,
2], underscores the urgent need to identify and rectify disparities in access to a clean and healthy environment. However, despite ongoing efforts to address environmental injustices, biased institutions and historic exclusionary practices continue to perpetuate a climate of environmental inequity across the globe [
1,
2,
3,
4]. Moreover, conventional tools for detecting these injustices exhibit limitations, falling short of providing a comprehensive and timely understanding of disparities. These challenges stem from a reliance on traditional data sources and methodologies, which may fail to capture the intricate spatial and social dynamics that contribute to environmental inequalities in a timely manner [
5]. In the United States, federal mapping tools like EJScreen [
6] and state tools like New Jersey’s EJMAP [
7] draw on infrequently collected Census and environmental datasets to assess current environmental injustice levels. By relying on outdated data, these tools risk leading policymakers to misallocate program resources that could otherwise be directed to mitigate emerging inequities.
As an example, the city of Newark, New Jersey is well known as a community of color experiencing environmental injustice, due in large part to historical discrimination and the prevalence of hazardous facilities [
8]. The data required to identify this city as one experiencing environmental injustice are generated by state and federal entities who monitor known hazards. However, if a new hazard were to arise, community members would be required to attract the attention of these governing entities before action would be taken. In the case of Newark, lead contamination in drinking water was emerging as a concern for residents, but remediation actions were not taken until enough data were collected and shared, clearly identifying the problem [
9]. Simultaneously, data would not be collected by state or federal agencies until there was a clear reason to investigate. Even after data collection began, testing for lead contamination is a notoriously resource-intensive and geographically narrow process [
10], leading to a slow response that prolongs exposure. As a result, a reliance on traditional static measures for environmental injustice creates a false sense of security at a high level, despite the growing presence of an environmental hazard. This study aims to address these shortcomings by investigating the integration of remote sensing and social sensing big data, allowing for the creation of more relevant, farther-reaching environmental injustice detection tools.
In seeking to develop a tool centered around environmental justice, we must first establish a foundational understanding of the subject. In practice, environmental justice is often measured by the levels of injustice present, defined by a disproportionate exposure of environmental burdens or hinderance of access to ecosystem services for socially vulnerable populations [
11,
12,
13,
14,
15]. Social vulnerability refers to the negative impact sociocultural identity, socioeconomic status, and physical capabilities can have on an individual or community’s ability to respond to perturbations, often as a result of historic marginalization [
16]. When considering the economic, social, and political resources individuals rely on when preparing for and responding to catastrophic events such as natural disasters, it is clear that those who historically received less resources or were actively encumbered are at a further disadvantage from the outset [
16,
17,
18]. This phenomenon and environmental justice broadly are relevant at both the individual and community level [
16,
19].
To identify and measure environmental injustice, researchers traditionally draw upon existing datasets collected by outside sources or set out to collect new data. On the one hand, relying on existing datasets provides a larger quantity of information to be collected over a greater area for a lower price point, often at the cost of temporal relevance [
20,
21,
22,
23,
24]. On the other hand, producing new datasets allows researchers to control the frequency, methodology, and overarching intention of data collection given they have the resources required to commit to collection efforts. Collecting new datasets also offers an additional opportunity to explore more complex environmental justice topics, such as recognition and representation justice [
2,
13,
25,
26,
27]. These data collection limitations are universal in research, but are particularly salient in the realm of environmental justice, as delayed identification of injustices may result in prolonged suffering.
In this context, the field of remote sensing has emerged as a powerful ally in environmental justice research. The ability of remote sensing technology to capture high-resolution, spatially explicit data in near-real time has significantly enhanced our capacity to map environmental hazards and resources and assess their distribution across landscapes. This technology has bolstered the identification of potential hotspots of environmental injustice, shedding light on areas burdened by air pollution [
28,
29], natural disasters [
30], or compromised ecosystem services [
3], particularly through measures like the normalized difference in vegetation index (NDVI) [
21,
27,
31,
32]. This application is well conveyed in a review by Kshetri et al. [
33], examining four academic publications that leveraged remote sensing imagery to measure deforestation in the Global South. In all four cases, the authors used satellite imagery to trace deforestation rates over time and compared results to estimates made by the entities responsible for the damage. Each case study showed that vegetation loss was greater than what was expressed, leading to legal cases against the responsible party which ultimately resulted in reparations. In this way, remote sensing played a pivotal role in facilitating environmental justice. Similarly, authors Kolosna and Spurlock [
3] used remote sensing imagery to compare the distribution of urban tree cover and socially vulnerable communities, with particular attention on socio-political decision-making mechanisms. Results from this analysis showed a clear inequitable distribution of urban tree cover that, despite the prevalence of tree maintenance ordinances, was not acknowledged by local policy mechanisms.
However, while remote sensing has proven invaluable, existing approaches often lack the depth necessary to fully capture the complexities of environmental injustice. The conventional reliance on physical measurements and surface-level analyses risks overlooking the human experiences and perceptions of affected communities, inadvertently neglecting the socio-economic factors that intertwine with environmental disparities. It is against this backdrop that the potential integration of remote sensing imagery and big data such as social media assumes a novel and promising role. By integrating the spatial insights provided by remote sensing with the experiential narratives shared on social media platforms, an unprecedented opportunity arises to bridge the gap between objective environmental data and community-driven perspectives.
This emergence of social media platforms, particularly as a data source for researchers, is a relatively recent phenomenon which has created a new hub for socially sensed big data [
34,
35]. This term, big data, refers to large quantities of information that are often produced quickly and consistently over large geographic extents [
36]. While still in its early stages of development, scholars have begun to refer to big data like social media as a new form of remote sensing data, labeled social sensing data, with individuals serving as the ultimate “sensors” [
37,
38,
39]. These socially sensed datasets have been utilized for environmental justice investigations before, drawing from pictures posted online to social media [
22], locations associated with cellphone use [
36], and general social media application activity [
40,
41,
42]. Authors Xu, Jiang, Li, Zhang, Zhao, Abbar, and González [
36], for example, analyzed exposure to particulate matter by incorporating cellphone data into traditional pollution modelling strategies to bolster analytical capabilities. In this case, location information allowed the authors to infer individual home, work, and commute information, in turn providing a foundation on which to build a model for vehicle usage and subsequent particulate matter emissions.
Independently, remotely sensed information such as satellite imagery and socially sensed data can be used to establish a foundation for environmental justice investigations. However, the amalgamation of these datasets offers the potential to not only pinpoint environmental factors, but also to capture the human dimensions of vulnerability and resilience. This integrative approach, while a departure from traditional methods, carries the potential to inform a more comprehensive and robust understanding of environmental injustice in near-real time if necessary. By leveraging this interaction, this approach seeks to open the door to a future in which we may tap into the collective wisdom of communities, amplify their voices, and uncover hidden dimensions of environmental disparities that could reshape our strategies for equitable policy formulation and advocacy.
Previous research has leveraged the synergistic relationship between remote sensing imagery and socially sensed big data before. Wang et al., for example, used the interactions between imagery and socially sensed data to interpolate missing information in their poverty analysis [
43]. However, to the best of our knowledge, this intersection is yet to be adequately explored in the field of environmental justice. In recognition of this untapped potential, this study aims to discern the feasibility, opportunities, and implications of applying remote sensing imagery and social media data together in an environmental injustice investigation. More specifically, we seek to answer the following research questions: (1) Is there a discernible relationship between environmental justice factors inferred from remotely sensed earth observations, traditional governmental sources, and social sensors? (2) How can we best model this relationship using Twitter data, social vulnerability measures, and environmental factors? (3) What are the potential consequences of leaning on these remote sensing datasets to draw conclusions about environmental justice?
We hypothesize that a lack of ecosystem services and prevalence of socially vulnerable populations and environmental hazards will have a positive relationship with Tweets using environmental justice terms, demonstrating an awareness of injustices. To test this hypothesis, we conduct our analysis in two parts. First, we conduct a broader analysis, focused on building a model in which socially sensed environmental justice can be compared to environmental factors, including data drawn from remotely sensed imagery, and social vulnerability. We analyze the geographic relationship between frequency of Tweets with environmental justice terms and environmental justice factors at the Block Group and Tract level using Ordinary-Least Squares regressions and Spatial Autoregressive Models. Independent variables for blue space and green space are drawn from Landsat-8 imagery following an index selection process, allowing us to select the measure which best fits the model. We calculate our dependent variables as the frequency with which geolocated Tweets with environmental justice words appeared within each Block Group and Tract. These final Tweet counts are weighted by area and normalized using the Inverse Normal Transformed technique [
44].
Second, we further characterize the relationship between socially sensed big data and environmental factors by investigating Tweets during the state’s largest wildfire of the year. MODIS imagery is used to define the extent of wildfire smoke exposure within cloud structures and compare Tweets falling within its boundary to those outside. Air quality data and local news reports are then used to corroborate the extent and impact of smoke exposure. Finally, Tweets are reviewed for all three days of the fire to determine if, and to what extent, the topic of air quality or smoke is discussed each day.
The remainder of this article is broken into the following parts:
Section 2 outlines the data sources, methodologies, and the overall research process used in this study.
Section 3 outlines the statistical results of the analyses before
Section 4 draws conclusions based on the findings. We close with a further discussion surrounding the implication of the study and its limitations.
3. Results
3.1. Environmental Justice Awareness
We first present the results of our broader analysis, investigating the relationship between socially sensed environmental justice awareness, remotely sensed imagery, and environmental justice factors through an OLS and SAR model.
We sought to conduct our analysis at two resolutions, Tract and Block Group. After examining the linear relationship between independent variables and the INT transformed environmental justice Tweet counts, the Block Group level was deemed unfit for analysis. Despite the transformation, the high number of Block Groups with zero environmental justice Tweets highly skewed the model, resulting in insignificant linear relationships. At the Tract level, all assumptions were met after introducing quadratic terms for several variables to achieve linearity. As such, the remainder of our analysis focuses on the Tract level. Additionally, the indexes for green and blue space selected for our model were the NDMI and MNDWI, respectively. R-squared and AIC values suggested the NDMI performed better than the NDVI by a very small margin, while MNDWI was clearly the best fit for our model.
After reviewing the distribution of Tweets with environmental justice terms, it also becomes apparent that our reliance on point data to estimate the geographic source of Tweets has resulted in the MAUP. This bias is the result of the nature of Census boundaries, where study areas like Tract and Block Groups are drawn to create a relatively equal distribution of population estimates, resulting in areas of high population density displaying smaller Census polygons. This is clearly observed in our study area, where high-density urban areas in the east have smaller Block Groups and Tract, while low-density areas in the west have larger Block Groups and Tracts. As a result, although the highest number of Tweets are present in the eastern part of our study area, this density is lost when joining to Census polygons due to the relatively small size of the Tracts and Block Groups. Instead, it appears as if the distribution of Tweets is rather scattered, with large Tracts and Block Groups containing higher values. We intend for our area weighting of the dependent variables prior to the INT transformation to help control this bias.
With these details in mind, we turn to our Tract level OLS model’s results (
Table 5). First, we note that our control variable, urban levels, demonstrates a
p-value above 0.05. Examining the social vulnerability variables, the results show all coefficient estimates are negative. The variables for Black or African American, Hispanic or Latino, and households with an individual over 65-years-old are particularly of interest, with
p-values below our 0.05 alpha. The
p-value for education below high school graduate approaches this alpha level, but remains slightly above it. Looking next to the relationships between remote sensing variables, we see only the green space (NDMI) variable demonstrates a statistically significant
p-value with our social sensing dependent variable. Notably, this is also one of only two environmental variables demonstrating a significant
p-value and a negative coefficient, suggesting that as green spaces decrease, environmental justice Tweets increase. On the other hand, the
p-value for blue space (MNDWI) variable does not fall below the alpha. This may indicate a lack of concern or habitual ignorance, meaning residents do not often associate water bodies with environmental benefits. Finally, examining the additional environmental variables, AFV stations was the only measure with a
p-value below the 0.05 alpha, with transit stations only nearly falling above this threshold. Contaminated sites proximity, PM
2.5, and flood zone area are all not statistically significant, suggesting the public awareness of exposure to potential contaminants or association with flood risk and environmental injustice do not present strong ties in our investigation. Similar to blue space, we attribute this as a possible result of habitual ignorance, suggesting residents often are not aware of the proximity to contaminated sites, flood risk, or air pollution.
The spatial effects in these models provide further insights. We test our dependent variable for spatial autocorrelation in R using Moran’s I. We constructed a row standardized neighborhood list using the
nb2listw function from the ‘spdep’ package. We calculated a Moran’s I value of 0.2510 with a
p-value near zero, suggesting significant positive global spatial autocorrelation is present in the Tweet distribution. Examining the local Moran plot for our model, we further observe a positive trend in local spatial autocorrelation, suggesting clusters of high values surrounded by high values. As discussed, when spatial autocorrelation is detected, the regression assumption of data independence is violated and should be addressed to draw more appropriate conclusions. We used the Lagrange Multipliers (LM) test to select between a spatial lag or error specification as commonly practiced [
78]. The LM results point to a spatial error model (SEM) as the most appropriate for our analysis. This suggests the spatial dependency present in our model stems primarily from the model’s error term, pointing to spatial autocorrelation in factors not included in our analysis.
The SEM model results (
Table 6) show this approach appears to improve upon the standard OLS. The AIC for the OLS is 2312.2 compared to 2293.4 for the SEM. As the SEM AIC value is lower, we conclude that controlling for spatial autocorrelation has improved the estimation of the relationship between our independent and dependent variables. Looking more closely, the SEM model shows similar results to the OLS in several aspects, with some notable differences. For all variables, the positive or negative nature of the coefficient estimates for statistically significant variables did not change, suggesting that both models captured the essence of the EJ awareness and social and environmental factors. Examining the social variables, the magnitude of nearly every variable increased or shifted only slightly. The variable for households with an individual over 65 was the only notable exception, with its coefficient moving closer to zero to a greater extent than any other variable. Along this same line, the
p-value of nearly all variables decreased or stayed approximately the same. Median household income in particular now has a
p-value below the alpha. This again excludes the variable for households with an individual over 65, which had its
p-value increase by over ten-fold, while still maintaining marginal significance at a level just below the alpha. This suggests spatial dependency played a non-negligible role in the relationship between this variable and environmental justice Tweets. Controlling this phenomenon has reduced the impact of age in our model.
Turning to the remote earth observation and additional environmental variables, the observable changes to our model are relatively modest. For most variables, p-values changed only slightly. Notable exceptions in this case include the control variable urban area, which increased significantly, and transit stations, which dropped below the alpha. Along this same line, coefficient estimates remained virtually the same in most cases, with only slight variation between models. Overall, our results suggest the SEM has refined our model and provided some clarification regarding the relationships between environmental justice Tweets and our independent variables. While upholding most of the initial statistical relationship revealed by the OLS model, the spatial model provides a more robust calibration that is better suited for our data.
3.2. Impact of Smoke on Tweets
Next, we present results from our pointed investigation, focusing on scrutinizing socially sensed data produced throughout a natural disaster event.
Our analysis of this data during the Spring Hill Wildfire was intended to show how discussions on Twitter shifted spatially and thematically during a period of poor air quality stemming from a natural disaster. Utilizing the short-wave infrared and near-infrared bands of the MODIS imagery, we are able to clearly identify the path of smoke travelling north from the wildfire, passing primarily through the center and east of our study area. The extent and impact of the smoke is confirmed by the PM
2.5 data interpolated for the same day, with the areas falling within and directly adjacent to the smoke boundary demonstrating the highest concentrations. News reports further corroborate these data, with some indicating that smoke could be seen and smelled as far north as Bergen County (
Figure 1), located in the northeast corner of our study area [
79,
80,
81]. One report stated that air quality in Newark (Essex County, the eastern part of our study area) maintained unsafe levels over a 24 h period beginning in the evening on 30 March [
79].
A total of 53,006 Tweets were drawn from 30 March to 1 April 2019, but only 2398 had geographic coordinates and fell within our study area. Of these Tweets, 629 appeared within the smoke boundary digitized using the MODIS imagery. However, reviewing these Tweets showed little to no discussion on smoke, air quality, or the fire itself. In fact, in the whole study area, the word fire appeared 12 times, air appeared 3 times, and smoke appeared 3 times, but the content of these Tweets showed that none of them related to air quality or the Spring Hill Wildfire in any way. When including Tweets surrounding the study area, a total of 15,807 Tweets with geographic coordinates appear, but a similar trend emerges, with fire appearing 49 times, air appearing 28 times, and smoke appearing 27 times. Again, a manual review shows that no Tweets appear to reference the wildfire or air quality specifically, with users instead sharing sentiments like “Spring is in the air”. We finally returned to the original set 53,006 Tweets, many of which appeared outside of the study area or had no coordinate data. In this expanded dataset we found the word fire 172 times, air appeared 170 times, and smoke appeared 93 times, but manual investigation revealed only three Tweets relating to the fire or its smoke. These Tweets each mentioned the smoke disrupting visibility specifically, and one mentioned the Spring Hill Fire by name. No other Tweet’s content could be attributed to the subject.
Aligning with our broader analysis results, this further suggests that the general public’s awareness of air quality, especially as a result of a short-term event like the Spring Hill Wildfire, often does not stir many social sensing responses, at least on the Twitter platform. Short-term events, while having significant impact on environmental quality, especially for vulnerable communities, might not align well with the quick “mentioning” nature of social sensing platforms. Still, considering social sensing platforms’ wide reach and coverage, policymakers or environmental agencies might benefit from taking advantage of these data sources to purposefully advocate and disseminate information through the platforms, particularly during environmental events. This sort of activity seems to be lacking at present.
4. Discussion
To review, we intend for this study to answer the following research questions: (1) Is there a discernible relationship between environmental justice factors inferred from remotely sensed earth observations, traditional governmental sources, and social sensors? (2) How can we best model this relationship using Twitter data, social vulnerability measures, and environmental factors? (3) What are the potential consequences of leaning on these remote sensing datasets to draw conclusions about environmental justice? We will explore each of these questions in the discussion below.
4.1. Relationships between Environmental Justice, Remotely Sensed Imagery, and Social Sensing
First, we review our broader analysis, examining the relationship between remotely sensed earth observations, additional environmental factors, socially vulnerable populations, and socially sensed environmental justice activity. Given that the SEM specification provides a better fit for our data, we will focus our discussion on its results.
Beginning with the relationships between remote sensing datasets, we found that the MNDWI did not appear statistically significant, but the NDMI measure exhibits a significant negative relationship with the environmental justice awareness proxy of Tweet counts. While our investigation of spectral indexes showed that all vegetation measures demonstrated statistical significance in our model, NDMI was the best fit, narrowly surpassing even the popularly used NDVI. We contend that this ubiquitous statistical significance among vegetation indexes helps to answer our first research question by demonstrating a measurable relationship between social media big data and vegetation derived from remote sensing imagery. We have further found that, in the case of our own study, the NDMI acts as the measure of best fit for this modeled relationship, addressing research question two. However, there is room for improvement in this model, as the relationship between factors is complex and imperfectly reflects environmental justice realities. We will unpack this idea below in order to answer our third research question.
It is perhaps fitting that vegetation measures have a complex relationship with environmental justice. On the one hand, greenery is a popular topic culturally, politically, and socially. Broadly speaking, it plays an important role in environmental justice due to the insinuated positive mental and physical health impacts [
82], services provided by public open space [
83], and recorded inequitable distribution [
23,
84,
85], each of which were exacerbated during the global COVID-19 pandemic. With this in mind, we would expect ecosystem services at large to be a topic of popular discussion, justifying a higher volume of environmental justice Tweets. On the other hand, ecosystem services are not perfectly modeled by spectral indexes. For example, the popularly used NDVI [
58,
86,
87,
88,
89] is not always the most appropriate measure (as we have shown is this case in our own model) and is similarly, not a perfect proxy for evaluating ecosystem services.
Previous researchers have noted that indexes like the NDVI can be effective at communicating the presence and intensity of vegetation, but do not inherently provide information about the type of vegetation detected, the vegetation’s health, or the ecosystem service being provided as a result [
3,
57,
86]. To this end, the details of the relationship between environmental justice and an NDVI cannot be reasonably assumed without a deeper investigation. In a recent study by Schwarz, Berland and Herrmann [
86], greenery captured over the years by an NDVI in Toledo, Ohio was often associated with overgrown vegetation and unkempt lawns emerging at abandoned sites. In this case, socially disadvantaged populations were positively correlated with increases in housing vacancy rates and NDVI values. This meant that vegetation was higher in marginalized communities, but further investigation showed this vegetation was in the form of overgrown yards and unwanted weeds, neither of which would provide the ecosystem services often associated with exposure to greenery.
We are not implying that this relationship is the case for our own study, but rather that the role spectral indexes play on the presence of environmental justice or injustice is not perfectly modeled when relying solely on remotely sensed imagery. In fact, we contend that our results demonstrate the potential for pairing traditional remotely sensed earth observation datasets with social sensing information. Our study suggests that there is an opportunity to mitigate inaccuracies and biases emerging from spectral indexes like the NDMI and MNDWI by pairing measures with observations made by individuals on the ground. Whereas the extent of ecosystem services provided by vegetation cannot be extrapolated from our NDMI, it can be paired with social sensing sources like social media to identify areas where environmental justice discussions overlap. In our particular study area, this manifests in the negative relationship observed between greenery and environmental justice awareness, suggesting a very indicative phenomenon in highly urbanized areas. Less access to or presence of surrounding green spaces prompts people to discuss to a greater extent, suggesting a potential pattern of environmental injustice when examining the NDMI distribution in urbanized neighborhoods.
This synergistic relationship between social sensing and remotely sensed imagery is further reflected in our pointed investigation of smoke and wildfire impact on Twitter activity. The use of MODIS imagery to identify the path of smoke effectively outlined areas experiencing the densest coverage, as corroborated by PM
2.5 data interpolated for the same day. However, no Tweets anywhere within the study area appear to mention the Spring Hill Wildfire or the impacts on local air quality. Perhaps more surprisingly, this trend persists even when expanding Tweet criteria to include neighboring areas and Tweets without geographic data. News reports from the first and second day of the fire expressed that the smell of smoke was present even in New York City [
90], but our investigation fails to capture much of this sentiment, even when including Tweets in those areas. We consider two theories that may explain this lack of Twitter data.
First, this lack of discussion on environmental conditions in the face of smoke, wildfire, and reduced air quality aligns with the results from our broader analysis, which showed that flood risk, contaminated sites, and particulate matter were not statistically significant variables in our model. In other words, our analysis of environmental justice awareness showed that the risk of natural disaster and exposure to contamination failed to generate a social sensing response on Twitter. It is perhaps understandable, then, that a wildfire event that resulted in exposure to contamination in the form of smoke failed to trigger a social sensing response. This idea is further echoed by the findings of a study by Xu et al. on perceptions of air quality in Beijing [
91]. The authors interviewed individuals who had lived in the community for at least two years and found that 41 out of 43 residents recognized the air was polluted. However, 35 interviewees also expressed that they felt slight or no concern regarding the negative impacts of their surrounding air quality. The reasons provided for those feeling a lack of concern included reports that individuals felt powerless to make a meaningful change, that they were not experiencing immediate health impacts, and that there were simply more pressing concerns such as food security and housing costs, alongside several other explanations. In this case, community members demonstrated a knowledge of an existing environmental hazard, but generally did not deem the risk of concern.
It is also possible that the short-lived, dynamic nature of the event may have reduced social sensing responses. As factors like wind, rain, and fuel change, the dispersion and impact of smoke would follow suit, potentially getting better and worse each hour. Our analysis examined all three days of the Spring Hill Fire, but assuming smoke conditions were changing along with these characteristics, it is possible the negative air quality did not persist for long enough to warrant a response from Twitter users. Alternatively, sentiment expressed over Twitter regarding environmental qualities during the wildfire may in fact exist, but is simply not captured due to technical and methodological constraints. For example, the nature of the Twitter API is such that only Tweets from public accounts are accessible in order to maintain the privacy of users. As a result, an unknown number of Tweets from individuals with private accounts are omitted from our analysis. Similarly, the medium with which users discuss topics like air quality may not be explicitly conveyed through text, but rather a combination of images and emojis. In this way, a Tweet may acknowledge air quality or wildfire in a manner that would not be detected in our textual analysis despite its relevance. More complex expressions such as sarcasm and metaphors are also harder to detect and may result in missed Tweets.
Given that less than 0.01% of the 53,006 public Tweets textually referenced the environmental impacts of the wildfire, it is perhaps fair to say that it is unlikely these technical shortcomings would drastically change our analysis. In any case, our results clearly point to a disconnect between Twitter data and smoke exposure. These insights are important in the context of exposure to poor air quality and disaster risk, but nonetheless demonstrate a connection between remote sensing imagery and social media data.
4.2. Big Data, Additional Environmental Factors, and Social Vulnerability
We turn next to our additional environmental factors, allowing us to better contextualize the relationship between socially sensed environmental justice and traditional measures.
First, it appears there is a positive relationship between the urbanization control variable (urban land use land cover percentage) and Tweets of environmental justice, but the measure does not pose a statistically significant relationship with environmental awareness in either model. Considering the highly urbanized landscape in northern New Jersey, this result might not be as unexpected. Since urbanization levels in the study area are quite high, the variation in urbanization level, as expressed as the land use land cover percentage of impervious surfaces, might not be sufficient to provide statistical power to explain the variation of the Tweet counts relating to environmental justice.
The other environmental factors including PM2.5, flood zone, and proximity to contaminated sites all do not show statistically significant relationships with the environmental justice Tweet counts. We argue that this is likely because of the less tangible nature of these environmental factors. Indeed, compared to greenery, which is immediately tangible, having a sense of particulate matter in the air, percentage of flood zone, access to waterbodies, or proximity to contaminated sites is much less sensible. Twitter, as one of the many social sensing platforms, will have a lower sensitivity to capturing these factors in daily Tweets.
Regarding the negative relationship between AFV fueling stations and environmental justice Tweets, our results suggest that as AFV fueling stations grow nearer, the number of Tweets containing environmental justice terms goes up. In other words, a greater presence of investment in sustainability, captured as AFV fueling stations, is associated with a greater awareness of environmental justice levels. We argue that this relationship may be the result of an awareness of environmental factors generally in these communities. Whether this investment is the result of government, business, or community advocacy, it is fair to say that the presence of AFV fueling stations stems from support at some level. As a result, this interest and subsequent presence of AFVs on community roads likely fosters a sense of environmental consciousness [
92]. It is reasonable to assume that this consciousness would be reflected on social media, captured in our analysis as individuals discussing environmental justice topics.
Additionally, the explanatory variable for transit stations is a particularly nuanced measure, having a juxtaposed impact on communities. Community members near transit stations benefit from the additional mobility and reductions in air pollutants in the long-term [
93], but risk disproportionate exposure to harmful pollutants in the short-term. Moreover, the concentration of public transit stations might also point to a relatively dense and more socioeconomically vulnerable community [
94,
95,
96]. Our model suggests that despite the complex relationship between transit stations and our community, higher concentration of public transit stations certainly poses a tangible and sensible signal that heightened residents’ environmental justice awareness. This is demonstrated by the variable for transit station’s significant relationship with the count of environmental justice awareness Tweets.
At this point, we have established that there is an apparent relationship between obviously sensible environmental factors (greenery, AFV fueling stations, and density of transit stations) and the discussion of environmental justice on Twitter. However, our model results do not suggest that there is awareness of environmental injustice among socially vulnerable populations. Instead, the modeled results show that (
Table 6), as there is an increase in the proportion of population that is Black or African American, is Hispanic or Latino, lives in a household with someone over 65 years old, has a disability, or has below a high-school-level education, the discussion of environmental justice on Twitter decreases. Median household income is the only variable to suggest otherwise, with a negative coefficient value, meaning that as household income decreases, the number of Tweets using environmental justice terms increases.
Of the five social factors, the coefficients of Black or African American, Hispanic or Latino, households with an individual over 65, and median income show statistical significance (
Table 6). Of particular note, we recognize from our visual exploration of each variable in ArcGIS Pro that proportions of the population that are Black or Hispanic are disproportionately highest in urban centers found in the east of our study area, in areas such as Newark, Paterson, and other cities in Hudson County. As discussed, Newark in particular is a well-known environmental injustice community [
8]. As such, it would be expected that these known exposures to environmental injustices would correspond with an increase in environmental justice terms on Twitter. The negative coefficient estimates suggest otherwise.
We theorize there are two possible explanations for the negative relationships observed between the social factors and environmental justice Tweets. On one hand, these relationships may mean that there is a genuine lack of awareness or acknowledgement among these communities that environmental injustice is occurring. This may be the result of extended exposure to hazards and limited access to resources leading to a normalization of the experiences. In other words, individuals that are exposed to hazards or kept from resources adjust to the point of no longer recognizing or acknowledging their own plight, hence forming a habitual ignorance of their environmental injustice. In the case of communities like Newark that have been experiencing environmental injustice for decades, it is reasonable to assume individuals who grew up in these places would view their experiences as normal. If nothing else, community members would likely stop discussing negative things in their community on Twitter daily after experiencing them for their entire life. In either scenario, we would expect to see a lack of environmental justice Tweets.
As mentioned above, the result of Xu et al.’s study on public perception of air quality offers a similar explanation, as residents demonstrated a lack of concern for the subject as a whole despite knowledge of the risks [
91]. Interviewees expressed a sort of apathy, considering air quality a lower concern in most cases. It is possible we are witnessing a similar phenomenon in our own study, wherein individuals living in environmental justice communities like Newark and Patterson recognize the existence of injustice, but do not express concern. Whether they feel a sense of powerlessness, are prioritizing other concerns, or simply believe the factors are not worthy of attention, if residents feel the injustices they suffer are not a major concern, they would likely not Tweet about them.
On the other hand, the negative relationship we have observed in our model may also be a symptom of the MAUP. We see this phenomenon occur particularly in the highly urbanized zones in the east of our study area, which as mentioned also contain the highest proportions of people of color in our study. While we attempted to mitigate the impact of the MAUP by weighting counts by area, this did not correct for Tracts and Block Groups that were assigned zero Tweets despite the high volume of points in these urban areas. This bias may have resulted in an undercounting of Tracts and Block Groups with socially vulnerable populations, leading to a negative relationship between these variables and environmental justice Tweet count. With either explanation of these relationships, the result is the same—the voices of socially vulnerable populations are underrepresented in environmental justice discourse occurring on Twitter. This result merits further investigation to facilitate a more in-depth analysis and understanding of communities that are particularly vulnerable.
5. Conclusions
In an era characterized by unprecedented technological innovation and a growing urgency to address environmental inequities, the convergence of big data analytics and remote sensing offers an unprecedented opportunity to unravel complexities posed by environmental injustice faster than ever before. At this time, environmental justice is a profoundly important concern that is being increasingly placed at the top of policymakers’ agendas, but initiatives and policies are reactive in nature. The result is often a case of ‘too little, too late,’ leaving marginalized populations displaced or living in unsafe conditions for extended periods of time. Community-based science offers an alternative, allowing those experiencing the injustices in their daily lives to catalog and report the data themselves [
24,
97,
98,
99], but these efforts are narrow in scope and unfairly place the burden of proof on residents. Big data sources such as social sensing information derived from Twitter and remote sensing imagery may yet serve a dual purpose in this context, acting as an early warning system for injustices and a corroborating source for community scientists seeking to raise the alarm.
Within this context, this study ventures into unexplored terrain, where the realms of remote sensing imagery and social media data intersect, seeking to untangle the intricate relationships between environmental hazards, ecosystem services, and social vulnerability. Our analysis indicates that there is a negative relationship between the number of Tweets utilizing environmental justice terminology and the presence of ecosystem services in the form of green spaces (most effectively captured by the NDMI), suggesting a synergy between the datasets and a broad awareness of injustice. However, there is simultaneously a negative relationship between socially vulnerable populations and Tweets with environmental justice words. This suggests that, generally, there is discussion on Twitter about injustice when resources are not present, but the voices of vulnerable populations are often unaccounted for, very likely as a result of urban bias and a habitual lack of awareness or concern for injustices. This latter theory is echoed in the results of our case study, demonstrating a lack of discussion on Twitter during the state’s largest wildfire of the year. Overall, our findings suggest that a meaningful relationship is present between social sensing and remote earth observation data in the context of environmental justice, but capitalizing on this intersection may perpetuate inequities if precautionary measures are not taken.
Our research is not without its limitations. First, data and computational constraints limited the number of Tweets which could be drawn for this study and required data aggregation. With superior hardware, additional data, and sufficient knowledge of machine learning techniques, there may have been an opportunity to parse all words across a longer timeframe to fit a more precise model which could operate beyond the limitations of aggregation. Second, we recognize the niche, dynamic social media landscape presented by Twitter that likely introduces inequities into our model. We utilize Tweets containing environmental justice terms as a proxy for community discussion, assuming that this platform hosts honest conversations on lived experiences by a wide variety of community members. In reality, social media in general is likely to generate a wide variety of discussions, likely by young, technology-literate individuals with downtime and access to a smartphone or computer. In fact, some third-party reports have shown that around 62% of users are below the age of 34 [
100] and over 60% of Twitter users identify as male [
101].
Along this same line, the accuracy of coordinates collected for Tweets is relatively unclear. Twitter’s API website explains that coordinates come from GPS-enabled devices and should represent exact location, but can be assigned in some circumstances [
102]. Content produced on Twitter also varies significantly from user to user, and these habits are impacted by trending topics, world events, and even ownership of the platform itself, each of which are often unexpected and unpredictable. Our model attempts to capture a snapshot of Twitter activity and make assumptions based on the observed trends in that time period, but future trends are likely to differ in frequency, topic, and impact. Our exploration nonetheless suggests that there is great potential. For this reason, future researchers might consider utilizing other socially sensed big data sources as an alternative or supplement to social media. Sources such as internet search engines may be particularly well suited for this purpose, with tools like Google Trends being used to capture different facets of spatial human dynamics through online search query frequency [
103,
104,
105]. Although social engines offer opportunities to capture information generated by a more diverse userbase regarding their intentional information gathering efforts, as with all data sources, they present their own challenges, specifically in terms of the ambiguity related with the broad concept of environmental justice. For this reason, our current study did not incorporate the search engines data into our study, though an immediate next step will certainly attempt to validate the potential of search engine data in future environmental justice studies, as the awareness of environmental justice is rapidly growing.
Third, exploring environmental justice through remote sensing imagery analysis and Tweets is an endeavor that will vary geographically and temporally. Although it is a subject that is relevant across the globe, environmental justice is highly contextually dependent [
2], and as such, discussion (online and offline) on the subject will vary by location and time period. While New Jersey’s diversity and legal recognition of environmental justice might make for an ideal case study, the extent to which this subject is broached in other states likely varies, requiring a more careful examination.
Despite these limitations, we believe our research takes a vital step towards critically investigating environmental injustice in the era of remote sensing and big data. We expect the extensive nature of these data, paired with emerging remote sensing data acquisition technologies (drone and hyperspectral images in particular) and advanced analytical methodologies (such as Bayesian spatiotemporal analytical framework [
37]), may provide novel techniques for future research capturing information on these subjects. We simultaneously urge researchers to consider the equity implications stemming from big data in all its form, particularly regarding sources popular in research not analyzed here such as Google Trends [
106]. Overall, this interdisciplinary exploration not only signifies a leap forward in remote sensing science, but also heralds a new era of comprehensive insights that extend beyond the boundaries of traditional research methodologies. Our hope is that these advancements might pave the way for a future in which environmental justice is a priority in more than just name.