Next Article in Journal
A Multi-Scale Content-Structure Feature Extraction Network Applied to Gully Extraction
Previous Article in Journal
Damage Scene Change Detection Based on Infrared Polarization Imaging and Fast-PCANet
Previous Article in Special Issue
Evaluation of Urban Microscopic Nighttime Light Environment Based on the Coupling Observation of Remote Sensing and UAV Observation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Assessment of Systematic Errors in Mapping Electricity Access Using Night-Time Lights: A Case Study of Rwanda and Kenya

by
Tunmise Raji
1,
Jay Taneja
2,3,* and
Nathaniel Williams
1,3
1
INSYST Lab, Golisano Institute for Sustainability, Rochester Institute of Technology, Rochester, NY 14623, USA
2
STIMA Lab, Department of Electrical and Computer Engineering, University of Massachusetts-Amherst, Amherst, MA 01003, USA
3
Kigali Collaborative Research Center (KCRC), Kigali Innovation City, Bumbogo BP 6150, Kigali, Rwanda
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(19), 3561; https://doi.org/10.3390/rs16193561
Submission received: 15 August 2024 / Revised: 24 September 2024 / Accepted: 24 September 2024 / Published: 25 September 2024
(This article belongs to the Special Issue Nighttime Light Remote Sensing Products for Urban Applications)

Abstract

:
Remotely sensed nighttime light data have become vital for electrification mapping in data-scarce regions. However, uncertainty persists regarding the veracity of these electrification maps. This study investigates how characteristics of electrified areas influence their detectability using nighttime lights. Utilizing a dataset comprising the locations, installation date, and electricity purchase history of thousands of electric meters and transformers from utilities in Rwanda and Kenya, we present a systematic error assessment of electrification maps produced with nighttime lights. Descriptive analysis is employed to offer empirical evidence that the likelihood of successfully identifying an electrified nighttime light pixel increases as characteristics including the time since electrification, the number of meters within a pixel, and the total annual electricity purchase of meters in a pixel increase. The performance of models trained on various temporal aggregations of nighttime light data (annual, quarterly, monthly, and daily) was compared, and it was determined that aggregation at the monthly level yielded the best results. Additionally, we investigate the transferability of electrification models across locations. Our findings reveal that models trained on data from Rwanda demonstrate strong transferability to Kenya, and vice versa, as indicated by balanced accuracies differing by less than 5% when additional data from the test location are included in the training set. Also, models developed with data from the centralized grid in East Africa were found to be useful for detecting areas electrified with off-grid systems in West Africa. This research provides valuable insight into the characterization of sources of nighttime lights and their utility for mapping electrification.

1. Introduction

Access to electricity is an important catalyst for development, showing strong correlations with various economic and development indicators. Economic benefits such as the increased income of households and small and medium scale enterprises (SMEs), reduced unemployment rates in rural communities, and a broader range of economic opportunities have all been attributed to the economic stimulation abilities of electrification [1,2,3]. Furthermore, access to electricity has also been found to provide several important social benefits such as facilitating gender equality and women empowerment [4,5], improving education [5,6], reducing indoor air pollution [7,8], among others.
Harnessing the potential of electrification has emerged as a crucial objective for numerous developing countries and development finance institutions, as evidenced by the recent publication of electrification plans by several nations [9,10]. As the drive for achieving universal access to electricity intensifies, the necessity of accurately mapping the extent of electrification has become increasingly important. This mapping serves primarily to track electrification progress. For instance, despite the significant investment in this sector, electrification access tracking has revealed that the world is not on course to achieve universal access to affordable, reliable, sustainable, and modern energy by 2030, as outlined in the Sustainable Development Goal 7 (SDG7) [11]. Other reasons why electrification mapping is important is its role in identifying treated and control areas for assessing the impact of electrification [12,13], and developing national electrification plans [14,15,16]. Nightlight imagery lends itself well to these activities due to its high temporal resolution (nightly cadence), global coverage, and extensive historical record (from 1992 to date).
The conventional approach to determining electrification rates and mapping the extent of electrification has relied on grid data from governments as in Ratledge et al. [12], and periodic surveys such as the Demographic and Health Surveys (DHS) [17], Living Standards Measurement Surveys (LSMS) [18], Multi-Indicator Cluster Surveys (MICS) [19], and World Health Survey (WHS) [20]. However, multitemporal electricity grid asset location data are either unavailable or difficult to obtain from governments, while surveys can be expensive and time-consuming. In recent times, nightlight emissions have emerged as a cost-effective alternative for mapping electrification.
Although originally designed to detect clouds from moonlight for weather applications [21], these remotely sensed data has been shown to be remarkably versatile and have been successfully employed in various applications, including assessing trends in socioeconomic indicators [22,23,24], mapping and assessing the impact of conflicts and disasters [25,26,27,28], delimiting urban boundaries [29,30], and estimating population [31], among others. In the context of electrification, nighttime light (NTL) has been used to estimate electrification rates [32,33], detect rural electrification [34], predict the path of the grid [35], and estimate electricity consumption levels [36,37,38].
Mapping electrification with NTL fundamentally involves the classification of NTL pixels as electrified or unelectrified on the basis of their radiance. However, the use of NTL for mapping electrification is complicated by the fact that the data source is affected by clouds, atmospheric effects (e.g., effects of aerosols, water vapor, and ozone), and is sensitive to other sources of light that are not due to electric lighting such as biomass burning, stray-lights, gas flares, among others [39]. Some approaches have been developed in the literature to address these challenges including (1) setting a radiance threshold above which an NTL pixel is considered electrified [33,35,36], (2) developing supervised learning models to classify NTL pixels containing the locations of power systems assets such as transformers as electrified while those without these assets are classified as unelectrified [40], and (3) estimating the statistical confidence that a settlement is electrified by computing the number of nights that the settlement is brighter than the background [41].
There are several limitations to the approaches listed above. For the first approach, it is challenging to determine the appropriate threshold to use. For instance, various numbers have been used in the literature, and there does not seem to be an agreement among studies. Ru et al. [33] used a 0 μ W · cm 2 · sr 1 radiance as a threshold, while other studies have used thresholds ranging from 0.1 μ W · cm 2 · sr 1 to 0.35 μ W · cm 2 · sr 1 [35,36]. In the second approach, while machine learning has been used to automatically determine the appropriate thresholds that separate electrified from unelectrified areas, there is still uncertainty regarding the best temporal aggregation technique to use to process the NTL images. For instance, annual [33], monthly [36], and daily [40,41] aggregations have been used, yet no study has compared their performance.
While studies using NTL to track electrification often assess the performance of their models with metrics such as the Area Under the Curve of the Receiver Operating Characteristic (AUC-ROC) and the F1 score, an aspect that has not been sufficiently addressed in the literature is the systematic assessment of the misclassifications that can be expected from these electrification maps. For instance, Correa et al. [40] reported that the AUC-ROC for machine learning techniques such as Random Forest, MLP, and XGBoost was around 0.77 for the electrification mapping task. However, it remains unclear how decisions like the temporal aggregation technique and the characteristics of electrified pixels such as average electricity consumption and time since electrification affect their detection.
Systematic errors refer to consistent deviations of a measurement from its true value in a particular direction [42]. We apply this concept to evaluate how both image processing techniques and the conditions at the location to be mapped influence the accuracy of electrification access mapping with NTL. Some important questions that remain unanswered and that a systematic error assessment can address include:
  • Which temporal aggregation technique of NTL is best for electrification mapping?
  • How do characteristics of an electrified NTL pixel such as the number of electrified structures, and the average electricity consumption of electrified structures in the pixel impact its likelihood of being correctly classified as electrified?
  • How do electrification maps developed from ground-truth data from one location generalize to another location?
This assessment not only sheds light on the detection limit of this technique but can also improve confidence in the electrification maps produced with this approach while helping to identify potential sources of errors. This problem was previously approached in two studies [34,43] that assessed the feasibility of using NTL to detect rural electrification in Senegal, Mali, and Vietnam, and it was found that the correlation of NTL with electricity consumption is low and that every additional 70 streetlights or 270 electrified structures resulted in only a point increase in NTL radiance measurements. This finding suggests that NTL may not be suitable for detecting electrification in sparsely populated areas. However, the NTL images that were used in this study were from the Operational Linescan System onboard the U.S. Air Force Defense Meteorological Satellite Program (DMSP-OLS) [44].
In this study, we systematically assess errors in machine learning models for electrification mapping using NTL data from the Visible Infrared Imaging Radiometer Suite Day/Night Band (VIIRS-DNB) onboard the Suomi National Polar Orbiting Partnership (NPP), which has better spatial resolution, radiometric quantization, calibration, and geolocation accuracy compared to the DMSP-OLS [45]. We explore the impact of various model development decisions on model performance, including comparing the impact of the identification of electrified pixels with transformer and meter locations and examining the impact of temporal aggregation methods ranging from daily to annual scales. Our analysis also evaluates how characteristics of electrified pixels such as average annual electricity consumption and years since electrification affect classification accuracy. Lastly, we assess the transferability of electrification mapping models across Rwanda and Kenya and their effectiveness in detecting communities electrified by off-grid systems. Although the primary focus of the study is on on-grid electrification, which is still the main model of electricity in sub-Saharan Africa (SSA) [46], we also discuss the challenges in mapping areas powered by off-grid systems.

2. Data and Methods

This section begins with a brief overview of the study areas before detailing the datasets utilized in this research. It concludes by describing the models used for the systematic error assessment. Figure 1 gives an overview of the data processing and analyses performed. In summary, NTL data at various temporal aggregations were collected and combined with the location of electricity assets and uninhabited areas identified from a land use land cover product to create the training data. These data were used to train a random forest model to detect the electrification status of a NTL pixel for each temporal aggregation method. The inference of the model was combined with the characteristics of electrified pixels to analyze factors that increase misclassifications and assess the model’s generalization to unseen locations.

2.1. Study Areas

The focus of this study is Rwanda and Kenya, both located in East Africa. Rwanda lies approximately two degrees south of the equator, bordered by Uganda, Tanzania, Burundi, and the Democratic Republic of Congo. Administratively, Rwanda is divided into four provinces which are further subdivided into 30 districts. Kenya is at the equator and is bordered by Ethiopia to the north, Somalia to the east, Tanzania to the south, Uganda to the west, and South Sudan to the northwest. Under its 2010 constitution, Kenya is divided into 47 counties.
Kenya and Rwanda both experience a significant disparity in electricity access between urban and rural areas. As of 2022, both countries have an urban electricity access rate of 98.0%. However, rural access rates are significantly different, with Kenya and Rwanda having rural access rates of 65.6% and 38.2%, respectively [47]. This urban-rural divide highlights the ongoing challenges in expanding electrification to less developed areas in SSA.

2.2. Nighttime Light Data

One of the main sources of NTL composites used in research is the Earth Observation Group (EOG), which provides monthly and annual NTL composites at 15 arc-seconds resolution, approximately 500 m at the Equator [48]. Monthly composites are generated from daily VIIRS-DNB data by averaging filtered nightly data to remove influences from sunlit, moonlit, stray light, lightning, and cloudy pixels [39,49]. These are further processed into annual composites, which are considered cleaner as they minimize outliers and background noise by taking the median of monthly data, effectively reducing anomalies like biomass burning and other ephemeral NTL sources [49]. However, concerns remain that these composites might omit dimly lit areas, such as rural communities in developing regions [30,50]. Studies have reported that radiance from biomass burning could exceed 2 nW/cm2/sr [30], while the radiance from electrified areas in developing regions can vary from 1 to 10 nW/cm2/sr [50].
In this study, we will compare the performance of annual composites, quarterly composites (with each quarterly composite obtained by averaging over three monthly composites), monthly composites, and daily images for detecting electrified pixels in nightlight imagery. Recognizing the advantages of representing time series using descriptors of the data distribution, autocorrelation properties, stationarity, and entropy rather than the raw data itself [51], the model trained on daily composites utilized a range of features including the mean, variance, kurtosis, skewness, median, standard deviation, standard error of the mean, first quartile, third quartile, interquartile range, maximum, and minimum of the daily NTL images.
We also explored an option where instead of manually extracting features from the daily images with the statistical moments listed above, we investigated automatically extracting the features with the Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests “TSFRESH” time series feature extraction Python package [52]. This package implements 63 time series characterization methods resulting in the exploration of about 800 features. Only 428 statistically significant features obtained with this method were considered in the model.

2.3. Datasets of the Locations and Characteristics of Electrified Pixels

Three datasets containing information on electricity meters were provided by the Rwanda Energy Group. The first dataset includes the serial numbers and corresponding installation dates for about 777,000 electricity meters. The second dataset provides the serial numbers and locations of approximately 672,000 electricity meters. Lastly, the third dataset includes the serial numbers and electricity purchase records for about 789,000 electricity meters, spanning the years 2011 to 2020.
Merging the first two datasets based on the serial number of the meters provided us with a combined dataset with the location and installation date for 430,000 electricity meters. The spatial distribution of the first dataset (with serial number and installation date) and the merged datasets (with serial number, installation date, and location) are detailed in Figure A1 in the appendix. Next, we merged these data with the electricity purchase records data using the serial number of the meters as a unique identifier to obtain a dataset containing the location, installation date, and electricity purchase information for about 430,000 electricity meters. It should be noted that we were able to obtain electricity consumption records for every meter with location and installation date information and that the approximately 430,000 electricity meters were distributed across all 30 districts in Rwanda. See Table A1 in the appendix for the distribution of electricity meters across districts.
Additionally, the Rwanda Energy Group supplied data on about 8000 transformers, detailing their locations, installation dates, and both primary and secondary voltage levels. By analyzing this dataset, we identified roughly 1900 transformers as step-down transformers, selecting only those with a primary voltage higher than their secondary voltage. Only these step-down transformers were used to identify electrified pixels, as they serve as reliable markers of electricity access and consumption within an area. In contrast, the presence of step-up transformers does not necessarily indicate the presence or use of electricity, rendering them less relevant for this study’s focus on mapping electrification. Also, the location and installation date of about 57,000 step-down transformers in Kenya were obtained from Kenya Power and Lighting Company (KPLC). The voltage levels for the step-down transformers used in the countries are 11/0.4 kV.

2.4. Curating the Training Data

In this section, we discuss our approach to identifying the electrification status of NTL pixels. Electrified pixels in this study refer to pixels that contain built-up structures with access to electricity, while unelectrified pixels are those that either do not contain any built-up structure or contain built-up structures without access to electricity.

2.4.1. Identifying Electrified Pixels

Electrified pixels were identified as pixels that contain assets of electricity infrastructure components, such as transformers and meters. The spatial distribution of the transformer locations in both countries is depicted in Figure 2. The approximately 430,000 electricity meters and 1900 transformer locations in Rwanda correspond to approximately 18,100 and 1200 unique NTL pixels, respectively, while the approximately 57,000 transformer locations in Kenya correspond to 46,800 unique NTL pixels. These datasets enabled us to accurately identify electrified NTL pixels in both countries.

2.4.2. Unelectrified Location Data

In a previous study, unelectrified pixels were identified by assuming that pixels lacking electricity assets, such as transformers, were unelectrified [40]. However, due to incomplete transformer and meter data, and the absence of information on off-grid electricity assets like minigrids and captive power projects, our approach deviates from this. We adopt the methodology proposed by Min et al. [41], where unelectrified pixels were taken to be those in uninhabited areas or areas devoid of buildings. A similar approach was used to determine background noise levels in Falchetta et al. [36].
Land Use Land Cover (LULC) products and Human Settlement Layers are two remote sensing products commonly used to detect uninhabited areas. To implement the approach described above, it is crucial to accurately classify non-residential built-up areas as built-up to prevent the misidentification of electrified pixels as unelectrified. For instance, non-residential built-up areas such as roads often have streetlights which are detectable in NTL images, as noted by Min and Gaba [43].
We assessed the best remote sensing product for identifying uninhabited areas by evaluating the capability of two commonly used products: Global Human Settlement Layer (GHSL) [53,54] and ESRI Land Use Land Cover (ESRI-LULC) [55], to detect roads in Rwanda. The latest version of GHSL was produced for the epoch 2018 at a 10 m resolution from Landsat and Sentinel 2 images. We make use of the version of this product containing both residential and non-residential built-up surfaces. ESRI-LULC is produced annually from 2017 to 2022 from Sentinel 2 composites at a 10 m resolution and categorizes global land surfaces as one of 10 classes including built-up, trees, and water. We used the product produced for the year 2018 to align with the epoch of GHSL. The shapefile containing details on all national and district roads in Rwanda was assessed by the Rwanda Transport Development Agency.
Next, we assess how the roads were classified in the GHSL and ESRI-LULC products. Note, that within the ESRI-LULC product, the ‘built-up’ class (class 7) is used to signify areas of settlement, whereas all other classes are categorized as non-settlement zones. Table 1 shows that the ESRI-LULC product is significantly more accurate in detecting roads than the GHSL product. Consequently, we proceed with the ESRI-LULC product in this study. Unelectrified NTL pixels as defined as those containing trees (forests) or water bodies, as identified within the 2020 version of the ESRI-LULC product. See Figure A2 in the appendix for a map of the LULC product.
Due to the significant spatial resolution mismatch between the NTL images (≈500 m) and the LULC product (10 m), we ensured that the geographic area covered by each unelectrified NTL pixel was entirely covered by either the water or forest class in the LULC product. In Figure 3, we provide an illustration of the selection of electrified and unelectrified NTL pixels based on the description above. Lastly, Figure 4 shows the spatial distribution of the electrified and unelectrified pixels in Rwanda. As expected, we observe that the majority of the unelectrified locations are clustered around the national parks [56] and lakes [57], especially around Lake Kivu and Nyungwe National Park in the west of the country.

2.4.3. Obtaining the Characteristics of Electrified Pixels

The electrification date of each electrified pixel was determined by extracting the earliest installation dates among the meters located in each NTL pixel. Additionally, we recorded the count of meters contained within each pixel. We also calculated the annual electricity purchase for each meter by summing the electricity purchase records for each year. For meters with a purchase history spanning multiple years, we computed the average annual electricity consumption (taken to be reasonably estimated by the electricity purchase) over those years. Finally, for each electrified pixel, we determined the total annual electricity consumption by summing the average annual electricity consumption from all the meters contained within that pixel.

2.5. Training Machine Learning Model

The NTL pixels were classified into electrified and unelectrified classes using Random Forest, a versatile machine-learning algorithm that has been applied to a variety of remote sensing tasks. A random forest classifier is an ensemble of decision tree classification models in which each tree is created from a subset of the training data through bootstrapping [58]. A fraction of the subset is used to build the trees while the remaining, referred to as out-of-bag samples, are used for cross-validation to monitor how well the tree is learning. To make a classification with random forest, each decision tree predicts a class and the most common prediction is taken as the prediction for a given input feature.
Random forest has been effective for remote sensing classification and regression problems such as land cover classification [59,60,61], land surface temperature estimation [62,63,64], and tree cover mapping [65]. Random forest was chosen as the preferred machine learning model for this study based on its proven performance in previous research related to electrification mapping with NTL [40]. Random forest models are particularly advantageous due to their ability to mitigate overfitting, as each tree is trained on different subsets of the training data, their robustness to multicollinearity, high data dimensionality, noise, and outliers, as well as their internal estimates that enable the measure of variable importance [66,67].
Two sets of training data were created for the experiments conducted, each consisting of a set of features (radiance values of the NTL pixels) and a ground truth classification of the pixel (electrified or unelectrified).
  • First Set: This dataset was used to investigate the impact of temporal aggregation techniques. It includes:
    • Annual Composites: One annual radiance feature for each NTL pixel.
    • Quarterly Composites: Four features, one for each quarter.
    • Monthly Composites: Twelve features, one for each month.
    • Daily NTL Images: Statistical moments such as mean and median, and 428 additional features derived from the “TSFRESH” package.
    For this set, all electrified pixels were identified using meter locations to ensure a fair comparison across different temporal aggregations.
  • Second Set: This dataset was created using annual, quarterly, and monthly composites, but with two versions for each temporal aggregation technique: one identifying electrified locations using meters and the other using transformers.
In total, ten random forest models were developed—four for the first set of data and six for the second set.
After creating the training datasets, we tuned the hyperparameters of the random forest models during the training process. We used grid search with five-fold cross-validation to determine the best combination of hyperparameters. The following hyperparameters were tuned: number of decision trees (varied from 200 to 2000), maximum tree depth (10 to 100 in increments of 10), number of features to consider when looking for the best split at each node, minimum number of samples required to split a node, and minimum number of samples required at each leaf node. After identifying the optimal hyperparameters, we retrained the model and evaluated its performance using 10-fold cross-validation. This approach provided an unbiased estimate of the model’s generalization capability. NTL data from the year 2020 were used here because it is the latest electricity meter installation year in our dataset.

Performance Metrics

We evaluated the performance of the models using 10 metrics namely accuracy, balanced accuracy, precision, recall (or sensitivity), specificity, Area Under the Curve of the Receiver Operating Characteristic (AUC-ROC), Matthews Correlation Coefficient (MCC), and three variants of the F β score. The appendix contains the expression for all the metrics. Balanced accuracy is preferred to vanilla accuracy because it is less sensitive to data imbalance and is defined in Equation (1) where TP, FN, TN, and FP are the true positives, false negatives, true negatives, and false positives, respectively. The F β score allows us to assign different weights to precision and recall, as shown in Equation (2) where the β parameter determines the weight of recall in the score.
Balanced Accuracy = 1 2 TP TP + FN + TN TN + FP ,
F β = 1 + β 2 β 2 Recall + 1 Precision , Recall = TP TP + FN , Precision = TP TP + FP

2.6. Systematic Error Assessment and Generalization

The first part of the systematic error assessment involved comparing the performance of models trained on various temporal aggregation techniques (annual, quarterly, monthly and daily) and models trained on different sources of ground-truth data (electricity meters and transformers). Next, we systematically assessed the errors in the models by using descriptive statistics to explore the characteristics of the correctly classified and misclassified pixels of each fold of the cross-validation. This analysis was conducted on the model trained on meter locations in Rwanda since it was the only dataset with sufficient information useful for characterizing the NTL pixel. The impact of characteristics such as time elapsed since electrification, number of electricity meters in a pixel, and average annual electricity consumption in a pixel on the performance of the electrification mapping model were investigated. Lastly, we explored two types of generalizations: (1) generalization of models trained on one country to another, and (2) generalization of data trained on a type of ground-truth data to another.

3. Results

In this section, we start by presenting the performance of the electrification models trained on different NTL composites and on meter and transformer locations in Rwanda before investigating the impact of pixel characteristics. After this, we assess the generalization capabilities of the models.

3.1. Comparison of Temporal Aggregation Techniques

Researchers aiming to map electrification with NTL often have to decide on the temporal aggregation technique or the source of ground-truth data to use in their study. While annual composites may be easier to manipulate, there is a concern that as you move from daily to annual composites, dimly lit pixels and pixels that are not consistently lit throughout the year may become harder to detect. This section shows the result of our work investigating the impact of the temporal aggregation technique on the performance of the electrification mapping model.
Figure 5 shows the performance of the model trained on meter locations in Rwanda for various temporal aggregation techniques. Each boxplot illustrates the distribution of the metrics across the folds of the cross-validations. Here, the β in F β was set to 1 since the number of electrified and unelectrified pixels in the training dataset was about the same. Our analysis initially revealed a consistent trend: as the temporal resolution increased from annual to monthly composites, we observed higher scores across all performance metrics. This aligns with our expectations since the finer temporal resolution of monthly composites allows for better detection of dimly lit or sparsely populated areas. The appendix shows that the trend continues for the remaining six metrics.
The additional processing steps involved in generating quarterly and annual composites from monthly data may cause some loss of information, thus impacting the model’s performance to a certain extent. For instance, events such as bush burning, which occur at a monthly scale and may affect the electrification model, can be detected with monthly data but may be harder to identify once averaged into quarterly or annual composites. However, we observed a deviation from this trend when considering daily inputs. We attribute this departure to the inherent noise present in the daily-level readings, even after applying necessary corrections with potential sources of this noise including scattered moonlight, wildfires, fishing boats, and vehicles, among others. This trend remains consistent across the techniques used for processing daily images, whether through manual extraction of features or automated feature extraction using “TSFRESH”.

3.2. Comparison of Performance with Meter and Transformer Locations

Another decision to be made when developing NTL for mapping electrification is determining the electricity asset to use to identify electrified locations. A previous study used the location of transformers to identify electrified NTL pixels in Kenya [68]. In this section, we take advantage of our dataset that contains the location of both transformers and meters in Rwanda to determine which of these electricity assets is best for identifying electrified locations in an NTL imagery.
Figure 6 presents a comparison of the model’s performance when identifying electrified pixels based on transformer locations versus meter locations. Because of the difference in the number of pixels identified as electrified with meters and transformer locations, in this section, we use performance metrics that are less sensitive to data imbalance such as balanced accuracy, Area Under the Curve of the Receiver Operating Characteristic curve (AUC ROC), and F β scores thereby enabling a more comprehensive evaluation of the model’s performance. The appendix contains the results of the other six metrics.
We observe from Figure 6 that utilizing meter locations appears to give a better performance than transformer locations across the balanced accuracy, precision, recall and F1 score metrics. A likely reason for this is that the dataset based on transformer locations contains fewer instances of electrified pixels, resulting in a limited amount of information for the model to learn from.

3.3. Impact of Time Lapse since Electrification

There are several reasons why newly electrified pixels may be difficult to detect with NTL. One is that there may be a time lag between when a potential electricity user gets connected and when they are able to utilize electricity. For instance, a case study in Sri Lanka reports a lag of 5 to 7 years between the provision of electricity access and the formation of new businesses [69]. Also, the number of connected households or businesses and the average electricity demand in a newly connected community tends to increase over time as people become aware of the benefits that electricity access can provide [70]. Lastly, as the economic conditions of newly electrified areas improve due to the economic stimulating capabilities of electrification, the use of household lightbulbs and streetlights that can influence NTL sensors can be expected to increase gradually over time.
In Figure 7, we express the number of misclassified electrified pixels in a particular year as a percentage of the total number of pixels in the test dataset that became electrified during that specific year (we only present results obtained from the model trained on monthly composites to improve the readability of the figures, since it has been shown to give the best performance). This error analysis corroborates our hypothesis that newly electrified pixels are more difficult to detect with NTL. We observed a decline in the likelihood of misclassification as the number of years since electrification increased.

3.4. Impact of Number of Meters

Here, we delve into the relationship between the number of meters within a pixel and its likelihood of being accurately classified as electrified. Our analysis reveals an interesting pattern in the distribution of meters in the full test data and the data containing only misclassified electrified pixels. In Figure 8, we observe a more right-skewed distribution of meters in the test data compared to the error data (misclassified pixels) indicating that pixels with fewer meters are more likely to be misclassified. This is also depicted in Figure 9 where we show that the percentage of correctly classified electrified pixels increases as the number of meters in a pixel increases. It is worth highlighting that no electrified pixel with more than 140 m was misclassified as unelectrified in this study.

3.5. Impact of Electricity Consumption

In this section, we examine the relationship between the electricity consumption within a pixel and its detectability. The electricity consumption in a pixel is calculated as the sum of the average annual electricity consumption across all meters within that pixel. In line with the findings observed with the number of meters in a pixel, our analysis reveals a comparable pattern when considering the annual electricity purchases of pixels. Figure 10 demonstrates that pixels with higher annual electricity purchases exhibit a lower likelihood of misclassification. Interestingly, we observe a steady increase in accuracy up to the 120–140 kWh consumption bin, after which it plateaus. This plateau may suggest that beyond a certain threshold, further increases in electricity consumption no longer correlate strongly with outdoor lighting, which is primarily captured by NTL imagery [34]. It is likely that after the initial few kWh, electricity is increasingly used for indoor activities rather than outdoor lighting, leading to a diminishing correlation between electricity consumption and NTL radiance. However, it is important to note that there are fewer pixels consuming more than 120 kWh per year, resulting in higher variance in error rates for these higher consumption bins due to the smaller sample size.

3.6. Feature Importance

Next, we rank these characteristics to see how important they are to the likelihood of correctly classifying a pixel. To do this, for each fold, we subset the test data to contain only electrified examples. Then, we label each example as ‘correct’ or ‘incorrect’ depending on whether it was correctly classified or not. After this, we trained a random forest model based on the assigned label with the characteristics of the examples such as year of electrification, number of meters in the pixel, and the sum of the average annual electricity purchased by meters in a pixel serving as features.
Next, we assess the importance of each of the features with the Mean Decrease in Impurity (MDI) and the Drop-Column feature importance techniques. MDI technique measures the importance of features by evaluating how it reduces impurity during tree splits. The impurity reduction is calculated for each split for each feature and then averaged over all trees in the random forest model [71]. The Drop-Column technique involves first establishing a baseline performance with all the features and subsequently excluding a feature, retraining the model, and then recalculating the performance of the model. The importance of a feature is determined by the difference between the baseline score and the score obtained when that feature was excluded [72]. Both methods identified the total average annual electricity purchased of all meters in each pixel as the most important of the features that determine the correct classification of an electrified pixel. The result of the MDI technique is shown in Figure 11.

3.7. Generalization Assessment

3.7.1. Generalization to Unseen Grid Connected Areas

Given the scarcity of ground-truth data on electrification status in sub-Saharan Africa, we took advantage of our unique dataset that includes data from Kenya and Rwanda and explored the transferability of electrification models trained on data collected from one country to another country. Electrified locations were only identified with transformer locations in both countries to allow a fair comparison. Our approach involves initially training a model using the complete dataset from one country (Country A), followed by evaluating the model’s performance on another country (Country B). In each iteration, we progressively added a larger portion of the data from Country B to Country A and subsequently trained a model on the expanded dataset. Then, we assessed the model’s performance using only the remaining data from Country B to avoid data leakage. The addition of data from country B to country A was made in increments of 10%. Since a complete year’s worth of NTL data have been available on the VIIRS-DNB system since 2013, we also explored the impact of using data from more than one year for the classification. That is, we compared the performance of models trained on data from only 2020 with that of models trained on data from 2013 to 2020. The results of these analyses are presented in Figure 12.
Our first observation was that the baseline performance of the models (that is, when the model is tested and trained on data from the same country) varies significantly between countries. Additionally, models developed with data spanning 8 years outperformed those developed with data from only one year. Specifically, the models achieved balanced accuracies of 98% and 84% in Kenya and Rwanda, respectively, when trained on NTL data from 2013 to 2020. Conversely, they achieved accuracies of 94% and 72% in Kenya and Rwanda, respectively, when trained solely on NTL data from 2020.
The differences in performance between countries may be due to variations in data composition and country-specific characteristics. For instance, in the Kenya dataset, the most recent transformer installation dates back to 2017, whereas in Rwanda, the latest installation was conducted in 2020. This disparity can impact the model’s performance, as earlier results in this paper have shown that the time elapsed since electrification can influence the accuracy of classifications. Another reason for this could be the correlation between electricity consumption and installation date.
Fobi et al. [70] showed that newly connected electricity users consume less electricity than older customers. Furthermore, the difference between the average radiance of electrified and unelectrified pixels appears to be higher in Kenya than in Rwanda, likely due to the fact that the average annual electricity consumption in Kenya is more than double that of Rwanda [73]. The impact of these site-specific characteristics is that it is easier for the model to correctly identify the electrification status of NTL pixels in Kenya than in Rwanda.
Secondly, we observe that the performance of a model trained solely on data from one country experiences only a slight degradation when tested on data from another country. For instance, a model trained on Kenyan data achieves a balanced accuracy of approximately 80% when tested on Rwandan data, while a model trained and tested exclusively on Rwandan data achieves an 84% balanced accuracy. Similarly, a model trained on Rwandan data achieves a balanced accuracy of approximately 95% when tested on Kenyan data, while a model trained and tested solely on Kenyan data achieves a 98% accuracy. This observation not only suggests that models trained on data from one country may generalize reasonably well to others but also shows that the performance of the model will depend strongly on the separability of electrified and unelectrified pixels in the country of interest with the separability of the pixels being easier in Kenya than in Rwanda due to the several factors discussed above.
Additionally, we noted that incorporating additional data from the test location into the training data does not lead to significant improvements in the models’ performance scores. This remains consistent regardless of the NTL data used for model training, whether based on one year or multiple years’ worth of data and regardless of the performance metric used, as detailed in the appendix. This finding suggests that location-specific data may not be necessary when developing electrification mapping models based on NTL.

3.7.2. Generalization to Unseen Off-Grid Connected Areas

We also investigated the applicability of a model trained on data collected from grid-connected assets in one country for detecting off-grid electricity assets in other countries. We leveraged the West African mini-grids location dataset, as provided by the ECOWAS Center for Renewable Energy and Energy Efficiency (ECREEE) [74]. This database contains details such as the location, system configuration, and system capacity of 392 minigrids in West Africa, the spatial distribution of which is shown in Figure 13. Only 341 of these minigrids were operational as of 2020. We trained a random forest model using the combination of the datasets used in the previous section. This comprises electrified 2020 NTL pixels identified with transformer locations in Rwanda and Kenya and unelectrified 2020 NTL pixels identified from forest and water bodies in the countries.
Subsequently, we assessed the model’s performance in identifying operational minigrids within the ECREEE data. Since the dataset on which the model is to be tested only contains the location of minigrids (that is, electrified locations), we needed to obtain the locations of unelectrified pixels to enable us to calculate more than the detection rate (error of omission). We obtained unelectrified locations using the approach described earlier, where unelectrified NTL pixels were identified from the 2020 ESRI-LULC product. Since the minigrids cut across 15 countries, we randomly selected 341 NTL pixels (to align with the number of minigrids) containing only water in an area that is in the middle of the geographical extent of the minigrids in the ECREEE dataset.
Additionally, since the ECREEE database also includes details of the distribution grid in West Africa and the locations of communities that are currently unelectrified and expected to be electrified with mini-grids by 2030, we explored an alternative method to identify unelectrified NTL pixels. This method involves identifying pixels that contain these communities to be electrified but do not include the distribution network. Table A2 in the appendix provides details on the number of installed and operational mini-grids in each ECOWAS member state and the area from which we identified the unelectrified pixels. Figure 14 shows the result of the assessment for both techniques of identifying unelectrified areas.
The model had balanced accuracy between 75% and 63%, depending on how unelectrified locations are identified, and a sensitivity of 67%. This implies that 227 of the 341 minigrids were successfully detected. These balanced accuracies are lower than what was obtained when detecting transformers and meters in Rwanda (see Figure 6), which may be due to the fact that the average illumination levels of on-grid electricity assets (transformers and meters) are significantly higher than those of off-grid electricity assets (minigrids), as presented in Figure 15.
The observed variation in specificity in Figure 14 suggests that identifying unelectrified areas using LULC products resulted in fewer false positives compared to the approach of determining electrification status based on the presence of a distribution network in NTL pixels. This discrepancy is likely attributable to potential incompleteness or gaps in the distribution network data, which were utilized to identify unelectrified areas by their absence within NTL pixels. It may also stem from the possibility that areas designated for electrification by 2030 (also used to identify unelectrified areas) may already have alternative sources of electricity, such as diesel generators, thereby potentially confounding the electrification model’s predictions.
Given that the ECREEE dataset includes details on the capacity and configuration of the minigrids, we investigated how the capacity of a minigrid can impact its detectability with NTL. We standardized the generating capacities of mini-grids, which incorporate diverse sources with varying capacities, by calculating both the maximum and cumulative generating source capacities in kilowatts (kW). This normalization was achieved by assuming a conservative power factor of 0.8 when converting generating assets rated in kilovolt-amperes (kVA) to their kW equivalents, reflecting a scenario where loads are predominantly inductive. For example, in the case of a minigrid equipped with a 5 kWp photovoltaic system and a 10 kVA diesel generator, the maximum system size would be 8 kW ( 0.8 × 10 kVA), and the cumulative capacity would be 13 kW (5 kW + 8 kW). Our findings revealed that the capacity of the generating source significantly influenced the likelihood of successfully detecting the minigrids. In particular, the average maximum and cumulative generating source capacity of undetected mini-grids was 39 and 46 kW, respectively, while that of the detected mini-grids was notably higher at 72 and 105 kW, respectively. It should be noted that the findings were robust and not sensitive to variations in the assumed power factor.

4. Discussion

In this section, we discuss the practical applications of the methods developed and implemented for mapping electrification. We also highlight several caveats that must be considered when interpreting these maps.

4.1. Applications of the Proposed Method of Mapping Electrification

This study implements a method for mapping electrification, specifically applicable to regions with limited data. Mapping electrification in these regions is important to a range of stakeholders, including governments and international development organizations. Traditional methods of obtaining this information have typically involved surveys, which can be resource-intensive, and the collection of data directly from governmental sources, which can be challenging. In contrast, our proposed approach uses freely available NTL data. The method achieves an F1 score as high as 89% (using monthly NTL composites) and a balanced accuracy of about 87%, demonstrating its effectiveness in monitoring electricity infrastructure expansion. Considering recent calls to address inconsistencies in reported electrification statistics in SSA [75], the methods proposed in this study can be used to validate electrification estimates and supplement incomplete data.
Beyond obtaining electrification information for a particular year, developing multitemporal electrification maps to determine exactly when a community got access to electricity is perhaps even more difficult. However, this information is important for researchers as they attempt to assess the impact of electrification, which requires identifying treated and control units. The proposed approach can facilitate the development of this map since NTL is available for the entire globe at a relatively high cadence even after temporal compositing. This is particularly important with ex-post impact assessments where the assessment is being conducted years after an intervention has been deployed.
To demonstrate the utility of the proposed methods for impact assessments, Figure 16 presents a multitemporal electrification map of Rwanda, developed using NTL data spanning from 2014 to 2023 processed with the methods described above. The map illustrates the expansion of electricity infrastructure, initially concentrated in Kigali in 2014, extending to the greater Kigali area by 2020. Additionally, it captures the development of road infrastructure with their associated streetlights, particularly the expansion of new roads in the southwestern part of the country around 2020.

4.2. Implications of Findings from Systematic Error Assessment

The findings of this study have significant implications for the application of NTL data in electrification mapping. By elucidating the relationships between model performance and the characteristics of NTL pixels, this study contributes to a deeper understanding of the key factors that influence the successful detection of electrified and unelectrified areas using NTL data.
A key implication of the findings of this paper is that caution should be exercised when utilizing NTL data to map electrification. Several studies have employed NTL to monitor progress in SDG7 [32,76]. In particular, in [32], the authors compared the differences between electrification rates derived from NTL data in certain countries to rates obtained by the International Energy Agency (IEA). The IEA estimates were based on administrative data acquired from the Energy Ministries of each respective country [77]. Discrepancies in electrification rate estimates were identified for some of these countries, notably Thailand, China, and Cuba.
Our research sheds light on potential reasons for these disparities, such as population density in specific locations, electricity consumption levels, and the aggregation method of NTL data, among others. Consequently, we advocate for a cautious interpretation of electrification maps derived from NTL data, taking into account their limitations and suitable applications. Furthermore, this study provides empirical evidence that using NTL for mapping electrification may not be effective for newly electrified areas with few meters since time elapsed since electrification and the number of meters in a pixel are two key characteristics that have been shown to impact the likelihood of correctly classifying an NTL pixel. Our assessment of the model performance in detecting off-grid systems, which was less effective than when detecting on-grid systems, supports the notion that this approach may miss some off-grid systems.
Lastly, the generalization assessment conducted in this study indicates minimal changes in model performance when additional location-specific data are added to the training dataset. This implies that location-specific data may not be necessary for developing electrification maps using the proposed method. Although the study focuses on Kenya and Rwanda, this approach, and our observations and conclusions based on the performance of the model, should be applicable to other developing countries in need of accurate electrification maps, given that NTL data are globally available and can be refined to exclude non-electrification light sources.

4.3. Limitations and Future Work

One notable limitation of the electricity assets datasets is their lack of comprehensiveness in that they do not include all the electricity meters and transformers in Rwanda and Kenya. Consequently, NTL pixels devoid of meters or transformers cannot definitively be classified as areas without electricity access. We approached this challenge by obtaining the location of unelectrified pixels from a LULC data product. This method for detecting unelectrified NTL pixels is limited to only a certain type of unelectrified NTL pixels, that is, those that do not contain any built-up structure, excluding those that contain built-up structures but do not have access to electricity.
Moreover, although the accuracy of the ESRI 2020 LULC used in this study was reported by its producer to be about 85% [55], its accuracy has been assessed in specific locations such as Syria, Morroco, and Vietnam to range from 73% to 84% [78,79,80]. However, compared to other 10 m LULC datasets such as Dynamic World and World Cover, the ESRI-LULC product was found to have the highest accuracy (75%) when tested on ground-truth data with global extent [81]. We account for the potential misclassifications in the LULC product by manually examining with Google Maps pixels identified as unelectrified with the LULC product but that have high radiance and excluding those that contain built-up areas.
Furthermore, as highlighted in Section 4.2, this approach for mapping electrification may miss out on recently deployed off-grid systems with few connections and limited electricity consumption, which are factors that we have shown can impact the detectability of an electrified location. We propose that future work explore alternative sources of data to detect these off-grid systems. For instance, solar photovoltaic arrays, commonly used in off-grid electrification in SSA, have been shown to be detectable with high-resolution aerial imagery [82]. Combining NTL (for on-grid systems) with daytime images (for off-grid systems) could improve the accuracy of the electricity maps.
Another set of limitations involves factors that can impact outdoor lighting. For instance, the year of interest for our study, 2020, coincided with the COVID-19 pandemic, during which economic activities significantly declined in many areas of the world, leading to decreased external lighting [83]. This may have made it more difficult to detect electrified areas. On the other hand, spurious light sources such as vehicle headlights could cause non-electrified locations to be incorrectly identified as electrified. While we believe that averaging performed when creating monthly and annual composites may mitigate some of these issues, it is still important to acknowledge them as potential sources of misclassifications.
Lastly, the VIIRS-DNB instrument was not originally designed for mapping electrification, which necessitates the consideration of several caveats when it is applied for this purpose. The interpretation of radiance values requires attention to the spectral characteristics of light sources since different spectral compositions can produce varying radiance values, even with similar power outputs. Also, the instrument’s spectral sensitivity, ranging from about 500 nm to about 900 nm, is unlikely to include blue light emissions (380 nm to 500 nm) common in light-emitting diode (LED) lamps, potentially resulting in inaccuracies when detecting urban lighting with NTL and biasing the mapping of illuminated areas. With the ongoing shift from traditional lighting technologies such as High-Pressure Sodium (HPS) and fluorescent lamps to LEDs [84], these inaccuracies can be expected to increase. Future work will evaluate the impact of these shifts in lighting technology on electrification mapping with NTL imagery.

5. Conclusions

In this study, we conducted a comprehensive systematic error assessment of nighttime light-derived machine learning-based electrification models. Our findings reveal several key factors that influence the model’s performance. These factors include the time elapsed since electrification, the density of electricity assets (such as meters and transformers) within nighttime light pixels, and the electricity purchase history associated with the meters in those pixels. We demonstrated that when developing this model, the locations of electricity meters are more effective than transformer locations in accurately identifying electrified pixels within nighttime imagery, particularly in regions with low electricity adoption rates like sub-Saharan Africa. We also show that the aggregation of nighttime light data at the monthly level appears to be the optimal aggregation technique for electrification mapping. Our results also indicate that the model exhibits high transferability to unseen areas and that the model has a decent performance in detecting both off-grid and on-grid communities. We validated this by testing the model trained on data from Kenya on Rwanda, and vice versa, and testing the model trained with on-grid data on off-grid data.
Overall, this research provides valuable insights into the performance and factors influencing the accuracy of nighttime light-derived machine learning electrification models. These findings have important implications for improving electrification assessments and planning efforts, particularly in regions with limited access to reliable electricity data. Based on the performance of the developed models, we identify two innovative applications for nighttime light-derived electrification maps: first, for identifying treated and control units when assessing the impact of electrification interventions, and second, for filling gaps in national-level electrification statistics.

Author Contributions

Conceptualization, T.R., J.T. and N.W.; methodology, T.R., J.T. and N.W.; software, T.R.; formal analysis, T.R.; investigation, T.R., J.T. and N.W.; writing—original draft preparation, T.R.; writing—review and editing, T.R., J.T. and N.W.; supervision, J.T. and N.W.; funding acquisition, J.T. and N.W. All authors have read and agreed to the published version of the manuscript.

Funding

We acknowledge the generous support of the Rockefeller Foundation, grant number 2021 PNC 010.

Data Availability Statement

The transformer and meter location presented in this article are not readily available because of their sensitive nature. Nighttime light data can be obtained from the Earth Observation Group (https://payneinstitute.mines.edu/eog-2/viirs/, accessed 6 August 2024). Land Use and Land Cover maps (ESRI and GSHL) can be downloaded from their respective providers (https://data.jrc.ec.europa.eu/dataset/9f06f36f-4b11-47ec-abb0-4f8b7b1d72ea, accessed 6 August 2024; https://livingatlas.arcgis.com/en/home/, accessed 6 August 2024). The shapefile containing details on all national and district roads in Rwanda can be assessed from the Rwanda Transport Development Agency (https://www.rtda.gov.rw, accessed 23 November 2023).

Acknowledgments

The authors thank REG and KPLC for providing access to the data without which this study would not have been possible. We also appreciate Zeal Shah for his assistance in the processing of the daily nighttime light images.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Supplementary Data

In this section, we describe the ground-truth data used in our study. This includes the spatial distribution of the dataset with only location information, as well as the datasets containing both location and installation dates. We also show the geographical coverage of the forest and water classes within the Land Use Land Cover (LULC) product that was used to identify unelectrified locations.

Appendix A.1. Distribution of Electricity Meter Data

Table A1 presents the counts of electricity meters, categorized by those with location information and those with both location and installation date information, across each of Rwanda’s districts. Subsequently, we successfully acquired corresponding electricity purchase data for the roughly 430,000 m with both location and installation date information. Figure A1 displays the spatial distribution of these datasets, revealing coverage across all 30 districts in Rwanda. This ensures that the merging process is unlikely to introduce any bias into the electrification access modeling.
Table A1. Table showing the number of electricity meters in each of the 30 districts in Rwanda.
Table A1. Table showing the number of electricity meters in each of the 30 districts in Rwanda.
#DistrictNumber of Meters with Location DataNumber of Meters with Location and Installation Date Data
1Bugesera23,40322,360
2Burera14,1874854
3Gakenke13,3613646
4Gasabo79,04377,637
5Gatsibo17,6677343
6Gicumbi17,48114,393
7Gisagara90051958
8Huye17,49815,947
9Kamonyi15,51614,717
10Karongi12,49611,023
11Kayonza18,7164378
12Kicukiro50,96550,209
13Kirehe17,96930
14Muhanga16,97116,231
15Musanze27,34326,274
16Ngoma16,28239
17Ngororero12,33378
18Nyabihu13,109337
19Nyagatare27,70322,777
20Nyamagabe965633
21Nyamasheke19,9315442
22Nyanza12,089100
23Nyarugenge38,60238,158
24Nyaruguru940843
25Rubavu32,86729,891
26Ruhango18,895303
27Rulindo12,78212,213
28Rusizi28,46125,347
29Rutsiro10,8273806
30Rwamagana21,85621,069
Total636,422430,636
Figure A1. Map presenting the location of meters in the dataset with only location information (red) and in the dataset with both location and installation date information (blue). Both datasets cover all 30 districts in Rwanda as shown in Table A1.
Figure A1. Map presenting the location of meters in the dataset with only location information (red) and in the dataset with both location and installation date information (blue). Both datasets cover all 30 districts in Rwanda as shown in Table A1.
Remotesensing 16 03561 g0a1

Appendix A.2. Identification of Unelectrified Locations

Figure A2 shows the ESRI 2020 Land Use Land Cover (LULC) product used to identify the unelectrified location in Rwanda. To create this map, we downloaded the two LULC tiles (35M and 36M) covering Rwanda from the Living Atlas database [85]. Subsequently, we merged and cropped these tiles to match Rwanda’s geographical extent. The forest and the water classes from this LULC product were used to identify unelectrified locations. Similarly, Figure A3 shows the region of West Africa used for identifying unelectrified NTL pixels, which was then used to supplement ECREEE’s minigrid location dataset with the aim of assessing the electrification access mapping model’s performance in detecting areas electrified with off-grid systems. The LULC tile shown in the figure is the 30P tile from the Living Atlas database which was visually determined to be the most center tile in the geographical area covered by the minigrids.
Figure A2. The ESRI 2020 land use land cover (LULC) map. Note, that most of the water bodies and forest in Rwanda are in the western and south-western parts of the country indicating that most of the unelectrified locations selected for the study will be concentrated in this area.
Figure A2. The ESRI 2020 land use land cover (LULC) map. Note, that most of the water bodies and forest in Rwanda are in the western and south-western parts of the country indicating that most of the unelectrified locations selected for the study will be concentrated in this area.
Remotesensing 16 03561 g0a2
Figure A3. This map shows the 15 countries with minigrids in the ECREEE minigrid data. We overlay the ESRI 2020 LULC tile over the countries to show the area from which we identified the unelectrified NTL pixels.
Figure A3. This map shows the 15 countries with minigrids in the ECREEE minigrid data. We overlay the ESRI 2020 LULC tile over the countries to show the area from which we identified the unelectrified NTL pixels.
Remotesensing 16 03561 g0a3

Appendix A.3. Location of Minigrids

Table A2 below provides the count of the installed and operational minigrids in each ECOWAS member state. We observe that although 15 countries are represented in the dataset, the non-operational minigrids are only located in Liberia and Sierra Leone.
Table A2. Table showing the number of minigrids in each ECOWAS member state.
Table A2. Table showing the number of minigrids in each ECOWAS member state.
#CountryInstalled MinigridsOperational Minigrids
1Benin77
2Burkina Faso55
3Cabo Verde88
4Cote d’Ivoire77
5Gambia11
6Ghana55
7Guinea33
8Guinea Bissau22
9Liberia1514
10Mali7777
11Niger1313
12Nigeria1818
13Senegal173173
14Sierra Leone544
15Togo44
Total392341

Appendix B. Supplementary Methods

In this section, we present the metrics calculated for each of the electrification mapping models developed in this study. A comprehensive set of 10 metrics were used to evaluate the performance of the models. The equations below express how the metrics were calculated where TP, FN, TN, FP, AUC-ROC, and MCC are the True Positives, False Negatives, True Negatives, False Positives, Area Under Curve—Receiver Operating Characteristic, and Matthews Correlation Coefficient, respectively.
Accuracy = TP + TN TP + TN + FP + FN
Balanced Accuracy = 1 2 TP TP + FN + TN TN + FP
Sensitivity = TP TP + FN
Specificity = TN TN + FP
Precision = TP TP + FP
F β = 1 + β 2 β 2 Sensitivity or Recall + 1 Precision , β = 0.5 , 2
AUC-ROC = 0 1 ROC ( t ) d t
MCC = TP · TN FP · FN ( TP + FP ) ( TP + FN ) ( TN + FP ) ( TN + FN )
Balanced accuracy and MCC were employed to address the challenge of data imbalance, which can impact traditional accuracy measures [86,87].

Appendix C. Supplementary Results

Appendix C.1. Comparison of Temporal Aggregation Techniques

Figure A4 below shows how the 10 performance metrics vary between the annual, quarterly, monthly, and daily temporal aggregation techniques. The monthly aggregation technique gave the best performance on all metrics.
Figure A4. Boxplots showing the distributions of the 10-fold cross-validation for each of the 10 metrics used to assess the performance of the electrification models developed from annual, quarterly, monthly, and daily nightlight composites. Notice that monthly composites (highlighted in green) outperform all other composites across all metrics. Also, Specificity and MCC had the lowest values hence a different y-axis range was used to plot them.
Figure A4. Boxplots showing the distributions of the 10-fold cross-validation for each of the 10 metrics used to assess the performance of the electrification models developed from annual, quarterly, monthly, and daily nightlight composites. Notice that monthly composites (highlighted in green) outperform all other composites across all metrics. Also, Specificity and MCC had the lowest values hence a different y-axis range was used to plot them.
Remotesensing 16 03561 g0a4

Appendix C.2. Comparison of Performance with Meter and Transformer Locations

Figure A5 below compares the performance of the electrification mapping model when employing meter locations versus transformer locations to identify electrified NTL pixels across all 10 metrics. Identifying electrified NTL pixels with meter location gave the best performance across all metrics except Specificity.
Figure A5. Boxplots showing the distributions of the 10-fold cross-validation for each of the 10 metrics used to assess the performance of the electrification models developed from transformer locations and meter locations. Using meter locations to identify electrified NTL pixels gave the best performance across most metrics.
Figure A5. Boxplots showing the distributions of the 10-fold cross-validation for each of the 10 metrics used to assess the performance of the electrification models developed from transformer locations and meter locations. Using meter locations to identify electrified NTL pixels gave the best performance across most metrics.
Remotesensing 16 03561 g0a5

Appendix C.3. Generalization to Unseen Grid-Connected Areas

In this section, we provide the result of the generalization of the model trained on one country to the other. Figure A6 shows the accuracy, balanced accuracy, sensitivity, specificity, and precision while Figure A7 shows the F1, F0.5, F2, AUC-ROC and MCC. We observe that the models’ performance scores do not show significant improvement when additional data from the test location are incorporated into the training data.
Figure A6. Accuracy, balanced accuracy, sensitivity, specificity, and precision of the model trained on data from one and country and tested on another.
Figure A6. Accuracy, balanced accuracy, sensitivity, specificity, and precision of the model trained on data from one and country and tested on another.
Remotesensing 16 03561 g0a6
Figure A7. F1, F0.5, F2, AUC-ROC and MCC of the model trained on data from one and country and tested on another.
Figure A7. F1, F0.5, F2, AUC-ROC and MCC of the model trained on data from one and country and tested on another.
Remotesensing 16 03561 g0a7

References

  1. Barnes, D.; Foley, G. Rural Electrification in the Developing World: A Summary of Lessons from Successful Programs; Technical Report; World Bank: Washington, DC, USA, 2004. [Google Scholar]
  2. Kirubi, C.; Jacobson, A.; Kammen, D.M.; Mills, A. Community-Based Electric Micro-Grids Can Contribute to Rural Development: Evidence from Kenya. World Dev. 2009, 37, 1208–1221. [Google Scholar] [CrossRef]
  3. Dinkelman, T. The Effects of Rural Electrification on Employment: New Evidence from South Africa. Am. Econ. Rev. 2011, 101, 3078–3108. [Google Scholar] [CrossRef]
  4. Burney, J.; Alaofè, H.; Naylor, R.; Taren, D. Impact of a rural solar electrification project on the level and structure of women’s empowerment. Environ. Res. Lett. 2017, 12, 095007. [Google Scholar] [CrossRef]
  5. Daka, K.R.; Ballet, J. Children’s education and home electrification: A case study in northwestern Madagascar. Energy Policy 2011, 39, 2866–2874. [Google Scholar] [CrossRef]
  6. Akram, V. Causality Between Access to Electricity and Education: Evidence From BRICS Countries. Energy Res. Lett. 2022, 3, 1–6. [Google Scholar] [CrossRef]
  7. Barron, M.; Torero, M. Household electrification and indoor air pollution. J. Environ. Econ. Manag. 2017, 86, 81–92. [Google Scholar] [CrossRef]
  8. Olanrele, I.A.; Lawal, A.I.; Dahunsi, S.O.; Babajide, A.A.; Iseolorunkanmi, J.O. The impact of access to electricity on education and health sectors in Nigeria’s rural communities. Entrep. Sustain. Issues 2020, 7, 3016–3035. [Google Scholar] [CrossRef]
  9. Ministry of Water, Irrigation, and Energy. National Electrification Program 2.0: Integrated Planning for Universal Access; World Bank: Washington, DC, USA, 2019.
  10. Ministry of Energy. Kenya National Electrification Strategy: Key Highlights; Ministry of Energy: Nairobi, Kenya, 2018.
  11. IEA; IRENA; UNSD; World Bank; WHO. Tracking SDG 7: The Energy Progress Report 2023; Technical Report; IEA: Paris, France; IRENA: Masdar City, United Arab Emirates; UNSD: New York, NY, USA; World Bank: Washington, DC, USA; WHO: Geneva, Switzerland, 2023. [Google Scholar]
  12. Ratledge, N.; Cadamuro, G.; De La Cuesta, B.; Stigler, M.; Burke, M. Using machine learning to assess the livelihood impact of electricity access. Nature 2022, 611, 491–495. [Google Scholar] [CrossRef]
  13. Lenz, L.; Munyehirwe, A.; Peters, J.; Sievert, M. Does Large-Scale Infrastructure Investment Alleviate Poverty? Impacts of Rwanda’s Electricity Access Roll-Out Program. World Dev. 2017, 89, 88–110. [Google Scholar] [CrossRef]
  14. Korkovelos, A.; Khavari, B.; Sahlberg, A.; Howells, M.; Arderne, C. The Role of Open Access Data in Geospatial Electrification Planning and the Achievement of SDG7. An OnSSET-Based Case Study for Malawi. Energies 2019, 12, 1395. [Google Scholar] [CrossRef]
  15. Mentis, D.; Howells, M.; Rogner, H.; Korkovelos, A.; Arderne, C.; Zepeda, E.; Siyal, S.; Taliotis, C.; Bazilian, M.; De Roo, A.; et al. Lighting the World: The first application of an open source, spatial electrification tool (OnSSET) on Sub-Saharan Africa. Environ. Res. Lett. 2017, 12, 085003. [Google Scholar] [CrossRef]
  16. Moksnes, N.; Korkovelos, A.; Mentis, D.; Howells, M. Electrification pathways for Kenya–linking spatial electrification analysis and medium to long term energy planning. Environ. Res. Lett. 2017, 12, 095008. [Google Scholar] [CrossRef]
  17. USAID. The DHS Program; USAID: Washington, DC, USA, 2023.
  18. World Bank. Living Standards Measurement Study; World Bank: Washington, DC, USA, 2023. [Google Scholar]
  19. UNICEF. Multiple Indicator Cluster Surveys; UNICEF: New York, NY, USA, 2023. [Google Scholar]
  20. WHO. World Health Survey; WHO: Geneva, Switzerland, 2023. [Google Scholar]
  21. Miller, S.D.; Noh, Y.; Grasso, L.D.; Seaman, C.J.; Ignatov, A.; Heidinger, A.K.; Nam, S.; Line, W.E.; Petrenko, B. A Physical Basis for the Overstatement of Low Clouds at Night by Conventional Satellite Infrared-Based Imaging Radiometer Bi-Spectral Techniques. Earth Space Sci. 2022, 9, e2021EA002137. [Google Scholar] [CrossRef]
  22. Proville, J.; Zavala-Araiza, D.; Wagner, G. Night-time lights: A global, long term look at links to socio-economic trends. PLoS ONE 2017, 12, e0174610. [Google Scholar] [CrossRef]
  23. Elvidge, C.D.; Baugh, K.E.; Kihn, E.A.; Kroehl, H.W.; Davis, E.R.; Davis, C.W. Relation between satellite observed visible-near infrared emissions, population, economic activity and electric power consumption. Int. J. Remote Sens. 1997, 18, 1373–1379. [Google Scholar] [CrossRef]
  24. Li, X.; Ge, L.; Chen, X. Detecting Zimbabwe’s Decadal Economic Decline Using Nighttime Light Imagery. Remote Sens. 2013, 5, 4551–4570. [Google Scholar] [CrossRef]
  25. Shah, Z.; Hsu, F.C.; Elvidge, C.D.; Taneja, J. Mapping Disasters & Tracking Recovery in Conflict Zones Using Nighttime Lights. In Proceedings of the 2020 IEEE Global Humanitarian Technology Conference (GHTC), Seattle, WA, USA, 29 October–1 November 2020; pp. 1–8. [Google Scholar]
  26. Li, X.; Zhang, R.; Huang, C.; Li, D. Detecting 2014 Northern Iraq Insurgency using night-time light imagery. Int. J. Remote Sens. 2015, 36, 3446–3458. [Google Scholar] [CrossRef]
  27. Ajmar, A.; Arco, E.; Eusebio, A. The VIIRS Nighttime Lights average annual global dataset: Exploratory and brisk trend analysis on three different domains. In Proceedings of the 2022 IEEE 21st Mediterranean Electrotechnical Conference (MELECON), Palermo, Italy, 14–16 June 2022; pp. 454–459. [Google Scholar]
  28. Shah, Z.; Carvallo, J.P.; Hsu, F.C.; Taneja, J. The inequitable distribution of power interruptions during the 2021 Texas winter storm Uri. Environ. Res. Infrastruct. Sustain. 2023, 3, 025011. [Google Scholar] [CrossRef]
  29. Henderson, M.; Yeh, E.T.; Gong, P.; Elvidge, C.; Baugh, K. Validation of urban boundaries derived from global night-time satellite imagery. Int. J. Remote Sens. 2003, 24, 595–609. [Google Scholar] [CrossRef]
  30. Yuan, X.; Jia, L.; Menenti, M.; Zhou, J.; Chen, Q. Filtering the NPP-VIIRS Nighttime Light Data for Improved Detection of Settlements in Africa. Remote Sens. 2019, 11, 3002. [Google Scholar] [CrossRef]
  31. Stathakis, D.; Baltas, P. Seasonal population estimates based on night-time lights. Comput. Environ. Urban Syst. 2018, 68, 133–141. [Google Scholar] [CrossRef]
  32. Elvidge, C.D.; Baugh, K.E.; Sutton, P.C.; Bhaduri, B.; Tuttle, B.T.; Ghosh, T.; Ziskin, D.; Erwin, E.H. Who’s in the Dark-Satellite Based Estimates of Electrification Rates. In Urban Remote Sensing; Yang, X., Ed.; John Wiley & Sons, Ltd.: Chichester, UK, 2011; pp. 211–224. [Google Scholar]
  33. Ru, Y.; Li, X.; Belay, W.A. Tracking Spatiotemporal Patterns of Rwanda’s Electrification Using Multi-Temporal VIIRS Nighttime Light Imagery. Remote Sens. 2022, 14, 4397. [Google Scholar] [CrossRef]
  34. Min, B.; Gaba, K.M.; Sarr, O.F.; Agalassou, A. Detection of rural electrification in Africa using DMSP-OLS night lights imagery. Int. J. Remote Sens. 2013, 34, 8118–8141. [Google Scholar] [CrossRef]
  35. Arderne, C.; Zorn, C.; Nicolas, C.; Koks, E.E. Predictive mapping of the global power system using open data. Sci. Data 2020, 7, 19. [Google Scholar] [CrossRef]
  36. Falchetta, G.; Pachauri, S.; Parkinson, S.; Byers, E. A high-resolution gridded dataset to assess electrification in sub-Saharan Africa. Sci. Data 2019, 6, 110. [Google Scholar] [CrossRef] [PubMed]
  37. Bhattarai, D.; Lucieer, A.; Lovell, H.; Aryal, J. Remote sensing of night-time lights and electricity consumption: A systematic literature review and meta-analysis. Geogr. Compass 2023, 17, e12684. [Google Scholar] [CrossRef]
  38. Xie, Y.; Weng, Q. Detecting urban-scale dynamics of electricity consumption at Chinese cities using time-series DMSP-OLS (Defense Meteorological Satellite Program-Operational Linescan System) nighttime light imageries. Energy 2016, 100, 177–189. [Google Scholar] [CrossRef]
  39. Elvidge, C.D.; Baugh, K.; Zhizhin, M.; Hsu, F.C.; Ghosh, T. VIIRS night-time lights. Int. J. Remote Sens. 2017, 38, 5860–5879. [Google Scholar] [CrossRef]
  40. Correa, S.; Shah, Z.; Wu, Y.; Kohlhase, S.; Raisin, P.; Gaihre, N.R.; Modi, V.; Taneja, J. PowerScour: Tracking electrified settlements using satellite data. In Proceedings of the 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Boston, MA, USA, 9–10 November 2022; pp. 139–148. [Google Scholar]
  41. Brian, M.; O’keeffe, Z. HREA Electricity Data. 2021. [Google Scholar]
  42. Hibbert, D.B. Systematic errors in analytical measurement results. J. Chromatogr. A 2007, 1158, 25–32. [Google Scholar] [CrossRef]
  43. Min, B.; Gaba, K. Tracking Electrification in Vietnam Using Nighttime Lights. Remote Sens. 2014, 6, 9511–9529. [Google Scholar] [CrossRef]
  44. Elvidge, C.D.; Imhoff, M.L.; Baugh, K.E.; Hobson, V.R.; Nelson, I.; Safran, J.; Dietz, J.B.; Tuttle, B.T. Night-time lights of the world: 1994–1995. ISPRS J. Photogramm. Remote Sens. 2001, 56, 81–99. [Google Scholar] [CrossRef]
  45. Miller, S.D.; Straka, W.; Mills, S.P.; Elvidge, C.D.; Lee, T.F.; Solbrig, J.; Walther, A.; Heidinger, A.K.; Weiss, S.C. Illuminating the Capabilities of the Suomi National Polar-Orbiting Partnership (NPP) Visible Infrared Imaging Radiometer Suite (VIIRS) Day/Night Band. Remote Sens. 2013, 5, 6717–6766. [Google Scholar] [CrossRef]
  46. Rawn, B.; Louie, H. Planning for Electrification: On- and Off-Grid Considerations in Sub-Saharan Africa. IDS Bull. 2017, 48, 9–27. [Google Scholar] [CrossRef]
  47. World Bank. World Bank Open Data; World Bank: Washington, DC, USA, 2022. [Google Scholar]
  48. Earth Observation Group. VIIRS Nighttime Light. Available online: https://eogdata.mines.edu/products/vnl/ (accessed on 10 August 2024).
  49. Elvidge, C.D.; Zhizhin, M.; Ghosh, T.; Hsu, F.C.; Taneja, J. Annual Time Series of Global VIIRS Nighttime Lights Derived from Monthly Averages: 2012 to 2019. Remote Sens. 2021, 13, 922. [Google Scholar] [CrossRef]
  50. Elvidge, C.D.; Hsu, F.C.; Zhizhin, M.; Ghosh, T.; Taneja, J.; Bazilian, M. Indicators of Electric Power Instability from Satellite Observed Nighttime Lights. Remote Sens. 2020, 12, 3194. [Google Scholar] [CrossRef]
  51. Fulcher, B.D. Feature-based time-series analysis. In Feature Engineering for Machine Learning and Data Analytics; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
  52. Christ, M.; Braun, N.; Neuffer, J.; Kempa-Liehr, A.W. Time Series Feature Extraction on basis of Scalable Hypothesis tests (tsfresh—A Python package). Neurocomputing 2018, 307, 72–77. [Google Scholar] [CrossRef]
  53. Pesaresi, M.; Politis, P. GHS-BUILT-S R2023A—GHS Built-Up Surface Grid, Derived from Sentinel2 Composite and Landsat, Multitemporal (1975–2030); European Commission, Joint Research Centre (JRC): Ispra, Italy, 2023. [Google Scholar]
  54. Melchiorri, M. The global human settlement layer sets a new standard for global urban data reporting with the urban centre database. Front. Environ. Sci. 2022, 10, 1003862. [Google Scholar] [CrossRef]
  55. Karra, K.; Kontgis, C.; Statman-Weil, Z.; Mazzariello, J.C.; Mathis, M.; Brumby, S.P. Global land use / land cover with Sentinel 2 and deep learning. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 4704–4707. [Google Scholar]
  56. National Institute of Statistics Rwanda. The Rwanda GeoPortal—Rwanda National Parks; National Institute of Statistics Rwanda: Kigali, Rwanda, 2022. [Google Scholar]
  57. National Institute of Statistics Rwanda. Humanitarian Data Exchange—Rwanda Water Bodies (Lakes); National Institute of Statistics Rwanda: Kigali, Rwanda, 2018. [Google Scholar]
  58. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  59. Mpakairi, K.S.; Dube, T.; Sibanda, M.; Mutanga, O. Fine-scale characterization of irrigated and rainfed croplands at national scale using multi-source data, random forest, and deep learning algorithms. ISPRS J. Photogramm. Remote Sens. 2023, 204, 117–130. [Google Scholar] [CrossRef]
  60. Nguyen, H.T.T.; Doan, T.M.; Radeloff, V. Applying Random Forest Classification to Map Land Use/Land Cover Using Landsat 8 OLI. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, XLII-3/W4, 363–367. [Google Scholar] [CrossRef]
  61. Jin, Y.; Liu, X.; Chen, Y.; Liang, X. Land-cover mapping using Random Forest classification and incorporating NDVI time-series and texture: A case study of central Shandong. Int. J. Remote Sens. 2018, 39, 8703–8723. [Google Scholar] [CrossRef]
  62. Wang, X.; Zhong, L.; Ma, Y. Estimation of 30 m land surface temperatures over the entire Tibetan Plateau based on Landsat-7 ETM+ data and machine learning methods. Int. J. Digit. Earth 2022, 15, 1038–1055. [Google Scholar] [CrossRef]
  63. Xu, S.; Cheng, J.; Zhang, Q. A Random Forest-Based Data Fusion Method for Obtaining All-Weather Land Surface Temperature with High Spatial Resolution. Remote Sens. 2021, 13, 2211. [Google Scholar] [CrossRef]
  64. Mpakairi, K.S.; Muvengwi, J. Night-time lights and their influence on summer night land surface temperature in two urban cities of Zimbabwe: A geospatial perspective. Urban Clim. 2019, 29, 100468. [Google Scholar] [CrossRef]
  65. Karlson, M.; Ostwald, M.; Reese, H.; Sanou, J.; Tankoano, B.; Mattsson, E. Mapping Tree Canopy Cover and Aboveground Biomass in Sudano-Sahelian Woodlands Using Landsat 8 and Random Forest. Remote Sens. 2015, 7, 10017–10041. [Google Scholar] [CrossRef]
  66. Sheykhmousa, M.; Mahdianpari, M.; Ghanbari, H.; Mohammadimanesh, F.; Ghamisi, P.; Homayouni, S. Support Vector Machine Versus Random Forest for Remote Sensing Image Classification: A Meta-Analysis and Systematic Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6308–6325. [Google Scholar] [CrossRef]
  67. Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
  68. Correa, S.; Shah, Z.; Taneja, J. This Little Light of Mine: Electricity Access Mapping Using Night-time Light Data. In Proceedings of the Twelfth ACM International Conference on Future Energy Systems, Virtual Event Italy, 28 June–2 July 2021; pp. 254–258. [Google Scholar]
  69. Fishbein, R.E. Survey of Productive Uses of Electricity in Rural Areas; Technical Report; World Bank Group: Washington, DC, USA, 2003. [Google Scholar]
  70. Fobi, S.; Deshpande, V.; Ondiek, S.; Modi, V.; Taneja, J. A longitudinal study of electricity consumption growth in Kenya. Energy Policy 2018, 123, 569–578. [Google Scholar] [CrossRef]
  71. Nembrini, S.; König, I.R.; Wright, M.N. The revival of the Gini importance? Bioinformatics 2018, 34, 3711–3718. [Google Scholar] [CrossRef]
  72. Parr, T.; Turgutlu, K.; Csiszar, C.; Howard, J. Beware Default Random Forest Importances. 2018. Available online: https://explained.ai/rf-importance/ (accessed on 10 August 2024).
  73. Bimenyimana, S.; Asemota, G.N.O.; Li, L. The State of the Power Sector in Rwanda: A Progressive Sector with Ambitious Targets. Front. Energy Res. 2018, 6, 68. [Google Scholar] [CrossRef]
  74. ECREEE. ECOWREX GeoNetwork Catalog; ECREEE: Praia, Cape Verde, 2019. [Google Scholar]
  75. Hirmer, S.; Tomei, J.; Yang, P.; Leonard, A.; Trotter, P.; Millot, A.; Egli, F.; van Dam, K.; Beltramo, A.; Stringer, M. Inconsistent measurement calls into question progress on electrification in sub-Saharan Africa. Nat. Energy 2024, 9, 1046–1050. [Google Scholar] [CrossRef]
  76. Doll, C.N.; Pachauri, S. Estimating rural populations without access to electricity in developing countries through night-time light satellite imagery. Energy Policy 2010, 38, 5661–5670. [Google Scholar] [CrossRef]
  77. IEA. SDG7: Data and Projections; Technical Report; International Energy Agency: Paris, France, 2020. [Google Scholar]
  78. Chaaban, F.; El Khattabi, J.; Darwishe, H. Accuracy Assessment of ESA WorldCover 2020 and ESRI 2020 Land Cover Maps for a Region in Syria. J. Geovis. Spat. Anal. 2022, 6, 31. [Google Scholar] [CrossRef]
  79. Huan, V.D. Accuracy assessment of land use land cover LULC 2020 (ESRI) data in Con Dao island, Ba Ria – Vung Tau province, Vietnam. IOP Conf. Ser. Earth Environ. Sci. 2022, 1028, 012010. [Google Scholar] [CrossRef]
  80. Chemchaoui, A.; Brhadda, N.; Alaoui, H.I.; Souad, E.G.; Bouchra, E.A.; Rabea, Z. Accuracy assessment and uncertainty of the 2020 10-meter resolution land use land cover maps at local scale. Case: Talassemtane national park, Morocco. Preprint, 2023; in review. [Google Scholar]
  81. Venter, Z.S.; Barton, D.N.; Chakraborty, T.; Simensen, T.; Singh, G. Global 10 m Land Use Land Cover Datasets: A Comparison of Dynamic World, World Cover and Esri Land Cover. Remote Sens. 2022, 14, 4101. [Google Scholar] [CrossRef]
  82. Malof, J.M.; Bradbury, K.; Collins, L.M.; Newell, R.G. Automatic detection of solar photovoltaic arrays in high resolution aerial imagery. Appl. Energy 2016, 183, 229–240. [Google Scholar] [CrossRef]
  83. Xu, G.; Xiu, T.; Li, X.; Liang, X.; Jiao, L. Lockdown induced night-time light dynamics during the COVID-19 epidemic in global megacities. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102421. [Google Scholar] [CrossRef] [PubMed]
  84. Zissis, G.; Bertoldi, P.; Serrenho, T. Update on the Status of LED-Lighting World Market Since 2018. EUR 30500; Publications Office of the European Union: Luxembourg, 2021; ISBN 978-92-76-27244-1. [Google Scholar]
  85. ESRI. ESRI Land Cover; ESRI: Redlands, CA, USA, 2023. [Google Scholar]
  86. Baldi, P.; Brunak, S.; Chauvin, Y.; Andersen, C.A.F.; Nielsen, H. Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics 2000, 16, 412–424. [Google Scholar] [CrossRef]
  87. Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef]
Figure 1. Overview of the steps used to assess the errors in nighttime lights-based electrification mapping models.
Figure 1. Overview of the steps used to assess the errors in nighttime lights-based electrification mapping models.
Remotesensing 16 03561 g001
Figure 2. Map showing the location of transformers in Rwanda (left) and Kenya (right).
Figure 2. Map showing the location of transformers in Rwanda (left) and Kenya (right).
Remotesensing 16 03561 g002
Figure 3. An illustration of the selection of electrified and unelectrified NTL pixels.
Figure 3. An illustration of the selection of electrified and unelectrified NTL pixels.
Remotesensing 16 03561 g003
Figure 4. Map showing the location of electrified and unelectrified pixels in Rwanda. We observe that most of the locations selected as unelectrified are concentrated in Lakes and National Parks.
Figure 4. Map showing the location of electrified and unelectrified pixels in Rwanda. We observe that most of the locations selected as unelectrified are concentrated in Lakes and National Parks.
Remotesensing 16 03561 g004
Figure 5. Boxplots showing the distributions of the performance of the electrification models developed from annual, quarterly, monthly, and daily nightlight composites with 10-fold cross-validation. Notice that monthly composites (highlighted in green) outperform all other composites across the four metrics.
Figure 5. Boxplots showing the distributions of the performance of the electrification models developed from annual, quarterly, monthly, and daily nightlight composites with 10-fold cross-validation. Notice that monthly composites (highlighted in green) outperform all other composites across the four metrics.
Remotesensing 16 03561 g005
Figure 6. Boxplots comparing the 10-fold cross-validated performance of models trained on meter locations with those trained on transformer locations. Note, that using meter locations to identify electrified pixels led to improved models irrespective of the composites used.
Figure 6. Boxplots comparing the 10-fold cross-validated performance of models trained on meter locations with those trained on transformer locations. Note, that using meter locations to identify electrified pixels led to improved models irrespective of the composites used.
Remotesensing 16 03561 g006
Figure 7. Time series boxplots showing the percentage of misclassified pixels as a function of the year the pixels were electrified. The secondary y-axis shows the total number of pixels electrified in each year while each boxplot shows the distribution of the 10-fold cross-validation result.
Figure 7. Time series boxplots showing the percentage of misclassified pixels as a function of the year the pixels were electrified. The secondary y-axis shows the total number of pixels electrified in each year while each boxplot shows the distribution of the 10-fold cross-validation result.
Remotesensing 16 03561 g007
Figure 8. Distribution of the counts of meters in the test and error data. The x-axis is in log scale due to the wide range of meters in the pixels. Notice that the percentage of meters in the first three bins is significantly less in the test data than in the error data.
Figure 8. Distribution of the counts of meters in the test and error data. The x-axis is in log scale due to the wide range of meters in the pixels. Notice that the percentage of meters in the first three bins is significantly less in the test data than in the error data.
Remotesensing 16 03561 g008
Figure 9. Bar chart showing error rates as a function of the number of meters in a pixel. The secondary y-axis shows the corresponding number of electrified pixels for each meter count bin, while the error bars show the 95% confidence interval (assuming t-distribution) of the mean of the cross-validated error rates. Note, that bins with fewer pixels may show higher variance in error rates due to smaller sample sizes, which reduces the reliability of the averages being reported.
Figure 9. Bar chart showing error rates as a function of the number of meters in a pixel. The secondary y-axis shows the corresponding number of electrified pixels for each meter count bin, while the error bars show the 95% confidence interval (assuming t-distribution) of the mean of the cross-validated error rates. Note, that bins with fewer pixels may show higher variance in error rates due to smaller sample sizes, which reduces the reliability of the averages being reported.
Remotesensing 16 03561 g009
Figure 10. Bar chart showing the error rates (primary y-axis) as a function of the total average annual electricity consumption of meters in a pixel. The secondary y-axis shows the corresponding number of electrified pixels for each electricity consumption bin while the error bars show the 95% confidence interval (assuming t-distribution) of the mean of the cross-validated error rates.
Figure 10. Bar chart showing the error rates (primary y-axis) as a function of the total average annual electricity consumption of meters in a pixel. The secondary y-axis shows the corresponding number of electrified pixels for each electricity consumption bin while the error bars show the 95% confidence interval (assuming t-distribution) of the mean of the cross-validated error rates.
Remotesensing 16 03561 g010
Figure 11. Ranking of the importance of the characteristics of electrified pixels for correct classification with the mean decrease in impurity technique. The sum of the average annual electricity purchased by meters in a pixel was found to be the most important of the three characteristics examined that determine the correct classification of a pixel.
Figure 11. Ranking of the importance of the characteristics of electrified pixels for correct classification with the mean decrease in impurity technique. The sum of the average annual electricity purchased by meters in a pixel was found to be the most important of the three characteristics examined that determine the correct classification of a pixel.
Remotesensing 16 03561 g011
Figure 12. Transferability of models trained on data from one country to the other. The scatter plot shows the results of the 10-fold cross-validation. The x-axis details how much data from the country to be generalized to was added to the initial data to form the training data. The text boxes present balanced accuracies when the models are trained and tested on data from the same location for both models trained with 2020 NTL data only and 2013 to 2020 NTL data. Notice that using NTL data from 2013 to 2020 increased the balanced accuracy when generalizing to unseen locations by at least 5% over what was achieved with only 2020 NTL data. Also, the balanced accuracy was about 16% higher when the models were tested on Kenya’s data compared to Rwanda’s data.
Figure 12. Transferability of models trained on data from one country to the other. The scatter plot shows the results of the 10-fold cross-validation. The x-axis details how much data from the country to be generalized to was added to the initial data to form the training data. The text boxes present balanced accuracies when the models are trained and tested on data from the same location for both models trained with 2020 NTL data only and 2013 to 2020 NTL data. Notice that using NTL data from 2013 to 2020 increased the balanced accuracy when generalizing to unseen locations by at least 5% over what was achieved with only 2020 NTL data. Also, the balanced accuracy was about 16% higher when the models were tested on Kenya’s data compared to Rwanda’s data.
Remotesensing 16 03561 g012
Figure 13. Map showing the location of the 392 minigrids contained in the ECREEE dataset.
Figure 13. Map showing the location of the 392 minigrids contained in the ECREEE dataset.
Remotesensing 16 03561 g013
Figure 14. Chart showing the performance metrics when using a model trained on transformer locations in Rwanda and Kenya to detect minigrids in West Africa.
Figure 14. Chart showing the performance metrics when using a model trained on transformer locations in Rwanda and Kenya to detect minigrids in West Africa.
Remotesensing 16 03561 g014
Figure 15. Boxplot showing the distribution of the radiances from 2020 NTLs pixels containing transformer locations and minigrids. The illumination levels of the minigrid locations are much lower than what was observed in the transformer locations. Note: The right-skewed radiance distribution pulls the mean outside the interquartile ranges, and outliers have been removed from the boxplot for improved visualization.
Figure 15. Boxplot showing the distribution of the radiances from 2020 NTLs pixels containing transformer locations and minigrids. The illumination levels of the minigrid locations are much lower than what was observed in the transformer locations. Note: The right-skewed radiance distribution pulls the mean outside the interquartile ranges, and outliers have been removed from the boxplot for improved visualization.
Remotesensing 16 03561 g015
Figure 16. Multitemporal electrification map showing the expansion of electricity access in Rwanda. Note, that the classifications for each year were processed such that once an area is identified as electrified, it is considered electrified in all subsequent years.
Figure 16. Multitemporal electrification map showing the expansion of electricity access in Rwanda. Note, that the classifications for each year were processed such that once an area is identified as electrified, it is considered electrified in all subsequent years.
Remotesensing 16 03561 g016
Table 1. Detection of road areas by LULC products.
Table 1. Detection of road areas by LULC products.
Road ClassificationESRI LULCGHSL
Settlement65%34%
Non-Settlement35%66%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Raji, T.; Taneja, J.; Williams, N. Assessment of Systematic Errors in Mapping Electricity Access Using Night-Time Lights: A Case Study of Rwanda and Kenya. Remote Sens. 2024, 16, 3561. https://doi.org/10.3390/rs16193561

AMA Style

Raji T, Taneja J, Williams N. Assessment of Systematic Errors in Mapping Electricity Access Using Night-Time Lights: A Case Study of Rwanda and Kenya. Remote Sensing. 2024; 16(19):3561. https://doi.org/10.3390/rs16193561

Chicago/Turabian Style

Raji, Tunmise, Jay Taneja, and Nathaniel Williams. 2024. "Assessment of Systematic Errors in Mapping Electricity Access Using Night-Time Lights: A Case Study of Rwanda and Kenya" Remote Sensing 16, no. 19: 3561. https://doi.org/10.3390/rs16193561

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop