Next Article in Journal
Multi-Temporal SamplePair Generation for Building Change Detection Promotion in Optical Remote Sensing Domain Based on Generative Adversarial Network
Previous Article in Journal
Predicting Dust-Storm Transport Pathways Using a Convolutional Neural Network and Geographic Context for Impact Adaptation and Mitigation in Urban Areas
Previous Article in Special Issue
Continual Contrastive Learning for Cross-Dataset Scene Classification
 
 
Article
Peer-Review Record

FEPVNet: A Network with Adaptive Strategies for Cross-Scale Mapping of Photovoltaic Panels from Multi-Source Images

Remote Sens. 2023, 15(9), 2469; https://doi.org/10.3390/rs15092469
by Buyu Su 1,2,3, Xiaoping Du 1,2,*, Haowei Mu 4, Chen Xu 1,2, Xuecao Li 5, Fang Chen 1,2 and Xiaonan Luo 3
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Remote Sens. 2023, 15(9), 2469; https://doi.org/10.3390/rs15092469
Submission received: 27 March 2023 / Revised: 27 April 2023 / Accepted: 5 May 2023 / Published: 8 May 2023

Round 1

Reviewer 1 Report

This manuscript proposed a network named FEPVNet, which embeds high-pass and low-pass filters and Polarized Self-Attention into a High-Resolution Network(HRNet) to improve its capabilities in noise resistance and adaptive feature extraction, ultimately enhancing photovoltaic extraction accuracy.

 

The paper is well written. The relevant background of the current photovoltaic extraction method is clearly introduced, and the innovation of the proposed method is described in detail. By incorporating high-pass and low-pass filters, polarized Self-Attention and data migration strategy into the HRNet, the FEPVNet significantly imporves the accuracy and the adaptive capability in photovoltaic extraction. I recommend acceptance after minor revision.

 

Some minor issues:

1. Abbreviations in the abstract cannot be used directly in the main text. For example, the PV in the second paragraph of the first page. The rules on abbreviations in academic papers need to be observed separately in the abstract and in the main text.

2. There are some minor problems with the writing, especially the use of prepositions. Please check the whole text carefully.

Author Response

Reviewer #1:

This manuscript proposed a network named FEPVNet, which embeds high-pass and low-pass filters and Polarized Self-Attention into a High-Resolution Network (HRNet) to improve its capabilities in noise resistance and adaptive feature extraction, ultimately enhancing photovoltaic extraction accuracy.

The paper is well written. The relevant background of the current photovoltaic extraction method is clearly introduced, and the innovation of the proposed method is described in detail. By incorporating high-pass and low-pass filters, polarized Self-Attention and data migration strategy into the HRNet, the FEPVNet significantly imporves the accuracy and the adaptive capability in photovoltaic extraction. I recommend acceptance after minor revision.

Response: We thank the reviewer for the precious time for reviewing our manuscript and appreciate the reviewer’s positive comments and we will improve our manuscript accordingly.

# Comment 1-1:

Abbreviations in the abstract cannot be used directly in the main text. For example, the PV in the second paragraph of the first page. The rules on abbreviations in academic papers need to be observed separately in the abstract and in the main text.

Response: Thank you very much for your suggestion! We have made corrections to the all abbreviations in the main text as follows:

“According to the International Energy Agency's (IEA) sustainability program, the number of photovoltaic (PV) plants will increase rapidly.” (page 1, line 41-42)

“Therefore, we selected HRNet as the base model and embedded Canny, Median filter, and Polarized Self-Attention (PSA)to design an adaptive FEPVNet.” (page 2, line 85-87)

“In addition, the Polarized Self-Attention-Residual (PAR), single depthwise separable (SDS) residual and Double Depthwise Separable (DDS) Residual blocks were con-structed to replace the standard residual blocks at different stages of the HRNet main network.” (page 4-5, line 152-155)

“1.  A SDS residual block, as shown in Figure 7(c), where two normal convolutions are replaced by depth convolution and point convolution; (page 8, line 275-276)

  1. A DDS residual block, as shown in Figure 7(d), where two depthwise separable convolutions are used to replace two normal convolutions.” (page 8, line 277-278)

 

# Comment 1-2:

There are some minor problems with the writing, especially the use of prepositions. Please check the whole text carefully.

Response: Thank you very much for your suggestion! We carefully reviewed the entire manuscript to check and correct both the prepositions and other grammar problems.

We provide some examples of sentences we have modified as follows:

“The global demand for energy is facing significant challenges and uncertainties, manifested by the decrease in fossil energy reserves and rising prices [1].” (page 1, line 35-36)

“(c)comparing the PV extraction ability of different models on the Sentinel-2 dataset.” (page 3, line 130-131)

Author Response File: Author Response.docx

Reviewer 2 Report

1. It is necessary to clarify what "zoom 14 and 16" means in Google images, it is understood to be a scaling treatment on the spatial resolution of the image. Why such images are used and how they are used.

2. It is not indicated which bands are used in Sentinel-2 images, and whether they are combined.

3. It would be appropriate to indicate on which server it is possible to download the satellite images used Geofen-2, and in general for all images.

4. Some cited are not related to the study.

Author Response

Reviewer #2:

# Comment 2-1:

It is necessary to clarify what "zoom 14 and 16" means in Google images, it is understood to be a scaling treatment on the spatial resolution of the image. Why such images are used and how they are used.

Response: Thank you very much for your suggestion and question! The description of Google images used in the manuscript was based on the official documentation of Google Maps Platform (https://developers.google.com/maps/documentation/maps-static/start?hl=zh-cn), which described different resolution levels of the images. For example, zoom 14 represents Google images with a resolution of 10 meters, while zoom 16 represents Google images with a resolution of 2 meters. Using these data, we developed a data migration strategy: First, we trained the FEPVNet model using Sentinel-2 images and then we trained the model with Google image to obtain photovoltaic features at different resolutions.

We have modified the description of these images in Section 2 (Dataset).

“To construct the cross-scale network model, four types of images are required: Sentinel-2 image at 10m resolution, which is available for download via Google Earth Engine (GEE), Google-14 (i.e., zoom level is14) image at 10m resolution, Google-16 (i.e., zoom level is 16) image at 2m resolution, all of which can be downloaded through the Google Images API, and Gaofen-2 image at 2m resolution which can be downloaded from the Data Sharing Website of Aerospace Information Research Institute, Chinese Academy of Sciences.” (page 3, line 98-103)

# Comment 2-2:

It is not indicated which bands are used in Sentinel-2 images, and whether they are combined.

Response: We apologize for not giving sufficient information about the images we used in this manuscript. In this study, we used RGB composite of Sentinel-2 images with bands (4, 3, 2).

We modified the band information as follows:

“The sample images of Sentinel-2 we used consist of three bands: red (B4), green (B3), and blue (B2), while the sample label images are grayscale images. These data were cut into 1024 × 1024 pixels, forming four datasets with properties shown in Table 1.” (page 3, line 111-114)

# Comment 2-3:

It would be appropriate to indicate on which server it is possible to download the satellite images used Geofen-2, and in general for all images.

Response: The Gaofen-2 image used in this research was obtained from the data-sharing website of the Aerospace Information Research Institute of the Chinese Academy of Sciences (http://ids.ceode.ac.cn/gfds/query), which requires registration and approval. Sentinel-2 data was downloaded from Google Earth Engine, and Google data was downloaded through Google Images API.

# Comment 2-4:

Some cited are not related to the study.

Response: We are sorry for our carelessness, and we have carefully proofread the full reference. The references were updated and some of them were corrected.

Author Response File: Author Response.docx

Reviewer 3 Report

The authors proposed a composite strategy for PV panel segmentation that addresses multi-scale issues and focuses on detail features. The results of the study demonstrate that they selected the best-performing HRNet-based framework.

However, I have the following main concerns:

- Introduction: Further elaboration is needed on the rationale behind the authors' proposal of a strategy based on HRNet. Are there any research studies or experimental results that demonstrate the superiority of HRNet-based methods over other deep learning-based approaches?

- Dataset: Did the authors take into account very high-resolution (VHR) satellite images at a half-meter level, such as those from the WorldView series? If so, why were they not included in the study?

- Data for DL: Could the authors please clarify why there are no validation and test sets available for Google-14/16, and why there are no training sets for Gaofen-2 in the DL data used in the study?

- Methodology: Could the authors provide more information on the process used to select the best-performing FEPVNet in section 3.1 of the methodology?

- Results: In the introduction section, the authors assert that DL-based methods have achieved success in object detection and segmentation, which I concur with. However, why wasn't the proposed method compared with any SOTAs?

Author Response

Reviewer #3:

The authors proposed a composite strategy for PV panel segmentation that addresses multi-scale issues and focuses on detail features. The results of the study demonstrate that they selected the best-performing HRNet-based framework.

Response: Thank you for taking the time to read and review our manuscript and thank you for your positive comments!

# Comment 3-1:

Introduction: Further elaboration is needed on the rationale behind the authors' proposal of a strategy based on HRNet. Are there any research studies or experimental results that demonstrate the superiority of HRNet-based methods over other deep learning-based approaches?

Response: Thank you for your suggestion. We have further improved the introduction section of the manuscript to explain why we chose the HRNet-based strategy in detail.

To illustrate the superiority of HRNet, we added more references to the introduction section:

“This study examined the current mainstream CNN models. Many researchers have compared U-Net, DeepLabv3+, PSPNet, and HRNet models on the PASCAL VOC 2012 dataset, and HRNet achieved the best performance [38,39]. Therefore, we selected HRNet as the base model and embedded Canny, Median filter, and Polarized Self-Attention (PSA)to design an adaptive FEPVNet.” (page 2, line 82-87)

# Comment 3-2:

Dataset: Did the authors take into account very high-resolution (VHR) satellite images at a half-meter level, such as those from the WorldView series? If so, why were they not included in the study?

Response: Thank you for your questions! In our study, we considered remote sensing images of different resolutions but did not include Very High-Resolution (VHR) satellite images such as the WorldView series because we did not have access to the WorldView images, especially in a very large scale.

We introduced the data and its usage in Section 2 (Dataset), as follows:

“To construct the cross-scale network model, four types of images are required: Sentinel-2 image at 10m resolution, which is available for download via Google Earth Engine (GEE), Google-14 (i.e., zoom level is14) image at 10m resolution, Google-16 (i.e., zoom level is 16) image at 2m resolution, all of which can be downloaded through the Google Images API, and Gaofen-2 image at 2m resolution which can be downloaded from the Data Sharing Website of Aerospace Information Research Institute, Chinese Academy of Sciences. Therefore, we first validated the FEPVNet performance using the Sentinel-2 images, then constructed three data migration strategies using the Sentinel-2 and Google images, and finally completed the PV extraction from the Goafen-2 images.” (page 3, line 98-106)

# Comment 3-3:

Data for DL: Could the authors please clarify why there are no validation and test sets available for Google-14/16, and why there are no training sets for Gaofen-2 in the DL data used in the study?

Response: Thank you for your question. We apologize for unclear description of the images used in this manuscript. Extracting photovoltaic panels from Gaofen-2 images using a model trained using Sentinel-2 images directly results in poor performance. To address this issue, we proposed a data migration strategy using Google images to transfer the Sentinel-2 based model to Gaofen-2 images. We used Google images as training data, while Gaofen-2 images were used as the validation and testing datasets. Therefore, no validation or testing datasets were required for Google images, and no training dataset was required for Gaofen-2 images.

We have provided detailed description in Section 2 (Dataset), as follows:

“The dataset was divided into three parts, one training set, one validation set, and one test set. The results were poor when training the model on Sentinel-2 images and directly extracting PV from Gaofen-2 images. Therefore, we consider combining multiple PV features to complete the transfer work of the Sentinel-2 model. We aimed to utilize images from Sentinel-2 and Google of different resolutions to perform cross-scale PV extraction in Gaofen-2 imagery without using Gaofen-2 imagery to train the model. As a result, only the training set of Google images was needed. For model training, we used Gaofen-2 imagery as the validation set and test set. Thus, Google images did not require validation and test sets, and Gaofen-2 imagery did not need a training set.” (page 3, line 114-123)

And we explained the construction of our model using data migration strategies in section 3.2, as follows:

“To enhance the generalizability of the Sentinel-2 image model in multi-source im-ages and facilitate its migration for use in high-resolution image models, we compared four methods illustrated in Figure 1(d), which include three image migration strategies. These methods consist of training the model with the Sentinel-2 dataset, mixing Senti-nel-2 images with Google-14 images in a 1:1 ratio to form the dataset training model, mixing Sentinel-2 images with Google-16 images in a 1:1 ratio to form the dataset training model, and mixing Sentinel-2 images with Google-14 and Google-16 images in a 1:1:2 to form the dataset training model. We then used these methods to extract PV from the Gaofen-2 image.” (page 9, line 295-303)

# Comment 3-4:

Methodology: Could the authors provide more information on the process used to select the best-performing FEPVNet in section 3.1 of the methodology?

Response: Thank you for your suggestion. In selecting the best-performing FEPVNet, we used multiple metrics, including accuracy, recall, F1-score, among others. We conducted extensive experiments on different improved models to determine the optimal position of each improvement component and combined them to construct FEPVNet.

We provided a detailed description in section 3.1 on how we determined the structure of FEPVNet to achieve better accuracy.

“Several modifications were made to improve the HRNet model, including adding high-low pass filtering, polarized parallel attention, and deep separable convolution. Four different stem networks were constructed: LG_stem, which combines Laplacian and Gaussian filters, SG_stem, which combines Sobel and Gaussian filters, CG_stem, which combines Canny and Gaussian filters, and CM_stem, which combines Canny and Median filters. In addition, the Polarized Self-Attention-Residual (PAR), single depthwise separable (SDS) residual and Double Depthwise Separable (DDS) Residual blocks were constructed to replace the standard residual blocks at different stages of the HRNet main network. The performance of these modules was evaluated on Senti-nel-2 images in terms of efficiency, Precision, Recall, F1-score, and Intersection over Union (IoU) to determine the best configuration for our model.” (page 4-5, line 147-157)

# Comment 3-5:

Results: In the introduction section, the authors assert that DL-based methods have achieved success in object detection and segmentation, which I concur with. However, why wasn't the proposed method compared with any SOTAs?

Response: We did consider the SOTA methods that are widely used in the field of object detection and segmentation, and compared them with the U-Net and HRNet models in our research. However, our focus was on proposing a new composite strategy to better address the issue of multi-scale and detail features. Therefore, we placed emphasis on validating the performance of our proposed method. We will consider comprehensive comparison with SOTAs in the future.

We conducted experiments on each improvement component in the 4.1 ablation experiment section to determine the optimal structure of FEPVNet. In section 4.2, we compared the performance of U-Net, HRNet, FEPVNet, and FESPVNet. Both of our proposed models, FEPVNet and its lightweight version FESPVNet, outperformed U-Net and HRNet. In order to verify the performance of FEPVNet in cross-regional PV extraction, we completed cross-validation experiments between HRNet and FEPVNet in section 4.3, further demonstrating the performance of FEPVNet.

Furthermore, considering the performance of extracting PV from Gaofen-2 at different scales was not good when the Sentinel-2 based models were used directly, we evaluated the PV extraction results of HRNet and FEPVNet on Gaofen-2 images using our proposed data migration strategy. This experiment was done to demonstrate that the combination of FEPVNet and data migration strategy will implement the cross-scale extraction of PV panels from multi-sources images.

Author Response File: Author Response.docx

Round 2

Reviewer 3 Report

Thanks for the authors’ responses to my concerns. In my opinion, extensive effort is required for a qualified publication. I understand that the authors try to propose a strategy combination for solving multi-scale and detailed feature problems.

 

- Response 3-1: Gaps in proposing the composite method

The authors insist that they focus on settling "issue of multi-scale and detail features" (Response 3-5), and declaim that there is no SOTA on this issue. Then how do they get to know that the baseline HRNet could potentially solve the "new" situation? To my understanding, the extended contents "HRNet achieved the best performance" (line 89) are limited to filling the gap. Besides, logic deduction on the proposal of the adaptive FEPVNet that embeds Canny, the median filter, and PSA is unconvincible.

 

- Response 3-2. VHR data

It is acceptable to not include half-meter level satellite imagery in this research. I still recommend them to explore further in their future work. To my knowledge, several satellite companies (i.e., MAXAR) encourage connections for academic research purposes.

 

- Response 3-3: clear enough

 

- Response 3-4: clear enough

 

- Response 3-5: Comparison with SOTAs is lacking

To my understanding, the authors try to propose a scheme to map PV panels. Then without comparing it with SOTAs, it is not acceptable to prove that this method is outstanding.

Author Response

Reviewer #3:

Thanks for the authors’ responses to my concerns. In my opinion, extensive effort is required for a qualified publication. I understand that the authors try to propose a strategy combination for solving multi-scale and detailed feature problems.

Response: Thank you for your positive comments and valuable suggestions on our work. We aim to propose an effective solution to the cross-scale mapping problem of PV panels. We have demonstrated through extensive experimentation that the proposed method improves the accuracy of PV panels extraction by introducing three modules (i.e., Canny, the Median filter, and PSA) on the HRNet and achieves better accuracy across different regions (i.e., China, the US) and scales (i.e., 10m, 2m). We have carefully considered your suggestions and improved the manuscript according to your inputs.

# Response 3-1-1: Gaps in proposing the composite method

The authors insist that they focus on settling "issue of multi-scale and detail features" (Response 3-5), and declaim that there is no SOTA on this issue. Then how do they get to know that the baseline HRNet could potentially solve the "new" situation? To my understanding, the extended contents "HRNet achieved the best performance" (line 89) are limited to filling the gap.

Response: Thank you for pointing this out. We are very sorry that we did not explain well of the Response 3-5 on the previous letter. Actually, HRNet is a SOTA method using convolutional neural networks for semantic segmentation according to literatures (Wang et al. 2021; Sun et al. 2019) and we already conducted a comparative analysis between HRNet and our model, FEPVNet on the manuscript.

The experiments showed that Adaptive FEPVNet outperformed benchmark methods such as HRNet due to its superior ability to extract boundaries between adjacent PV panels, as demonstrated in Figure 10. As a result, FEPVNet achieved higher evaluation metrics (shown in Table 4), surpassing that of the SOTA model HRNet in both study regions.

Table 4. Evaluation metrics for different main body networks.

Region

Model

Recall

Precision

F1-score

IoU

Params

Flops

China

U-Net

0.4174

0.5316

0.4676

0.3052

31054344

64914029

HRNet

0.9052

0.9489

0.9265

0.8631

65847122

374.51G

FEPVNet

SwinTransformer

0.9309

0.9309

0.9493

0.9460

0.9400

0.9384

0.8868

0.8840

65939858

59,830,000

376.34G

936.71G

FESPVNet

0.9246

0.9503

0.9373

0.8820

26066258

253.77G

US

U-Net

0.8717

0.6224

0.7262

0.5702

31054344

64914029

HRNet

0.9521

0.9595

0.9558

0.9153

65847122

374.51G

FEPVNet

SwinTransformer

0.9641

0.9591

0.9695

0.9726

0.9668

0.9658

0.9358

0.9339

65939858

59,830,000

376.34G

936.71G

FESPVNet

0.9567

0.9679

0.9623

0.9273

26066258

253.77G

 

Figure 10. Prediction results for China and the US in different network models. Note: The predic-tion results of U-Net, HRNet, SwinTransformer, FESPVNet, and FEPVNet in China and US re-gions are shown. (page 14-15, line 396-400)

 

To further demonstrate the performance of FEPVNet, we conducted cross-validation using FEPVNet and HRNet on Sentinel-2 datasets from two different regions. According to the evaluation metrics in Table 5 and the prediction results in Figure 11, FEPVNet outperformed HRNet in extracting PV panels from different regions.

“Table 5. Area comparison predictive evaluation metrics.

Region

Model

Recall

Precision

F1-score

IoU

China

HRNet_US

0.3755

0.9372

0.5362

0.3663

 

FEPVNet_US

0.4645

0.9539

0.6248

0.4544

US

HRNet_China

0.8288

0.4869

0.6134

0.4424

 

FEPVNet_China

0.6872

0.6221

0.6530

0.4848

Figure 11. Cross-validation results. Note: The prediction results of HRNet and FEPVNet for China and US regions with different weight parameters were shown, respectively.” (page 16, line 421-424)

The cross-scale extraction of PV panels was compared among four methods, using three data migration strategies proposed in this study. The results both in Table 6 and in Figure 12 showed that FEPVNet, with data migration strategies, was more efficient for cross-scale PV panels extraction than SOTA model.

“Table 6. Evaluation metrics for model migration prediction results.

Model

Strategy

Recall

Precision

F1-score

IoU

HRNet

Sentinel-2

0.2620

0.9216

0.4084

0.2563

Sentinel-2 Googel-14

0.3346

0.9036

0.4884

0.3231

Sentinel-2 Googel-16

0.8940

0.9162

0.9050

0.8265

Sentinel-2 Googel-14 16

0.8889

0.9269

0.9075

0.8308

FEPVNet

Sentinel-2

0.2883

0.9083

0.4377

0.2801

Sentinel-2 Googel-14

0.6681

0.8724

0.7567

0.6086

Sentinel-2 Googel-16

0.8864

0.9437

0.9142

0.8419

Sentinel-2 Googel-14 16

0.9084

0.9192

0.9138

0.8413

Figure 12. Prediction results of two models with different data migration strategies. Note: The prediction results of HRNet and FEPVNet models were shown for four migration strategies for Gaofen data, respectively. (page 17-18, line 446-450)

# Response 3-1-2: Besides, logic deduction on the proposal of the adaptive FEPVNet that embeds Canny, the median filter, and PSA is unconvincible.

Response: The Canny operator, median filter, and PSA embedded in the Adaptive FEPVNet method can greatly improve the performance for PV extraction because it captures more details of PV in the remote sensing images. Moreover, PSA can make the network focus more on photovoltaic panel features and reduce the influence of background features, so we used it to construct the PAR module and gained idea performance.

The experimental results in Table 2, Table 3 and Figure 8 prove that the logical deduction is reliable.

“Table 2. Evaluation metrics for different stem networks in China and the US.

Region

Model

Recall

Precision

F1-score

IoU

China

stem

0.9052

0.9489

0.9265

0.8631

LG_stem

0.8965

0.9336

0.9147

0.8428

SG_stem

0.8830

0.9057

0.8942

0.8087

CG_stem

0.9065

0.9226

0.9145

0.8425

CM_stem

0.9315

0.9472

0.9393

0.8856

US

stem

0.9521

0.9595

0.9558

0.9153

LG_stem

0.9498

0.9564

0.9531

0.9105

SG_stem

0.9444

0.9443

0.9444

0.8946

CG_stem

0.9541

0.9700

0.9620

0.9268

CM_stem

0.9619

0.9691

0.9655

0.9333

Figure 8. Predicted results for different stem networks in China and the US. Note: For the adaptation improvement of stem networks, we complete five stem network comparison experiments: original stem, LG_stem combined with Laplacian and Gaussian, SG_stem combined with Sobel and Gaussian, CG_stem combined with Canny and Gaussian, and CM_stem combined with Canny and Median (page 10-11, line 343-349)

“Table 3. Evaluation metrics for different main body networks.

Region

Model

Recall

Precision

F1-score

IoU

Params

Flops

China

PAR_stage2

0.9306

0.9423

0.9364

0.8805

65939858

376.34G

PAR_stage3

0.9241

0.9372

0.9306

0.8702

67400786

385.47G

PAR_stage4

0.9281

0.9481

0.9380

0.8833

70555922

385.45G

DDS_stage2

0.9202

0.9369

0.9285

0.8665

65120210

355.52G

DDS_stage3

0.9247

0.9149

0.9198

0.8515

53557586

260.13G

DDS_stage4

0.9217

0.9265

0.9241

0.8590

28401362

259.82G

SDS_stage2

0.9215

0.9366

0.9290

0.8675

65068946

354.14G

SDS_stage3

0.9147

0.9388

0.9266

0.8633

52735058

252.09G

SDS_stage4

0.9277

0.9380

0.9328

0.8742

25973522

251.93G

US

PAR_stage2

0.9422

0.9655

0.9537

0.9116

65939858

376.34G

PAR_stage3

0.9318

0.9605

0.9459

0.8975

67400786

385.47G

PAR_stage4

0.9467

0.9615

0.9540

0.9122

70555922

385.45G

DDS_stage2

0.9409

0.9650

0.9528

0.9099

65120210

355.52G

DDS_stage3

0.9361

0.9617

0.9487

0.9025

53557586

260.13G

DDS_stage4

0.9410

0.9582

0.9495

0.9039

28401362

259.82G

SDS_stage2

0.9439

0.9609

0.9523

0.9090

65068946

354.14G

SDS_stage3

0.9417

0.9690

0.9552

0.9142

52735058

252.09G

SDS_stage4

0.9412

0.9633

0.9521

0.9087

25973522

251.93G

(page 12-13, line 372-373)

# Response 3-2. VHR data

It is acceptable to not include half-meter level satellite imagery in this research. I still recommend them to explore further in their future work. To my knowledge, several satellite companies (i.e., MAXAR) encourage connections for academic research purposes.

Response: Thank you very much for your suggestions and we quite agree with you!  It’s very important to use high resolution images in the future research and there are more and more data resources available for academic research.

# Response 3-5: Comparison with SOTAs is lacking

To my understanding, the authors try to propose a scheme to map PV panels. Then without comparing it with SOTAs, it is not acceptable to prove that this method is outstanding.

Response: Thank you for your comment! Regarding this comment, we are very sorry that we did not explain well on the previous letter. In Response 3-1, we explained that HRNet is one of the SOTA models and we compared it with our model. Furthermore, according to your comment, we have added one more SOTA model SwinTransformer in our manuscript for comparative analysis shown in Table 4 and Figure 10.

Table 4. Evaluation metrics for different main body networks.

Region

Model

Recall

Precision

F1-score

IoU

Params

Flops

China

U-Net

0.4174

0.5316

0.4676

0.3052

31054344

64914029

HRNet

0.9052

0.9489

0.9265

0.8631

65847122

374.51G

FEPVNet

SwinTransformer

0.9309

0.9309

0.9493

0.9460

0.9400

0.9384

0.8868

0.8840

65939858

59,830,000

376.34G

936.71G

FESPVNet

0.9246

0.9503

0.9373

0.8820

26066258

253.77G

US

U-Net

0.8717

0.6224

0.7262

0.5702

31054344

64914029

HRNet

0.9521

0.9595

0.9558

0.9153

65847122

374.51G

FEPVNet

SwinTransformer

0.9641

0.9591

0.9695

0.9726

0.9668

0.9658

0.9358

0.9339

65939858

59,830,000

376.34G

936.71G

FESPVNet

0.9567

0.9679

0.9623

0.9273

26066258

253.77G

Figure 10. Prediction results for China and the US in different network models. Note: The predic-tion results of U-Net, HRNet, SwinTransformer, FESPVNet, and FEPVNet in China and US re-gions are shown. (page 14-15, line 396-400)

 

Author Response File: Author Response.docx

Back to TopTop