Next Article in Journal
Three-Dimensional Distance Mapping Method to Evaluate Mandibular Symmetry and Morphology of Adults with Unilateral Premolar Scissors Bite
Next Article in Special Issue
Women in Artificial Intelligence
Previous Article in Journal
Pixel Resolution Imaging in Parallel Phase-Shifting Digital Holography
Previous Article in Special Issue
Ethical Issues in AI-Enabled Disease Surveillance: Perspectives from Global Health
 
 
Article
Peer-Review Record

Intersectional Study of the Gender Gap in STEM through the Identification of Missing Datasets about Women: A Multisided Problem

Appl. Sci. 2022, 12(12), 5813; https://doi.org/10.3390/app12125813
by Genoveva Vargas-Solar
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5: Anonymous
Appl. Sci. 2022, 12(12), 5813; https://doi.org/10.3390/app12125813
Submission received: 1 April 2022 / Revised: 31 May 2022 / Accepted: 2 June 2022 / Published: 8 June 2022
(This article belongs to the Special Issue Women in Artificial intelligence (AI))

Round 1

Reviewer 1 Report

It is well-known that you need data and indicators to analyse situations and look for improvements. The paper discusses the absence of data for analysing the role of women in CS. The introduction starts with some old predictions while there exists many recent predictions that should be used instead of, for instance Davos reports. For different sections of the paper it should be useful to refer to “She Figures” (The annual publications of European Commission).

During the last 10 years the actions to give visibility to women in computing is growing, more intensively the last 5 years. The paper mentions some international organizations. Sections 2.1 and 2.2 are a mix of discussing the importance of the role of women in AI and DS with the lack of disaggregated data. These sections should be reorganised.

When talking about awarded women, the first sentence (line 157) start with a reference quite old (from 2002). By the way, reference 30 and reference 4 are the same.

Comments in lines 170-174 are not very correct. The author should look for the origin of naming “calculators” the first women in computing to understand the history. Maybe the text is from reference 34, but it is not correct, some films are quite good in showing the contributions of women.

When making the list of laureates, why the author does not check until the most recent year? For instance, until now there is one more laureate with the ACM Edgar F. Codd Innovations award.

For the discussion about the Mexican case, the author uses data from 2014. Why? The most recent data are from 2020. Moreover, it would be very useful to have also data for men to better understand the numbers from lines 250 to 257.

Again, in line 277, why are the data from 2016?

What is the meaning of “limited” in line 303 ?

Section 3.1 ends with some data in CS in Mexico. Year?

Line 440: A reference for the “studies” in US should be useful

What is the goal of Appendix A? What are the criteria to include a woman in that appendix? For instance, the women with the Alan Turing award are not there.

Sometimes references seam not appropriate. For instance, in line 97 we expect a reference to the report of the WEF in 2018 instead of a reference from 2011.

Lapsus in reference 25: “retirado”

Author Response

Reviewer 1: (Remarks in blue, Answers in Black)

It is well-known that you need data and indicators to analyse situations and look for improvements. The paper discusses the absence of data for analysing the role of women in CS.

Answer:

  • The introduction starts with some old predictions while there exists many recent predictions that should be used instead of, for instance Davos reports. For different sections of the paper it should be useful to refer to “She Figures” (The annual publications of European Commission).

I included content in the first paragraph of the introduction, about predictions from the UN that has performed studies in Latin America and Caribes. Since the focus in the new version is on missing data, I have identified these sources as examples of existing data that need to be complemented with more.

  • During the last 10 years the actions to give visibility to women in computing is growing, more intensively the last 5 years. The paper mentions some international organizations. Sections 2.1 and 2.2 are a mix of discussing the importance of the role of women in AI and DS with the lack of disaggregated data. These sections should be reorganised.

I completely reorganized the paper and tried to be clear about women in STEM  CS and AI and DS as examples of sub-disciplines in CS that are promising as they open the labour market.

  • When talking about awarded women, the first sentence (line 157) start with a reference quite old (from 2002). By the way, reference 30 and reference 4 are the same.

I solved the repeated reference. I do not see why it is problematic using a reference from 2002 if it helps support the argument. Anyway, I revisited the sentence even if I kept the “old” reference.

  • Comments in lines 170-174 are not very correct. The author should look for the origin of naming “calculators” the first women in computing to understand the history. Maybe the text is from reference 34, but it is not correct, some films are quite good in showing the contributions of women.

The objective in referring to the term “calculator” is intended to show that contribution of women has been considered paper dragging even if they played a relevant role in the projects, they participated in. Despite the importance of their contribution, they were not regarded as leaders of projects, and they remained invisible for a long time. I gave this argument in the paper. I am not criticizing the achievement of film pointing out the role of women in projects but rather pointing out that women labour in CS has been diminished across history.

  • When making the list of laureates, why the author does not check until the most recent year? For instance, until now there is one more laureate with the ACM Edgar F. Codd Innovations award.

In the new version, I included information about awards until the latest year 2021 available on the sites.

  • For the discussion about the Mexican case, the author uses data from 2014. Why? The most recent data are from 2020. Moreover, it would be very useful to have also data for men to better understand the numbers from lines 250 to 257.

I updated the numbers with official statistics of CONACyT in 2020 and some even including information in 2022. I did not explicitly compare men and women but since I used percentages it is easy to deduce the situation of men.

  • Again, in line 277, why are the data from 2016?

I updated the information to March 2022 shared in the site of the AMC. Included a figure with the statistics in disciplines classified according to the AMC vocabulary (“Exact Sciences” and Natural Sciences).

  • Section 3.1 ends with some data in CS in Mexico. Year?

I gave details about the year and compared two periods to see how numbers have moved, particularly after the application of gender balance policies.

  • What is the meaning of “limited” in line 303 ?

I revisited the sentence to signify the few women participated in the structure of the Scientific and Technological Consultative Forum that coordinated the development of the National Science and Technology 2006-2012.

  • Line 440: A reference for the “studies” in US should be useful

I revisited my references for such studies. In the references, there is more recent data from 2020.

  • What is the goal of Appendix A? What are the criteria to include a woman in that appendix? For instance, the women with the Alan Turing award are not there.

Female Turing awards are enumerated in section 2.3. The appendix shows a list of female important contributions independently of whether authors have been awarded. I added an introductory text in the appendix to insist on this approach. Since I have revisited the paper giving focus on missing data, the point is that there is missing data about female contribution and that this gives the impression that women have not been active actors of this science. The appendix draws attention to female contributions that can be unknown even by the scientific community “educated” in CS.

  • Sometimes references seam not appropriate. For instance, in line 97 we expect a reference to the report of the WEF in 2018 instead of a reference from 2011.

I went through the references making sure that I was referring to the right reference a.

  • Lapsus in reference 25: “retirado”

Solved

Reviewer 2 Report

The paper entitled “Analysing gender gap in STEM: the absence of data” focuses on the problem of missing data for analyzing and exhibiting the role of women in Artificial Intelligence (AI) and Data Science (DS). It uses a dataset collected in Mexico with information about computer science from the National System of Researchers. The dataset is incomplete because not all women in CS are awarded by the system. It investigates the unbalanced situation in countries like Mexico creates economic loss and discusses the impact on the economy caused by gender unbalance.

In general, the topic invested in this paper is interesting although it is lack scientific soundness. However, the paper is well written, and the discussion seems to be reasonable. Working with the same topic before, I agree that the missing values are very common in real-life applications. Collecting data is not always an easy task, however, and can lead to missing values in the data. Unfortunately, missing values may hide the correct answers underlying the data. They can also reduce the performance of the algorithms. Simply discarding missing data is not a reasonable practice, because valuable information can be lost, and inferential power can be compromised. In addition, deleting observations with missing values can result in very few observations remaining in the data when many predictive variables contain missing values. I like the way that authors invest in different methods to find the optimal one for the experiment dataset. The authors should revise the paper to further improve its quality before I vote for an acceptance. My comments are as follows

- In the Introduction, briefly highlight the impact of missing values in real-life applications. In addition, introduce several recent works that consider the missing datasets like yours, some good examples are [https://doi.org/10.1016/j.ins.2021.07.039] and [https://doi.org/10.1016/j.ins.2021.04.076].
- In section 3, I suggest authors insert the information of missing values in the dataset such as the number of missing features, number of missing values, types of features…
- In section 3, I suggest authors make a workflow that shows all steps taken in this research to help readers easily what the author has proposed. Also in this section, put a table of notations used in the paper. 
- I suggest authors insert a subsection or a paragraph to discuss several methods for analyzing the gender gap in STEM. These methods can be applied in future works on the use case dataset to give some insights into how this situation can be analyzed with high interpretability several data analysis methods such as clustering and regression. For that purpose, the author refers to these works in the discussion [https://doi.org/10.1007/978-981-15-1209-4_1], [https://doi.org/10.3390/app12010072].

- Carefully proofread the paper to fix all typos and grammar mistakes.

Author Response

Reviewer 2: (Remarks in blue, Answers in black)

The paper entitled “Analysing gender gap in STEM: the absence of data” focuses on the problem of missing data for analyzing and exhibiting the role of women in Artificial Intelligence (AI) and Data Science (DS). It uses a dataset collected in Mexico with information about computer science from the National System of Researchers. The dataset is incomplete because not all women in CS are awarded by the system. It investigates the unbalanced situation in countries like Mexico creates economic loss and discusses the impact on the economy caused by gender unbalance.

In general, the topic invested in this paper is interesting although it is lack scientific soundness. However, the paper is well written, and the discussion seems to be reasonable. Working with the same topic before, I agree that the missing values are very common in real-life applications. Collecting data is not always an easy task, however, and can lead to missing values in the data. Unfortunately, missing values may hide the correct answers underlying the data. They can also reduce the performance of the algorithms. Simply discarding missing data is not a reasonable practice, because valuable information can be lost, and inferential power can be compromised. In addition, deleting observations with missing values can result in very few observations remaining in the data when many predictive variables contain missing values. I like the way that authors invest in different methods to find the optimal one for the experiment dataset. The authors should revise the paper to further improve its quality before I vote for an acceptance. My comments are as follows

- In the Introduction, briefly highlight the impact of missing values in real-life applications. In addition, introduce several recent works that consider the missing datasets like yours, some good examples are [https://doi.org/10.1016/j.ins.2021.07.039] and [https://doi.org/10.1016/j.ins.2021.04.076].

I did not use the reference proposals you suggest since I could not identify the relation with the notion of missing data of the paper, that refers to data that have not been collected. I added the following paragraph in the introduction to state and clarify the vision of missing data in the paper:

“We focus on missing data instead of analysing the gender gap with available datasets. These studies have been already done, but they remain partial. Missing data combined with data analytics models can lead to conclusions but not fully understanding the problem. Yet, to efficiently address gender gap problems, we believe that we need to identify which data are missing.”

- In section 3, I suggest authors make a workflow that shows all steps taken in this research to help readers easily what the author has proposed. Also in this section, put a table of notations used in the paper. 

To avoid breaking the flow of the text with the table acronyms I decided to add an appendix with this information (cf. Appendix B).

- I suggest authors insert a subsection or a paragraph to discuss several methods for analyzing the gender gap in STEM. These methods can be applied in future works on the use case dataset to give some insights into how this situation can be analyzed with high interpretability several data analysis methods such as clustering and regression. For that purpose, the author refers to these works in the discussion [https://doi.org/10.1007/978-981-15-1209-4_1], [https://doi.org/10.3390/app12010072].

Since the focus of the paper is to identify missing data that can be used to integrate intersectional datasets that can provide more representative observations about gender gap, I added in the introduction references to works that have intended to measure gender gap but that I consider partial in the sense that they only use data observing the problem from single or reduced perspectives (without context). My point is that it is not a matter of the data science process (data preparation, cleaning, engineering) nor about the analytics models applied to extract knowledge. My point is that without intersectional data, which is often missing in available datasets, the gender issue cannot be modelled and understood in its complexity.

- Carefully proofread the paper to fix all typos and grammar mistakes.

I have used Grammarly premium for proofreading the text and improving English.

 

Reviewer 3 Report

The topic presenterd in the paper ius really very interesting. However is not well written and it can’t be publisibble in the way it is right now. The main issue is that the paper does not reflect what is exposed in the abstract and the title “the problem of missing data for analyzing and exhibiting the role of woman”. There are really important issues that should be addressed:

 

 

  • The introduction is too general and is not related to what the paper is presenting. In fact, I suggest you delete from line 11 to 13 and from line 17 to line 44. You should be talking about “the problem of missing data for analyzing and exhibiting the role of woman”.
  • Section 2 should be added to the introduction I believe that this part is what corresponds to the paper introduction. The paragraph corresponding to lines 84-86 should not be in a bullet. It should be in a normal paragraph.
  • Subsection 2.1 should be reduced and added to the introduction.
  • Subsection 2.2and 2.3 should be elaborate a bit more and presented as the main section of the paper (section 2). How are we going to prove that the problem of missing data is really an important issue? Do you have any way of measuring the absence of data?
  • Section 3 should be an in-depth analysis of the Mexican Case in which authors have to apply the methodology presented in Section 2. Figure 1 is not relevant.
  • Section 4 is again not related to the paper topic.

Author Response

Reviewer 3 (Remarks in blue, Answers in black)

The topic presented in the paper is really very interesting. However is not well written and it can’t be publisibble in the way it is right now. The main issue is that the paper does not reflect what is exposed in the abstract and the title “the problem of missing data for analyzing and exhibiting the role of woman”. There are really important issues that should be addressed:

  • The introduction is too general and is not related to what the paper is presenting. In fact, I suggest you delete from line 11 to 13 and from line 17 to line 44. You should be talking about “the problem of missing data for analyzing and exhibiting the role of woman”.
  • Section 2 should be added to the introduction I believe that this part is what corresponds to the paper introduction. The paragraph corresponding to lines 84-86 should not be in a bullet. It should be in a normal paragraph.
  • Subsection 2.1 should be reduced and added to the introduction.
  • Subsection 2.2and 2.3 should be elaborate a bit more and presented as the main section of the paper (section 2). How are we going to prove that the problem of missing data is really an important issue? Do you have any way of measuring the absence of data?
  • Section 3 should be an in-depth analysis of the Mexican Case in which authors have to apply the methodology presented in Section 2. Figure 1 is not relevant.
  • Section 4 is again not related to the paper topic.

Answer: I have completely revisited the paper for improving the logical order of ideas, giving a focus on missing data. I adopted a non-quantitative approach for exhibiting the type of data rather than the volume of data missing to understand different perspectives of the gender gap in STEM labour. Nonetheless, I tried to provide a critical discussion about specific types of missing data and often about the implications of this issue in the way women develop careers, are granted awards, and are included in history. I also tried to discuss in many cases the implications of missing data in the specific case in Mexico and in sometimes in the global south. I did not include figure 1 and I believe that the text does not need it.

 

Reviewer 4 Report

The author discusses a hot topic of gender gap in STEM disciplines. I am aware of the fact that this topic has many levels, starting from STEM education at primary and secondary school up to the university level and then professional careers in STEM areas. It is not easy to cover all of them in one manuscript. However, it would be great to read at least few sentences answering my comments and questions below.

It would be good if the author can mention whether she also performed research about education in STEM. There are countries where ratio of female and male students in STEM at university level is not balanced. 

Another issue is the glass ceiling for women in career. What is the author´s opinion?

Author Response

Reviewer 4 (Remarks in blue, Answers in black)

The author discusses a hot topic of gender gap in STEM disciplines. I am aware of the fact that this topic has many levels, starting from STEM education at primary and secondary school up to the university level and then professional careers in STEM areas. It is not easy to cover all of them in one manuscript. However, it would be great to read at least few sentences answering my comments and questions below.

  • It would be good if the author can mention whether she also performed research about education in STEM. There are countries where ratio of female and male students in STEM at university level is not balanced. 
  • Another issue is the glass ceiling for women in career. What is the author´s opinion?

Answer: I have opened section 3.5 that discusses missing data about labour condition of women in academia. I discussed briefly about the glass ceiling pointing out the missing data about labour conditions does not contribute to understand the female professional career evolution in STEM. My discussion on the part of women in the SNI and the awards is in a way a discussion about breaking the glass ceiling. Since the paper focusses more on labour rather than on studies I did not devote a section in projects by universities worldwide collecting data about students in STEM, their choice, permanence, and eventual decision for doing graduate studies. Nonetheless I touched the topic briefly in the conclusions third paragraph. I have completely revisited the paper for improving the logic order of ideas, giving a focus on missing data. I adopted a non-quantitative approach for exhibiting the type of data rather than the volume of data missing to understand different perspectives of the gender gap in STEM labour.

Reviewer 5 Report

The paper is well-written, motivated, and organized.

It discusses the issue of missing data when it comes to studying gender bias in computer science, particularly in specialized disciplines like artificial intelligence and data science. The study's findings highlight the harmful impact on society and the economy of a lack of balance in the computer science professional market.

Two minor typos:

Line 26: bodies, resulting in avoidable deaths for women and children” [4]. With a more diverse (remove quotes)

Line 419: statistics. The National Centre for Women & Information Technology (NCWIT). argues (Center and remove the dot)

Author Response

Reviewer 5 (Remarks in blue, Answers in black)

Two minor typos:

Line 26: bodies, resulting in avoidable deaths for women and children” [4]. With a more diverse (remove quotes)

Line 419: statistics. The National Centre for Women & Information Technology (NCWIT). argues (Center and remove the dot)

Answer: I have addressed the typos highlighted by your review and I also proofed checked the whole paper with Grammarly trying to avoid typos and English problems.

 

Round 2

Reviewer 1 Report

The new version of the paper is much better than previous one. The focus is clearly specified and developed with the new structure of sections.

I expected the blue text to be the new text and the black text to be the text of the first version. This is not the case, a significant part of the black text is also new. Therefore the colour distinction was not very useful.

The author has added some "Missing datasets" numbered 1, 3 and 4. Where is dataset 2 ?

I repeat a comment: I still do not understand (lines 170-171 in this version, 97 in the previous one) the reference to a paper by Bobbitt-Zeher (2011) while talking about the WEF 2018 report.

 

Author Response

Reviewer 1: 
The author has added some "Missing datasets" numbered 1, 3 and 4. Where is dataset 2? 
Answer:
Thank you very much for the comments that have helped provide a more solid version of the paper. I realized indeed that there was a problem with a Latex command that hid the text of missing datasets 2. The text appears now in the current version.
 
I repeat a comment: I still do not understand (lines 170-171 in this version, 97 in the previous one) the reference to a paper by Bobbitt-Zeher (2011) while talking about the WEF 2018 report. 
 
I had not understood the issue you were highlighting in the previous review. I solved the problem and now the text cites WF 2018 and then I reformulated other “important” arguments I wanted to discuss and supported with Bobbit-Zeher (2011)
 
Finally, note that I highlighted in red the modifications. You will also find more red text that corresponds to requests made by reviewer 2.

Reviewer 2 Report

I have read the revision, it seems to me that the quality of the paper has not been improved significantly. The authors have not considered all comments from reviewers to further improve the Scientific Soundness of the paper.

Although the purpose of the paper is to analyze the impact of gender imbalance in STEM, in fact, no data mining, data-driven approach, or machine learning models have been used for such purposes. Thus, the observations in this paper may be subjective. The to apply qualitative or quantitative methods to find the answers to the gender gap in CS.

The Introduction is too long but the motivation and contribution of the paper were not explained clearly.

No workflow can be found in the paper although the authors declared that it has been included in Appendix B.

No formulations, definitions, or hypotheses can be found in the paper. The authors should use data-driven methods to analyze the relevant case study datasets. As an example, you can refer to this paper [https://doi.org/10.3389/feduc.2019.00060] to see how they test their hypothesis.

Author Response

Reviewer 2:

I have read the revision; it seems to me that the quality of the paper has not been improved significantly. The authors have not considered all comments from reviewers to further improve the Scientific Soundness of the paper.

Answer:

Thank you very much for the comments that have helped provide a more solid version of the paper. I tried to thoroughly read your comments and suggestions and include them as far as it was possible considering that the work/results stem from the specific hypotheses that are somehow difficult to revisit a posteriori. I insist that I tried to go back to our study and add some work to integrate your suggestions.

N.B. Note that I highlighted the modifications in red.

Although the purpose of the paper is to analyze the impact of gender imbalance in STEM, in fact, no data mining, data-driven approach, or machine learning models have been used for such purposes. Thus, the observations in this paper may be subjective.

The work claims that it is not pertinent to apply DM, ML or AI methods to answer the question “where are women in STEM” because datasets are missing. Existing ones give a too incomplete and partial view of the contribution of women in STEM. Our study highlights the problem by focusing on the case of Computing Science. In those cases, the AI, and DS subdisciplines are possible, considering that DS is a Computing Science subdiscipline. Then, the fact that there is almost no numerical evidence (if this is possible to have) about missing datasets, in this sense, the study provides strong conjectures with argumentative evidence. Your suggestion about formulating the problem with a qualitative perspective is what I adopted to complete it and consider your suggestions.

The to apply qualitative or quantitative methods to find the answers to the gender gap in CS.

I had not exhibited the methodology I used to perform the study, which is a qualitative methodology applying grounded theory and using case methods. The issue is that the work’s objective is not, to measure the gender gap. The objective is to argue that no quantitative study seems to be possible with missing datasets, otherwise, the measures have the risk of being too general and not providing representative insight about the absence of women in STEM which goes beyond defining a gap. In fine, the claim of our work is that the gap independently of its size, comes from the fact that history, census, and other tools highlight a too-small set of women but forget those that are not mediatic or that suffer the implications of epistemic violence because their location of enunciation is the global south.

You can find arguments about these claims, especially in the introduction and section 2 of the new version of the paper.

The Introduction is too long but the motivation and contribution of the paper were not explained clearly.

I agree that it is a bit long. Some statements seem to be important to give context and motivate the study and its background idea that is not aligned with quantitative studies done to measure the gender gap. To consider your suggestion and reduce the drawback of the length, I organized the content into subsections that underline the hypothesis, research questions, the methodology and contributions.

No workflow can be found in the paper although the authors declared that it has been included in Appendix B.

I think I declared that acronyms and a clarification of the choice of the list of female contributions had been proposed. I explained that the phases of the study had been enumerated in the text. Anyway, in this new version, I proposed a workflow that shows how a data collection driven by a qualitative approach can be considered as a preliminary phase to the data preparation and data analytics phases of a data science experiment. It is presented in a figure and described in Section 2.

No formulations, definitions, or hypotheses can be found in the paper. The authors should use data-driven methods to analyze the relevant case study datasets. As an example, you can refer to this paper [https://doi.org/10.3389/feduc.2019.00060] to see how they test their hypothesis.

In the introduction, I stated a hypothesis, research questions (RQ1-3) and methodology. Then I described the methodology in section 2, where the study shows the development of a grounded theory strategy and answers RQ1-2. Then, section 3 provides a use case that can answer RQ3. Again, data-driven methods are not adapted to determine the missing datasets. In the paper you suggest, which I referred to discuss the position of the study I propose, the propose a gender gap index (see Context and Methodology section in the introduction). Identifying missing datasets has implied searching for data. Still, there is no baseline or quantitative reference to determine which are the minimum datasets to consider. Which are the datasets that together provide a complete vision of women's contribution to a sciences group?   Many studies have addressed the gender gap index. Still, the scientific community has trouble naturally identifying women's scientific contributions in history and today, both in the global north and south.

Reviewer 3 Report

There is something missing in reference to 

Missing datasets 2:

In the article the author goes from 

Missing datasets 1: to Missing datasets 3:

Author Response

Reviewer 3

There is something missing in reference to Missing datasets 2:
In the article the author goes from Missing datasets 1: to Missing datasets 3:

Answer:

Thank you very much for the comments that have helped provide a more solid version of the paper. I realized indeed that there was a problem with a Latex command that hid the text of missing datasets 2. The text appears now in the current version.

I highlighted in red the modifications. You will also find more red text that corresponds to requests made by reviewer 2.

Back to TopTop