Next Article in Journal
Fetal Hypoxia Detection Using Machine Learning: A Narrative Review
Next Article in Special Issue
Ethical Considerations for Artificial Intelligence Applications for HIV
Previous Article in Journal
ECARRNet: An Efficient LSTM-Based Ensembled Deep Neural Network Architecture for Railway Fault Detection
Previous Article in Special Issue
AI and Regulations
 
 
Article
Peer-Review Record

Towards an ELSA Curriculum for Data Scientists

AI 2024, 5(2), 504-515; https://doi.org/10.3390/ai5020025
by Maria Christoforaki 1,* and Oya Deniz Beyan 1,2
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4:
Reviewer 5: Anonymous
AI 2024, 5(2), 504-515; https://doi.org/10.3390/ai5020025
Submission received: 22 December 2023 / Revised: 19 February 2024 / Accepted: 27 March 2024 / Published: 11 April 2024
(This article belongs to the Special Issue Standards and Ethics in AI)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study proposes a framework for an ELSA program for data scientists. This topic is essential in this era of rapid development of artificial intelligence. The entire ELSA curriculum framework is clearly described in the article.

Although the article mentions that ethical and societal KUs, legal KUs, and technical renderings KUs are designed into the curriculum, there is a lack of relevant research literature or theories to support the basis of these course content designs.

Author Response

Dear reviewer,

 

Thank you for your time and effort in reading and assessing our paper and providing us with helpful remarks and suggestions that will considerably improve its quality.

Regarding the decision to structure the Curriculum horizontally into ethical and societal, legal, and technical renderings strands, the rationale is the following:

From the literature review of the existing courses, we identified that the subjects could be classified according to this scheme. While the reviewed approaches did not follow any specific classification, they more or less included the topics mentioned above (some all of them, some only partly).

We recognize that this rationale was not apparent in the first submitted version of the manuscript, so we added an introductory paragraph in section 3 and elaborated more on the subject in subsection 3.2 (lines 212-217 of the newly submitted version). Additionally, we inserted a reference in our previous work where the topics of the already surveyed instruction courses are presented. We hope that now, the reader can better understand the reason for the horizontal structure of the curriculum teaching scope.

 

Reviewer 2 Report

Comments and Suggestions for Authors

In line 307 after the parentheses, I think the text should be changed (for some hours a week or for example) and the (or) should be deleted because it doesn't make sense (for some hours a week for example) or it should become ((for some hours a week), or for example in an intensive one-week).

I think it's an important piece of work that will be of great interest to the scientific community. The only thing I would like to point out is that the courses should not only focus on beginners in data analysis, as you mention in line 310, but also on experts who do not have specific knowledge of e.g. legal issues. 

Author Response

Dear reviewer,

Thank you for your time and effort in reading and assessing our paper and providing us with helpful remarks and suggestions that will considerably improve its quality. We followed both your suggestions and altered the respective segments.

 

Specifically:

  1. We followed your suggestions about l. 307 of the initially submitted manuscript.
  2. We expanded section 4 (new title: Discussion - Implementation Strategies and Limitations) to include implementation proposals for experienced data scientists and data scientists who work in specific application domains (l. 402-425 of the revised manuscript). We also added a final paragraph to the same section, underlining that this is a work in progress and that since the Curriculum is developed in the framework of a research project, issues like that will be the subject of further elaboration in the future deliverables and the ensuing implementation suggestions will be published with the second and final version of the Curriculum at the end of the said project. We sincerely hope that we will be given the chance to work on the Curriculum and its implementation in the framework of a future project so that we can publish more concrete results in our future work.  

Reviewer 3 Report

Comments and Suggestions for Authors

The article proposes an Ethical, Legal and Societal Aspects (ELSA) curriculum for data scientists to add capabilities for data scientists to incorporate ELSA in their data science / mining processes. ELSA is indeed a challenge not only in AI, but in any project dealing with data.

Although section 2.2. provides an overview of what universities are doing, it is mainly based on an ACM survey from 2018 and a technical survey conducted in 2020. Considering the dynamics and evolution in this field, since the authors are proposing a curriculum for data scientists, it would be mandatory to assess the current state of the art, to set the ground for their proposal. It is critical to better shape the problem and identify the current limitations. Why current curriculum is not adequate and how the authors proposal will handle these limitations.  

Section three presents the curriculum, organized by knowledge units (KU). The authors used crisp-dm to position these KU in a data mining process, which is adequate to understand where ELSA contributions are relevant for the data science process.

In my opinion in order to be capable to motivate and compare this curriculum with other alternatives, it is critical to: (1) identify the challenges / risks / threats in the Ethical, Legal and Societal dimensions for data science projects; (2) identify how these risks can be controlled and what are the capabilities required to manage them. If this is clearly done, the proposed curriculum can be much better grounded, defining learning outcomes that will contribute to address the above-mentioned challenges.

It also requires further discussion the fact that data science projects are multidisciplinary. What should be the required level of expertise required for data scientists with regard to ELSA? This requires discussion.

In addition, I consider that this manuscript has problems with regard to the methodology. The authors must better define the problem and explain why current curriculum are not adequate or need to be improved. Then, position their proposal and define objectives for this solution. Explain how their proposal will be evaluated. How does it compare to other solutions? How can we compare the learning outcomes of a data scientist following this curriculum with others from other curriculum?

With a better positioning for the methodology, it will be very important to provide an improved discussion of the results before the conclusion.

In summary, I see the merit of this manuscript and I think that it has potential to be an interesting publication. However, at the current state, it lacks methodology, grounding and motivation to clarify the proposed curriculum and, in particular, how it compares to current curriculum and how can we evaluate them.

Comments on the Quality of English Language

The manuscript is well written.

Author Response

Dear reviewer,

Thank you for your time and effort in reading and assessing our paper, as well as providing us with helpful remarks and suggestions that will considerably improve its quality. We are very grateful that you provided us with such detailed comments, and we tried to address all the points that you made and followed your suggestions. We hope that you will find the new manuscript version more adequate.

 

Specifically:

  1. State-of-the-art review: The literature review is indeed limited to papers till 2020. This is because the work presented here is done in the framework of a research project, and the literature review was due in 2021. The first version of the Curriculum proposal appeared at the end of 2023. The respective deliverables are published in Zenodo and accessible to all. They are cited in the Supplementary Materials Section of the paper and referenced the publication in the corresponding paragraphs so that the interested reader can acquire the relevant information. However, as you have noticed, we failed to state that as a limitation. So, we have restructured section 2 by adding an introductory paragraph that explains the framework in which this work was developed.
  2. Regarding identifying the ELSA challenges, indeed, they are not stated directly. We have expanded the introduction (section 1), to refer to the ELSA challenges data science application pose, offer some examples of legal and ethical measures developed in order to mitigate them, and how the education of data scientists fits into that framework.
  3. The multidisciplinary nature of data science projects must also be more prominent. Indeed, it is only briefly discussed in the conclusions, mainly due to space considerations since our main target was to present the Curriculum proposal. However, following your suggestions, we underlined the multidisciplinary nature of the data science applications in a newly added introductory paragraph in section 3 (. 176-183 of the revised manuscript), while we also point out that the Curriculum was developed in a research project which is also multidisciplinary since it includes apart from computer scientist, ethical and legal experts, who provided us with input stemming from their work in the development of the technical demonstrators (l. 416-425).
  4. Finally, we would like to add that this is still a work in progress. We have presented only the first version of the curriculum and are gathering feedback for the next and final version (within the project framework, at least). We have already conducted two workshops and a survey to collect the said feedback, and we are moving to bring this proposal to a broader audience. Part of this endeavour is the current paper.  We expanded section 4 (revised title: Discussion - Implementation Strategies and Limitations) in order to make this framework more obvious and discuss the current limitations, as well as to describe our suggestions regarding the future Curriculum implementation and evaluation (neither of which is in the scope of our current funding).

 

Reviewer 4 Report

Comments and Suggestions for Authors

1- number of used references should be added.

2- conclusions should be presented by bullet points. 

 

Comments on the Quality of English Language

The text should be moderately edited.

Author Response

Dear reviewer,

 

Thank you for your time and effort in reading and assessing our paper, as well as for providing us with helpful remarks and suggestions that will considerably improve its quality. We addressed both of your points; we hope we succeeded in both; however, any oversight from our side will be corrected immediately, so please provide us with more feedback if that is the case.

 

Reviewer 5 Report

Comments and Suggestions for Authors

In this manuscript, an ELSA curriculum is proposed for data scientists aiming to enhance the communication between them and their adherence to the various ethics standards. The manuscript presents new information which can be useful to the readers. However, in order to increase its quality, the following comments need to be addressed:
-At line 16, please provide the description of the abbreviation "CRISP-DM" near its first occurrence.
-It seems that the abstract is divided in three paragraphs. The authors are kindly requested to ensure that this complies with the formatting guidelines of the journal.
-At line 84, maybe this could be numbered as subtitle.
-At line 124, please provide the description of the abbreviation "IP"
-The authors should make a specific presentation of how exactly the proposed tool can be applied in a real-world scenario, as well as the gains for the users, at least in a qualitative way. To enhance the practical applicability of the research I recommend being more specific in your descriptions about the application of the methodology in the aforementioned real-world scenario. Please add relevant descriptions in the manuscript.
-The authors use the CRISP-DM methodology for the proposed ELSA curriculum. If another data-mining methodology is used instead of CRISP-DM, does it alter significantly the proposed ELSA curriculum? The authors are kindly requested to add a relevant description in the manuscript.
-Can the KUs be presented graphically in some way? The authors are highly encouraged to add some schematic diagrams in the manuscript which explain the proposed methodology.
-The numbering of Figure 2A suggests that there should exist another Figure 2B. The authors are kindly requested to check if Figure 2B should be there.
-The description of the proposed ELSA curriculum needs to be more concrete and with the minimum number of abstractions. The authors are kindly requested to add descriptions in the manuscript concerning the above.

Comments on the Quality of English Language

-The language of the manuscript should generally be improved in terms of grammar and syntax. Some errors which have been identified in the manuscript are listed below, but please be aware that this list is not exhaustive:
--At lines 14-15, the text should be rephrased as follows: "...science workflow. ELSA should not be seen as an impediment or a superfluous artefact, but rather as an integral part of the Data Science Project Lifecycle.". The same for lines 134-135
--At line 39, replace "as us it is presented" with "as it is presented"
--Lines 296-299 have been repeated in the caption of Figure 2A.

Author Response

Dear reviewer,

Thank you for your time and effort in reading and assessing our paper and providing us with detailed, insightful, and helpful remarks and suggestions that will considerably improve its quality.

We are very grateful that you provided us with such detailed comments on the format, the content and the language issues of the manuscript; we tried to follow your suggestions and address all your points. I hope that you find that our revisions resulted in a better manuscript.

Regarding specific points (the lines correspond to the initial manuscript, not the revised one):

  1. At line 16, we expanded the acronym as required
  2. Abstract in one paragraph: we conformed to the journal guidelines.
  3. At line 84: numbering added.
  4. At line 124: abbreviation
  5. The proposed Curriculum is still a work in progress. We have presented only the first version of the Curriculum and are gathering feedback for the next and final version (within the project framework, at least). We have already conducted two workshops and a survey to collect the said feedback, and we are moving to bring this proposal to a broader audience. Part of this endeavour is the current paper. The feedback we gathered also includes suggestions on implementing the Curriculum (duration, form, tutors, evaluation, etc.). We would also like to add that the project deliverable presenting the Curriculum (available in Zenodo-see Supplementary material) also discusses using the demonstrators developed in the project as Use cases for the Curriculum. The FAIR Data Spaces project is a multidisciplinary project that includes a work package on ethical and legal issues regarding the development and use of the demonstrators. The use cases are not included in the current paper due to space considerations and the fact that since the project is still ongoing, the results of the ethical and legal participants are still not final. We based our use cases on the already available material. However, we aim to be more specific on the implementation in the final curriculum version, due at the end of 2024. We have expanded the discussion section to include all the above points.
  6. Use of the CRISP-DM model compared to other workflow models: we included a paragraph (l. 245-253 of the revised manuscript) where we elaborate on this issue. Specifically, we discuss the use of CRISP-DM compared to KDD and SEMMA, adding relevant paper citations that debate the issue.
  7. The graphical representation of the KU exists in the supplementary material. We tried to incorporate a graphical representation; however, the result was suboptimal from a legibility point of view. Since, unfortunately, we do not have any support from a visualization perspective, we opted to leave this figure out and direct the reader to the respective publication in the supplementary materials and the references to our deliverable publication. Since, due to space considerations, we do not offer a detailed description of the KUs in the paper either, we thought that the reference to the extended work would be sufficient. This is also why we do not offer a more concrete description of the KUs; we thought it would make the paper a bit cumbersome, while the interested reader can find this kind of description in the supplementary materials published openly in Zenodo. As mentioned in point 5, this Curriculum is a work in progress; many issues are not yet consolidated. We have expanded the discussion section to make these limitations more prominent.
  8. Figure 2A: Corrected; this mistake was due to the layout problems we discussed in the previous point since Figure 2B was meant to be the detailed KU description.
  9. Comments on the Quality of English Language: We are very grateful for the comments and followed them all.

 

 

 

 

 

Round 2

Reviewer 5 Report

Comments and Suggestions for Authors

The reviewer would like to thank the authors for addressing all comments in their manuscript in a diligent way and for providing detailed descriptions and enhancements to the manuscript.
The reviewer believes that the manuscript is substantially improved and it is acceptable for publication in its current form.
The reviewer would like to wish every success to the authors in their future research efforts.

Back to TopTop