Next Article in Journal
Clustering-Based Representation Learning through Output Translation and Its Application to Remote-Sensing Images
Next Article in Special Issue
SatelliteSkill5—An Augmented Reality Educational Experience Teaching Remote Sensing through the UN Sustainable Development Goals
Previous Article in Journal
Risk Assessment of Snow Disasters for Animal Husbandry on the Qinghai–Tibetan Plateau and Influences of Snow Disasters on the Well-Being of Farmers and Pastoralists
Previous Article in Special Issue
How to Boost Close-Range Remote Sensing Courses Using a Serious Game: Uncover in a Fun Way the Complexity and Transversality of Multi-Domain Field Acquisitions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Five Guiding Principles to Make Jupyter Notebooks Fit for Earth Observation Data Education

1
Laboratory for Climatology and Remote Sensing (LCRS), Faculty of Geography, University of Marburg, 35037 Marburg, Germany
2
European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT), 64295 Darmstadt, Germany
3
Meteorological Environmental Earth Observation (MEEO) s.r.l., 44121 Ferrara, Italy
4
European Centre for Medium-Range Weather Forecasts (ECMWF), Reading RG2 9AX, UK
5
Department of Mathematics and Computer Science, University of Marburg, 35037 Marburg, Germany
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(14), 3359; https://doi.org/10.3390/rs14143359
Submission received: 17 May 2022 / Revised: 7 July 2022 / Accepted: 10 July 2022 / Published: 12 July 2022
(This article belongs to the Collection Teaching and Learning in Remote Sensing)

Abstract

:
There is a growing demand to train Earth Observation (EO) data users in how to access and use existing and upcoming data. A promising tool for data-related training is computational notebooks, which are interactive web applications that combine text, code and computational output. Here, we present the Learning Tool for Python (LTPy), which is a training course (based on Jupyter notebooks) on atmospheric composition data. LTPy consists of more than 70 notebooks and has taught over 1000 EO data users so far, whose feedback is overall positive. We adapted five guiding principles from different fields (mainly scientific computing and Jupyter notebook research) to make the Jupyter notebooks more educational and reusable. The Jupyter notebooks developed (i) follow the literate programming paradigm by a text/code ratio of 3, (ii) use instructional design elements to improve navigation and user experience, (iii) modularize functions to follow best practices for scientific computing, (iv) leverage the wider Jupyter ecosystem to make content accessible and (v) aim for being reproducible. We see two areas for future developments: first, to collect feedback and evaluate whether the instructional design elements proposed meet their objective; and second, to develop tools that automatize the implementation of best practices.

1. Introduction

There is a growing gap between the amount of open Earth Observation (EO) data produced every day and the ability of users to find, access and process the data. Growing data volumes, data discovery and a limited processing capacity are major challenges faced by users of EO data, and these problems hinder data uptake and use [1]. Hence, there is a growing need to inform EO data users about existing and upcoming data products, as well as to teach them how to access and work with them. This aspect will gain importance in the future, as many new developments and missions are in the pipeline. During the second phase of the Copernicus program, the launches of six high-priority candidate missions are planned by 2027 [2,3]. The European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) is preparing the launch of the next generation of its geostationary (Meteosat Third Generation) and polar-orbiting (Metop-Second Generation (SG)) satellites [4], and NASA’s Landsat program continues with the launch of Landsat-9 in September 2021 [5], followed by Landsat NeXt in 2029 [6].
EO training is developed and offered by different entities, including academia, commercial entities and key organizations in the field such as the National Aeronautical and Space Agency (NASA) in the US or the European Space Agency (ESA) and EUMETSAT in Europe. EO training offered through academia (category 1) provides a general introduction and formal training in basic and advanced concepts of remote sensing, geographic information sciences (GIS) and machine learning (ML) [7,8]. This category additionally teaches foundational technical skills such as data management and programming. Complementary to the EO training offered through academia is the training offered by commercial entities or key organizations (such as EUMETSAT) in the field (category 2). Trainings offered through this category are often specialized and tailored to activities and thematic application areas, new developments or data which the respective data organization or company offers [8]. Academia, commercial entities and key organizations also often collaborate on education-related projects, e.g., to bring EO and remote sensing to schools [9,10], or by developing digital learning resources [11].
The first category is aimed at undergraduate and postgraduate students. While introductory courses teach foundational skills in remote sensing, GIS or ML, advanced courses focus on applying these foundational skills and put them in the context of specific thematic applications. The target group of the second category (commercial entities and key organizations in the field) is highly diverse, where learners show a wide range of data, thematic and technical literacy. They may be EO ‘expert’ users (professionals in the EO domain or students at PhD and postdoc levels) who are interested in learning about upcoming satellite missions and data products, as well as EO practitioners who have just started using EO data products for a specific application and who are interested in learning about available or upcoming data products and how to use them [8].
Unlike academic courses, which often span 10–15 weeks of weekly classes, training formats offered by commercial entities or key organizations are more diverse and range from short courses (up to 1.5 h) to multi-day workshops or training schools as well as Massive Open Online Courses (MOOCs). Compared to an academic course structure, such formats pose additional challenges. First, instructor time is limited. Additionally, there is limited time to set up the working environment, which often requires the download and installation of software and packages on different operating systems [12,13]. Hence, a flexible solution is required where training participants have access to a working environment during the training but can reproduce the same environment locally afterwards. Second, expertise related to data, programming language and thematic applications is highly diverse. Hence, the training needs to cater for different levels of data, thematic and programming literacy.
Since 2020, the COVID-19 pandemic has greatly accelerated the need for online learning [14,15]. Hence, the training offered by both categories needs to offer a high degree of flexibility in terms of how the course can be taught (online vs. more traditional, in-person classroom settings), and the training material needs to be well-organized and easy to follow so that course participants can also use the material after the training in a self-paced manner.
One tool that can address some of the challenges listed are computational notebooks, interactive browser- or web-based applications that combine code, computational output, explanatory text and multimedia resources in a single document [16,17]. Computational notebooks have become popular programming environments that facilitate collaboration, interactive development and reproducibility and are, in particular, valued for rapid prototyping, data exploration and training [18]. There are dozens of different notebook systems available [19], but the use of Jupyter certainly excels across many disciplines [20,21,22], including bio- and health-informatics [12,23,24], general data science [13,25,26] and, more recently, Earth Observation research [27,28]. In 2021, more than 10 million Jupyter notebooks were available on the code-sharing platform GitHub [17,29]. While, more and more, notebooks form part of complex end-to-end workflows and web applications that involve data ingest, computing, collaboration and distribution [19,24], the majority are used for research experimentation, development of machine-learning pipelines and education [18]. Distinct advantages of Jupyter notebooks include its support of dozens of programming languages and its entire ecosystem of open-source subprojects, e.g., to make Jupyter notebooks shareable and accessible. ‘nbviewer’, for example, offers an option to share a static HTML version of a computational notebook [29]. Binder allows users to turn a collection of computational notebooks hosted on GitHub into executable notebooks [16,29,30]. In the background, Binder utilizes Kubernetes and JupyterHub to turn computational notebooks into interactive computational environments that can be accessed and used concurrently by multiple users.
While computational notebooks are a common choice for collaborative research and for training and teaching data-intensive science, quantitative analyses of notebooks on GitHub [18,19,20,21] as well as empirical research among data scientists and researchers [18,22] have identified unique challenges and pitfalls when using notebooks. One of the biggest disadvantages is that the code cells, despite being linearly arranged, can be executed in any order, which in turn fosters poor coding practices [16,19,22]. Pimentel et al., (2019) and Chattopadhyay et al., (2020) identified challenges to make notebooks reproducible and reusable [20,22]. In the study of Pimentel et al., (2019), only one out of four notebooks hosted on GitHub could be re-executed, and only 4% produced the same results in the end [20]. Rule et al., (2018) identified a tension between the exploration and explanation of notebooks. Annotations are not evenly distributed within a notebook and do not reach the objective of well-described computational narratives [18,21].
Such findings have been the motivation to define best practices and recommendations on how to write and share Jupyter notebooks [31], how to make them reproducible [21], how to foster collaboration with notebooks [32] and how to use notebooks in academic classrooms [33]. Most of these best practices focus on using Jupyter notebooks in a (data) science context, but not necessarily using them for education.
In this article, we introduce the ‘Learning Tool for Python (LTPy) on Atmospheric composition data’. LTPy is a series of well-documented step-by-step computational notebooks on different open satellite-, model- and ground-based data related to atmospheric composition developed by EUMETSAT. We applied five guiding principles to make these notebooks more educational and reusable. The principles are founded in recognized best practices from the fields of scientific computing and Jupyter notebook research and have been selected based on their applicability for training and capacity-building. For each principle, we share a practical example of how it was implemented in the LTPy course.
This paper has the following outline: Section 2 introduces the Learning Tool for Python (LTPy) on atmospheric composition and explains its objective and target audience, as well as the course outline and structure, in detail. Section 3 highlights five guiding principles for using Jupyter notebooks for Earth Observation data education. Section 4 puts our approach and results into a wider context, and Section 5 offers concluding remarks and an outlook.

2. Learning Tool for Python (LTPy) on Atmospheric Composition

2.1. Overview, Aim and Target Audience

LTPy is a Python-based series of Jupyter notebooks on different open satellite-, model- and ground-based data on atmospheric composition developed by EUMETSAT. The objectives of the LTPy course are threefold. First, it provides a general overview of different satellite-, model- and ground-based data on atmospheric composition to facilitate data uptake and use. Second, it provides code examples and well-annotated workflows on how to load, process and visualize these data. Third, it provides examples grouped in thematic application areas, e.g., fire monitoring, air-quality monitoring or stratospheric ozone.
LTPy has been developed continuously since 2019 and consists of two parts: (i) the main course and (ii) thematic modules (see Table 1). The main course consists of more than 50 Jupyter notebooks. The course outline is aligned with a typical data analysis workflow and includes notebooks on data access, data exploration, case studies and exercises (see Figure 1). Thematic modules are self-contained collections of notebooks related to a specific application area, such as dust monitoring and forecasting. The thematic module on dust monitoring and forecasting consists of 22 notebooks and is divided into two sections. The first section provides an overview of different types of data for dust monitoring and forecasting. The second section is hands-on, where exercises and assignments guide learners gradually through the analysis of a real-life dust event. In this way, training participants learn the advantages but also limitations of different datasets in a comprehensive way. Table A1 in the Appendix A provides an overview of data introduced in the LTPy main course, while Table A2 gives an overview of data introduced in the thematic module.
The LTPy course is for learners who have a basic understanding of Earth Observation data and remote sensing but need an overview of data available and a practical introduction on how the data can be accessed, processed and visualized. The course is designed to accommodate learners with different levels of data and coding literacy, from beginners to more advanced users of Python.

2.2. Hosting and Accessibility

The main course is available on a public code repository [34], and the thematic module on dust monitoring and forecasting is accessible in the form of an online book (http://dustbook.ltpy.adamplatform.eu/ (accessed on 16 May 2022)) (see Figure 2). Additionally, the notebooks are available through a hosted JupyterHub-based training platform (LTPy main course: https://ltpy.adamplatform.eu/ (accessed on 16 May 2022); thematic module on dust: https://dust.ltpy.adamplatform.eu (accessed on 16 May 2022)), where the required Python environment, package dependencies and data are already available. Course participants are asked to register and create a free account under https://login.ltpy.adamplatform.eu/ (accessed on 16 May 2022).
Hosted JupyterHub-based training environments provide great flexibility for course participants as well as course providers. Instead of preparing the programming environment on their local machines, training course participants can directly start with the content-based training. For us, as course providers, the hosted option has the advantage that server resources can flexibly be adapted for each training activity, depending on the expected number of participants. With a regular setup, up to 50 concurrent users can access the platform. However, for some larger training events, server resources were adjusted to be able to host up to 100 concurrent users.

2.3. Training Activities and Feedback

Since the start of the Learning Tool for Python on atmospheric composition activities, the Jupyter notebooks have been used in a total of 16 in-person and online training events, and over 1000 learners have been reached (see Table 2). There were three main types of training conducted: (i) training schools, (ii) thematic expert workshops and (iii) short courses. Training schools and thematic expert workshops usually have a duration of several days, while short courses aim to give a lightweight introduction to a specific topic and are usually 1 to 1.5 h long.
We collected general feedback on the training via a poll conducted with the event audience engagement platform Slido. We asked the following four questions to gather general feedback on the training:
  • Q1: Would you recommend the training to a friend or colleague? (Rating from 1 to 5)
  • Q2: What did you specifically enjoy or find useful in this session? (Open response)
  • Q3: What should we do differently next time? (Open response)
  • Q4: What type of information would help you in using a specific dataset? (Multiple choice question)
The overall feedback on the training material was positive. Over 90% (103 responses) would either recommend or highly recommend the training to a colleague or friend (Q1) (see Figure A1 in the Appendix A). Course participants specifically highlighted three aspects as useful (Q2) (see Table A3 in the Appendix A): (i) the practical part of the training activities, including the introduction to Jupyter notebooks; (ii) introduction to atmospheric composition thematic and data; and (iii) the overall course structure, content and organization. A little more than half (54%, n = 57) specifically enjoyed the practical part of the training activities, and two out of five learners (40%, n = 36) highlighted Jupyter notebooks as particularly useful. A total of 92 participants provided suggestions for what could be improved (Q3) (see Table A4 in the Appendix A). By far, the most mentioned suggestion (n = 27) was related to extending the practical training part to include daily assignments and to start workflows from downloading datasets. In total, 196 participants provided a response to Q4 (see Figure A2 in the Appendix A). The two options that would help more than half of the respondents were (i) the provision of examples for basic processing, visualisation and analysis (59%) and (ii) training activities, e.g., short courses, webinars or recorded videos (55%).

3. Guiding Principles for Using Jupyter Notebooks for EO Data Education

During the development of the LTPy notebooks, we applied a set of five guiding principles that helped us to make the notebooks more educational, to cater for different levels of data, thematic and programming expertise and to improve overall navigation to make the training material applicable for instructor-led trainings as well as for self-paced learning. This section outlines these guiding principles.

3.1. Leverage the Literate Programming Paradigm to Make Jupyter Notebooks Educational

The literate programming paradigm is not new and goes back to 1984, when Donald Knuth shared his idea to enrich code with natural language to explain its logic [35]. Several decades later, notebooks, in fact, became an enabler of the literate paradigm and were designed to facilitate the construction and sharing of computational narratives [29]. However, several studies which explore the use of Jupyter notebooks among researchers and data scientists discovered that descriptive text in notebooks is (i) unevenly distributed, with most text at the beginning and hardly any text the end, and (ii) resembles more of a collection of loose scripts than a computational narrative. Further, the cell proportion in most notebooks is skewed towards more cells with code than cells with markdown [18,21]. The importance of annotations and descriptive text, as well as dividing workflows into shorter subsections, increases when notebooks are used for educational purposes [29]. To see if the notebooks of the LTPy course meet the objective to be educational and well documented, for each notebook we calculated the number of cells in total, number of markdown cells and number of code cells and built the ratio of markdown vs. code cells (see Appendix A Table A5 for notebooks from the main course and Appendix A Table A6 for notebooks from the thematic module). The total number of cells, on average, is 74 for the main course and 55 for the thematic module (Table 3). However, if the exercise notebooks of the thematic module, which tend to have less cells, are not considered, then the total number of cells on average would be 64. The average ratio of markdown vs. code cells is 2.8 for the main course and 3.2 for the thematic module (Table 3). This means that the LTPy notebooks have, on average, three times more markdown cells with descriptive text than cells with code. Figure 3 shows the frequency distribution per markdown/code ratio category. More than half (56%, n = 41) of the notebooks have a markdown/code ratio of 3. Another 21% (n = 15) of the notebooks have a markdown/code ratio of 4. Exercise notebooks tend to have a higher markdown/code ratio than data exploration notebooks or case studies (Table 3).

3.2. Use of Instructional Design Elements

Throughout the course, we leveraged a combination of HTML/Markdown elements that serve as instructional design elements (see Figure 4 and Table 4). These elements improve the look and feel and overall navigation of the course as well as the navigation within a notebook. Each notebook has a navigation pane on top (see Figure 5, box 2). ‘Alert boxes’ help to highlight and cross-reference any prerequisites or related notebooks (see Figure 5, box 1). With the help of anchor links, each notebook has an outline section (table of contents) that allows learners to easily navigate to sub-sections within a notebook (see Figure 5, box 4). Additionally, we make use of backticks (‘…’) to highlight important sections or code in the text of a markdown cell. Backticks change the font to Courier New and highlight the text in grey (see Figure 5, box 3). With these elements, we believe that the notebooks are useful for different training modalities and serve as training material for an instructor-led course, but are also easy to follow and navigate when used for self-paced learning.

3.3. Follow Best Practices for Scientific Computing

When Jupyter notebooks are used in an educational context, they should not only be conceptualized to teach a specific topic but should also set a good example by following and implementing best practices for scientific computing [31,36]. Several studies, however, reveal that common practices in notebooks contravene the best practices for scientific computing, such as an out-of-order execution of notebook cells [20], poor-quality code [37] or code duplication [38]. While not all best practices defined by Wilson et al., (2014) [36] are directly applicable for an educational context, some are to be followed fundamentally, such as bringing imports at the beginning of a notebook, making code style and formatting consistent, using meaningful names for variables and the modularization of content.
An essential part of the LTPy course is a ‘functions’ notebook, which is a collection of 14 pre-defined functions that support the learner with data loading, pre-processing and visualization. This modularization of repetitive code further helps to cater for diverse levels of coding and programming among training course participants. Learners with no or basic Python knowledge learn Python by applying these functions, where only keyword arguments (kwargs) have to be provided. Learners with more Python experience can examine the functions in a separate notebook or build their own functions or code routines in the notebook. External Python scripts or notebooks can be loaded from another notebook with the Jupyter magic command %run (see Table 4 and Figure 6). Once the magic command has been executed, the functions can be applied. The command ‘?function_name’ opens the docstring of the function, which provides learners with a short description of what the function does and the keyword arguments required.

3.4. Take Advantage of the Full Jupyter Ecosystem to Make Content Accessible

A code-sharing platform such as GitHub or GitLab is a common way to share and provide access to notebooks and also offers a built-in rendering of notebooks [18]. However, the built-in rendering is not ideal, and, for notebooks with an educational purpose, we recommend leveraging the wider ecosystem of Project Jupyter subprojects [12] and to make the notebooks available as static but also executable content. ‘nbviewer’, for example, allows you to paste a link to a notebook into a webpage, and it returns a nicely rendered version that can be shared with a unique link. With the help of the Jupyterbook project [39], interactive online books can be built based on computational notebooks. We chose to build a Jupyterbook for the thematic module on dust monitoring and forecasting and linked it with a JupyterHub-based training platform in the backend. In this way, learners can browse through the Jupyterbook like they browse through a website. On the top of each Jupyterbook subpage, they can follow a link that opens the executable notebook directly on the JupyterHub-based training platform (see Figure 2). Besides sharing notebooks statically, there are different platforms that offer free compute resources to execute and run notebooks. For example, Binder [30] or Kaggle make it possible to open a (collection of) notebook(s) in an executable environment. These are great options if the dependencies of the training environment are not too complex and the data can be retrieved programmatically and is not too large in volume. On the other hand, a self-hosted JupyterHub server offers great flexibility in setting up the training environment and already includes all Python packages and data [13,16]. This helps to mitigate the challenge of the fragmented landscape of EO data access systems [1]. The data of the LTPy course are hosted on more than five different data-access systems, and the access modalities range from manually downloading data to downloading data programmatically with an API. Making the data available already on the training platform and providing instructions on how to access data for self-paced learning gives educators flexibility to use the training material for different training formats. We have further discovered that, especially for training participants from regions with limited internet connectivity, the download of large volumes of data is a challenge, but accessing the Jupyter-based training platform is often manageable. On the other hand, a self-hosted JupyterHub server shifts system operation and maintenance responsibility to the training provider, which is often a considerable investment of time and resources [13].

3.5. Aim for Reproducibility

The importance of reproducibility increases for notebooks with an educational purpose, especially if the notebooks are used for self-paced learning. To achieve a minimum standard of reproducibility, machine- and human-readable instructions for the data, computing environment and dependencies are required [31]. Pimentel et al., (2019, 2021) discovered that the reproducibility of notebooks on GitHub is far from ideal and, as a result, defined eight best practices to improve the reproducibility of notebooks. Some are in line with best practices for scientific computing, e.g., (i) abstracting code into functions, classes and modules, and (ii) putting imports at the beginning. Another two practices, (i) paying attention to the bottom of the notebook and (ii) restarting kernel and running all cells, can easily be achieved during the cleaning process of a notebook. In the LTPy course, we further record dependencies, including the package version, in an environment.yml file and provide the data as a zipped tar file for download from an FTP server. However, the growing data volumes of Earth Observation data make this challenging. The data folder for the LTPy main course, for example, has more than 80 GB. We acknowledge that making notebooks reproducible is associated with a considerable amount of extra effort and time investment, but aiming for reproducibility, especially when notebooks are used for training and capacity-building, greatly increases the usability of the notebooks.

4. Discussion

In this paper, we share our experience with using Jupyter notebooks in Earth Observation data education. We adapted five guiding principles, mainly from fields of scientific computing and Jupyter notebooks research, to make the notebooks more educational, reusable and fit for teaching and capacity building. In the following section, we will discuss our reasoning as to why we believe that the five guiding principles introduced in this manuscript are fundamental for EO data education.
Leverage the literate programming paradigm: The literate programming paradigm [35] is fundamental for educational notebooks. Currently, the potential for developing rich computational narratives has not been fully unleashed, even though Jupyter notebooks make this easier than ever [18]. One reason for this may be the considerable time needed to clean up a notebook and to rearrange code and markdown cells into a coherent workflow and narrative. The results from Rule et al., (2018), who analyzed more than one million notebooks hosted on Github, revealed that one out of four notebooks had no text at all, and those who had text had hardly more than 1000 words. The results from Quaranta et al., (2022) show a median text/code ratio of 0.4 (7/19), indicating that the ~1000 notebooks analyzed in the study had 2.7 times more code than text [32]. In comparison, the text/code ratio of the LTPy notebooks seems exceptional. Their text/code ratio is exactly reversed, having an average text/code ratio of 3. Even though there are no defined minimum standards for notebooks, we could conclude that a text/code ratio of at least two (twice as much text cells than code cells) could be an indication of the literate programming paradigm and meeting the criteria of being ‘educational’.
Use of instructional design elements: This guiding principle originates mainly from the need to make the training material coherent and easy to navigate for both self-paced learning and during an instructor-led training session. We are not aware of any minimum standards concerning instructional design elements in Jupyter notebooks that have been defined so far. Hence, this manuscript can be considered as a starting point to share one approach and practical implementation examples on how UX design features can be leveraged to increase the look and feel, quality and overall navigation of notebooks. An important next step would be to collect feedback on whether the instructional design patterns implemented in the LTPy fulfil their objective.
Take advantage of the full Project Jupyter ecosystem to make content accessible: Johnson (2020) describes the benefits and pitfalls of using Jupyter notebooks in an academic classroom and highlights, for example, that the client–server model upon which Jupyter notebooks are based is often already a hurdle for novice users [33]. Accessing and processing EO data that is browser- or web-based is a ‘paradigm-change’ for EO data users, especially as the majority still process large volumes of EO data locally [1,40]. Hence, conducting training based on Project Jupyter tools not only supports EO data users in their current needs, but also helps familiarize them with future data processing interfaces. The web interface JupyterLab, in particular, is often seen as the ‘next-generation’ user interface for conducting data science [16,17]. We believe that this will also be the case for Earth Observation data science. Many Earth Observation platforms and EO cloud-based services already offer a JupyterLab web interface to access and process EO data. Hence, computational notebooks can be considered as a facilitator for the paradigm ‘bringing users to the data’ and the transition to cloud-based services for Big Earth data [40,41].
Follow best practices for scientific computing: Using Jupyter notebooks in an educational context amplifies the responsibility to develop high-quality notebooks that follow best practices on scientific computing [36] and the use of Jupyter notebooks. Despite an exponential growth of Jupyter notebooks on GitHub in recent years, most notebooks fall short in terms of quality and following common best practices. One reason could be that best practices and practical implementation examples are not well known and the pitfalls of Jupyter notebooks are not well understood. Even though not all best practices for scientific computing and the use of Jupyter notebooks are directly applicable to an educational context, the two we consider fundamental are the modularization of content and importing libraries at the beginning. The latter is relatively easy to implement with no additional effort and seems to be adopted in most cases [32]. Modularizing content, e.g., functions, can be time-consuming, especially to make the functions applicable to different types of EO data products. From our experience, the time investment pays off, as from an educational perspective, the modularization of functions not only follows best practices for scientific computing, but also helps to cater for different levels of coding and programming expertise.
Aim for reproducibility: Being reproducible is a big claim, which we do not dare to make. For this reason, as a guiding principle, we advocate to ‘aim’ for reproducibility when developing Jupyter notebooks in an educational context. The minimum reproducibility aspects that should be followed in an educational context are to re-run notebooks from top to bottom, to clean any empty cells and to make the processing environment including data reproducible. It seems that there is a misconception that content is automatically reproducible when made available as a Jupyter notebook. The study by Pimentel et al., (2019, 2021), however, revealed that only a quarter of the notebooks available on Github can be re-executed. In the study by Quaranta et al., (2022), a surprisingly high percentage of the notebooks analyzed on Kaggle had a linear execution order. They argue that this best practice is enforced by Kaggle, and that the ‘execute all’ command is part of the commit process. In the Jupyter ecosystem, this process needs to be invoked manually. One suggestion for future developments could be that best practices are automatically integrated in the ecosystem. For example, when a notebook is committed to a code sharing platform, a cleaning of empty cells and an execution of cells from top to bottom is automatically invoked.

5. Conclusions and Future Outlook

The increase in open Earth Observation data and the emergence of cloud-based services require stronger efforts in training EO data users to ensure continued EO data uptake and use. By sharing our experience and guiding principles on how to make notebooks more educational, we hope we can contribute to increasing the overall quality and usability of Jupyter notebooks, especially when these are used in an educational context. Current best practices focus on the general use of notebooks among data scientists, but more research and practical examples from the community on how to make notebooks more educational are required. We see two areas of future developments as relevant. First, we need a better understanding of notebook users, the benefits they seek and the challenges they face when starting to use notebooks, especially in application domains where the use of Jupyter notebooks is growing, such as in the field of Earth Observation research. Further, we need an evaluation of whether the instructional design elements that were shared in this manuscript meet their purpose of improving the navigation through the training material. Second, we believe that time is the main the reason why best practices related to scientific computing or reproducibility are often not followed. Some best practices could be invoked by developing, e.g., a Jupyter extension, which re-executes a notebook from top to bottom and removes empty code cells before it is committed.

Author Contributions

Conceptualization, J.W. and F.F.; writing—original draft preparation, J.W.; writing—review and editing, J.W., F.F., S.M., S.S., B.S. and J.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work has partially been funded under the framework of EUMETSAT/Copernicus contract No. 19/218240. Open Access funding provided by the Open Access Publication Fund of Philipps-Universität Marburg with support of the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation).

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available.

Acknowledgments

We thank all the colleagues at EUMETSAT and participants at various training events whose feedback fundamentally helped to improve the LTPy course.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Overview of satellite- and model-based products featured in the LTPy main course.
Table A1. Overview of satellite- and model-based products featured in the LTPy main course.
Satellite-Based Products
SatelliteInstrumentData Product(s)
Metop series (consists of three satellites: Metop-A, -B, -C)Global Ozone Monitoring Experiment-2 (GOME-2)Absorbing Aerosol Index (AAI)
Absorbing Aerosol Height (AAH)
Tropospheric Nitrogen Dioxide (NO2)
Metop series (consists of three satellites: Metop-A, -B, -C)Infrared Atmospheric
Sounding Interferometer (IASI)
Carbon Monoxide
Ammonia (NH3)
Sentinel-5 Precursor (Sentinel-5P)TROPOspheric Monitoring Instrument (TROPOMI)Carbon Monoxide (CO)
Ultraviolet Aerosol Index (UVAI)
Sentinel-3Ocean and Land Colour
Instrument (OLCI)
Level 1 RGB True- and False-Color composites
Sentinel-3Sea and Land Surface
Temperature Radiometer (SLSTR)
Fire Radiative Power (FRP)
Aerosol Optical Depth (AOD)
Metop series (consists of three satellites: Metop-A, -B, -C)GOME-2, IASI, Advance Very High Resolution
Radiometer (AVHRR)
Polar Multi-Sensor Aerosol Optical Properties (PMAp) Aerosol Optical Depth (AOD)
Model-based products
Operational serviceProduct typeData variable(s)
Copernicus Atmospheric Monitoring Service (CAMS)Global Fire Assimilation
System (GFAS)
Fire Radiative Power (FRP)
Copernicus Atmospheric Monitoring Service (CAMS)Global reanalysis (EAC4)Organic Matter Aerosol Optical Depth (OMAOD)
Ozone (O3)
Copernicus Atmospheric Monitoring Service (CAMS)Global atmospheric
composition forecasts
Dust Aerosol Optical Depth
Copernicus Atmospheric Monitoring Service (CAMS)European air quality
forecasts
Dust concentration
Nitrogen Dioxide (NO2)
Copernicus Emergency Management Service (CEMS)Global ECMWF Fire
Forecast (GEFF)
Fire Weather Index (FWI)
Table A2. Overview of satellite-, model- and ground-based products featured in the thematic module on dust monitoring and forecasting.
Table A2. Overview of satellite-, model- and ground-based products featured in the thematic module on dust monitoring and forecasting.
Satellite-Based Products
SatelliteInstrumentData Product(s)
Metop series (consists of three satellites: Metop-A, -B, -C)Global Ozone Monitoring Experiment-2 (GOME-2)Absorbing Aerosol Index (AAI)
Sentinel-5 Precursor (Sentinel-5P)TROPOspheric Monitoring Instrument (TROPOMI)Ultraviolet Aerosol Index (UVAI)
Metop series (consists of three satellites: Metop-A, -B, -C)GOME-2, IASI, Advanced Very High Resolution
Radiometer (AVHRR)
Polar Multi-Sensor Aerosol Optical Properties (PMAp) Aerosol Optical Depth (AOD)
Meteosat Second GenerationSpinning Enhance Visible and InfraRed Image (SEVIRI)Level 1 RGB True- and False-Color composites
Terra and AquaModerate Resolution
Imaging Spectroradiometer (MODIS)
Level 1 RGB True- and False- Color composites
Aerosol Optical Depth (AOD)
Model-based products
Operational serviceProduct typeData variable(s)
Copernicus Atmosphere Monitoring Service (CAMS)Global atmospheric
composition forecasts
Dust Aerosol Optical Depth
Copernicus Atmosphere Monitoring Service (CAMS)European air quality
forecasts
Dust concentration
Barcelona Dust Regional CenterRegional dust forecasts for Northern Africa,
Middle East and Europe
Dust Aerosol Optical Depth
Ground-based products
Data serviceGround-based sensorData variable(s)
Aerosol Robotic NETworkSun photometerAerosol Optical Depth
Angstrom Exponent
Coefficient
European Aerosol Research Lidar Network (EARLINET)LidarVertical backscatter profile
European Environment Agency (EEA) Air Quality DataParticulate Matter SensorParticulate Matter 2.5 (PM2.5)
Particulate Matter 10 (PM10)
Figure A1. Rating by training participants (n = 111) whether they would recommend the training to a friend/colleague. Rating levels were from 1 (low) to 5 (high). Bars indicate absolute numbers, labels on top indicate relative frequencies in %.
Figure A1. Rating by training participants (n = 111) whether they would recommend the training to a friend/colleague. Rating levels were from 1 (low) to 5 (high). Bars indicate absolute numbers, labels on top indicate relative frequencies in %.
Remotesensing 14 03359 g0a1
Table A3. Categories built based on the open response question ‘What did you specifically enjoy or find useful in the session?” (n = 105). Note: some responses could be categorized into two categories.
Table A3. Categories built based on the open response question ‘What did you specifically enjoy or find useful in the session?” (n = 105). Note: some responses could be categorized into two categories.
Categoryn%Example Responses
Practical session in general, including introduction to Jupyter notebooks5754.3
-
I enjoyed the step by step demonstration and follow up session. The instructors were able to go through all the questions and gave us hints for the functions and plotting.
-
Clear, well organized and very useful training environment.
-
Systematic step-wise process to get a visualized output with clear working code and functions
-
The potentials that Jupyter offers.
-
The practical session. Very useful to go through already available code.
Introduction to atmospheric composition thematic and data, including scientific lectures3634.3
-
It has been a new concept for me as I am not an atmospheric scientist but I do have a remote sensing background from environmental sciences.
-
Great presentations on variety of perspectives and useful cases—inspiring.
-
It was quite enlightening. Good to know about the variety of datasets available with regards to air pollution monitoring.
-
I found the whole topic very interesting and important regarding my research.
Course structure, content and organization2321.9
-
I enjoyed the presentation and the content.
-
All the information and discussions were very good.
Table A4. Categories of responses to the question ‘What should we do differently next time?’ (n = 92). Categories on the left have four or more mentions, with the first category (Extension of practical training part) being by far the most mentioned suggestion (n = 27). Others had four or five mentions. Categories on the right were mentioned less than four times but are considered as useful suggestions.
Table A4. Categories of responses to the question ‘What should we do differently next time?’ (n = 92). Categories on the left have four or more mentions, with the first category (Extension of practical training part) being by far the most mentioned suggestion (n = 27). Others had four or five mentions. Categories on the right were mentioned less than four times but are considered as useful suggestions.
Categories with Four or More MentionsCategories with Less than Four Mentions
Extension of the practical training, including daily assignments and start workflows from downloading datasets
More breaks during the practical training, including more time for introducing the
Jupyter-based training platform and
explaining Python code principles
Preference for in-person training
More scientific depth and inter-
connections of the atmospheric composition topic
Provide more references to books,
articles and data documentation
Provide presentations and practical
training content in beforehand
More time for Q&A and provide
written answers to questions
Provide a space for offline
discussions after the training course
Offer separate introduction to
Python/Jupyter notebooks/xarray
Inform about the level of difficulty
of the practical part in beforehand
Provide more trainers for practical part
Figure A2. Responses to the question ‘What type of information would help you in using a specific dataset?’ (n = 196). Bars indicate absolute numbers, labels on the right indicate relative frequencies in %.
Figure A2. Responses to the question ‘What type of information would help you in using a specific dataset?’ (n = 196). Bars indicate absolute numbers, labels on the right indicate relative frequencies in %.
Remotesensing 14 03359 g0a2
Table A5. Overview of notebooks that are part of the LTPy main course and, for each notebook, the number of cells in total, number of markdown cells, number of code cells and the markdown/code ratio are listed.
Table A5. Overview of notebooks that are part of the LTPy main course and, for each notebook, the number of cells in total, number of markdown cells, number of code cells and the markdown/code ratio are listed.
No.Notebook Title# of Cells (Total)No. of Cells (Markdown)No. of Cells (Code)Ratio (Markdown/Code)
Section I—Data access
1.1Atmospheric composition data—Overview and data access9593246.5
1.2WEkEO Harmonized data access (HDA) API5540152.7
Section II—Data exploration
2.1.1AC SAF Metop-A GOME-2—Tropospheric NO2—Level 2—Load and browse 5240123.3
2.1.2AC SAF Metop-A/B GOME-2—Tropospheric NO2—Level 2—Pre-process7252202.6
2.1.3AC SAF Metop-A/B GOME-2—Tropospheric NO2- Level 3—Load and browse382993.2
2.1.4AC SAF Metop-A/B/C GOME-2—Absorbing Aerosol Index—Level 3—Load and browse5439152.6
2.1.5AC SAF Metop-A/B/C GOME-2—Absorbing Aerosol Height—Level 2—Load and browse5946133.5
2.2.1Polar Multi-Sensor Aerosol Optical Properties (PMAp)—Aerosol Optical Depth—Level 2—Load and browse413293.6
2.3.1Metop-A/B IASI—Ammonia (NH3)—Level 2—Load and browse5540152.7
2.3.2Metop-A/B IASI—Carbon Monoxide—Level 2—Load and browse6848202.4
2.4.1Sentinel-5P TROPOMI—Carbon Monoxide—Level 2—Load and browse4735122.9
2.4.2Sentinel-5P TROPOMI—Ultraviolet Aerosol Index—Level 2—Load and browse403284.0
2.5.1Sentinel-3 OLCI—Radiances—Level 1—Load and browse7752252.1
2.5.2Sentinel-3 SLSTR NRT—Fire Radiative Power (FRP)—Level 2—Load and browse8456282.0
2.5.3Sentinel-3 SLSTR NRT—Aerosol Optical Depth (AOD)—Level 2—Load and browse312564.2
2.6.1CAMS Global reanalysis (EAC4)—Organic Matter Aerosol Optical Depth—Load and browse6142192.2
2.6.2CAMS Global Fire Assimilation System (GFAS)—Fire Radiative Power—Load and browse332583.1
2.6.3CAMS Global Forecast—Dust Aerosol Optical Depth—Load and browse6043172.5
2.6.4CAMS European air quality forecast—Dust Concentration—Load and browse6446182.6
2.6.5European air quality forecast—Nitrogen Dioxide—Load and browse6345182.5
2.7.1CEMS Global ECMWF Fire Forecast—Fire Weather Index—Load and browse8763242.6
2.7.2CEMS Global ECMWF Fire Forecast—Fire Weather Index—Harmonized Danger Classes362974.1
2.7.3CEMS Global ECMWF Fire Forecast—Fire Weather Index—Custom Danger Classes6141202.1
Section III—Case studies
3.1.0Case study—Siberian fires 2021—Multi-data11077332.3
3.1.1Case study—Amazon fires 2019—Multi-data7350232.2
3.1.2Case study—Siberian fires 2019—Multi-data11982372.2
3.1.3Case study—Californian fires 2020—Multi-data160113472.4
3.1.4Case study—Chernobyl fires 2020—Sentinel-3 SLSTR NRT—Fire Radiative Power8256262.2
3.1.5Case study—Californian fires 2020—Sentinel-3 SLSTR NRT—Fire Radiative Power8155262.1
3.1.6Case study—Californian fires 2020—Sentinel-3 SLSTR NRT—Aerosol Optical Depth312564.2
3.1.7Case study—Indonesian fires 2015—Multi-data148106422.5
3.1.8Case study—Indonesian fires 2020—Multi-data9367262.6
3.1.9Case study—Portugal fires 2020—Multi-data205145602.4
3.2.1Case study—Map and time-series analysis—AC SAF Metop-A/B GOME-2—Tropospheric NO24333103.3
3.2.2Case study—Produce gridded dataset—AC SAF Metop-A/B GOME-2—Tropospheric NO27151202.6
3.2.3Case study—Create an anomaly map—Europe—AC SAF Metop-A/B GOME-2—Tropospheric NO212283392.1
3.2.4Case study—Time-series analysis—Europe—AC SAF Metop-A/B GOME-2—Tropospheric NO26345182.5
3.2.5Case study—Create an anomaly map—Europe—Sentinel-5P TROPOMI—Tropospheric NO25741162.6
3.2.6Case study—Time-series analysis—Europe—Sentinel-5P TROPOMI—Tropospheric NO5844143.1
3.3.1Case study—Antarctic ozone hole 2019—Multi-data7651252.0
3.3.2Case study—Antarctic ozone hole 2019—CAMS animation312473.4
3.3.3Case study—Antarctic ozone hole 2020—AC SAF Metop-A/B/C GOME-2 Level 28663232.7
3.3.4Case study—Arctic ozone hole 2020—Metop-A/B/C IASI Level 28463213.0
3.3.5Case study—Arctic ozone hole 2020—CAMS Reanalysis (EAC4)362883.5
Section IV—Exercises
4.1.1Exercise—Sentinel-5P TROPOMI—Carbon Monoxide—Level 28262203.0
4.2.1Exercise—Sentinel-3 OLCI—Radiances—Level 19773243.0
4.2.2Exercise—Sentinel-3 SLSTR NRT—Fire Radiative Power8563223.0
4.2.3Exercise—Sentinel-3 SLSTR NRT—Aerosol Optical Depth5443114.0
4.3.1Exercise—CAMS Global Reanalysis (EAC4)—Total Column Carbon Monoxide5947124.0
4.4.1Exercise—AC SAF Metop-A/B/C GOME-2—Ozone12595303.0
4.4.2Exercise—Metop-A/B/C IASI—Ozone10780273.0
Table A6. Overview of notebooks that are part of the thematic module on dust monitoring and forecasting and for each notebook the number of cells in total, number of markdown cells, number of code cells and the markdown/code ratio is listed.
Table A6. Overview of notebooks that are part of the thematic module on dust monitoring and forecasting and for each notebook the number of cells in total, number of markdown cells, number of code cells and the markdown/code ratio is listed.
No.Notebook Title# of Cells (Total)No. of Cells (Markdown)No. of Cells (Code)Ratio (Markdown/Code)
Section I—Observations—Satellite data
1.1.1Meteosat Second Generation SEVIRI—True color and dust RGB—Level 110174272.7
1.1.2MODIS—True color and dust RGB—Level 1B8060203.0
1.1.3Sentinel-5P TROPOMI—Ultraviolet Aerosol Index—Level 24836123.0
1.1.4MODIS—10 km aerosol product—Level 24938113.5
1.1.5Polar Multi-Sensor Aerosol Optical Properties (PMAp) Product—Aerosol Optical Depth—Level 25039113.5
1.1.6Metop-ABC GOME-2—Absorbing Aerosol Index—Level 36449153.3
Section I—Observations—Ground-based data
1.2.1AERONET—AErosol RObotic NETwork6753143.8
1.2.2European Aerosol Research Lidar Network (EARLINET)—Backscatter profiles373074.3
1.2.3European Environment Agency Air Quality Data4836123.0
Section II—Model forecasts
2.1CAMS global atmospheric composition forecast—Dust Aerosol Optical Depth7453212.5
2.2CAMS European Air Quality Forecasts—Dust concentration7454202.7
2.3WMO SDS-WAS MONARCH—Dust Forecast4634122.8
Section III—Practical case study
3.1Exercise 01
Exercise 01—Solution
21
77
17
56
4
21
4.3
2.7
3.2Exercise 02
Exercise 02—Solution
30
73
23
52
7
21
3.3
2.5
3.3Exercise 03
Exercise 03—Solution
31
66
24
49
7
17
3.4
2.9
3.4Exercise 04
Exercise 04—Solution
28
76
22
54
6
22
3.7
2.5
3.5Exercise 05
Exercise 05—Solution
21
57
17
43
4
14
4.3
3.1

References

  1. Wagemann, J.; Siemen, S.; Seeger, B.; Bendix, J. Users of Open Big Earth Data—An Analysis of the Current State. Comput. Geosci. 2021, 157, 104916. [Google Scholar] [CrossRef]
  2. Price Waterhouse Coopers (PWC). Main Trends and Challenges in the Space Sector; PWC: Neuilly-sur-Seine, France, 2020. [Google Scholar]
  3. Hebden, S. Plans for a New Wave of European Satellites. 2020. [Google Scholar]
  4. European Organisation for the Exploitation of Meteorological Satellites Meteosat Series|EUMETSAT. Available online: https://www.eumetsat.int/our-satellites/meteosat-series?sjid=future (accessed on 12 February 2022).
  5. Masek, J.G.; Wulder, M.A.; Markham, B.; McCorkel, J.; Crawford, C.J.; Storey, J.; Jenstrom, D.T. Landsat 9: Empowering Open Science and Applications through Continuity. Remote Sens. Environ. 2020, 248, 111968. [Google Scholar] [CrossRef]
  6. National Aeronautics and Space Administration Landsat NeXt|Landsat Science. Available online: https://landsat.gsfc.nasa.gov/satellites/landsat-next/ (accessed on 12 February 2022).
  7. Bernd, A.; Braun, D.; Ortmann, A.; Ulloa-Torrealba, Y.Z.; Wohlfart, C.; Bell, A. More than Counting Pixels—Perspectives on the Importance of Remote Sensing Training in Ecology and Conservation. Remote Sens. Ecol. Conserv. 2017, 3, 38–47. [Google Scholar] [CrossRef] [Green Version]
  8. Miguel-Lago, M. Towards an Innovative Strategy for Skills Development and Capacity Building in the Space Geoinformation Sector Supporting Copernicus User Uptake: Deliverable 1.6—Space/Geospatial Sector Skills Strategy; EO4GEO: Genova, Italy, 2019. [Google Scholar]
  9. Hodam, H.; Rienow, A.; Jürgens, C. Bringing Earth Observation to Schools with Digital Integrated Learning Environments. Remote Sens. 2020, 12, 345. [Google Scholar] [CrossRef] [Green Version]
  10. European Space Agency ESA—European Space Education Resource Office. Available online: https://www.esa.int/Education/Teachers_Corner/European_Space_Education_Resource_Office (accessed on 16 May 2022).
  11. Friedrich Schiller Universität Jena. Welcome to EO College—EO College. Available online: https://eo-college.org/welcome (accessed on 16 May 2022).
  12. Davies, A.; Hooley, F.; Causey-Freeman, P.; Eleftheriou, I.; Moulton, G. Using Interactive Digital Notebooks for Bioscience and Informatics Education. PLoS Comput. Biol. 2020, 16, e1008326. [Google Scholar] [CrossRef]
  13. Kim, B.; Henke, G. Easy-to-Use Cloud Computing for Teaching Data Science. J. Stat. Data Sci. Educ. 2021, 29, S103–S111. [Google Scholar] [CrossRef]
  14. Bauer, T.; Immitzer, M.; Mansberger, R.; Vuolo, F.; Márkus, B.; Wojtaszek, M.V.; Földváry, L.; Szablowska-Midor, A.; Kozak, J.; Oliveira, I.; et al. The Making of a Joint E-Learning Platform for Remote Sensing Education: Experiences and Lessons Learned. Remote Sens. 2021, 13, 1718. [Google Scholar] [CrossRef]
  15. Maggioni, V.; Girotto, M.; Habib, E.; Gallagher, M.A. Building an Online Learning Module for Satellite Remote Sensing Applications in Hydrologic Science. Remote Sens. 2020, 12, 3009. [Google Scholar] [CrossRef]
  16. Perkel, J.M. Why Jupyter Is Data Scientists’ Computational Notebook of Choice. Nature 2018, 563, 145–146. [Google Scholar] [CrossRef] [Green Version]
  17. Perkel, J.M. Ten Computer Codes That Transformed Science. Nature 2021, 589, 344–348. [Google Scholar] [CrossRef]
  18. Rule, A.; Tabard, A.; Hollan, J.D. Exploration and Explanation in Computational Notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, 21–26 April 2018; ACM: Montreal, QC, Canada, 2018; pp. 1–12. [Google Scholar]
  19. Lau, S.; Drosos, I.; Markel, J.M.; Guo, P.J. The Design Space of Computational Notebooks: An Analysis of 60 Systems in Academia and Industry. In Proceedings of the 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Dunedin, New Zealand, 10–14 August 2020; IEEE: Dunedin, New Zealand, 2020; pp. 1–11. [Google Scholar]
  20. Pimentel, J.F.; Murta, L.; Braganholo, V.; Freire, J. A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks. In Proceedings of the 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), Montreal, QC, Canada, 25–31 May 2019; IEEE: Montreal, QC, Canada, 2019; pp. 507–517. [Google Scholar]
  21. Pimentel, J.F.; Murta, L.; Braganholo, V.; Freire, J. Understanding and Improving the Quality and Reproducibility of Jupyter Notebooks. Empir. Softw. Eng. 2021, 26, 65. [Google Scholar] [CrossRef] [PubMed]
  22. Chattopadhyay, S.; Prasad, I.; Henley, A.Z.; Sarma, A.; Barik, T. What’s Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020; ACM: Honolulu, HI, USA, 2020; pp. 1–12. [Google Scholar]
  23. Engelberger, F.; Galaz-Davison, P.; Bravo, G.; Rivera, M.; Ramírez-Sarmiento, C.A. Developing and Implementing Cloud-Based Tutorials That Combine Bioinformatics Software, Interactive Coding, and Visualization Exercises for Distance Learning on Structural Bioinformatics. J. Chem. Educ. 2021, 98, 1801–1807. [Google Scholar] [CrossRef]
  24. Clarke, D.J.B.; Jeon, M.; Stein, D.J.; Moiseyev, N.; Kropiwnicki, E.; Dai, C.; Xie, Z.; Wojciechowicz, M.L.; Litz, S.; Hom, J.; et al. Appyters: Turning Jupyter Notebooks into Data-Driven Web Apps. Patterns 2021, 2, 100213. [Google Scholar] [CrossRef]
  25. Lasser, J.; Manik, D.; Silbersdorff, A.; Säfken, B.; Kneib, T. Introductory Data Science across Disciplines, Using Python, Case Studies, and Industry Consulting Projects. Teach. Stat. 2021, 43, S190–S200. [Google Scholar] [CrossRef]
  26. Boscoe, B.M.; Pasquetto, I.V.; Golshan, M.S.; Borgman, C.L. Using the Jupyter Notebook as a Tool for Open Science: An Empirical Study. In Proceedings of the 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Toronto, ON, Canada, 19–23 June 2017. [Google Scholar]
  27. Camara, G.S.; Camboim, S.P.; Bravo, J.V.M. Using Jupyter Notebooks for Viewing and Analysing Geospatial Data: Two Examples for Emotional Maps and Education Data. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2021, XLVI-4/W2-2021, 17–24. [Google Scholar] [CrossRef]
  28. Committee on Earth Observation Satellites. Jupyter Notebooks for Capacity Development Webinar|CEOS. Available online: https://ceos.org/meetings/jupyter-notebooks-for-capacity-development-webinar/ (accessed on 10 February 2022).
  29. Granger, B.E.; Perez, F. Jupyter: Thinking and Storytelling with Code and Data. Comput. Sci. Eng. 2021, 23, 7–14. [Google Scholar] [CrossRef]
  30. Jupyter, P.; Bussonnier, M.; Forde, J.; Freeman, J.; Granger, B.; Head, T.; Holdgraf, C.; Kelley, K.; Nalvarte, G.; Osheroff, A.; et al. Binder 2.0—Reproducible, Interactive, Sharable Environments for Science at Scale. In Proceedings of the 17th Python in Science Conference (SciPy 2018), Austin, TX, USA, 9–15 July 2018; pp. 113–120. [Google Scholar]
  31. Rule, A.; Birmingham, A.; Zuniga, C.; Altintas, I.; Huang, S.-C.; Knight, R.; Moshiri, N.; Nguyen, M.H.; Rosenthal, S.B.; Pérez, F.; et al. Ten Simple Rules for Writing and Sharing Computational Analyses in Jupyter Notebooks. PLoS Comput. Biol. 2019, 15, e1007007. [Google Scholar] [CrossRef] [PubMed]
  32. Quaranta, L.; Calefato, F.; Lanubile, F. Eliciting Best Practices for Collaboration with Computational Notebooks. Proc. ACM Hum. Comput. Interact. 2022, 6, 1–41. [Google Scholar] [CrossRef]
  33. Johnson, J.W. Benefits and Pitfalls of Jupyter Notebooks in the Classroom. In Proceedings of the 21st Annual Conference on Information Technology Education, Virtual, 7–9 October 2020; ACM: New York, NY, USA, 2020; pp. 32–37. [Google Scholar]
  34. Wagemann, J.; Szeto, S.; Mantovani, S.; Fierli, F. Learning Tool for Python on Atmospheric Composition. J. Open Source Educ. 2022. under review. [Google Scholar]
  35. Knuth, D.E. Literate Programming. Comput. J. 1984, 27, 97–111. [Google Scholar] [CrossRef] [Green Version]
  36. Wilson, G.; Aruliah, D.A.; Brown, C.T.; Chue Hong, N.P.; Davis, M.; Guy, R.T.; Haddock, S.H.D.; Huff, K.D.; Mitchell, I.M.; Plumbley, M.D.; et al. Best Practices for Scientific Computing. PLoS Biol. 2014, 12, e1001745. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  37. Wang, J.; Kuo, T.; Li, L.; Zeller, A. Assessing and Restoring Reproducibility of Jupyter Notebooks. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, Virtual, 21–25 December 2020; pp. 138–149. [Google Scholar]
  38. Koenzen, A.P.; Ernst, N.A.; Storey, M.-A.D. Code Duplication and Reuse in Jupyter Notebooks. In Proceedings of the 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Dunedin, New Zealand, 10–14 August 2020; IEEE: Dunedin, New Zealand, 2020; pp. 1–9. [Google Scholar]
  39. Executable Books Community. Jupyter Book; Zenodo/CERN: Geneva, Switzerland, 2020. [Google Scholar]
  40. Wagemann, J.; Siemen, S.; Seeger, B.; Bendix, J. A User Perspective on Future Cloud-Based Services for Big Earth Data. Int. J. Digit. Earth 2021, 14, 1758–1774. [Google Scholar] [CrossRef]
  41. Echterhoff, J.; Wagemann, J.; Lieberman, J. Earth Observation Cloud Platform Concept Development Study Report; Open Geospatial Consortium, Inc.: Arlington, VA, USA, 2021. [Google Scholar]
Figure 1. Overview of Learning Tool for Python main course: general course outline (blue) and the modalities of access (yellow). The notebooks are organized in notebooks on data access, data exploration, case studies and exercises. Learners can either clone the content onto their local machines from a publicly accessible GitLab repository or directly access the JupyterHub-based training platform under https://ltpy.adamplatform.eu (accessed on 16 May 2022). The course starts with the index.ipynb notebook, which introduces learners to the course outline and modality. A notebook with 14 pre-defined functions helps to modularize code and to cater for different levels in programming and coding.
Figure 1. Overview of Learning Tool for Python main course: general course outline (blue) and the modalities of access (yellow). The notebooks are organized in notebooks on data access, data exploration, case studies and exercises. Learners can either clone the content onto their local machines from a publicly accessible GitLab repository or directly access the JupyterHub-based training platform under https://ltpy.adamplatform.eu (accessed on 16 May 2022). The course starts with the index.ipynb notebook, which introduces learners to the course outline and modality. A notebook with 14 pre-defined functions helps to modularize code and to cater for different levels in programming and coding.
Remotesensing 14 03359 g001
Figure 2. Screenshot of the Jupyterbook on dust aerosol detection, monitoring and forecasting, which can be accessed via https://dust.trainhub.eumetsat.int/ (accessed on 16 May 2022). The example shows a subpage introducing the Meteosat Second Generation SEVIRI Level 1B data. Each data subpage has on top a link to the executable notebook on a JupyterHub-based training platform and info boxes on ‘basic facts’ and ‘How to access the data’.
Figure 2. Screenshot of the Jupyterbook on dust aerosol detection, monitoring and forecasting, which can be accessed via https://dust.trainhub.eumetsat.int/ (accessed on 16 May 2022). The example shows a subpage introducing the Meteosat Second Generation SEVIRI Level 1B data. Each data subpage has on top a link to the executable notebook on a JupyterHub-based training platform and info boxes on ‘basic facts’ and ‘How to access the data’.
Remotesensing 14 03359 g002
Figure 3. Frequency plot of number of notebooks per markdown/code ratio category. The ratio category describes the ratio of number of markdown cells vs. number of code cells. For example, a ratio = 3 means that the respective notebooks have three times more annotated cells than code cells. Top: frequency distribution of the notebooks that are part of the LTPy main course (n = 50). Bottom: frequency distribution of notebooks that are part of the thematic module on dust monitoring and forecasting (n = 22). The LTPy main course has one outlier (a notebook with mainly descriptive text on how to access different data) with a ratio of 46.5. This outlier has been removed from the plot.
Figure 3. Frequency plot of number of notebooks per markdown/code ratio category. The ratio category describes the ratio of number of markdown cells vs. number of code cells. For example, a ratio = 3 means that the respective notebooks have three times more annotated cells than code cells. Top: frequency distribution of the notebooks that are part of the LTPy main course (n = 50). Bottom: frequency distribution of notebooks that are part of the thematic module on dust monitoring and forecasting (n = 22). The LTPy main course has one outlier (a notebook with mainly descriptive text on how to access different data) with a ratio of 46.5. This outlier has been removed from the plot.
Remotesensing 14 03359 g003
Figure 4. Example of one case study notebook of the LTPy main course, with several instructional design elements integrated: header, navigation pane, alert boxes (course section and prerequisites), introduction section and notebook outline with anchor links.
Figure 4. Example of one case study notebook of the LTPy main course, with several instructional design elements integrated: header, navigation pane, alert boxes (course section and prerequisites), introduction section and notebook outline with anchor links.
Remotesensing 14 03359 g004
Figure 5. Overview of HTML/Markdown commands, which were used as instructional design elements, and their rendered output: (i) alert boxes, (ii) navigation pane, (iii) highlighting text as code and (iv) anchor links.
Figure 5. Overview of HTML/Markdown commands, which were used as instructional design elements, and their rendered output: (i) alert boxes, (ii) navigation pane, (iii) highlighting text as code and (iv) anchor links.
Remotesensing 14 03359 g005
Figure 6. Magic command (%run) to load content, e.g., functions, from an external script or notebook. Once the command has been executed, the functions can be used in a notebook. The docstring of a function can be called by adding a question mark in front of the function name.
Figure 6. Magic command (%run) to load content, e.g., functions, from an external script or notebook. Once the command has been executed, the functions can be used in a notebook. The docstring of a function can be called by adding a question mark in front of the function name.
Remotesensing 14 03359 g006
Table 1. Overview of the Learning Tool for Python components.
Table 1. Overview of the Learning Tool for Python components.
ComponentLinksNumber of Notebooks
LTPy main courseGitLab repository: https://gitlab.eumetsat.int/eumetlab/atmosphere/atmosphere (accessed on 16 May 2022)
Hosted JupyterHub-based training platform: https://ltpy.adamplatform.eu/ (accessed on 16 May 2022)
51
Thematic module on dust monitoring and forecastingJupyterbook: https://dust.trainhub.eumetsat.int/ (accessed on 16 May 2022)
Hosted Jupyterhub-based training platform: https://dust.ltpy.adamplatform.eu (accessed on 16 May 2022)
22
Table 2. Overview of training events, including numbers of training participants for each type of training.
Table 2. Overview of training events, including numbers of training participants for each type of training.
Type of TrainingNumber of EventsNumber of Participants
Training schools6553
Thematic expert workshops2130
Short courses8402
Total161085
Table 3. Average number of total, markdown and code cells for each category of the LTPy main course and of the thematic module. Each entry summarized from Appendix A Table A5 and Table A6.
Table 3. Average number of total, markdown and code cells for each category of the LTPy main course and of the thematic module. Each entry summarized from Appendix A Table A5 and Table A6.
# No. of Cells (Total)No. of Cells (Markdown)No. of Cells (Code)Ratio
Main courseSection I—Data access (n = 1) *5540152.7
Section II—Data exploration (n = 21)56.44115.42.9
Section III—Case studies (n = 21)87.16225.12.7
Section IV—Exercises (n = 7)8766.120.63.3
Total (n = 50)73.553.320.22.8
Thematic moduleData exploration (n = 12)61.546.315.23.2
Exercises (n = 5)27.521.563.7
Exercise solutions (n = 5)7352.820.22.6
Total (n = 22)55.441.513.93.2
* The notebook on data access systems (1.1 in Table A5) is an outlier, with a markdown/code ratio of 46.5. This notebook is not included in the calculation of the LTPy main course’s total number of cells (total, markdown and code) and ratio. For this reason, n = 50, even though the course consists of 51 notebooks.
Table 4. Overview of useful HTML/Markdown commands, including their functionality and the objective they fulfill.
Table 4. Overview of useful HTML/Markdown commands, including their functionality and the objective they fulfill.
TypeCommandFunctionalityObjective
Modularization%run ./functions.ipynbLoads content, e.g., functions,
hosted in another notebook
or Python script
Cater for different programming literacies
Teach best practices for scientific computing
?function_nameOpens the docstring of
the function
Navigation pane<span style = “float:right;”>
<a href = “./test.ipynb”>
Next notebook >>
</a>
</span>
Leverage HTML to align items to the rightImprove overall
course navigation
Anchor links<a id=’”example1”></a>
# Heading 1
Assign an ID for each section
in the notebook
Improve navigation within a notebook
[Heading 1](#example1)Reference the assigned ID in
the module outline on top
Alert boxes<div class=”alert alert-block alert-success”>
Add content
</div>
Colorizes markdown cells
Options for alert box colors:
alert-success (green)
alert-danger (red)
alert-info (blue)
alert-warning (yellow)
Improve overall
course navigation
Text highlighting as code‘this text shall be
highlighted’
Highlights the text within the
backticks as code
(changes font to Courier New and
colors the text in grey)
Highlight specific
sections/code in text
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wagemann, J.; Fierli, F.; Mantovani, S.; Siemen, S.; Seeger, B.; Bendix, J. Five Guiding Principles to Make Jupyter Notebooks Fit for Earth Observation Data Education. Remote Sens. 2022, 14, 3359. https://doi.org/10.3390/rs14143359

AMA Style

Wagemann J, Fierli F, Mantovani S, Siemen S, Seeger B, Bendix J. Five Guiding Principles to Make Jupyter Notebooks Fit for Earth Observation Data Education. Remote Sensing. 2022; 14(14):3359. https://doi.org/10.3390/rs14143359

Chicago/Turabian Style

Wagemann, Julia, Federico Fierli, Simone Mantovani, Stephan Siemen, Bernhard Seeger, and Jörg Bendix. 2022. "Five Guiding Principles to Make Jupyter Notebooks Fit for Earth Observation Data Education" Remote Sensing 14, no. 14: 3359. https://doi.org/10.3390/rs14143359

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop