ChatGPT for Science Lesson Planning: An Exploratory Study Based on Pedagogical Content Knowledge

Peikos, Giorgos; Stavrou, Dimitris

doi:10.3390/educsci15030338

Open AccessArticle

ChatGPT for Science Lesson Planning: An Exploratory Study Based on Pedagogical Content Knowledge

by

Giorgos Peikos

^* and

Dimitris Stavrou

Department of Primary Education, University of Crete, 74100 Rethymno, Greece

^*

Author to whom correspondence should be addressed.

Educ. Sci. 2025, 15(3), 338; https://doi.org/10.3390/educsci15030338

Submission received: 15 January 2025 / Revised: 13 February 2025 / Accepted: 8 March 2025 / Published: 10 March 2025

(This article belongs to the Special Issue Teaching and Learning with Generative AI)

Download

Browse Figures

Versions Notes

Abstract

:

Contemporary education is evolving in a landscape shaped by technological advancements, with generative artificial intelligence (AI) gaining significant attention from educators and researchers. ChatGPT, in particular, has been recognized for its potential to revolutionize teachers’ tasks, such as lesson planning. However, its effectiveness in designing science lesson plans aligned with the research-based recommendations of the Science Education literature remains in its infancy. This exploratory study seeks to address this gap by examining ChatGPT-assisted lesson planning for primary schools through the lens of a sound theoretical framework in Science Education: pedagogical content knowledge (PCK). Guided by the question, “What are the characteristics of lesson plans created by ChatGPT in terms of PCK?”, we designed four interactions with ChatGPT-4o using carefully constructed prompts informed by specific PCK aspects and prompt engineering strategies. Using qualitative content analysis, we analyzed data from these interactions. Findings indicate that incorporating PCK elements into prompts, using layer prompting strategies, and providing reference texts to ChatGPT might enhance the quality of AI-generated lesson plans. However, challenges were identified. This study concludes with guidelines for the teacher–ChatGPT co-design of lesson plans based on PCK.

Keywords:

generative AI; ChatGPT; science lesson planning; pedagogical content knowledge; science education

1. Introduction

Contemporary education is facing a changing landscape dominated by technological advancement, with cutting-edge technologies, specifically artificial intelligence (AI), attracting the interest of students, educators, and researchers (Farrokhnia et al., 2023). A subfield of AI called generative AI (GenAI) is devoted to creating new content, such as text, images, audio, videos, and even code. “The term generative refers to AI’s ability to produce novel outputs rather than merely replicating, sorting, processing, or analyzing given inputs” (Chan & Colloton, 2024, p. 9). One well-known tool in this area is ChatGPT, a transformer-based language model developed by OpenAI. Natural language processing enables generating human-like text, responding to queries, and summarizing information, producing conversations that closely resemble human discourse. Its capacity to understand complex language patterns has led to its widespread use as a tool, having been trained on vast amounts of data. ChatGPT is a breakthrough in AI technology because of its accessibility to the general public, ease of use, and human-like responses based on the user prompts (Chan & Colloton, 2024; ElSayary, 2023; Okulu & Muslu, 2024).

Several studies have identified opportunities for leveraging ChatGPT in education. For teachers, ChatGPT can assist with lesson planning, the creation of teaching materials, and the development of assessment tools, such as quizzes and rubrics, tailored to students’ varying academic levels (Clark et al., 2024; Mai et al., 2024; Zhang & Tur, 2024). These features save teachers time and help reduce burnout (Hashem et al., 2024). Additionally, ChatGPT can provide instant feedback on student assignments, ranging from essays to problem-solving tasks, thereby improving workflow efficiency (Farrokhnia et al., 2023). Despite its benefits, ChatGPT has limitations and raises concerns. For instance, its lack of contextual understanding can result in recommendations that are either too simplistic or overly complex (Farrokhnia et al., 2023). Moreover, issues of academic integrity have emerged, raising ethical concerns about cheating and over-reliance on AI tools (Farrelly & Baker, 2023). Bias in AI outputs and the risk of perpetuating stereotypes also pose challenges, particularly when the training data lacks representation of diverse populations (Avraamidou, 2024; Halaweh, 2023). Lastly, ChatGPT may produce “hallucinations” or incorrect information, underscoring the importance of human oversight (Exintaris et al., 2023; Mishra et al., 2023).

Seeking responsible GenAI use in education, UNESCO (2023, p. 15) emphasizes “unexplainable models used to generate outputs”. This means that, even though the overall approach of GenAI models may be understandable, “the specific models’ parameters, including their weights, are not transparent or easily inspected” (Blonder & Feldman-Maggor, 2024, p. 5), particularly in explaining “how artificial intelligence systems make decisions, what data they use, and why they produce specific results” (Blonder & Feldman-Maggor, 2024). Characterizing these GenAI systems as opaque, teachers face the challenge of a lack of clarity on how outputs are generated. Therefore, it is imperative for teachers to critically evaluate the outputs, when using GenAI tools as assistants in teaching, based on their subject-related knowledge, pedagogical knowledge, and experience. This is necessary to prevent inaccuracies in scientific content and ensure alignment with pedagogical theories, the curriculum, and student characteristics (Feldman-Maggor et al., 2025). Consequently, teachers’ content knowledge and pedagogical knowledge provide a strong foundation for critically evaluating GenAI outputs and ensuring their responsible application in education. In this context, pedagogical content knowledge (PCK) could offer a framework for engaging with GenAI tools effectively (Feldman-Maggor et al., 2025).

Several researchers have emphasized ChatGPT’s potential to assist teachers with tasks such as lesson planning and generating educational materials (Cooper, 2023; Moundridou et al., 2024; Okulu & Muslu, 2024). However, while ChatGPT offers promising capabilities, its effectiveness in designing science lesson plans that align with research-based recommendations in Science Education remains limited. A critical issue is ensuring that AI-generated lesson plans are pedagogically sound and appropriate for specific educational contexts, such as primary school Science Education.

This study aims to address this gap by exploring ChatGPT-generated lesson plans through the lens of pedagogical content knowledge (PCK) (Otto & Everett, 2013; Shulman, 1986). The research question guiding this study is as follows: What are the characteristics of lesson plans created by ChatGPT in terms of PCK? Specifically, we investigate how different prompt engineering strategies and PCK-informed prompts influence the characteristics of ChatGPT-generated lesson plans. We hypothesize that incorporating PCK elements in prompts, using layer prompts (Atlas, 2023), and providing reference texts to ChatGPT (Blonder & Feldman-Maggor, 2024) will influence the characteristics of the generated lesson plans.

In the following sections, we discuss key topics related to research on ChatGPT and science lesson planning, the importance of prompt engineering as a critical skill when interacting with ChatGPT, and the PCK framework.

1.1. ChatGPT for Science Lesson Planning

In this section of the paper, we present findings from the literature on the use of ChatGPT in science lesson planning. Cooper (2023), based on a self-study methodology, investigated the ways educators could utilize ChatGPT in their science pedagogy, particularly for lesson planning. Regarding science lesson planning, the researcher instructed ChatGPT to create a teaching unit using the 5E model, specifically designed to challenge students with a strong understanding of renewable and non-renewable energy sources at a Year 7 level. Additionally, the chatbot was tasked with providing support and scaffolding for students struggling with the material. The researcher found that ChatGPT’s lesson plan mostly aligned with the 5E model, incorporating activities such as students sharing prior knowledge, engaging in group work, and participating in classroom debates as well as in self-assessment activities. While ChatGPT can be useful for generating ideas and serving as a starting point, the researcher emphasized that teachers should critically evaluate and adjust the output to fit the specific educational context, including students’ needs, school profile, curriculum, and available resources (Cooper, 2023). However, the researcher only used a single prompt to create the lesson plan and did not engage in further conversation with ChatGPT for refinements.

Hashem et al. (2024) investigated the effectiveness of ChatGPT as a teacher assistant to reduce workload and prevent burnout. The researchers, based on an exploratory research design, tested ChatGPT’s contribution to lesson planning for English, Science, and Math. They used several prompts to instruct ChatGPT in generating lesson plans. The lesson plans were analyzed based on the 5E instructional model, using a checklist created by the researchers that included descriptions of the phases of this model. In addition, qualitative feedback was recorded for aspects of the lesson plans that were either acceptable or required improvement. The initial prompt asked ChatGPT to design a lesson plan for an eighth grade class focusing on square roots and cube roots within a 45 min class period. The researcher identified areas for improvement in student engagement, the use of visual aids, and the lack of strategies for addressing students’ misconceptions. In response, the researcher instructed the model to revise the lesson plan, incorporating enhancements to align with the 5E instructional model. The updated plan adopted a more student-centered approach, integrating group work, real-world examples, and visual aids. Subsequently, the researchers asked ChatGPT to design a lesson plan for a tenth grade chemistry class on decomposition reaction, to be completed within a 45 min period. Although no explicit instructions were given to use the 5E model, ChatGPT generated a lesson plan aligned with the 5E, demonstrating its adaptability by recalling and integrating elements from the previous conversation into its responses. The authors concluded that “through the input of information on curriculum, learning objectives, learning theories, instructional models, and student requirements, ChatGPT can swiftly generate lesson plans and educational materials of high quality…” (Hashem et al., 2024, p. 18). Additionally, the researcher noted that teachers can provide ChatGPT with specific lesson planning templates, enabling it to generate materials aligned with those guidelines. However, the researcher emphasized that the initial prompt alone was insufficient for producing a high-quality lesson plan. The quality improved only after providing a series of more specific prompts. This finding suggests that the effectiveness of lesson plans generated by ChatGPT is highly dependent on the clarity and specificity of the prompts given. Therefore, teachers should craft thoughtful, task-specific prompts to achieve optimal results, engaging in a feedback loop to improve ChatGPT’s accuracy and relevance (Hashem et al., 2024).

ElSayary (2023) explored teachers’ perceptions of using ChatGPT as a supportive tool in teaching and learning through a survey of 40 teachers (grades 6–12), primarily from STEM fields, most of whom had received training on integrating ChatGPT into education. Seven teachers also participated in interviews. The findings revealed that lesson planning was the most prominent area where teachers found ChatGPT beneficial, a conclusion that was further reinforced by the interview responses. Teachers shared a range of experiences with using ChatGPT in lesson planning. Some highlighted positive outcomes, noting that the tool helped generate assessments aligned with learning objectives and provided useful examples of best practices to incorporate into their lessons. Others appreciated how ChatGPT saved time by handling routine tasks, which allowed them to focus more on individualized instruction, feedback, and assessment. They also found its ability to brainstorm ideas and provide lesson structures particularly useful. However, some teachers emphasized that ChatGPT should only be used as a supplementary tool, not a replacement for thoughtful planning and assessment. They stressed the importance of carefully reviewing and evaluating the content generated by the tool to ensure it aligns with instructional goals and objectives (ElSayary, 2023).

1.2. Prompt Engineering

The quality of responses generated by GenAI tools is closely related to the input provided by the user (Moundridou et al., 2024). Prompt engineering, according to UNESCO (2023, p. 11), “…refers to the processes and techniques for composing input to produce GenAI output that more closely resembles the user’s desired intent”. It is regarded as part of the technological dimension of the technological pedagogical content knowledge (TPACK) framework (Feldman-Maggor et al., 2025). The TPACK framework emphasizes that teachers require specific types of knowledge to use technology creatively and effectively in their teaching (Mishra et al., 2023). For educators to use GenAI tools effectively, prompt engineering is a critical skill (Moundridou et al., 2024). Several suggestions have been proposed for prompt engineering. UNESCO (2023, p. 12) emphasizes that prompts are most effective when they “articulate a coherent chain of reasoning centered on a particular problem or a chain of thought in a logical order”. Additionally, using clear and accessible language, providing relevant examples to guide the desired responses, and offering sufficient context are essential for generating meaningful outputs. It is also important to refine and iterate prompts to improve the relevance of responses. Prioritizing ethical issues is necessary, avoiding prompts that may generate inappropriate, biased, or harmful content (UNESCO, 2023). Atlas (2023) also highlights useful strategies, such as instructing the model to respond from a specific persona or perspective, which can help generate answers more relevant to the purpose and audience. Another effective approach is the use of layer prompts, where step-by-step instructions guide the model. For instance, one might begin by requesting a summary of an article, followed by additional prompts to refine the summary based on specific factors, helping the model better align with the intended goal. In addition, the matrix method suggests first asking the model about a topic, and then following up with a question about how a different topic connects to it (Atlas, 2023; Okulu & Muslu, 2024). Furthermore, OpenAI (n.d.) outlines six key strategies, each paired with related tactics—practical ideas to implement—that can help improve the quality of outputs from large language models such as ChatGPT-4o. Firstly, similar to previous papers, OpenAI proposes writing clear instructions including tactics such as adding details to prompts to generate more relevant answers, asking the model to adopt a specific persona, using delimiters to clearly indicate distinct parts of the input, providing examples, and specifying the desired length of the output. The second strategy, which has not been mentioned in previous papers, involves providing reference text. Given that language models can sometimes provide confident but incorrect answers, using reference texts helps in answering with fewer fabrications. This is particularly important in the field of education, as supplying the model with trusted educational sources allows it to generate responses grounded in reliable information. However, this approach does not guarantee that the output will always be accurate, and human evaluation remains essential. This approach seems promising, as it resembles a technique used by Blonder and Feldman-Maggor (2024). In their study, the researchers asked ChatGPT to create a lesson on a chemistry topic. The initial output was incomplete, as it did not include the necessary mathematical and graphical representations. The results improved after a few interactions with ChatGPT, during which the researchers provided specific prompts instructing it to consider equations and graphs. Another improvement was achieved by providing ChatGPT with a PDF file of a scientific paper related to the lesson, which further enhanced the output (Blonder & Feldman-Maggor, 2024). Additional strategies include breaking down complex tasks into simpler subtasks, allowing the model time to “think” before generating a response, using external tools, and systematically testing changes by evaluating the model’s outputs (OpenAI, n.d.).

Moreover, Nazari and Saadi (2024) proposed a ChatGPT prompt development formula consisting of two levels: components and elements. The components include Task, Context, and Instructions (TCI), while the elements encompass Role, Audience, Tone, Examples, and Limits (RATEL). According to researchers, this formula can aid in developing prompts tailored to specific needs, fostering creativity and personalization and reducing the need for post-processing (Nazari & Saadi, 2024). Nonetheless, they emphasize the importance of critically evaluating all ChatGPT responses (Nazari & Saadi, 2024). Similarly, from our perspective, in the context of education, the outputs generated by ChatGPT require careful critical evaluation by educators, as the process is not always straightforward. Achieving the desired results often requires multiple iterations of a prompt, with a strong emphasis on human oversight throughout the process (UNESCO, 2023), including the use of the PCK framework to critically evaluate ChatGPT’s outputs (Feldman-Maggor et al., 2025).

1.3. Pedagogical Content Knowledge

PCK was introduced as an academic construct by Shulman (1986), as teacher-specific professional knowledge (Chaitidou et al., 2018) emphasizing that a teacher’s professional expertise is an amalgam of content knowledge and pedagogical knowledge (Shulman, 1986). As Shulman explained,

“Within the category of pedagogical content knowledge I include, for the most regularly taught topics in one’s subject area, the most useful forms of representation of those ideas, the most powerful analogies, illustrations, examples, explanations and demonstrations—in a word the ways of representing and formulating the subject that make it comprehensible to others… an understanding of what makes the learning of specific topics easy or difficult: the conceptions and preconceptions that students of different ages and backgrounds bring with them to the learning of those most frequently taught topics and lessons… Here, research on teaching and on learning coincide most closely. The study of student misconceptions and their influence on subsequent learning has been among the most fertile topics for cognitive research.”
(pp. 9–10)

In the same vein, recent studies, approaching PCK as “the knowledge of, reasoning behind, and planning for teaching a particular topic in a particular way for a particular purpose to particular students for enhanced student outcomes” (Gess-Newsome, 2015, p. 36), emphasize that the core of PCK lies in pedagogical knowledge, content knowledge, knowledge of students, knowledge of curriculum, and knowledge of assessment (Großmann & Krüger, 2024). Otto and Everett (2013) developed an innovative teaching strategy for introducing PCK concepts to pre-service elementary teachers by employing a three-circle Venn diagram for science lesson planning. The diagram featured overlapping circles representing Pedagogy, Content, and Context, with PCK depicted as the intersection of all three components. The Venn diagram highlighted three main components of PCK: Pedagogy, which refers to the primary teaching strategies chosen in the lesson plan; Context, which addresses the classroom and school environment; and Content, which refers to the learning objectives for the science topic. It also underscored the interplay between each pair of components. The Pedagogy/Context intersection focused on specific strategies to ensure all students in the classroom are effectively reached. The Pedagogy/Content intersection dealt with aligning appropriate teaching strategies with the Content being taught. Meanwhile, the Content/Context intersection concentrated on capturing students’ conceptions of the topic. Finally, the Venn diagram illustrated the overall integrative nature of PCK, summarizing how these components fit together into an effective science lesson. The Venn diagram was used by the pre-service teachers as an easily remembered graphic organizer that helped them design effective science lesson plans (Otto & Everett, 2013).

Lesson planning is an essential aspect of teachers’ professional competence, involving the creation of lesson plans as well as the description and justification of their pedagogical decisions based on their PCK (Großmann & Krüger, 2024; Zaragoza et al., 2024). In the era of GenAI, with tools like ChatGPT capable of producing lesson plans, PCK appears to serve as a valuable framework for enhancing responsible interactions with ChatGPT when used to assist in designing lesson plans. For example, Feldman-Maggor et al. (2025) demonstrated how a teacher used their PCK to craft effective prompts and identify inaccuracies in ChatGPT-generated content. Specifically, in the context of teaching the differences between molecular and ionic materials, a chemistry teacher engaged in an iterative dialogue with ChatGPT, seeking strategies for teaching this concept. The teacher’s PCK was essential in evaluating ChatGPT-generated responses. The teacher identified misconceptions that ChatGPT failed to address and provided prompts to refine the model’s output. The teacher also noticed that ChatGPT did not always generate responses with accurate chemical writing and recognized the need to correct them before presenting the information to students. Therefore, we argue that, on the one hand, PCK provides a knowledge base for creating prompts to guide ChatGPT in lesson planning, while on the other hand, it can be used to critically evaluate and justify the pedagogical soundness of the outputs, i.e., the lesson plans.

2. Materials and Methods

2.1. The Context of the Study

This study is an exploratory study in which the first author engaged in interactions with ChatGPT-4o to create lesson plans on a specific science topic (Cooper, 2023). In alignment with UNESCO (2023), we adopted a teacher–AI co-design approach for lesson planning. To this end, we carefully designed the interactions with ChatGPT guided (a) by literature suggestions concerning prompt engineering focusing on specific strategies (Atlas, 2023; Blonder & Feldman-Maggor, 2024; Okulu & Muslu, 2024; UNESCO, 2023) and (b) by PCK, which was used to design prompts that include aspects of content knowledge, pedagogical knowledge, context knowledge, and their overlaps about a specific science topic (Otto & Everett, 2013). Drawing on Otto and Everett’s (2013) description of the PCK components and the interplay between them, we crafted prompts aligned with the chosen science content. Then, we critically examined the outputs by analyzing the resulting lesson plans using content analysis methods (Mayring, 2014) to identify their characteristics through the lens of PCK (Chaitidou et al., 2018; Otto & Everett, 2013).

The science topic selected for this study was the phenomenon of floating and sinking. This topic was chosen because it is included in primary school science curricula (Kariotoglou & Psillos, 2019), students encounter it frequently in their daily lives (Joung, 2009), and there is extensive research on both students’ conceptions (Yin et al., 2014) and effective teaching strategies for addressing this topic in primary schools (Zoupidis et al., 2021). Briefly, students’ explanations about floating and sinking are rooted in perception-based macroscopic natural properties, such as weight, length, and volume and lead to misconceptions; for example, students think that the shape, size, or weight of an object determines whether it floats or sinks (Yin et al., 2014). As regards teaching floating and sinking phenomena to primary school students, an inquiry-based learning approach is proposed focusing on density-based explanations. This approach focuses on the variables that affect floating and sinking to derive a predictive rule that determines which objects will float or sink. In contrast, a buoyancy-based approach, which explains how an object floats using an equilibrium mechanism, involves intermediate concepts that are beyond the primary school level (Zoupidis et al., 2021). Delving deeper, the density-based approach is suggested to be introduced qualitatively by teaching students the control of variables strategy and using visual representations of the density of objects at the primary school level. This approach aims to help students develop causal relational reasoning by qualitatively comparing the densities of an object and a liquid to determine the floating or sinking outcome, rather than relying on quantitative mathematical calculation of density (Zoupidis et al., 2021).

We created four distinct interactions with ChatGPT to develop lesson plans on the floating and sinking phenomena (Figure 1). In each interaction, the prompts included clear instructions (Atlas, 2023; UNESCO, 2023), and ChatGPT was asked to adopt a specific persona to help the model generate text that was appropriate and aligned with the intended purpose (Atlas, 2023), i.e., “Act as an expert in science lesson planning for primary school”.

In the first interaction, we used a single prompt that instructed ChatGPT to create a lesson plan that was quite close to the one used by Cooper (2023), specifically, “Create a lesson plan using the 5E instructional model about floating–sinking and density for 5th grade primary school students”. This prompt did not include specific learning objectives; therefore, we regarded it as a low integration of PCK. Specifically, it included aspects of Content, i.e., a general reference to floating and sinking phenomena and the concept of density, Pedagogy, i.e., the 5E instructional model, and Context, i.e., the 5th grade primary school students.

In the second interaction, the prompt was revised to incorporate more essential aspects of PCK: “Create a lesson plan using the 5E instructional model about floating–sinking and density for 5th grade primary school students. Students should (a) inquire the factors that might influence the floating or sinking of an object, (b) use the concept of density and the comparison of densities to predict whether objects will float or sink in water, and (c) practice the control of variables strategy”. In this prompt, the Content aspect was tailored to specific learning objectives, addressing both declarative knowledge (e.g., floating, sinking, and density) and procedural knowledge (e.g., control of variables strategy). We consider this prompt to have a higher integration of PCK aspects compared to the initial interaction.

The third interaction began with the same prompt used in the second interaction. Subsequently, layer prompts were employed in a conversational manner to guide ChatGPT, providing specific instructions to revise its responses or generate more relevant and useful outputs (Atlas, 2023). Each prompt was designed to guide the model step by step in revising the lesson plan, incorporating various aspects of PCK. Specifically, for the overlap of Content/Context, a prompt was as follows: “Describe in detail an activity to identify students’ misconceptions about floating and sinking without including any steps aimed at addressing or correcting these misconceptions”. For Pedagogy/Context, the prompts included the following: “Describe in detail an activity aimed at helping students address a specific misconception about floating and sinking” and “Propose specific strategies to reach all students in the classroom, particularly those who struggle with reading”. For Pedagogy/Content, the prompts included the following: “Propose an appropriate teaching strategy to explain the concept of density to primary school students” and “Describe an activity that explicitly teaches students how to test whether a specific variable influence floating or sinking, i.e., only this variable should be changed, while all other independent variables should be controlled”. This interaction features layer prompts with a greater integration of PCK aspects than the second interaction.

The fourth interaction was similar to the third, using layer prompts. The basic difference was that we provided the model with a reference text, i.e., a PDF file of an open-access scientific paper relevant to the topic. The paper included research-based teaching suggestions for floating, sinking, and density, specifically adapted for primary school students (Zoupidis et al., 2021). Providing a reference text to ChatGPT is argued to potentially enhance the quality of its output (Blonder & Feldman-Maggor, 2024; OpenAI, n.d.). In this interaction, we initially used the same prompt as in the third interaction, with an additional request to base the answer on the uploaded paper: “I want your answers to be based on the uploaded paper”. We then instructed the model to revise the lesson plan using similar prompts to those in the third interaction.

We acknowledge that this study did not aim to examine whether ChatGPT produces variable responses if the same prompt is used multiple times. Instead, we focused on how different prompt engineering strategies and PCK-informed prompts influence the characteristics of lesson plans generated by ChatGPT.

2.2. Data Sources and Analysis

The data sources comprise the full text generated from each interaction with ChatGPT during the lesson plan creation process. The coding units were parts of the text that were meaningful with respect to the research question. On the one hand, we followed a deductive approach for categorizing coding units into categories, i.e., components of PCK as proposed by Otto and Everett (2013) and Chaitidou et al. (2018). These categories include Pedagogy, Content, and Context as well as their overlaps: Pedagogy/Content, Pedagogy/Context, Content/Context, and PCK. Most of the categories included subcategories, which were formed inductively based on the data. Table 1 presents the data coding framework. The first author coded all the data. Then, two independent researchers with experience in Science Education coded a subset of the data (approximately 20%). Any differences were resolved through discussion and adjustments to the coding tool.

3. Results

In this section, our results are presented, following the seven categories that were identified, i.e., Pedagogy, Content, Context, Pedagogy/Content, Pedagogy/Context, Content/Context, and PCK.

3.1. Content

The category of Content was informed by two subcategories identified during the coding process: declarative knowledge and procedural knowledge.

Regarding declarative knowledge, across the interactions, the concept of density and its role in floating and sinking phenomena was included in the learning objectives. Examples of these learning objectives were as follows: “Understand the concept of density and how it determines whether an object floats or sinks” (first interaction); “Students will use the concept of density and density comparisons to predict floating or sinking in water” (second interaction); “Understand that density, not weight or size, determines whether an object floats or sinks” and “Correct misconceptions, such as heavier objects always sink” (third interaction); and “Address common misconceptions, such as the belief that heavier objects always sink” and “Use the concept of density to predict whether objects will float or sink in water” (fourth interaction).

As for procedural knowledge, the integration of procedural learning goals varied between the interactions. Even though the prompt in the first interaction did not include any reference to procedural knowledge, ChatGPT provided a learning goal related to carrying out investigations: “Predict and test whether various objects will float or sink”. The other three interactions included prompts that explicitly addressed aspects of procedural knowledge, and ChatGPT provided answers related to the control of variables strategy. For instance, the second interaction suggested that “Students will practice controlling variables in an experiment”, while the third proposed the following: “Practice scientific skills, such as controlling variables and recording data”. In the fourth interaction, where a related scientific paper was uploaded to ChatGPT, the learning objective for the control of variables strategy was explicitly connected to the phenomena of floating and sinking: “Practice the control of variables strategy to investigate how specific factors influence floating and sinking”. Additionally, a notable difference was observed in the third interaction, where a quantitative aspect of measurement was introduced, specifically the calculation of density: “Calculate the density of objects using their mass and volume”.

3.2. Context

The category of Context consisted of three subcategories observed in the data: the focus of the lesson plans on the students’ grade level, the time constraints, and the resources needed to implement the lesson plan.

Regarding the students’ grade level, the prompts provided in all four interactions specified a particular grade of primary school students. In each interaction, ChatGPT’s responses explicitly referred to this grade level using slightly different wording, such as “Grade Level: 5th Grade” (first interaction) and “Grade: 5th Primary” (third interaction).

Concerning time constraints, the prompts provided did not specify a time limit for the lesson plans. In the first three interactions, ChatGPT suggested implementing the lesson plans within 60 min, as indicated by responses such as “Duration: 60 min” (third interaction). However, in the fourth interaction, after a reference paper about teaching floating and sinking was provided to ChatGPT, the suggested duration doubled to “Duration: 2 × 60 min sessions”. Furthermore, after a follow-up conversation that included additional prompts, the duration was further increased to 180 min: “Duration: 3 × 60 min sessions”. Notably, the same prompts were used in both the third and fourth interactions, but the duration increased only in the fourth interaction, where the primary difference was the provision of the reference paper to ChatGPT.

In relation to the resources needed to implement the lesson plan, all four interactions included specific lists of materials required, both for conducting experimental activities and for helping students represent the concepts. Examples of these resources included the following:

“Materials: transparent water containers; objects of different materials, shapes, and sizes (e.g., metal spoon, wooden block, plastic bottle cap, rock, hollow ball); modeling clay; digital scale; ruler; worksheets with visuals and minimal text; visual aids (e.g., ‘dots-in-a-box’ diagrams to represent density); access to a digital simulation of floating and sinking (optional); chart paper and markers.”
(Fourth interaction)

3.3. Pedagogy

Regarding Pedagogy, the 5E instructional model was consistently used to structure the lesson into distinct phases, each with a specific purpose. The lesson plans appropriately included the following phases: Engage, Explore, Explain, Elaborate, and Evaluate. In most answers, the activities provided were aligned with the purpose of each phase. However, some differences were noted across the interactions. In particular, it was observed that although the Engage phase is primarily aimed at enhancing engagement, eliciting prior knowledge, and informally identifying misconceptions related to the content taught, some ChatGPT answers suggested providing students with explanations related to the content. This approach is inconsistent with the purpose of the Engage phase in the 5E model. In particular, ChatGPT suggested introducing density as “how tightly packed the material in an object is…The ball is large but not tightly packed (low density). The key is small but tightly packed (high density)”. This approach, which provides an explanation during the Engage phase and prior to the Exploration phase, seems to align more closely with a confirmatory inquiry model, where the theory is presented first and then tested. This contrasts with a guided inquiry approach, where students explore materials and tools, conduct experiments, and use evidence to construct their own explanations, as outlined in the 5E instructional model (Bybee, 2014).

3.4. Pedagogy/Content

The Pedagogy/Content overlap refers to the alignment of appropriate teaching strategies with the content. Five subcategories were identified: a qualitative approach, a quantitative approach, a density-based approach, a buoyancy-based approach, and a control of variables strategy.

The first, second, and third interactions incorporated both a quantitative approach, which was dominant, and a qualitative approach. Concerning the quantitative approach, these interactions included quantitative explanations of density, which were based on the mathematical formula d = m/v. For example, in the first interaction, ChatGPT suggested using this mathematical calculation to explain the phenomena of floating and sinking: “Use simple math (e.g., “Density = Mass ÷ Volume”) to explain why objects float if their density is less than the density of water (~1 g/cm³)” (first interaction).

The second and third interactions further suggested that students calculate the density of several objects and predict if they float or sink.

“Provide materials to measure mass (scale) and volume (graduated cylinder or water displacement method). Assign groups to calculate the density of one or two objects. Use the formula: Density = Mass/Volume. Have students predict whether the objects will float or sink based on their calculated densities. Test their predictions.”
(Second interaction)

“Explain that water has a density of approximately 1 g/cm³: Objects with a density less than 1 g/cm³ float. Objects with a density greater than 1 g/cm³ sink. Show how to measure mass using a scale and volume using water displacement or geometric calculations (for regular shapes)… Hands-On Mini Activity: Demonstrate with a small object (e.g., a rubber ball): Measure its mass on a scale. Measure its volume using water displacement (submerge it in a graduated cylinder and record the difference). Calculate density and determine if it will float or sink.”
(Third interaction)

Regarding the qualitative approach, the first, second, and third interactions included suggestions to explain density in a qualitative way. These suggestions relied on short, simple explanations, such as: “Mini-Lecture (with visuals): Define density as the amount of ‘stuff’ (mass) packed into a given space (volume)” (second interaction). During the third interaction, ChatGPT was explicitly asked to adopt a qualitative approach. While the revised lesson plan did adopt a qualitative approach and was based on the idea of being “light for its size”, the examples provided were not based on visuals or representations that could facilitate causal reasoning for explaining floating and sinking.

“Explain (15 min) Objective: Reinforce the relationship between density and floating/sinking in a qualitative manner. Teacher Explanation: What is density? Use relatable examples: “Think of a loaf of bread versus a brick of clay. The clay is tightly packed and heavy for its size, so it sinks. The bread is full of air pockets and light for its size, so it floats.” Explain that objects float when their material is less dense than water and sink when their material is more dense than water. Group Discussion: Look at recorded observations. Ask: “What do the floating objects have in common?” “What do the sinking objects have in common?” Address Misconceptions: Clarify that weight and size do not determine floating or sinking—density does.”
(Third interaction)

In the fourth interaction, where a reference paper on teaching floating and sinking was provided to ChatGPT, the approach adopted was qualitative, utilizing a specific representation from the literature known as the “dots-in-a-box” model. ChatGPT suggested an activity aimed at developing a conceptual understanding of density and its role in floating and sinking. The activity involved visualizing density by illustrating dots within a cube, with a greater number of dots representing higher density. This approach could facilitate the development of causal relational reasoning.

“Explain. Objective: Develop conceptual understanding of density and its role in floating/sinking. Activity: Visualizing Density. Use the “dots-in-a-box” representation to explain density qualitatively. Heavier for its size: Show dots in a cube, with more dots representing higher density. Explain the causal relational FS rule: If an object’s density is less than water’s, it floats; otherwise, it sinks.”
(Fourth interaction)

In addition, when prompted to describe in detail an activity for teaching the concepts of floating, sinking, and density to primary school students, ChatGPT’s response included specific examples for qualitatively explaining these phenomena.

“Step 3: Introduction to Density with Visual Aids Present the concept of density using “dots-in-a-box” diagrams: Use a dense material like metal to show dots tightly packed together. Use wood or plastic to show dots more spread out. Explain that density is the relationship between how much “stuff” (mass) an object has in a given space (volume). Relate density to floating and sinking: Objects less dense than water float. Objects denser than water sink. Step 4: Connecting Observations to the Concept Lead a class discussion to connect their observations to density… Discuss why a metal spoon sinks while a large wooden block floats, despite the block being bigger (density vs. weight). Highlight the causal rule from the uploaded paper: If an object’s density is less than water’s density, it floats; otherwise, it sinks. Step 5: Reflect and Explain: Have students complete the worksheet with questions such as: Why do you think some objects floated and others sank?… What have you learned about how density explains floating and sinking? Encourage students to share their answers and reasoning with the class.”
(Fourth interaction)

With regard to the distinction between density-based and buoyancy-based explanations, the former uses an elimination-of-variables approach, focusing on identifying the factors that influence floating and sinking phenomena to derive a prediction rule for determining which object will float. In contrast, buoyancy-based explanations rely on an equilibrium mechanism to explain how an object floats. In the first, second, and third interactions, ChatGPT proposed both density-based and buoyancy-based explanations. For instance, it suggested the following: “Emphasize the importance of both density and buoyancy in understanding floating and sinking” (third interaction). Moreover, within buoyancy-based explanations, Archimedes’ Principle was proposed as a concept to be introduced: “Introduce the idea of Archimedes’ Principle for advanced students” (first interaction). Regarding density-based explanations, these included both the quantitative approach, which involved the mathematical calculation of density to determine whether objects float or sink, and the qualitative approach, which used the “dots-in-a-box” model, as presented in the previous paragraph.

Concerning the control of variables strategy, this was evident in the second, third, and fourth interactions, appearing only after it was explicitly included as an instructional goal in the initial prompt of each interaction. However, the suggested activities related to the control of variables strategy varied in accuracy. Some were characterized as less accurate, such as when ChatGPT proposed brief activities that lacked structured instructions to guide students in controlling variables. For instance, “Use an example (e.g., a clay ball) to demonstrate controlling variables. Flatten the ball into a disc and retest. Ask: Does changing the shape affect whether it floats or sinks? Why? (second interaction)”.

A more accurate suggestion was provided by ChatGPT in the fourth interaction, after a specific prompt asked ChatGPT to describe an activity that explicitly teaches students how to test whether a specific variable influences floating or sinking—ensuring that only this variable is changed, with a focus on size.

“Activity: Testing the Effect of Size on Floating or Sinking Objective: Teach students how to systematically test the effect of size on floating or sinking by changing only the size of an object while keeping other variables (e.g., material, shape, and mass per unit volume) constant. Materials: Transparent container filled with water Clay or modeling material with a uniform density, Digital scale (to ensure consistency in mass per unit volume), Ruler (for measuring dimensions), Worksheets for recording observations, Visual aids to illustrate the relationship between size, volume, and density. Procedure: Step 1: Introduction Begin with a class discussion: Does the size of an object influence whether it floats or sinks? Why or why not? Explain the importance of testing one variable at a time and keeping all other variables controlled. Introduce the variable to be tested: size. Step 2: Preparing for the Experiment Control Variables: Use the same material for all objects (e.g., clay or modeling material with uniform density). Keep the shape of the objects constant (e.g., all objects shaped as spheres). Ensure the material’s density remains constant. Independent Variable: Change only the size of the object (e.g., small, medium, and large spheres). Dependent Variable: Observe whether the object floats or sinks. Step 3: Conducting the Experiment Provide students with clay and a digital scale. Guide students to: Roll the clay into three spheres of different sizes (e.g., a small, medium, and large sphere)…”
(Fourth interaction)

3.5. Pedagogy/Context

The Pedagogy/Context overlap focuses on the specific strategies required to reach all students in the classroom effectively. For this category, four subcategories were identified in the lesson plans: specific strategies to address students’ misconceptions, strategies that facilitate differentiation, different modes of representation that facilitate differentiation, as well as resources needed for the activities.

Concerning the specific strategies to address students’ misconceptions, the first and second interactions did not include activities explicitly suggested by ChatGPT to target specific misconceptions. Such activities only emerged after prompts were provided in the third and fourth interactions, explicitly requesting the following: “Describe in detail an activity aimed at helping students address a specific misconception about floating and sinking”. In response, ChatGPT proposed activities that directly targeted specific misconceptions, offering strategies to help students refine their understanding of floating and sinking phenomena. Specifically, in the third and fourth interactions, ChatGPT suggested experimental activities designed to challenge the misconception that an object’s weight determines whether it floats or sinks. These activities guided students to recognize a conflict between their initial beliefs and the observed behavior of objects during experimentation.

“Activity: Investigating the Misconception “Heavier Objects Always Sink” Title: “Does Weight Determine Floating or Sinking?” Objective: To help students confront and address the misconception that heavier objects always sink, by engaging them in a controlled, inquiry-based activity… Use the scale to measure and record the weight of each object. Write the weights on a chart for everyone to see. Testing in Water: One by one, place the objects into the container of water. Observe whether each object floats or sinks. Record the results next to the corresponding predictions and weights… Discuss observations as a class. Prompt students to identify patterns: “Did the heaviest object always sink?” “Did the lightest object always float?… Introduce the concept of density without fully explaining it yet: “What other factors besides weight might be influencing floating or sinking?”… This will challenge their existing misconception and create cognitive dissonance, motivating them to seek a deeper understanding of the role of density. Key Notes: This activity focuses solely on helping students confront their misconception by comparing their predictions to actual outcomes. It does not yet provide explanations or introduce the concept of density in detail, leaving space for follow-up lessons to address these concepts.”
(Third interaction)

Regarding strategies that facilitate differentiation, relevant suggestions were only provided during the third and fourth interactions. This occurred after a specific prompt explicitly requested strategies to support all students in the classroom, particularly those struggling with reading. In response, ChatGPT proposed several approaches to foster an inclusive learning environment that seems to be relevant to flexible grouping, learning stations, and digital technology.

Flexible grouping was proposed, allowing students to collaborate with peers of varying abilities supporting students with reading difficulties.

“Pair struggling readers with peers for collaborative experiments and discussions. Why It Helps: Students can learn from their peers’ explanations and teamwork reduces individual pressure. Implementation: Assign specific roles (e.g., recorder, tester) to ensure participation without relying on reading-heavy tasks. Encourage discussion about observations in their own words.”
(Third interaction)

“Incorporate Group Activities Collaborative Experiments: Pair students with reading difficulties with peers who can assist in completing written components or interpreting instructions. Discussion Prompts: Facilitate group discussions where students verbally share their predictions and observations instead of relying solely on written responses.”
(Fourth interaction)

Learning stations were also proposed, where students could use a collection of materials to explore the science topic at their own pace. For example, in the fourth interaction, ChatGPT suggested the following: “Water Play Stations: Set up small water play stations where students can test objects without pressure. Encourage exploration and experimentation” (fourth interaction).

Additionally, ChatGPT recommended incorporating digital technology to present interactive and accessible content, ensuring that students with reading challenges could fully engage with the lesson. For instance, it suggested the following: “Provide text-to-speech options for any required reading materials” (third interaction) and “Audio Support: Provide audio explanations or interactive video lessons for students to follow along” (fourth interaction).

Moreover, ChatGPT suggested using different modes of representation to support various learning styles (auditory, visual, and kinesthetic). For example, it proposed the following:

“Use simple graphics to explain density as “how tightly packed” material is, avoiding complex text.”
(Third interaction)

“Use videos, animations, or interactive simulations to illustrate concepts. Why It Helps: These tools can visually represent abstract ideas like density and buoyancy, removing the reading barrier. Implementation: Show a short video explaining why objects float or sink (e.g., comparing objects of different densities).”
(Third interaction)

“Kinesthetic Activities: Allow students to act out density concepts, such as pretending to be “tight-packed dots” for dense materials or “spread-out dots” for less dense materials.”
(Fourth interaction)

“Visual Aids: Use diagrams or animations to explain density, such as “dots-in-a-box” to represent mass relative to volume. Show that dense materials have tightly packed dots, while less dense materials have spread-out dots.”
(Fourth interaction)

Concerning resources needed for the activities, all four interactions included suggestions for providing specific physical materials such as wooden blocks, weighing scales, and water containers.

“Divide students into small groups. Provide each group with materials: a clear container of water, a variety of objects (e.g., metal key, wooden block, clay ball, plastic lid), and a worksheet for recording predictions and observations.”
(Second interaction)

Only in the fourth interaction was a digital simulation proposed as a resource to help students investigate floating and sinking.

“Simulation: Use a digital tool or simulation to explore objects floating in various liquids (e.g., oil, glycerin). Students predict outcomes using the density comparison rule and confirm using simulations.”
(Fourth interaction)

3.6. Content/Context

The Content/Context overlap focuses on activities that provide teachers with insights into their students’ prior knowledge and misconceptions about a science topic. This category includes two subcategories: simple/short activities to elicit students’ prior knowledge/misconceptions and structured activities to identify students’ prior knowledge/misconceptions.

Regarding the simple activity subcategory, these activities rely on short discussions rather than structured tasks. For instance, in the second interaction, ChatGPT proposed the following:

“Begin with a demonstration: Hold up a variety of objects (e.g., a plastic bottle, a coin, a sponge, and a stone). Ask students to predict whether each object will float or sink when placed in a clear container of water. Discussion: Why do you think some objects float while others sink? Does the size or shape of the object matter?”
(Second interaction)

Similarly, in the third interaction, ChatGPT suggested the following:

“Capture students’ attention and uncover pre-existing misconceptions. Activity: Begin with a demonstration. Hold up a variety of objects (e.g., a sponge, a coin, a wooden block, and a small sealed bottle). Ask students: ‘Will this float or sink? Why?’ Students make predictions and briefly explain their reasoning. Discussion Prompts: ‘Why do you think some objects float and others sink?’ ‘Does the size, shape, or weight of an object matter?’ Purpose: This activity reveals misconceptions such as ‘heavier objects always sink’ or ‘bigger objects always float.”
(Third interaction)

Concerning the structured activity subcategory, activities proposed by ChatGPT to identify students’ prior knowledge and misconceptions emerged only after a specific prompt was provided during the third and fourth interactions. The prompt explicitly requested the following: “Describe in detail an activity to identify students’ misconceptions about floating and sinking without including any steps aimed at addressing or correcting these misconceptions”. ChatGPT’s response included structured activities involving worksheets and concept maps that challenged students to make predictions and explain their reasoning about whether various objects would float or sink in water. The activity incorporated an experimental procedure using objects made of different materials and shapes. It also included questions explicitly designed to address common misconceptions about floating and sinking. Students were encouraged to discuss disagreements within their groups and record differing viewpoints. An example from the fourth interaction is as follows:

“Activity: Identifying Students’ Misconceptions About Floating and Sinking Objective: To identify students’ prior knowledge and misconceptions about the principles of floating, sinking, and density. Materials: Transparent container filled with water; Objects of various materials and shapes (e.g., wooden block, metal spoon, plastic bottle cap, inflated balloon, rock); A worksheet with open-ended questions; Chart paper and markers; Procedure: Introduction and Predictions: Present the transparent water container and a variety of objects. Ask students to individually predict whether each object will float or sink and to provide a reason for their prediction. Record their predictions on the worksheet… Have students share their predictions and reasoning with their group members… Testing Objects: In their groups, students test each object by placing it in the water. Ask students to observe and record the outcomes of each test on their worksheet… Facilitate a class discussion where groups explain why they think certain objects floated or sank…” Concept Mapping: Provide chart paper and markers to each group. Ask groups to create a concept map linking factors they believe affect floating and sinking…”
(Fourth interaction)

3.7. PCK

This category includes assessments proposed by ChatGPT to inform teachers about the effectiveness of the lesson plans in addressing students’ declarative and procedural knowledge. More assessment suggestions focused on declarative knowledge, while fewer targeted procedural knowledge.

“Evaluate (5 min) Objective: Assess understanding of concepts and skills. Quick Assessment: Pose reflection questions: What factors determine if an object floats or sinks? How does density relate to floating and sinking?”
(Declarative knowledge, second interaction)

“Evaluate Objective: Assess understanding through practical application. Assessment Task: Provide scenarios involving floating and sinking (e.g., predicting outcomes for new objects or liquids). Students explain their reasoning using the causal relational FS rule and density concepts. Reflection: Students write or discuss what they found most surprising and how their understanding changed.”
(Declarative knowledge, fourth interaction)

“Assessment Rubric: Inquiry Skills: Did students formulate and test predictions?… Experimental Skills: Did students control variables effectively?”
(Procedural knowledge, second interaction)

In addition, in this category, we summarized how all components fit together to form an effective lesson. This involves synthesizing aspects already coded under Pedagogy, Content, Context, and their overlaps to determine whether the Pedagogy and Content are appropriate for the specific Context, i.e., the students’ grade level.

Specifically, in the Pedagogy/Content category, both a qualitative and a quantitative approach were proposed for teaching floating and sinking. However, when considering the Context, particularly the grade level of the primary school students targeted, the qualitative approach appears to be more appropriate, based on the relevant literature (Zoupidis et al., 2021). Notably, only in the fourth interaction, where a specific reference paper was provided to ChatGPT, was the “dots-in-a-box” model used. This model could help students develop causal relational reasoning in a qualitative way by comparing the densities of an object and a liquid to determine whether the object would float or sink, e.g., “Activity: Visualizing Density Use ‘dots-in-a-box’ diagrams: Dense materials (e.g., metal) have tightly packed dots. Less dense materials (e.g., wood) have widely spaced dots. Explain that an object’s ability to float or sink depends on its density relative to water.” (Fourth interaction).

In addition, the Pedagogy/Content category included activities that focused on either a density-based approach or a buoyancy-based approach. When considering the Context, specifically the grade level of primary school students, the literature suggests adopting a density-based approach within the framework of the elimination of variables approach. For example, “Students will learn to: Change only one variable (size) while controlling others (material, shape, density). Understand that floating or sinking is determined by an object’s density relative to water, not size alone.” (Fourth interaction). This approach is considered more appropriate for the students’ grade level, as the buoyancy-based approach requires the understanding of several intermediate concepts, such as the forces of buoyancy and gravity, which are more complex to teach (Zoupidis et al., 2021). Table 2 summarizes the main findings, illustrating how different interactions with ChatGPT influenced the characteristics of lesson plans in terms of PCK aspects.

4. Discussion

This study addresses the need for a responsible integration of generative artificial intelligence in education, particularly in the context of science lesson planning. Although prior research has examined the potential of ChatGPT for generating lesson plans (Cooper, 2023; Moundridou et al., 2024; Okulu & Muslu, 2024), limited attention has been given to evaluating these outputs through established pedagogical frameworks. To fill this gap, this study adopts a sound theoretical framework rooted in Science Education, specifically PCK (Chaitidou et al., 2018; Otto & Everett, 2013). PCK provides a lens for examining the integration of content knowledge, pedagogical knowledge, and context knowledge, as well as their overlaps that guide effective teaching. Additionally, prompt engineering, which is a key component of TPACK in the age of AI (Feldman-Maggor et al., 2025), plays a crucial role in enhancing the quality of ChatGPT-generated outputs. Generating effective lesson plans requires multiple iterations of a prompt, with careful refinement and strong human oversight throughout the process (UNESCO, 2023). In this study, PCK served a dual purpose: it guided the design of prompts used to interact with ChatGPT and provided the framework for analyzing the resulting lesson plans. Four distinct interactions with ChatGPT were conducted to design the lesson plans. The first and second interactions were based on a single prompt, differing in the PCK aspects included. The third and fourth interactions, by contrast, involved layer prompts, where step-by-step instructions guided the model based on detailed PCK-aligned elements. The key distinction between the two was that, in the fourth interaction, a scientific paper was provided, and ChatGPT was instructed to base its responses on the information from that source.

Our results show that incorporating PCK elements and layer prompts can enhance the quality of ChatGPT-generated lesson plans. Layer prompts enhanced the alignment of ChatGPT’s responses with PCK. During the third interaction, for instance, prompts specifically requested strategies to address misconceptions about floating and sinking, a well-documented challenge in Science Education. The resulting lesson plan included structured activities that aimed to identify misconceptions as well as proposed ways to address them. Similarly, the fourth interaction demonstrated how the inclusion of reference materials improved the accuracy of ChatGPT’s outputs. For example, the “dots-in-a-box” model suggested in the fourth lesson plan provided a qualitative way to approach floating–sinking and density, aligning well with research-based suggestions (Zoupidis et al., 2021).

The overlap of Pedagogy/Content was particularly evident in the analysis. ChatGPT’s early responses often relied on a quantitative approach, such as using the formula d = m/v, which, while accurate, may not be suitable for primary school students (Zoupidis et al., 2021). However, layer prompts as well as providing ChatGPT with reference text helped guide the model toward more developmentally appropriate, qualitative strategies, such as visual representations. The fourth interaction, in particular, demonstrated how detailed guidance could shift ChatGPT’s focus to more age-appropriate suggestions.

The Pedagogy/Context overlap revealed that layer prompts improved ChatGPT’s ability to generate effective strategies for addressing student misconceptions and learning diversity. While initial interactions lacked specific activities for tackling misconceptions, later prompts elicited inquiry-based tasks, such as challenging the belief that “heavier objects always sink” (Zoupidis et al., 2021). Differentiation strategies also emerged, including flexible grouping, learning stations, and digital tools to reach all students in the classroom (Tobin & Tippett, 2014; Tomlinson, 2001).

The Content/Context overlap revealed ChatGPT’s ability to design activities for uncovering students’ prior knowledge and misconceptions. Without detailed prompts, the model suggested generic activities. However, when guided by layer prompts, it generated more structured and targeted activities, such as concept mapping, which are better suited for addressing misconceptions (Yin et al., 2014). This finding supports previous research that emphasizes the need for clear and iterative prompts to produce more contextually relevant and pedagogically sound AI outputs (Blonder & Feldman-Maggor, 2024).

Despite these affordances, challenges emerged. ChatGPT’s responses varied in depth and accuracy. One notable issue was related to assessment strategies, as most of ChatGPT’s suggestions focused primarily on summative assessments for declarative knowledge, rather than incorporating formative assessments that provide feedback to support student learning and inform teaching adjustments (Yin et al., 2014). This limitation may be attributed to the fact that our prompts did not explicitly instruct the model to generate formative assessment strategies. Another issue was that ChatGPT generated lesson plans that incorporated both qualitative and quantitative approaches for teaching floating and sinking. However, considering the context, particularly the grade level of the primary school students, the qualitative approach is suggested in the literature instead of the quantitative approach (Zoupidis et al., 2021). Additionally, without specific guidance, ChatGPT occasionally introduced advanced concepts, such as buoyancy-based explanations and Archimedes’ Principle, which may not be appropriate for primary school learners (Zoupidis et al., 2021). This is a crucial point, as the literature highlights that generative AI tools can produce inaccurate information (Exintaris et al., 2023; Mishra et al., 2023). Therefore, teachers need to be trained to be aware of these limitations. Understanding how these tools are developed (Mishra et al., 2023), along with the disconnect between how GenAI models “appear” to understand the text they generate and the reality that they lack true comprehension of language and the real world (UNESCO, 2023), is essential for teachers to recognize their tendencies to generate inaccurate information. Familiarity with these constraints may help teachers critically evaluate AI-generated content in educational settings (Mishra et al., 2023), ensuring it is appropriate for their students and aligned with the curriculum (Blonder & Feldman-Maggor, 2024; UNESCO, 2023).

Recommendations

Based on the findings, we propose guidelines for the responsible use of ChatGPT to support teachers in co-designing lesson plans for Science Education (Figure 2).

Figure 2 illustrates a teacher–generative AI co-design process (UNESCO, 2023), where educators guide the GhatGPT by creating prompts grounded in PCK and iteratively refining them to improve the quality of the generated lesson plans. The teacher begins by designing prompts that integrate key components of PCK. Building on prompt engineering strategies, a TPACK aspect (Feldman-Maggor et al., 2025) identified as effective in this study involves the use of layer prompts in a conversational manner. These prompts include clear, step-by-step instructions that integrate PCK elements. Additionally, it is proposed to provide ChatGPT with reference texts, such as scientific papers, relevant to the content being taught. ChatGPT processes the prompt and generates an initial output, which is then critically evaluated by the teacher using their PCK. Through an iterative process, the teacher refines the prompts, to address specific aspects, such as misconceptions or aligning activities with the students’ grade level. The teacher then determines which activities in the lesson plans are suitable for instruction and makes further adjustments to tailor the content to their students’ specific context, resulting in the final lesson plan.

Further research could focus on designing activities within the context of a university course, following these guidelines, to train primary school teacher students in the responsible use of GenAI for science lesson planning through the lens of PCK. This includes investigating how primary school teacher students utilize ChatGPT in science lesson planning after being introduced to prompt engineering strategies and the PCK framework, combining multiple data sources such as ChatGPT-assisted lesson plans created by teacher students and insights from focus group discussions.

Author Contributions

Conceptualization, G.P. and D.S.; methodology, G.P. and D.S.; investigation, G.P. and D.S.; writing—original draft preparation, G.P. and D.S.; writing—review and editing, G.P. and D.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Atlas, S. (2023). ChatGPT for higher education and professional development: A guide to conversational AI. Available online: https://digitalcommons.uri.edu/cba_facpubs/548 (accessed on 10 January 2025).
Avraamidou, L. (2024). Can we disrupt the momentum of the AI colonization of science education? Journal of Research in Science Teaching, 61(10), 2570–2574. [Google Scholar] [CrossRef]
Blonder, R., & Feldman-Maggor, Y. (2024). AI for chemistry teaching: Responsible AI and ethical considerations. Chemistry Teacher International, 6(4), 385–395. [Google Scholar] [CrossRef]
Bybee, R. W. (2014). The BSCS 5E instructional model: Personal reflections and contemporary implications. Science and Children, 51(8), 10–13. [Google Scholar] [CrossRef]
Chaitidou, M., Spyrtou, A., Kariotoglou, P., & Dimitriadou, C. (2018). Professional development in inquiry-oriented pedagogical content knowledge among primary school teachers. The International Journal of Science, Mathematics and Technology Learning, 25(2), 17–36. [Google Scholar] [CrossRef]
Chan, C. K. Y., & Colloton, T. (2024). Generative AI in higher education. Routledge. [Google Scholar] [CrossRef]
Clark, T. M., Fhaner, M., Stoltzfus, M., & Queen, M. S. (2024). Using ChatGPT to support lesson planning for the historical experiments of Thomson, Millikan, and Rutherford. Journal of Chemical Education, 101(5), 1992–1999. [Google Scholar] [CrossRef]
Cooper, G. (2023). Examining science education in ChatGPT: An exploratory study of generative artificial intelligence. Journal of Science Education and Technology, 32(3), 444–452. [Google Scholar] [CrossRef]
ElSayary, A. (2023). An investigation of teachers’ perceptions of using ChatGPT as a supporting tool for teaching and learning in the digital era. Journal of Computer Assisted Learning, 40(3), 931–945. [Google Scholar] [CrossRef]
Exintaris, B., Karunaratne, N., & Yuriev, E. (2023). Metacognition and critical thinking: Using ChatGPT-generated responses as prompts for critique in a problem-solving workshop (SMARTCHEMPer). Journal of Chemical Education, 100(8), 2972–2980. [Google Scholar] [CrossRef]
Farrelly, T., & Baker, N. (2023). Generative artificial intelligence: Implications and considerations for higher education practice. Education Sciences, 13(11), 1109. [Google Scholar] [CrossRef]
Farrokhnia, M., Banihashem, S. K., Noroozi, O., & Wals, A. (2023). A SWOT analysis of ChatGPT: Implications for educational practice and research. Innovations in Education and Teaching International, 61(3), 460–474. [Google Scholar] [CrossRef]
Feldman-Maggor, Y., Blonder, R., & Alexandron, G. (2025). Perspectives of generative AI in chemistry education within the TPACK framework. Journal of Science Education and Technology, 34, 1–12. [Google Scholar] [CrossRef]
Gess-Newsome, J. (2015). A model of teacher professional knowledge and skill including PCK: Results of the thinking from the PCK Summit. In A. Berry, P. Friedrichsen, & J. Loughran (Eds.), Reexamining pedagogical content knowledge in science education (pp. 28–42). Routledge. [Google Scholar]
Großmann, L., & Krüger, D. (2024). Assessing the quality of science teachers’ lesson plans: Evaluation and application of a novel instrument. Science Education, 108(1), 153–189. [Google Scholar] [CrossRef]
Halaweh, M. (2023). ChatGPT in education: Strategies for responsible implementation. Contemporary Educational Technology, 15(2), ep421. [Google Scholar] [CrossRef]
Hashem, R., Ali, N., El Zein, F., Fidalgo, P., & Abu Khurma, O. (2024). AI to the rescue: Exploring the potential of ChatGPT as a teacher ally for workload relief and burnout prevention. Research and Practice in Technology Enhanced Learning, 19, 23. [Google Scholar] [CrossRef]
Joung, Y. J. (2009). Children’s typically-perceived-situations of floating and sinking. International Journal of Science Education, 31(1), 101–127. [Google Scholar] [CrossRef]
Kariotoglou, P., & Psillos, D. (2019). Teaching and learning pressure and fluids. Fluids, 4(4), 194. [Google Scholar] [CrossRef]
Mai, D. T. T., Da, C. V., & Hanh, N. V. (2024). The use of ChatGPT in teaching and learning: A systematic review through SWOT analysis approach. Frontiers in Education, 9, 1328769. [Google Scholar] [CrossRef]
Mayring, P. (2014). Qualitative content analysis: Theoretical foundation, basic procedures and software solution. Available online: http://nbn-resolving.de/urn:nbn:de:0168-ssoar-395173 (accessed on 5 January 2025).
Mishra, P., Warr, M., & Islam, R. (2023). TPACK in the age of ChatGPT and Generative AI. Journal of Digital Learning in Teacher Education, 39(4), 235–251. [Google Scholar] [CrossRef]
Moundridou, M., Matzakos, N., & Doukakis, S. (2024). Generative AI tools as educators’ assistants: Designing and implementing inquiry-based lesson plans. Computers and Education: Artificial Intelligence, 7, 100277. [Google Scholar] [CrossRef]
Nazari, M., & Saadi, G. (2024). Developing effective prompts to improve communication with ChatGPT: A formula for higher education stakeholders. Discover Education, 3(1), 45. [Google Scholar] [CrossRef]
Okulu, H. Z., & Muslu, N. (2024). Designing a course for pre-service science teachers using ChatGPT: What ChatGPT brings to the table. Interactive Learning Environments, 32(10), 7450–7467. [Google Scholar] [CrossRef]
OpenAI. (n.d.). Prompt engineering. Available online: https://platform.openai.com/Docs/Guides/Prompt-Engineering#tactic-Instruct-the-Model-to-Answer-Using-a-Reference-Text (accessed on 10 January 2025).
Otto, C. A., & Everett, S. A. (2013). An instructional strategy to introduce pedagogical content knowledge using Venn diagrams. Journal of Science Teacher Education, 24(2), 391–403. [Google Scholar] [CrossRef]
Shulman, L. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15(2), 4–14. [Google Scholar] [CrossRef]
Tobin, R., & Tippett, C. D. (2014). Possibilities and potential barriers: Learning to plan for differentiated instruction in elementary science. International Journal of Science and Mathematics Education, 12(2), 423–443. [Google Scholar] [CrossRef]
Tomlinson, C. (2001). How to differentiate instruction in mixed-ability classrooms. ASCD. [Google Scholar]
UNESCO. (2023). Guidance for generative AI in education and research. UNESCO. [Google Scholar] [CrossRef]
Yin, Y., Tomita, M. K., & Shavelson, R. J. (2014). Using formal embedded formative assessments aligned with a short-term learning progression to promote conceptual change and achievement in science. International Journal of Science Education, 36(4), 531–552. [Google Scholar] [CrossRef]
Zaragoza, A., Seidel, T., & Hiebert, J. (2024). Exploring preservice teachers’ abilities to connect professional knowledge with lesson planning and observation. European Journal of Teacher Education, 47(1), 120–139. [Google Scholar] [CrossRef]
Zhang, P., & Tur, G. (2024). A systematic review of ChatGPT use in K-12 education. European Journal of Education, 59(2), e12599. [Google Scholar] [CrossRef]
Zoupidis, A., Spyrtou, A., Pnevmatikos, D., & Kariotoglou, P. (2021). Teaching and learning floating and sinking: Didactic transformation in a density-based approach. Fluids, 6(4), 158. [Google Scholar] [CrossRef]

Figure 1. Overview of the four interactions with ChatGPT.

Figure 2. Guidelines for teacher–ChatGPT co-design of lesson plans based on PCK.

Table 1. A framework for data coding adapted by Otto and Everett (2013) and Chaitidou et al. (2018).

Category	Description	Subcategory
1 Pedagogy	Main teaching strategy	5E lesson format
2 Content	Learning objectives	(a) Declarative knowledge, e.g., the concept of density (b) Procedural knowledge, e.g., control of variables strategy
3 Context	Descriptions of class and school environment including recourses and time constraints	(a) Grade level, (b) the time constraints, (c) list of resources
4 Pedagogy/Content	Alignment of appropriate teaching strategy with content	(a) Qualitative approach of content, (b) quantitative approach of content, (c) density-based approach, (d) buoyancy-based approach, (e) control of variables strategy
5 Pedagogy/Context	Specific strategies to reach all students in the classroom	(a) Specific strategies to address students’ misconceptions, (b) strategies that facilitate differentiation, (c) different modes of representation that facilitate differentiation, (d) specific resources needed for the activities that have been selected
6 Content/Context	Capture students’ conceptions of the topic	(a) Simple/short activities to elicit students’ prior knowledge/misconceptions; (b) structured activities to identify students’ prior knowledge/misconceptions
7 PCK	Summary of how all segments fit together into an effective lesson as well as proposed assessments to inform teachers about the effectiveness of the lesson plans	(a) Assessments proposed by ChatGPT to inform teachers about the effectiveness of the lesson plans in addressing students’ declarative and procedural knowledge (b) Summary of how all segments fit together into an effective lesson: in our analysis of lesson plans, this subcategory focuses on identifying whether the pedagogy and content are appropriate for the specific context, i.e., the students’ grade level

Table 2. Main findings on how different interactions with ChatGPT influenced lesson plan characteristics in terms of PCK.

Lesson Plan Characteristics	Interaction 1: Single Prompt with Low PCK Integration	Interaction 2: Single Prompt with Higher PCK Integration	Interaction 3: Layer Prompts for Step-by-Step PCK Integration	Interaction 4: Providing ChatGPT with a Reference Text and Layer Prompts
knowledge	declarative and procedural	declarative and procedural	declarative and procedural	declarative and procedural
qualitative/quantitative approach	mainly quantitative	mainly quantitative	mainly quantitative	qualitative
density-based/buoyancy-based explanations	density-based and buoyancy-based explanations	density-based and buoyancy-based explanations	density-based and buoyancy-based explanations	density-based explanations
activities for the control of variables strategy	-	less accurate activities for the controlling variables	less accurate activities for the controlling variables	more accurate activities for the controlling variables
activities to elicit students’ prior knowledge/misconceptions	simple/short activities to elicit students’ prior knowledge/misconceptions	simple/short activities to elicit students’ prior knowledge/misconceptions	structured activities to identify students’ prior knowledge/misconceptions	structured activities to identify students’ prior knowledge/misconceptions
strategies to address students’ misconceptions	-	-	specific strategies to address students’ misconceptions	specific strategies to address students’ misconceptions
differentiation	-	-	strategies that facilitate differentiation	strategies that facilitate differentiation

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Peikos, G.; Stavrou, D. ChatGPT for Science Lesson Planning: An Exploratory Study Based on Pedagogical Content Knowledge. Educ. Sci. 2025, 15, 338. https://doi.org/10.3390/educsci15030338

AMA Style

Peikos G, Stavrou D. ChatGPT for Science Lesson Planning: An Exploratory Study Based on Pedagogical Content Knowledge. Education Sciences. 2025; 15(3):338. https://doi.org/10.3390/educsci15030338

Chicago/Turabian Style

Peikos, Giorgos, and Dimitris Stavrou. 2025. "ChatGPT for Science Lesson Planning: An Exploratory Study Based on Pedagogical Content Knowledge" Education Sciences 15, no. 3: 338. https://doi.org/10.3390/educsci15030338

APA Style

Peikos, G., & Stavrou, D. (2025). ChatGPT for Science Lesson Planning: An Exploratory Study Based on Pedagogical Content Knowledge. Education Sciences, 15(3), 338. https://doi.org/10.3390/educsci15030338

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ChatGPT for Science Lesson Planning: An Exploratory Study Based on Pedagogical Content Knowledge

Abstract

1. Introduction

1.1. ChatGPT for Science Lesson Planning

1.2. Prompt Engineering

1.3. Pedagogical Content Knowledge

2. Materials and Methods

2.1. The Context of the Study

2.2. Data Sources and Analysis

3. Results

3.1. Content

3.2. Context

3.3. Pedagogy

3.4. Pedagogy/Content

3.5. Pedagogy/Context

3.6. Content/Context

3.7. PCK

4. Discussion

Recommendations

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI