1. Introduction
Contemporary education is facing a changing landscape dominated by technological advancement, with cutting-edge technologies, specifically artificial intelligence (AI), attracting the interest of students, educators, and researchers (
Farrokhnia et al., 2023). A subfield of AI called generative AI (GenAI) is devoted to creating new content, such as text, images, audio, videos, and even code. “The term generative refers to AI’s ability to produce novel outputs rather than merely replicating, sorting, processing, or analyzing given inputs” (
Chan & Colloton, 2024, p. 9). One well-known tool in this area is ChatGPT, a transformer-based language model developed by OpenAI. Natural language processing enables generating human-like text, responding to queries, and summarizing information, producing conversations that closely resemble human discourse. Its capacity to understand complex language patterns has led to its widespread use as a tool, having been trained on vast amounts of data. ChatGPT is a breakthrough in AI technology because of its accessibility to the general public, ease of use, and human-like responses based on the user prompts (
Chan & Colloton, 2024;
ElSayary, 2023;
Okulu & Muslu, 2024).
Several studies have identified opportunities for leveraging ChatGPT in education. For teachers, ChatGPT can assist with lesson planning, the creation of teaching materials, and the development of assessment tools, such as quizzes and rubrics, tailored to students’ varying academic levels (
Clark et al., 2024;
Mai et al., 2024;
Zhang & Tur, 2024). These features save teachers time and help reduce burnout (
Hashem et al., 2024). Additionally, ChatGPT can provide instant feedback on student assignments, ranging from essays to problem-solving tasks, thereby improving workflow efficiency (
Farrokhnia et al., 2023). Despite its benefits, ChatGPT has limitations and raises concerns. For instance, its lack of contextual understanding can result in recommendations that are either too simplistic or overly complex (
Farrokhnia et al., 2023). Moreover, issues of academic integrity have emerged, raising ethical concerns about cheating and over-reliance on AI tools (
Farrelly & Baker, 2023). Bias in AI outputs and the risk of perpetuating stereotypes also pose challenges, particularly when the training data lacks representation of diverse populations (
Avraamidou, 2024;
Halaweh, 2023). Lastly, ChatGPT may produce “hallucinations” or incorrect information, underscoring the importance of human oversight (
Exintaris et al., 2023;
Mishra et al., 2023).
Seeking responsible GenAI use in education,
UNESCO (
2023, p. 15) emphasizes “unexplainable models used to generate outputs”. This means that, even though the overall approach of GenAI models may be understandable, “the specific models’ parameters, including their weights, are not transparent or easily inspected” (
Blonder & Feldman-Maggor, 2024, p. 5), particularly in explaining “how artificial intelligence systems make decisions, what data they use, and why they produce specific results” (
Blonder & Feldman-Maggor, 2024). Characterizing these GenAI systems as opaque, teachers face the challenge of a lack of clarity on how outputs are generated. Therefore, it is imperative for teachers to critically evaluate the outputs, when using GenAI tools as assistants in teaching, based on their subject-related knowledge, pedagogical knowledge, and experience. This is necessary to prevent inaccuracies in scientific content and ensure alignment with pedagogical theories, the curriculum, and student characteristics (
Feldman-Maggor et al., 2025). Consequently, teachers’ content knowledge and pedagogical knowledge provide a strong foundation for critically evaluating GenAI outputs and ensuring their responsible application in education. In this context, pedagogical content knowledge (PCK) could offer a framework for engaging with GenAI tools effectively (
Feldman-Maggor et al., 2025).
Several researchers have emphasized ChatGPT’s potential to assist teachers with tasks such as lesson planning and generating educational materials (
Cooper, 2023;
Moundridou et al., 2024;
Okulu & Muslu, 2024). However, while ChatGPT offers promising capabilities, its effectiveness in designing science lesson plans that align with research-based recommendations in Science Education remains limited. A critical issue is ensuring that AI-generated lesson plans are pedagogically sound and appropriate for specific educational contexts, such as primary school Science Education.
This study aims to address this gap by exploring ChatGPT-generated lesson plans through the lens of pedagogical content knowledge (PCK) (
Otto & Everett, 2013;
Shulman, 1986). The research question guiding this study is as follows: What are the characteristics of lesson plans created by ChatGPT in terms of PCK? Specifically, we investigate how different prompt engineering strategies and PCK-informed prompts influence the characteristics of ChatGPT-generated lesson plans. We hypothesize that incorporating PCK elements in prompts, using layer prompts (
Atlas, 2023), and providing reference texts to ChatGPT (
Blonder & Feldman-Maggor, 2024) will influence the characteristics of the generated lesson plans.
In the following sections, we discuss key topics related to research on ChatGPT and science lesson planning, the importance of prompt engineering as a critical skill when interacting with ChatGPT, and the PCK framework.
1.1. ChatGPT for Science Lesson Planning
In this section of the paper, we present findings from the literature on the use of ChatGPT in science lesson planning.
Cooper (
2023), based on a self-study methodology, investigated the ways educators could utilize ChatGPT in their science pedagogy, particularly for lesson planning. Regarding science lesson planning, the researcher instructed ChatGPT to create a teaching unit using the 5E model, specifically designed to challenge students with a strong understanding of renewable and non-renewable energy sources at a Year 7 level. Additionally, the chatbot was tasked with providing support and scaffolding for students struggling with the material. The researcher found that ChatGPT’s lesson plan mostly aligned with the 5E model, incorporating activities such as students sharing prior knowledge, engaging in group work, and participating in classroom debates as well as in self-assessment activities. While ChatGPT can be useful for generating ideas and serving as a starting point, the researcher emphasized that teachers should critically evaluate and adjust the output to fit the specific educational context, including students’ needs, school profile, curriculum, and available resources (
Cooper, 2023). However, the researcher only used a single prompt to create the lesson plan and did not engage in further conversation with ChatGPT for refinements.
Hashem et al. (
2024) investigated the effectiveness of ChatGPT as a teacher assistant to reduce workload and prevent burnout. The researchers, based on an exploratory research design, tested ChatGPT’s contribution to lesson planning for English, Science, and Math. They used several prompts to instruct ChatGPT in generating lesson plans. The lesson plans were analyzed based on the 5E instructional model, using a checklist created by the researchers that included descriptions of the phases of this model. In addition, qualitative feedback was recorded for aspects of the lesson plans that were either acceptable or required improvement. The initial prompt asked ChatGPT to design a lesson plan for an eighth grade class focusing on square roots and cube roots within a 45 min class period. The researcher identified areas for improvement in student engagement, the use of visual aids, and the lack of strategies for addressing students’ misconceptions. In response, the researcher instructed the model to revise the lesson plan, incorporating enhancements to align with the 5E instructional model. The updated plan adopted a more student-centered approach, integrating group work, real-world examples, and visual aids. Subsequently, the researchers asked ChatGPT to design a lesson plan for a tenth grade chemistry class on decomposition reaction, to be completed within a 45 min period. Although no explicit instructions were given to use the 5E model, ChatGPT generated a lesson plan aligned with the 5E, demonstrating its adaptability by recalling and integrating elements from the previous conversation into its responses. The authors concluded that “through the input of information on curriculum, learning objectives, learning theories, instructional models, and student requirements, ChatGPT can swiftly generate lesson plans and educational materials of high quality…” (
Hashem et al., 2024, p. 18). Additionally, the researcher noted that teachers can provide ChatGPT with specific lesson planning templates, enabling it to generate materials aligned with those guidelines. However, the researcher emphasized that the initial prompt alone was insufficient for producing a high-quality lesson plan. The quality improved only after providing a series of more specific prompts. This finding suggests that the effectiveness of lesson plans generated by ChatGPT is highly dependent on the clarity and specificity of the prompts given. Therefore, teachers should craft thoughtful, task-specific prompts to achieve optimal results, engaging in a feedback loop to improve ChatGPT’s accuracy and relevance (
Hashem et al., 2024).
ElSayary (
2023) explored teachers’ perceptions of using ChatGPT as a supportive tool in teaching and learning through a survey of 40 teachers (grades 6–12), primarily from STEM fields, most of whom had received training on integrating ChatGPT into education. Seven teachers also participated in interviews. The findings revealed that lesson planning was the most prominent area where teachers found ChatGPT beneficial, a conclusion that was further reinforced by the interview responses. Teachers shared a range of experiences with using ChatGPT in lesson planning. Some highlighted positive outcomes, noting that the tool helped generate assessments aligned with learning objectives and provided useful examples of best practices to incorporate into their lessons. Others appreciated how ChatGPT saved time by handling routine tasks, which allowed them to focus more on individualized instruction, feedback, and assessment. They also found its ability to brainstorm ideas and provide lesson structures particularly useful. However, some teachers emphasized that ChatGPT should only be used as a supplementary tool, not a replacement for thoughtful planning and assessment. They stressed the importance of carefully reviewing and evaluating the content generated by the tool to ensure it aligns with instructional goals and objectives (
ElSayary, 2023).
1.2. Prompt Engineering
The quality of responses generated by GenAI tools is closely related to the input provided by the user (
Moundridou et al., 2024). Prompt engineering, according to
UNESCO (
2023, p. 11), “…refers to the processes and techniques for composing input to produce GenAI output that more closely resembles the user’s desired intent”. It is regarded as part of the technological dimension of the technological pedagogical content knowledge (TPACK) framework (
Feldman-Maggor et al., 2025). The TPACK framework emphasizes that teachers require specific types of knowledge to use technology creatively and effectively in their teaching (
Mishra et al., 2023). For educators to use GenAI tools effectively, prompt engineering is a critical skill (
Moundridou et al., 2024). Several suggestions have been proposed for prompt engineering.
UNESCO (
2023, p. 12) emphasizes that prompts are most effective when they “articulate a coherent chain of reasoning centered on a particular problem or a chain of thought in a logical order”. Additionally, using clear and accessible language, providing relevant examples to guide the desired responses, and offering sufficient context are essential for generating meaningful outputs. It is also important to refine and iterate prompts to improve the relevance of responses. Prioritizing ethical issues is necessary, avoiding prompts that may generate inappropriate, biased, or harmful content (
UNESCO, 2023).
Atlas (
2023) also highlights useful strategies, such as instructing the model to respond from a specific persona or perspective, which can help generate answers more relevant to the purpose and audience. Another effective approach is the use of layer prompts, where step-by-step instructions guide the model. For instance, one might begin by requesting a summary of an article, followed by additional prompts to refine the summary based on specific factors, helping the model better align with the intended goal. In addition, the matrix method suggests first asking the model about a topic, and then following up with a question about how a different topic connects to it (
Atlas, 2023;
Okulu & Muslu, 2024). Furthermore,
OpenAI (
n.d.) outlines six key strategies, each paired with related tactics—practical ideas to implement—that can help improve the quality of outputs from large language models such as ChatGPT-4o. Firstly, similar to previous papers, OpenAI proposes writing clear instructions including tactics such as adding details to prompts to generate more relevant answers, asking the model to adopt a specific persona, using delimiters to clearly indicate distinct parts of the input, providing examples, and specifying the desired length of the output. The second strategy, which has not been mentioned in previous papers, involves providing reference text. Given that language models can sometimes provide confident but incorrect answers, using reference texts helps in answering with fewer fabrications. This is particularly important in the field of education, as supplying the model with trusted educational sources allows it to generate responses grounded in reliable information. However, this approach does not guarantee that the output will always be accurate, and human evaluation remains essential. This approach seems promising, as it resembles a technique used by
Blonder and Feldman-Maggor (
2024). In their study, the researchers asked ChatGPT to create a lesson on a chemistry topic. The initial output was incomplete, as it did not include the necessary mathematical and graphical representations. The results improved after a few interactions with ChatGPT, during which the researchers provided specific prompts instructing it to consider equations and graphs. Another improvement was achieved by providing ChatGPT with a PDF file of a scientific paper related to the lesson, which further enhanced the output (
Blonder & Feldman-Maggor, 2024). Additional strategies include breaking down complex tasks into simpler subtasks, allowing the model time to “think” before generating a response, using external tools, and systematically testing changes by evaluating the model’s outputs (
OpenAI, n.d.).
Moreover,
Nazari and Saadi (
2024) proposed a ChatGPT prompt development formula consisting of two levels: components and elements. The components include Task, Context, and Instructions (TCI), while the elements encompass Role, Audience, Tone, Examples, and Limits (RATEL). According to researchers, this formula can aid in developing prompts tailored to specific needs, fostering creativity and personalization and reducing the need for post-processing (
Nazari & Saadi, 2024). Nonetheless, they emphasize the importance of critically evaluating all ChatGPT responses (
Nazari & Saadi, 2024). Similarly, from our perspective, in the context of education, the outputs generated by ChatGPT require careful critical evaluation by educators, as the process is not always straightforward. Achieving the desired results often requires multiple iterations of a prompt, with a strong emphasis on human oversight throughout the process (
UNESCO, 2023), including the use of the PCK framework to critically evaluate ChatGPT’s outputs (
Feldman-Maggor et al., 2025).
1.3. Pedagogical Content Knowledge
PCK was introduced as an academic construct by
Shulman (
1986), as teacher-specific professional knowledge (
Chaitidou et al., 2018) emphasizing that a teacher’s professional expertise is an amalgam of content knowledge and pedagogical knowledge (
Shulman, 1986). As Shulman explained,
“Within the category of pedagogical content knowledge I include, for the most regularly taught topics in one’s subject area, the most useful forms of representation of those ideas, the most powerful analogies, illustrations, examples, explanations and demonstrations—in a word the ways of representing and formulating the subject that make it comprehensible to others… an understanding of what makes the learning of specific topics easy or difficult: the conceptions and preconceptions that students of different ages and backgrounds bring with them to the learning of those most frequently taught topics and lessons… Here, research on teaching and on learning coincide most closely. The study of student misconceptions and their influence on subsequent learning has been among the most fertile topics for cognitive research.”
(pp. 9–10)
In the same vein, recent studies, approaching PCK as “the knowledge of, reasoning behind, and planning for teaching a particular topic in a particular way for a particular purpose to particular students for enhanced student outcomes” (
Gess-Newsome, 2015, p. 36), emphasize that the core of PCK lies in pedagogical knowledge, content knowledge, knowledge of students, knowledge of curriculum, and knowledge of assessment (
Großmann & Krüger, 2024).
Otto and Everett (
2013) developed an innovative teaching strategy for introducing PCK concepts to pre-service elementary teachers by employing a three-circle Venn diagram for science lesson planning. The diagram featured overlapping circles representing Pedagogy, Content, and Context, with PCK depicted as the intersection of all three components. The Venn diagram highlighted three main components of PCK: Pedagogy, which refers to the primary teaching strategies chosen in the lesson plan; Context, which addresses the classroom and school environment; and Content, which refers to the learning objectives for the science topic. It also underscored the interplay between each pair of components. The Pedagogy/Context intersection focused on specific strategies to ensure all students in the classroom are effectively reached. The Pedagogy/Content intersection dealt with aligning appropriate teaching strategies with the Content being taught. Meanwhile, the Content/Context intersection concentrated on capturing students’ conceptions of the topic. Finally, the Venn diagram illustrated the overall integrative nature of PCK, summarizing how these components fit together into an effective science lesson. The Venn diagram was used by the pre-service teachers as an easily remembered graphic organizer that helped them design effective science lesson plans (
Otto & Everett, 2013).
Lesson planning is an essential aspect of teachers’ professional competence, involving the creation of lesson plans as well as the description and justification of their pedagogical decisions based on their PCK (
Großmann & Krüger, 2024;
Zaragoza et al., 2024). In the era of GenAI, with tools like ChatGPT capable of producing lesson plans, PCK appears to serve as a valuable framework for enhancing responsible interactions with ChatGPT when used to assist in designing lesson plans. For example,
Feldman-Maggor et al. (
2025) demonstrated how a teacher used their PCK to craft effective prompts and identify inaccuracies in ChatGPT-generated content. Specifically, in the context of teaching the differences between molecular and ionic materials, a chemistry teacher engaged in an iterative dialogue with ChatGPT, seeking strategies for teaching this concept. The teacher’s PCK was essential in evaluating ChatGPT-generated responses. The teacher identified misconceptions that ChatGPT failed to address and provided prompts to refine the model’s output. The teacher also noticed that ChatGPT did not always generate responses with accurate chemical writing and recognized the need to correct them before presenting the information to students. Therefore, we argue that, on the one hand, PCK provides a knowledge base for creating prompts to guide ChatGPT in lesson planning, while on the other hand, it can be used to critically evaluate and justify the pedagogical soundness of the outputs, i.e., the lesson plans.
3. Results
In this section, our results are presented, following the seven categories that were identified, i.e., Pedagogy, Content, Context, Pedagogy/Content, Pedagogy/Context, Content/Context, and PCK.
3.1. Content
The category of Content was informed by two subcategories identified during the coding process: declarative knowledge and procedural knowledge.
Regarding declarative knowledge, across the interactions, the concept of density and its role in floating and sinking phenomena was included in the learning objectives. Examples of these learning objectives were as follows: “Understand the concept of density and how it determines whether an object floats or sinks” (first interaction); “Students will use the concept of density and density comparisons to predict floating or sinking in water” (second interaction); “Understand that density, not weight or size, determines whether an object floats or sinks” and “Correct misconceptions, such as heavier objects always sink” (third interaction); and “Address common misconceptions, such as the belief that heavier objects always sink” and “Use the concept of density to predict whether objects will float or sink in water” (fourth interaction).
As for procedural knowledge, the integration of procedural learning goals varied between the interactions. Even though the prompt in the first interaction did not include any reference to procedural knowledge, ChatGPT provided a learning goal related to carrying out investigations: “Predict and test whether various objects will float or sink”. The other three interactions included prompts that explicitly addressed aspects of procedural knowledge, and ChatGPT provided answers related to the control of variables strategy. For instance, the second interaction suggested that “Students will practice controlling variables in an experiment”, while the third proposed the following: “Practice scientific skills, such as controlling variables and recording data”. In the fourth interaction, where a related scientific paper was uploaded to ChatGPT, the learning objective for the control of variables strategy was explicitly connected to the phenomena of floating and sinking: “Practice the control of variables strategy to investigate how specific factors influence floating and sinking”. Additionally, a notable difference was observed in the third interaction, where a quantitative aspect of measurement was introduced, specifically the calculation of density: “Calculate the density of objects using their mass and volume”.
3.2. Context
The category of Context consisted of three subcategories observed in the data: the focus of the lesson plans on the students’ grade level, the time constraints, and the resources needed to implement the lesson plan.
Regarding the students’ grade level, the prompts provided in all four interactions specified a particular grade of primary school students. In each interaction, ChatGPT’s responses explicitly referred to this grade level using slightly different wording, such as “Grade Level: 5th Grade” (first interaction) and “Grade: 5th Primary” (third interaction).
Concerning time constraints, the prompts provided did not specify a time limit for the lesson plans. In the first three interactions, ChatGPT suggested implementing the lesson plans within 60 min, as indicated by responses such as “Duration: 60 min” (third interaction). However, in the fourth interaction, after a reference paper about teaching floating and sinking was provided to ChatGPT, the suggested duration doubled to “Duration: 2 × 60 min sessions”. Furthermore, after a follow-up conversation that included additional prompts, the duration was further increased to 180 min: “Duration: 3 × 60 min sessions”. Notably, the same prompts were used in both the third and fourth interactions, but the duration increased only in the fourth interaction, where the primary difference was the provision of the reference paper to ChatGPT.
In relation to the resources needed to implement the lesson plan, all four interactions included specific lists of materials required, both for conducting experimental activities and for helping students represent the concepts. Examples of these resources included the following:
“Materials: transparent water containers; objects of different materials, shapes, and sizes (e.g., metal spoon, wooden block, plastic bottle cap, rock, hollow ball); modeling clay; digital scale; ruler; worksheets with visuals and minimal text; visual aids (e.g., ‘dots-in-a-box’ diagrams to represent density); access to a digital simulation of floating and sinking (optional); chart paper and markers.”
(Fourth interaction)
3.3. Pedagogy
Regarding Pedagogy, the 5E instructional model was consistently used to structure the lesson into distinct phases, each with a specific purpose. The lesson plans appropriately included the following phases: Engage, Explore, Explain, Elaborate, and Evaluate. In most answers, the activities provided were aligned with the purpose of each phase. However, some differences were noted across the interactions. In particular, it was observed that although the Engage phase is primarily aimed at enhancing engagement, eliciting prior knowledge, and informally identifying misconceptions related to the content taught, some ChatGPT answers suggested providing students with explanations related to the content. This approach is inconsistent with the purpose of the Engage phase in the 5E model. In particular, ChatGPT suggested introducing density as “how tightly packed the material in an object is…The ball is large but not tightly packed (low density). The key is small but tightly packed (high density)”. This approach, which provides an explanation during the Engage phase and prior to the Exploration phase, seems to align more closely with a confirmatory inquiry model, where the theory is presented first and then tested. This contrasts with a guided inquiry approach, where students explore materials and tools, conduct experiments, and use evidence to construct their own explanations, as outlined in the 5E instructional model (
Bybee, 2014).
3.4. Pedagogy/Content
The Pedagogy/Content overlap refers to the alignment of appropriate teaching strategies with the content. Five subcategories were identified: a qualitative approach, a quantitative approach, a density-based approach, a buoyancy-based approach, and a control of variables strategy.
The first, second, and third interactions incorporated both a quantitative approach, which was dominant, and a qualitative approach. Concerning the quantitative approach, these interactions included quantitative explanations of density, which were based on the mathematical formula d = m/v. For example, in the first interaction, ChatGPT suggested using this mathematical calculation to explain the phenomena of floating and sinking: “Use simple math (e.g., “Density = Mass ÷ Volume”) to explain why objects float if their density is less than the density of water (~1 g/cm3)” (first interaction).
The second and third interactions further suggested that students calculate the density of several objects and predict if they float or sink.
“Provide materials to measure mass (scale) and volume (graduated cylinder or water displacement method). Assign groups to calculate the density of one or two objects. Use the formula: Density = Mass/Volume. Have students predict whether the objects will float or sink based on their calculated densities. Test their predictions.”
(Second interaction)
“Explain that water has a density of approximately 1 g/cm3: Objects with a density less than 1 g/cm3 float. Objects with a density greater than 1 g/cm3 sink. Show how to measure mass using a scale and volume using water displacement or geometric calculations (for regular shapes)… Hands-On Mini Activity: Demonstrate with a small object (e.g., a rubber ball): Measure its mass on a scale. Measure its volume using water displacement (submerge it in a graduated cylinder and record the difference). Calculate density and determine if it will float or sink.”
(Third interaction)
Regarding the qualitative approach, the first, second, and third interactions included suggestions to explain density in a qualitative way. These suggestions relied on short, simple explanations, such as: “Mini-Lecture (with visuals): Define density as the amount of ‘stuff’ (mass) packed into a given space (volume)” (second interaction). During the third interaction, ChatGPT was explicitly asked to adopt a qualitative approach. While the revised lesson plan did adopt a qualitative approach and was based on the idea of being “light for its size”, the examples provided were not based on visuals or representations that could facilitate causal reasoning for explaining floating and sinking.
“Explain (15 min) Objective: Reinforce the relationship between density and floating/sinking in a qualitative manner. Teacher Explanation: What is density? Use relatable examples: “Think of a loaf of bread versus a brick of clay. The clay is tightly packed and heavy for its size, so it sinks. The bread is full of air pockets and light for its size, so it floats.” Explain that objects float when their material is less dense than water and sink when their material is more dense than water. Group Discussion: Look at recorded observations. Ask: “What do the floating objects have in common?” “What do the sinking objects have in common?” Address Misconceptions: Clarify that weight and size do not determine floating or sinking—density does.”
(Third interaction)
In the fourth interaction, where a reference paper on teaching floating and sinking was provided to ChatGPT, the approach adopted was qualitative, utilizing a specific representation from the literature known as the “dots-in-a-box” model. ChatGPT suggested an activity aimed at developing a conceptual understanding of density and its role in floating and sinking. The activity involved visualizing density by illustrating dots within a cube, with a greater number of dots representing higher density. This approach could facilitate the development of causal relational reasoning.
“Explain. Objective: Develop conceptual understanding of density and its role in floating/sinking. Activity: Visualizing Density. Use the “dots-in-a-box” representation to explain density qualitatively. Heavier for its size: Show dots in a cube, with more dots representing higher density. Explain the causal relational FS rule: If an object’s density is less than water’s, it floats; otherwise, it sinks.”
(Fourth interaction)
In addition, when prompted to describe in detail an activity for teaching the concepts of floating, sinking, and density to primary school students, ChatGPT’s response included specific examples for qualitatively explaining these phenomena.
“Step 3: Introduction to Density with Visual Aids Present the concept of density using “dots-in-a-box” diagrams: Use a dense material like metal to show dots tightly packed together. Use wood or plastic to show dots more spread out. Explain that density is the relationship between how much “stuff” (mass) an object has in a given space (volume). Relate density to floating and sinking: Objects less dense than water float. Objects denser than water sink. Step 4: Connecting Observations to the Concept Lead a class discussion to connect their observations to density… Discuss why a metal spoon sinks while a large wooden block floats, despite the block being bigger (density vs. weight). Highlight the causal rule from the uploaded paper: If an object’s density is less than water’s density, it floats; otherwise, it sinks. Step 5: Reflect and Explain: Have students complete the worksheet with questions such as: Why do you think some objects floated and others sank?… What have you learned about how density explains floating and sinking? Encourage students to share their answers and reasoning with the class.”
(Fourth interaction)
With regard to the distinction between density-based and buoyancy-based explanations, the former uses an elimination-of-variables approach, focusing on identifying the factors that influence floating and sinking phenomena to derive a prediction rule for determining which object will float. In contrast, buoyancy-based explanations rely on an equilibrium mechanism to explain how an object floats. In the first, second, and third interactions, ChatGPT proposed both density-based and buoyancy-based explanations. For instance, it suggested the following: “Emphasize the importance of both density and buoyancy in understanding floating and sinking” (third interaction). Moreover, within buoyancy-based explanations, Archimedes’ Principle was proposed as a concept to be introduced: “Introduce the idea of Archimedes’ Principle for advanced students” (first interaction). Regarding density-based explanations, these included both the quantitative approach, which involved the mathematical calculation of density to determine whether objects float or sink, and the qualitative approach, which used the “dots-in-a-box” model, as presented in the previous paragraph.
Concerning the control of variables strategy, this was evident in the second, third, and fourth interactions, appearing only after it was explicitly included as an instructional goal in the initial prompt of each interaction. However, the suggested activities related to the control of variables strategy varied in accuracy. Some were characterized as less accurate, such as when ChatGPT proposed brief activities that lacked structured instructions to guide students in controlling variables. For instance, “Use an example (e.g., a clay ball) to demonstrate controlling variables. Flatten the ball into a disc and retest. Ask: Does changing the shape affect whether it floats or sinks? Why? (second interaction)”.
A more accurate suggestion was provided by ChatGPT in the fourth interaction, after a specific prompt asked ChatGPT to describe an activity that explicitly teaches students how to test whether a specific variable influences floating or sinking—ensuring that only this variable is changed, with a focus on size.
“Activity: Testing the Effect of Size on Floating or Sinking Objective: Teach students how to systematically test the effect of size on floating or sinking by changing only the size of an object while keeping other variables (e.g., material, shape, and mass per unit volume) constant. Materials: Transparent container filled with water Clay or modeling material with a uniform density, Digital scale (to ensure consistency in mass per unit volume), Ruler (for measuring dimensions), Worksheets for recording observations, Visual aids to illustrate the relationship between size, volume, and density. Procedure: Step 1: Introduction Begin with a class discussion: Does the size of an object influence whether it floats or sinks? Why or why not? Explain the importance of testing one variable at a time and keeping all other variables controlled. Introduce the variable to be tested: size. Step 2: Preparing for the Experiment Control Variables: Use the same material for all objects (e.g., clay or modeling material with uniform density). Keep the shape of the objects constant (e.g., all objects shaped as spheres). Ensure the material’s density remains constant. Independent Variable: Change only the size of the object (e.g., small, medium, and large spheres). Dependent Variable: Observe whether the object floats or sinks. Step 3: Conducting the Experiment Provide students with clay and a digital scale. Guide students to: Roll the clay into three spheres of different sizes (e.g., a small, medium, and large sphere)…”
(Fourth interaction)
3.5. Pedagogy/Context
The Pedagogy/Context overlap focuses on the specific strategies required to reach all students in the classroom effectively. For this category, four subcategories were identified in the lesson plans: specific strategies to address students’ misconceptions, strategies that facilitate differentiation, different modes of representation that facilitate differentiation, as well as resources needed for the activities.
Concerning the specific strategies to address students’ misconceptions, the first and second interactions did not include activities explicitly suggested by ChatGPT to target specific misconceptions. Such activities only emerged after prompts were provided in the third and fourth interactions, explicitly requesting the following: “Describe in detail an activity aimed at helping students address a specific misconception about floating and sinking”. In response, ChatGPT proposed activities that directly targeted specific misconceptions, offering strategies to help students refine their understanding of floating and sinking phenomena. Specifically, in the third and fourth interactions, ChatGPT suggested experimental activities designed to challenge the misconception that an object’s weight determines whether it floats or sinks. These activities guided students to recognize a conflict between their initial beliefs and the observed behavior of objects during experimentation.
“Activity: Investigating the Misconception “Heavier Objects Always Sink” Title: “Does Weight Determine Floating or Sinking?” Objective: To help students confront and address the misconception that heavier objects always sink, by engaging them in a controlled, inquiry-based activity… Use the scale to measure and record the weight of each object. Write the weights on a chart for everyone to see. Testing in Water: One by one, place the objects into the container of water. Observe whether each object floats or sinks. Record the results next to the corresponding predictions and weights… Discuss observations as a class. Prompt students to identify patterns: “Did the heaviest object always sink?” “Did the lightest object always float?… Introduce the concept of density without fully explaining it yet: “What other factors besides weight might be influencing floating or sinking?”… This will challenge their existing misconception and create cognitive dissonance, motivating them to seek a deeper understanding of the role of density. Key Notes: This activity focuses solely on helping students confront their misconception by comparing their predictions to actual outcomes. It does not yet provide explanations or introduce the concept of density in detail, leaving space for follow-up lessons to address these concepts.”
(Third interaction)
Regarding strategies that facilitate differentiation, relevant suggestions were only provided during the third and fourth interactions. This occurred after a specific prompt explicitly requested strategies to support all students in the classroom, particularly those struggling with reading. In response, ChatGPT proposed several approaches to foster an inclusive learning environment that seems to be relevant to flexible grouping, learning stations, and digital technology.
Flexible grouping was proposed, allowing students to collaborate with peers of varying abilities supporting students with reading difficulties.
“Pair struggling readers with peers for collaborative experiments and discussions. Why It Helps: Students can learn from their peers’ explanations and teamwork reduces individual pressure. Implementation: Assign specific roles (e.g., recorder, tester) to ensure participation without relying on reading-heavy tasks. Encourage discussion about observations in their own words.”
(Third interaction)
“Incorporate Group Activities Collaborative Experiments: Pair students with reading difficulties with peers who can assist in completing written components or interpreting instructions. Discussion Prompts: Facilitate group discussions where students verbally share their predictions and observations instead of relying solely on written responses.”
(Fourth interaction)
Learning stations were also proposed, where students could use a collection of materials to explore the science topic at their own pace. For example, in the fourth interaction, ChatGPT suggested the following: “Water Play Stations: Set up small water play stations where students can test objects without pressure. Encourage exploration and experimentation” (fourth interaction).
Additionally, ChatGPT recommended incorporating digital technology to present interactive and accessible content, ensuring that students with reading challenges could fully engage with the lesson. For instance, it suggested the following: “Provide text-to-speech options for any required reading materials” (third interaction) and “Audio Support: Provide audio explanations or interactive video lessons for students to follow along” (fourth interaction).
Moreover, ChatGPT suggested using different modes of representation to support various learning styles (auditory, visual, and kinesthetic). For example, it proposed the following:
“Use simple graphics to explain density as “how tightly packed” material is, avoiding complex text.”
(Third interaction)
“Use videos, animations, or interactive simulations to illustrate concepts. Why It Helps: These tools can visually represent abstract ideas like density and buoyancy, removing the reading barrier. Implementation: Show a short video explaining why objects float or sink (e.g., comparing objects of different densities).”
(Third interaction)
“Kinesthetic Activities: Allow students to act out density concepts, such as pretending to be “tight-packed dots” for dense materials or “spread-out dots” for less dense materials.”
(Fourth interaction)
“Visual Aids: Use diagrams or animations to explain density, such as “dots-in-a-box” to represent mass relative to volume. Show that dense materials have tightly packed dots, while less dense materials have spread-out dots.”
(Fourth interaction)
Concerning resources needed for the activities, all four interactions included suggestions for providing specific physical materials such as wooden blocks, weighing scales, and water containers.
“Divide students into small groups. Provide each group with materials: a clear container of water, a variety of objects (e.g., metal key, wooden block, clay ball, plastic lid), and a worksheet for recording predictions and observations.”
(Second interaction)
Only in the fourth interaction was a digital simulation proposed as a resource to help students investigate floating and sinking.
“Simulation: Use a digital tool or simulation to explore objects floating in various liquids (e.g., oil, glycerin). Students predict outcomes using the density comparison rule and confirm using simulations.”
(Fourth interaction)
3.6. Content/Context
The Content/Context overlap focuses on activities that provide teachers with insights into their students’ prior knowledge and misconceptions about a science topic. This category includes two subcategories: simple/short activities to elicit students’ prior knowledge/misconceptions and structured activities to identify students’ prior knowledge/misconceptions.
Regarding the simple activity subcategory, these activities rely on short discussions rather than structured tasks. For instance, in the second interaction, ChatGPT proposed the following:
“Begin with a demonstration: Hold up a variety of objects (e.g., a plastic bottle, a coin, a sponge, and a stone). Ask students to predict whether each object will float or sink when placed in a clear container of water. Discussion: Why do you think some objects float while others sink? Does the size or shape of the object matter?”
(Second interaction)
Similarly, in the third interaction, ChatGPT suggested the following:
“Capture students’ attention and uncover pre-existing misconceptions. Activity: Begin with a demonstration. Hold up a variety of objects (e.g., a sponge, a coin, a wooden block, and a small sealed bottle). Ask students: ‘Will this float or sink? Why?’ Students make predictions and briefly explain their reasoning. Discussion Prompts: ‘Why do you think some objects float and others sink?’ ‘Does the size, shape, or weight of an object matter?’ Purpose: This activity reveals misconceptions such as ‘heavier objects always sink’ or ‘bigger objects always float.”
(Third interaction)
Concerning the structured activity subcategory, activities proposed by ChatGPT to identify students’ prior knowledge and misconceptions emerged only after a specific prompt was provided during the third and fourth interactions. The prompt explicitly requested the following: “Describe in detail an activity to identify students’ misconceptions about floating and sinking without including any steps aimed at addressing or correcting these misconceptions”. ChatGPT’s response included structured activities involving worksheets and concept maps that challenged students to make predictions and explain their reasoning about whether various objects would float or sink in water. The activity incorporated an experimental procedure using objects made of different materials and shapes. It also included questions explicitly designed to address common misconceptions about floating and sinking. Students were encouraged to discuss disagreements within their groups and record differing viewpoints. An example from the fourth interaction is as follows:
“Activity: Identifying Students’ Misconceptions About Floating and Sinking Objective: To identify students’ prior knowledge and misconceptions about the principles of floating, sinking, and density. Materials: Transparent container filled with water; Objects of various materials and shapes (e.g., wooden block, metal spoon, plastic bottle cap, inflated balloon, rock); A worksheet with open-ended questions; Chart paper and markers; Procedure: Introduction and Predictions: Present the transparent water container and a variety of objects. Ask students to individually predict whether each object will float or sink and to provide a reason for their prediction. Record their predictions on the worksheet… Have students share their predictions and reasoning with their group members… Testing Objects: In their groups, students test each object by placing it in the water. Ask students to observe and record the outcomes of each test on their worksheet… Facilitate a class discussion where groups explain why they think certain objects floated or sank…” Concept Mapping: Provide chart paper and markers to each group. Ask groups to create a concept map linking factors they believe affect floating and sinking…”
(Fourth interaction)
3.7. PCK
This category includes assessments proposed by ChatGPT to inform teachers about the effectiveness of the lesson plans in addressing students’ declarative and procedural knowledge. More assessment suggestions focused on declarative knowledge, while fewer targeted procedural knowledge.
“Evaluate (5 min) Objective: Assess understanding of concepts and skills. Quick Assessment: Pose reflection questions: What factors determine if an object floats or sinks? How does density relate to floating and sinking?”
(Declarative knowledge, second interaction)
“Evaluate Objective: Assess understanding through practical application. Assessment Task: Provide scenarios involving floating and sinking (e.g., predicting outcomes for new objects or liquids). Students explain their reasoning using the causal relational FS rule and density concepts. Reflection: Students write or discuss what they found most surprising and how their understanding changed.”
(Declarative knowledge, fourth interaction)
“Assessment Rubric: Inquiry Skills: Did students formulate and test predictions?… Experimental Skills: Did students control variables effectively?”
(Procedural knowledge, second interaction)
In addition, in this category, we summarized how all components fit together to form an effective lesson. This involves synthesizing aspects already coded under Pedagogy, Content, Context, and their overlaps to determine whether the Pedagogy and Content are appropriate for the specific Context, i.e., the students’ grade level.
Specifically, in the Pedagogy/Content category, both a qualitative and a quantitative approach were proposed for teaching floating and sinking. However, when considering the Context, particularly the grade level of the primary school students targeted, the qualitative approach appears to be more appropriate, based on the relevant literature (
Zoupidis et al., 2021). Notably, only in the fourth interaction, where a specific reference paper was provided to ChatGPT, was the “dots-in-a-box” model used. This model could help students develop causal relational reasoning in a qualitative way by comparing the densities of an object and a liquid to determine whether the object would float or sink, e.g., “Activity: Visualizing Density Use ‘dots-in-a-box’ diagrams: Dense materials (e.g., metal) have tightly packed dots. Less dense materials (e.g., wood) have widely spaced dots. Explain that an object’s ability to float or sink depends on its density relative to water.” (Fourth interaction).
In addition, the Pedagogy/Content category included activities that focused on either a density-based approach or a buoyancy-based approach. When considering the Context, specifically the grade level of primary school students, the literature suggests adopting a density-based approach within the framework of the elimination of variables approach. For example, “Students will learn to: Change only one variable (size) while controlling others (material, shape, density). Understand that floating or sinking is determined by an object’s density relative to water, not size alone.” (Fourth interaction). This approach is considered more appropriate for the students’ grade level, as the buoyancy-based approach requires the understanding of several intermediate concepts, such as the forces of buoyancy and gravity, which are more complex to teach (
Zoupidis et al., 2021).
Table 2 summarizes the main findings, illustrating how different interactions with ChatGPT influenced the characteristics of lesson plans in terms of PCK aspects.
4. Discussion
This study addresses the need for a responsible integration of generative artificial intelligence in education, particularly in the context of science lesson planning. Although prior research has examined the potential of ChatGPT for generating lesson plans (
Cooper, 2023;
Moundridou et al., 2024;
Okulu & Muslu, 2024), limited attention has been given to evaluating these outputs through established pedagogical frameworks. To fill this gap, this study adopts a sound theoretical framework rooted in Science Education, specifically PCK (
Chaitidou et al., 2018;
Otto & Everett, 2013). PCK provides a lens for examining the integration of content knowledge, pedagogical knowledge, and context knowledge, as well as their overlaps that guide effective teaching. Additionally, prompt engineering, which is a key component of TPACK in the age of AI (
Feldman-Maggor et al., 2025), plays a crucial role in enhancing the quality of ChatGPT-generated outputs. Generating effective lesson plans requires multiple iterations of a prompt, with careful refinement and strong human oversight throughout the process (
UNESCO, 2023). In this study, PCK served a dual purpose: it guided the design of prompts used to interact with ChatGPT and provided the framework for analyzing the resulting lesson plans. Four distinct interactions with ChatGPT were conducted to design the lesson plans. The first and second interactions were based on a single prompt, differing in the PCK aspects included. The third and fourth interactions, by contrast, involved layer prompts, where step-by-step instructions guided the model based on detailed PCK-aligned elements. The key distinction between the two was that, in the fourth interaction, a scientific paper was provided, and ChatGPT was instructed to base its responses on the information from that source.
Our results show that incorporating PCK elements and layer prompts can enhance the quality of ChatGPT-generated lesson plans. Layer prompts enhanced the alignment of ChatGPT’s responses with PCK. During the third interaction, for instance, prompts specifically requested strategies to address misconceptions about floating and sinking, a well-documented challenge in Science Education. The resulting lesson plan included structured activities that aimed to identify misconceptions as well as proposed ways to address them. Similarly, the fourth interaction demonstrated how the inclusion of reference materials improved the accuracy of ChatGPT’s outputs. For example, the “dots-in-a-box” model suggested in the fourth lesson plan provided a qualitative way to approach floating–sinking and density, aligning well with research-based suggestions (
Zoupidis et al., 2021).
The overlap of Pedagogy/Content was particularly evident in the analysis. ChatGPT’s early responses often relied on a quantitative approach, such as using the formula d = m/v, which, while accurate, may not be suitable for primary school students (
Zoupidis et al., 2021). However, layer prompts as well as providing ChatGPT with reference text helped guide the model toward more developmentally appropriate, qualitative strategies, such as visual representations. The fourth interaction, in particular, demonstrated how detailed guidance could shift ChatGPT’s focus to more age-appropriate suggestions.
The Pedagogy/Context overlap revealed that layer prompts improved ChatGPT’s ability to generate effective strategies for addressing student misconceptions and learning diversity. While initial interactions lacked specific activities for tackling misconceptions, later prompts elicited inquiry-based tasks, such as challenging the belief that “heavier objects always sink” (
Zoupidis et al., 2021). Differentiation strategies also emerged, including flexible grouping, learning stations, and digital tools to reach all students in the classroom (
Tobin & Tippett, 2014;
Tomlinson, 2001).
The Content/Context overlap revealed ChatGPT’s ability to design activities for uncovering students’ prior knowledge and misconceptions. Without detailed prompts, the model suggested generic activities. However, when guided by layer prompts, it generated more structured and targeted activities, such as concept mapping, which are better suited for addressing misconceptions (
Yin et al., 2014). This finding supports previous research that emphasizes the need for clear and iterative prompts to produce more contextually relevant and pedagogically sound AI outputs (
Blonder & Feldman-Maggor, 2024).
Despite these affordances, challenges emerged. ChatGPT’s responses varied in depth and accuracy. One notable issue was related to assessment strategies, as most of ChatGPT’s suggestions focused primarily on summative assessments for declarative knowledge, rather than incorporating formative assessments that provide feedback to support student learning and inform teaching adjustments (
Yin et al., 2014). This limitation may be attributed to the fact that our prompts did not explicitly instruct the model to generate formative assessment strategies. Another issue was that ChatGPT generated lesson plans that incorporated both qualitative and quantitative approaches for teaching floating and sinking. However, considering the context, particularly the grade level of the primary school students, the qualitative approach is suggested in the literature instead of the quantitative approach (
Zoupidis et al., 2021). Additionally, without specific guidance, ChatGPT occasionally introduced advanced concepts, such as buoyancy-based explanations and Archimedes’ Principle, which may not be appropriate for primary school learners (
Zoupidis et al., 2021). This is a crucial point, as the literature highlights that generative AI tools can produce inaccurate information (
Exintaris et al., 2023;
Mishra et al., 2023). Therefore, teachers need to be trained to be aware of these limitations. Understanding how these tools are developed (
Mishra et al., 2023), along with the disconnect between how GenAI models “appear” to understand the text they generate and the reality that they lack true comprehension of language and the real world (
UNESCO, 2023), is essential for teachers to recognize their tendencies to generate inaccurate information. Familiarity with these constraints may help teachers critically evaluate AI-generated content in educational settings (
Mishra et al., 2023), ensuring it is appropriate for their students and aligned with the curriculum (
Blonder & Feldman-Maggor, 2024;
UNESCO, 2023).
Recommendations
Based on the findings, we propose guidelines for the responsible use of ChatGPT to support teachers in co-designing lesson plans for Science Education (
Figure 2).
Figure 2 illustrates a teacher–generative AI co-design process (
UNESCO, 2023), where educators guide the GhatGPT by creating prompts grounded in PCK and iteratively refining them to improve the quality of the generated lesson plans. The teacher begins by designing prompts that integrate key components of PCK. Building on prompt engineering strategies, a TPACK aspect (
Feldman-Maggor et al., 2025) identified as effective in this study involves the use of layer prompts in a conversational manner. These prompts include clear, step-by-step instructions that integrate PCK elements. Additionally, it is proposed to provide ChatGPT with reference texts, such as scientific papers, relevant to the content being taught. ChatGPT processes the prompt and generates an initial output, which is then critically evaluated by the teacher using their PCK. Through an iterative process, the teacher refines the prompts, to address specific aspects, such as misconceptions or aligning activities with the students’ grade level. The teacher then determines which activities in the lesson plans are suitable for instruction and makes further adjustments to tailor the content to their students’ specific context, resulting in the final lesson plan.
Further research could focus on designing activities within the context of a university course, following these guidelines, to train primary school teacher students in the responsible use of GenAI for science lesson planning through the lens of PCK. This includes investigating how primary school teacher students utilize ChatGPT in science lesson planning after being introduced to prompt engineering strategies and the PCK framework, combining multiple data sources such as ChatGPT-assisted lesson plans created by teacher students and insights from focus group discussions.