1. Introduction
Large language models (LLMs) are text-generating AI systems. Their use is spreading in several areas of research and development. The literature about real implementations of LLMs in everyday activities started to appear in 2019 following the availability of the first version of ChatGPT (GPT-2) [
1,
2]. However, most of the publications and systematic reviews about LLMs and LLM-based tools started to be available in 2023, as witnessed in databases such as SCOPUS and WOS [
3,
4,
5,
6,
7,
8] (PRISMA3).
The advent of LLMs can be considered as a sort of revolution at both the educational and professional levels. This is why the correct approach to them in university courses is mandatory to ensure that students, as future engineering professionals, are able to address these new technologies properly and conscientiously in order to give research, development, and innovation an effective boost. As engineering educators at different levels (undergraduate, postgraduate, etc.), we have been introducing LLMs in our courses since 2021 from both theoretical and practical points of view [
9,
10]. Although the effective help provided by LLMs has been evident since the beginning, the approach was initially empirical since adoption guidelines or best practices were not available. In the last three years, meaningful information has been generated and made available. Although the landscape of the literature dealing with LLMs offers the most-cited systematic reviews covering domains such as medicine [
5,
11], industry or robotics [
7,
12], and education [
8,
13], it seems that reviews focusing on LLMs’ involvement in undergraduate/postgraduate engineering education have not yet appeared in the literature. Furthermore, suggestions and guidelines for best practices of involving LLMs in everyday education activities are still missing. All of this suggests that we should analyze the literature while focusing only on this domain. The investigation spread across several dimensions, from areas that involve LLMs to the engineering disciplines in which they are most present, as well as how this involvement takes place and which LLM-based tools are used, if any.
Under these premises, the initial research question—RQ0—was defined as follows.
RQ0. What is the current status of LLMs’ involvement in engineering education?
The term “involvement” was carefully chosen, as was the generality of RQ0. This aimed to capture the presence of LLMs in the engineering education activities reported in the literature as much as possible. Using terms such as “implementation”, “adoption”, or something else or defining a more focused question would have unnecessarily narrowed the scope of the review a priori. We decided to attempt to answer RQ0 through a focused, systematic review [
14]. Moreover, given the availability of guidelines and checklists for making systematic reviews as rigorous and replicable as possible, we mapped this systematic review to the PRISMA checklist [
15]. For this reason, labels with “PRISMAn” appear throughout the article. They are references to the items of the PRISMA checklist, as reported in
Appendix A.
Once we defined the general research question (RQ0), we built queries for the selection of articles from the literature, collected them, developed and applied inclusion/exclusion criteria, read the articles, and analyzed the data. The results provide a clear overview of the current involvement of LLMs in engineering education. These results, in turn, will help address structured ways to involve LLMs and measure their effectiveness as time progresses (PRISMA4). The Discussion section also deals with practical suggestions for involving LLMs in engineering education activities.
This article opens with the Materials and Methods section, which describes the research background and approach. The activities conducted as part of the systematic review are described in the next section. Then, the Results section reports the review’s outcomes, and the following Discussion section analyzes them critically and offers suggestions about the use of the research results in undergraduate and postgraduate engineering courses. The conclusion, which also contains some research perspectives, closes the study.
3. Systematic Review
As mentioned before, the planning of this systematic review occurred by following the PRISMA checklist. This helped define the scope of the systematic review, identify key research questions, establish inclusion and exclusion criteria, process data, and formulate the outcomes. This review is not registered (PRISMA 24a). Regarding the assessment of the risk of bias, the considerations leading to both the first search and the subsequent adoption of the exclusion criteria were objective and strong enough to keep the risk of bias as low as possible (PRISMA11). Regarding the protocol used, the precise references to the PRISMA checklist occurring in the different sections of the study highlight that the research occurred rigorously and made it replicable by other researchers and practitioners (PRISMA 24b, PRISMA24c).
Two researchers took part in the review activities. They screened the records independently using Microsoft Excel spreadsheets for data analysis. At the end of their work, they compared the results and generated the research outcomes (PRISMA9, PRISMA13a).
The selection/evaluation of articles occurred as follows. Two databases, SCOPUS and IEEEX, were searched on 6 March 2024. The SCOPUS database was searched using the following query:
“(TITLE-ABS-KEY ((chatgpt OR bard OR gemini OR “large language models” OR llms) AND engineering AND education) AND LANGUAGE (english))”
This query returned 202 papers. The IEEEX database was searched using the following query:
“(“All Metadata”:ChatGPT OR “All Metadata”:Bard OR “All Metadata”:GEMINI OR “All Metadata”:Llms OR “All Metadata”:”large language models “) AND (“All Metadata”:engineering) AND (“All Metadata”:education)”
In this case, the results consisted of 168 papers. Thus, the total number of papers selected from the two databases was 370 (PRISMA6; PRISMA7). By eliminating 39 duplicates, the number of papers dropped to 331, which was the starting point for the following activities. These papers were numbered in order to code them, and this coding is used hereafter.
Before progressing to the next stage—the content analysis—the first exclusion criteria were implemented (PRISMA5). Initially, from the pool of 331 papers, those authored prior to 2018 were excluded, as this was the year of the first appearance of LLM-based tools such as ChatGPT. This step reduced the number of papers to 319. Additionally, papers categorized as “conference reviews”, “books”, and “editorials” were further removed, resulting in a total of 306 papers for subsequent analysis.
A first analysis was performed on these 306 papers. It regarded the countries to which the authors belonged. The aim was to gain an insight into the geographical distribution of the involvement of LLMs in engineering education activities worldwide at the time of the database query. The results showed the prevalence of the USA (73 affiliations), followed by China (44), India (25), Germany (20), and the United Kingdom (19). Many other countries followed, showing that the coverage was quite equally distributed.
Figure 1 shows the worldwide coverage.
Next, starting from these 306 papers, there was an initial screening through the reading of titles, abstracts, and keywords (authors’ keywords, indexed keywords, or IEEE terms). This reading led to the definition of the second exclusion criterion. Papers that were not deemed to be focused on the theme posed in RQ0, namely, the use of LLMs in engineering education, were discarded, thus reducing the number of papers to 151 (PRISMA16b).
The reading of the titles, abstracts, and keywords of the 151 papers helped refine the initial research question by distributing the interest over several topics, which are called research dimensions (RDs) here.
Figure 2 shows the eight RDs considered in the research.
These eight RDs, which were as orthogonal as possible to each other, had the following peculiarities.
RD1—WHO. This refers to the actors involving LLMs in engineering education activities. Examples thereof are students, educators, or any other stakeholders.
RD2—HOW. This dimension represents the ways in which LLMs are involved. Examples thereof—grouped as reference activities—are tests of use, case studies, use method proposals, etc.
RD3—WHY. This describes the reasons/goals for the involvement of LLMs. Examples thereof span from the enhancement of understanding to the enrichment of problem solving, teaching improvement, etc.
RD4—HOW MUCH. Papers could report qualitative/quantitative evaluations of the involvement of LLMs in tasks or activities in engineering education. Examples thereof are a qualitatively measured low impact, a quantitatively measured high impact, etc.
RD5—WHAT. Since more LLM-based tools are made available day by day, this dimension allows the description of those that are involved paper by paper, if any. Examples thereof are ChatGPT, Bard/Gemini, etc.
RD6—WHERE. This dimension represents the domains of engineering education in which the involvement of LLMs takes place. Examples thereof are software engineering, mechanical engineering, chemical engineering, etc.
RD7—WHEN. It is important to highlight the moment of the educative path at which the involvement of LLMs takes place. This dimension allows this to be expressed. Examples thereof are undergraduate courses, postgraduate courses, etc.
RD8—PROS/CONS. Some papers are quite clear about the advantages and drawbacks of the involvement of LLMs. Examples of PROS are enhanced understanding, adoption of real-world examples and practical applications, etc. Examples of CONS are confusing and contradictory answers, inaccuracies in responses, ethical concerns, etc.
As a first important consequence of the definition of the RDs, they allow the general research question proposed in the introduction (RQ0) to be refined. The RDs could be logically combined to obtain research questions whose answers would better represent the state-of-the-art involvement of LLMs in engineering education. Three research questions that were more precise and focused were the result of these considerations. They were developed by paying attention to the mixing of “primary” dimensions (RD1 to RD4) with “secondary” dimensions (RD5 to RD8) (see
Figure 2). The reason for this classification will be made clear in the following.
The first new research question (RQ1) investigated the interactions between people and LLMs. This RQ was based on RD1—WHO, RD2—HOW, RD7—WHEN, and RD8—PROS/CONS. RQ1 was the following:
RQ1. Are the roles and duties of people clear regarding the involvement of LLMs in engineering education?
The second new research question (RQ2) referred to the engineering domains of the involvement of LLMs and their possible influences on the modalities of this involvement. RQ2 was based on RD2—HOW, RD6—WHERE, and RD7—WHEN.
RQ2. Is there evidence of relationships between engineering disciplines and the ways that LLMs are involved in related educational activities?
Finally, the third new research question (RQ3) focused on LLM-based tools. It dealt with possible suggestions for their adoption in educational activities. RQ3 was based on RD4—HOW MUCH, RD5—WHAT, RD7—WHEN, and RD8—PROS/CONS.
RQ3. Can clear indications of which LLM-based tools should be involved in order to improve the effectiveness of education activities and impact measurements be obtained?
These new RQs will help formulate suggestions for the improvement of current educational activities.
Before starting to read the full text of the papers, a third set of exclusion criteria was implemented in order to focus on specific topics from time to time (PRISMA5; PRISMA8). The first criterion referred to the relationships among the search terms used for the formulation of the queries. It aimed to exclude papers where the terms appeared in the title, abstract, and/or keywords, but their meaning did not belong to the research scope. For example, both the terms “ChatGPT” and “engineering education” appeared in one of the papers, but the focus of the work was on the ways of detecting and managing cases of plagiarism, and this topic was not covered here; therefore, that paper was excluded. Moreover, in order to filter the selected papers to focus on the research objective even more closely, a hierarchy was defined for the RDs. The WHO (RD1), HOW (RD2), WHY (RD3), and HOW MUCH (RD4) dimensions were considered primary. This decision was based on the authors’ experience as researchers and educators, as well as on precise considerations about the eight RDs. For example, knowing the “who” (RD1) of involving LLMs in education or how (RD2) this happens was considered fundamental in order to understand the state of the art and list practical suggestions for improving educational activities. On the contrary, the other four RDs, WHAT (RD5), WHERE (RD6), WHEN (RD7), and PROS/CONS (RD8), were considered secondary. For example, knowing where (RD6) the involvement occurs and when (RD7) it happens is considered important information but not at the same level as the first four RDs. Consequently, papers that did not present clear references to the four main dimensions were excluded. Once these exclusion criteria were applied, 20 papers remained from the 151 papers.
Table 1 contains the titles of these papers, along with the numerical codes used to represent them throughout this research.
Before describing the next research activities,
Figure 3 depicts a flow diagram summarizing the search and selection process that led to the dataset used in the research—from the use of the queries to select the 370 records from the databases to the selection of the final 20 papers (PRISMA 16a).
These papers were then read carefully to look for correspondences to the RDs (PRISMA10a). They were primarily papers published in 2023 (17 papers) and 2024 (3 papers), mostly in conference proceedings (10 papers) or in indexed scientific journals (10 papers). They mainly described experiences related to engineering education in the IT field, focusing on software engineering courses (9 papers), electrical/electronic engineering (4), or chemical engineering (3). There were very few works related to other engineering fields (4). The papers primarily aimed to understand the influence that the use of GenAI tools, mainly ChatGPT, can have in educational settings. Eleven papers also included a component of investigating the opinions of different users through the use of questionnaires. In particular, many papers discussed the possibility of using LLMs for exercises related to programming and code production, evaluating the situation both before and after the introduction of new LLM-based tools or assessing the reliability of solving exercises assigned during classes. Some papers also concerned the evaluation of the degree of reliability and the correctness of the solutions obtained. In some papers that were primarily related to non-computer-science subjects, the possibility of using LLM-based tools for text production and in-depth exploration of topics of interest (essay production) was evaluated, thus assessing the reliability of the information obtained. Several papers presented, in different ways, the potential advantages and disadvantages of the introduction and use of these tools. In some cases, observations came from both students’ and educators’ perspectives.
What followed mapped the eight RDs to the peculiarities of the 20 papers (PRISMA17) in detail.
4. Results
The results of the review were as follows (PRISMA23a). Based on the comprehensive data collected from the analysis of the 20 selected papers (
Table 1) and summarized in
Table 2,
Table 3,
Table 4,
Table 5,
Table 6,
Table 7,
Table 8,
Table 9 and
Table 10 in relation to the eight research dimensions (RDs), the following reflections provide deeper insights into research questions RQ1 to RQ3, which were posed in order to investigate the current involvement of LLMs in engineering education.
Referring to RQ1, “Are the roles and duties of people clear regarding the involvement of LLMs in engineering education?”, four dimensions were involved.
RD1—WHO: This indicated that both students and educators were involved in LLM activities, with varying degrees of direct and indirect participation. For example, as shown in
Table 2, papers 4, 20, 36, 95, 98, 131, 135, 145, 153, 166, 167, and 262 depicted the active participation of students in coding activities or assignments during lessons or at home. This showed the direct involvement of students in LLM activities.
RD2—HOW: This described the types of activities involving LLMs, such as tests of use and method proposals, but did not directly specify the roles and duties. For example, looking at
Table 3, papers such as 2, 4, 20, 44, 94, 98, 111, 126, 130, 135, 153, 166, 167, 180, and 230 proposed methods of use and guidelines for LLM tools, suggesting roles for educators in implementing these methods within their courses.
RD7—WHEN: This specified the moment in the educational path at which LLM involvement occurs, as it can influence the clarity of roles and duties at different stages of education. For example, referring to
Table 8, papers such as 4, 36, 94, 95, 98, 126, 130, 131, 135, 145, 153, 166, 167, 180, 230, and 262 focused on undergraduate-level experiences, indicating the moment in the educational path at which LLM involvement occurs.
RD8—PROS/CONS: This provided insight into the advantages and disadvantages associated with LLM involvement, as they indirectly reflect roles and duties, such as confusion due to contradictory answers. For example, as shown in
Table 10, papers such as 4 and 98 reported difficulties in handling code errors as a disadvantage of LLM involvement, which may indicate unclear roles in overseeing LLMs’ use.
In summary, the analysis revealed a spectrum of participation levels among students and educators in LLM activities, with some papers depicting direct engagement in coding exercises or assignments, while others portrayed indirect involvement through methodological guidance or advisories. These findings underscored the complexity of roles and responsibilities within the context of engineering education, suggesting a need for clearer delineation and communication of duties to optimize the integration of LLMs into educational practices.
Regarding RQ2, “Is there evidence of relationships between engineering disciplines and the ways that LLMs are involved in the related educational activities?”, three dimensions were involved.
RD2—HOW: This indicated the types of activities involving LLMs across different engineering disciplines, revealing potential patterns in their utilization. For instance, the papers listed in
Table 3, such as 44, 98, 126, 130, 131, 135, 145, 153, 166, 230, and 262, primarily focused on tests of the use of LLM-based tools, while papers such as 2, 4, 20, 44, 94, 98, 111, 126, and 230 proposed methodological approaches, indicating the ways in which LLMs were involved across different engineering disciplines.
RD6—WHERE: This identified the engineering domains where the involvement of LLMs took place, as this could influence the types of activities observed. For example, in
Table 7, we observed that software engineering/computer science papers (e.g., papers 2, 4, 44, 95, 98, 130, 135, 180, and 262) predominantly involved LLM activities.
RD7—WHEN: This specified the moment in the educational path at which LLM involvement occurs, as this could also influence the types of activities observed across engineering disciplines. For example, referring to
Table 8, papers such as 4, 36, 94, 95, 98, 126, 130, 131, 135, 145, 153, 166, 167, 180, 230, and 262 focused on undergraduate-level experiences, showing the timing of LLM involvement in engineering education.
To synthesize the findings, discernible patterns emerged regarding the utilization of LLMs across various engineering disciplines, with some disciplines predominantly emphasizing tests of use or method proposals, while others prioritized case studies or project development. These trends suggest that the specific focus of educational activities within each discipline influences the ways that LLMs are incorporated, highlighting the importance of tailoring LLM integration strategies to discipline-specific needs and objectives.
Finally, concerning RQ3, “Can clear indications of which LLM-based tools to involve in order to improve the effectiveness of education activities and of impact measurements be obtained?”, four dimensions were involved.
RD4—HOW MUCH: This examined the evaluations of the impact of the involvement of LLMs, providing insight into the effectiveness of different tools. For example, papers such as 2, 4, 20, 36, 44, 94, 95, 98, 111, 126, 130, 131, 145, 153, 167, 180, 230, and 262, which are listed in
Table 5, provide qualitative evaluations of LLM involvement, offering insights into the effectiveness of LLM-based tools.
RD5—WHAT: This describes the specific LLM-based tools used, as this can inform decisions on tool selection for improving educational activities. For example, in
Table 6, we see that ChatGPT—particularly versions 3 and 3.5—was the predominant LLM-based tool used in the analyzed papers (e.g., papers 2, 4, 20, 36, 44, 94, 95, 98, 111, 126, 130, 131, 135, 145, 153, 166, 167, 180, 230, and 262).
RD7—WHEN: This specified the moment in the educational path at which LLM involvement occurred, as it could influence the effectiveness and impact of LLM-based tools at different stages of education. For example, referring to
Table 8, papers such as 20, 36, 98, and 230 focused on postgraduate-level experiences, indicating the timing of LLM involvement for impact measurement at different educational levels.
RD8—PROS/CONS: This considered the advantages and disadvantages associated with LLM involvement, providing insights into which aspects of LLMs contributed to effectiveness and impact measurement. For instance, papers such as 2, 20, 44, 94, 126, 131, 153, 166, 167, and 262, which are listed in
Table 9, outlined advantages such as enhanced understanding and engagement, providing indications on the effectiveness of certain aspects of LLM involvement.
In conclusion, the analysis yielded insights into the effectiveness of specific LLM-based tools for enhancing educational activities and measuring impact, with certain tools demonstrating advantages such as enhanced student engagement, improved problem-solving abilities, and increased task performance. These findings offer valuable guidance for educators and policymakers seeking to optimize educational outcomes through informed selection and implementation of LLM-based tools, emphasizing the importance of considering both the pedagogical context and the desired educational objectives.
5. Discussion
As the first point of discussion, in order to follow the PRISMA checklist as much as possible, it is worth stating that the quality of evidence in the studies included in the review ranged from “very low” to “high”, depending on several factors. For example, mainly in papers belonging to conference proceedings, due to the small number of pages allowed, the descriptions of the experiences were rather essential; therefore, in this case, the quality of evidence should be considered “very low” or “low”. On the contrary, papers published in journals are usually complete and more detailed, so their quality can be considered “high” (PRISMA23b).
Some limitations of the review can be highlighted as well. First, since the domain where the research took place is rapidly evolving, the outcome risks being outdated in the near future. Indeed, it is worth saying that this outcome is valid at the time of the queries (6 March 2024). Moreover, the literature allows the situation at the time of the writing of the papers to be depicted; therefore, we can assume a delay of several months with respect to the current situation, which is a long time considering the rapid evolution of the AI field. At the time of the reading of this paper, some issues highlighted by this research might have been solved in new versions of LLMs. Moreover, the novelty of the spread of LLMs necessitates some kind of shortage related information availability, the different LLM-based tools involved (just one, up to now), or the variety of engineering disciplines where LLMs are currently involved (PRISMA23c).
Moreover, this research allowed the identification of some gaps in engineering education and in the involvement of LLM-based tools within courses; these gaps are where further research would be needed. For example, current research lacks insight into the development and evaluation of specific pedagogical approaches to engaging LLMs in engineering education activities. There are few detailed examples of the integration of LLM-based tools in different engineering disciplines and course levels. In addition, there are no examples of evaluations of the impact of the involvement of LLM-based tools on student engagement, participation, and interaction in engineering courses. There are also a few papers that explored the potential of LLM-based tools to personalize and tailor learning experiences for individual students in engineering courses or to help educators make the best use of these tools.
The results of this review have possibilities for practical adoption, and they suggest future research directions. As the proposal of practical suggestions for putting LLMs into practice in engineering education was one of the goals of this research, as claimed in the abstract, the following text focuses on this (PRISMA23d).
Table 11 lists suggestions for improving the effectiveness of involving LLMs in engineering education while ensuring a responsible and ethical approach. Although each suggestion comes from a specific RQ, as is easily recognizable, this information has been reputed to be useless when an educator uses these suggestions to improve their educational activities; thus, this information does not appear in the list.
By implementing these suggestions, educators can enhance their activities in engineering education by leveraging LLMs as valuable tools for facilitating learning, promoting engagement, and achieving educational objectives effectively.
6. Conclusions
The research described in this study aimed to systematically review the existing literature on the involvement of LLMs in engineering education, with a focus on how to improve educational activities at different levels using different actors in different engineering domains and with the LLM-based tools that are made available as time progresses. Despite the relatively small number of papers analyzed, which was noted as a limitation, interesting results were obtained. Although LLMs became widely available only a few years ago, the material collected here made it possible to list some practical suggestions that we were the first to put into practice in our undergraduate and postgraduate courses.
Both the limitations of the research and the gaps highlighted by the systematic review, as described in the Discussion section, provide valuable insights into potential areas for future exploration.
Regarding the limitations of this research, to prevent obsolescence and support the updating of outcomes, the suggestion is to evaluate emerging LLM tools across disciplines to understand their efficacy and limitations. Creating dedicated repositories for LLM-based tools could help address information shortages. Identifying new engineering disciplines for LLM applications is crucial, along with assessing their impacts. Longitudinal research studies can be conducted to investigate the long-term impact of integrating LLMs and LLM-based tools into engineering education on student learning outcomes, career readiness, and post-graduation success by tracking students’ academic performance, professional achievements, and attitudes toward AI over time. The aim of following the PRISMA checklist in this research was also to make it somehow robust and replicable so that similar reviews can be performed to keep the outcomes up to date with the evolution of the AI field.
Considering the gaps, there could be further study of deeper teaching strategies, learning activities, and assessment methods that effectively leverage LLMs to improve student learning outcomes. In addition, special attention can be given to describing the design of curricular modules or assignments that incorporate LLM-based tools to support various engineering disciplines. In addition, it could be considered how LLM-based activities influence student motivation, collaboration, and peer learning experiences within the classroom. Finally, it is also important to address the professional development needs of engineering educators to effectively integrate LLM-based tools into their teaching practices and to enhance their pedagogical competence and confidence in using AI technologies by providing training, resources, and technical support.
From a general research perspective, fostering collaboration among engineering education researchers, AI experts, instructional designers, and industry practitioners could facilitate interdisciplinary approaches to exploring the potential of LLM-based tools in engineering education. This could lead to innovative solutions addressing complex challenges and opportunities at the intersection of AI and engineering pedagogy. Moreover, incorporating user feedback is essential for improving the usability of LLM-based tools. Finally, investigating the biases, privacy concerns, and societal impacts of LLM adoption is imperative for ethical and responsible deployment.