Next Article in Journal
Exploring the Influencing Factors of Surface Ozone Variability by Explainable Machine Learning: A Case Study in the Basilicata Region (Southern Italy)
Previous Article in Journal
Short-Term Temporal Variability of Radon in Finnish Dwellings and the Use of Temporal Correction Factors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Applications of AI Tools in the Fields of Weather and Climate—Selected Examples

by
Agnieszka Krzyżewska
Department of Hydrology and Climatology, Institute of Earth and Environmental Sciences, Maria Curie Skłodowska University, 20-718 Lublin, Poland
Atmosphere 2025, 16(5), 490; https://doi.org/10.3390/atmos16050490
Submission received: 9 March 2025 / Revised: 3 April 2025 / Accepted: 16 April 2025 / Published: 23 April 2025
(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

Abstract

:
Large language models (LLMs) based on artificial intelligence have found applications across various sectors—including medicine, education, science, literature, and marketing. Although they offer considerable opportunities, their limitations also raise important concerns. This study evaluates several AI tools in the context of meteorology and climatology. The tools examined include ChatGPT o3-mini, o1, 4.o, 4.0; Gemini Advanced 1.5 and 2.0; Copilot; Perplexity; DataAnalyst; Consensus; ScholarGPT; SciSpace; Claude; and DeepSeek. The evaluation tasks comprised cloud recognition and classification from photographs, gap-filling in literature reviews, map creation based on provided datasets, comparative interpretation of maps, and archival data retrieval from line graphs converted to numerical data. Each task was rated on a 0–5 scale. Conducted between February 2024 and February 2025, the study found that ChatGPT o3-mini excelled in cloud classification; ChatGPT4.o and ScholarGPT produced high-quality maps; Claude 3.5 Sonnet and SciSpace provided the most detailed map descriptions; and Consensus and ChatGPT o1 were the most effective for literature review support. However, all tools performed poorly in regards to archival data retrieval, with Claude 3.5 Sonnet yielding the smallest errors. Overall, substantial progress was observed over the study period.

Graphical Abstract

1. Introduction

The emergence of publicly available AI tools, notably ChatGPT3.5 released in November 2022, has unveiled a myriad of promising—and at times concerning—prospects for AI applications [1]. Nearly three years later, ChatGPT and other LLMs (e.g., Google’s Gemini, Microsoft’s Copilot) are widely utilized across diverse fields, including scientific research [2,3,4,5,6]. Nevertheless, while these models serve as valuable assistants, their inherent limitations and associated ethical implications—particularly in the realm of scientific publishing—remain critical concerns [7,8,9,10].
In meteorology and climatology, AI tools are increasingly employed to enhance weather prediction, process extensive datasets, and evaluate climate risks. Over the past few years, artificial intelligence has emerged as a powerful tool in meteorology and climate science, supporting a wide range of applications including medium-range weather prediction, satellite data assimilation, and severe weather event modeling. For instance, the use of 3D neural networks has significantly improved global weather forecasts by directly predicting meteorological variables at high spatiotemporal resolution [11]. The neural networks have been used for better initial field estimation [12], ensemble prediction refinement in tropical cyclone forecasting [13], and full AI-based assimilation systems [14]. Recent studies indicate that machine learning techniques, especially deep learning, can improve the accuracy of numerical weather prediction (NWP) models [2], and AI methods have proven effective in downscaling global climate models (GCMs) to regional and local scales [3]. These developments highlight the growing potential of AI-driven tools in operational meteorology, although challenges in generalization, interpretability, and geospatial transferability still remain.
Moreover, LLMs have shown potential for interpreting climate data by aiding literature reviews, model validation, and climate scenario comparisons [15]. Dedicated applications such as ChatClimate (https://www.chatclimate.ai/) exemplify efforts to enhance the accessibility of climate science for policymakers and the public [2]. Similar studies also focus on using such powerful tools to aid scientists, policymakers, students, and vulnerable populations by providing explanations and aiding in model parameterization, data interpretation, and scenarios evaluation [3,15,16,17]. However, the efficacy of these tools depends on their ability to generate accurate, context-aware responses—a challenge given current limitations in reasoning and factual precision [16]. Moreover, trustworthy AI frameworks emphasize the ongoing need for human oversight and explainability in meteorological use [18,19]. While the field continues to evolve, most studies focus on either low-level data modeling or physics-informed prediction systems. In contrast, this study offers a novel perspective by evaluating how general-purpose large language models (LLMs) interpret and engage with atmospheric knowledge in tasks relevant to both education and practical forecasting.
Although large language models were originally developed for conversational tasks, their applications are increasingly extending into critical fields such as science and medicine. Therefore, it is crucial to assess their capabilities and limitations, which is the primary objective of this research. In this study, several LLMs were assigned a range of tasks with potential future applications, and their accuracy was assessed. The tasks encompassed cloud recognition and classification, literature review support, map creation and interpretation, and archival data retrieval. The aim was to determine which AI models excel in specific applications and to identify areas requiring further improvement. Overall, the study provides valuable insights into the potential integration of AI in climate science and documents their development over a one-year period.

2. Materials and Methods

Currently, a diverse array of AI tools exists, each based on different types and versions of large language models. These tools are continually updated and modified, and are available in both free and subscription-based formats. They offer a range of functionalities—including processing images, data files, and text. While some of these tools can access and search online resources, others are designed without internet connectivity. The quality of their outputs varies considerably, ranging from highly accurate and factually robust information to overgeneralized responses that contain numerous errors and a significant level of “hallucination”. Furthermore, both the performance and availability of these options have evolved over time. All of these AI tools were treated as black-box systems; internal model architectures such as transformer attention mechanisms or super-resolution modules were not directly evaluated in this study.

2.1. AI Tools Used in This Research

  • ChatGPT 4.0, 4.o, o1, and o3 (both mini and mini-high) [20] by OpenAI [21]. ChatGPT is a conversational AI model developed by OpenAI that processes text, data files, and images. The earlier model, ChatGPT3.5, noted for only 30–40% accuracy in answering scientific queries [22], is no longer available.
  • Gemini Advanced 1.5 and 2.0 [23] by Google [24]. Gemini Advanced represents Google’s next-generation AI system, developed as an evolution from earlier experimental models (often associated with Bard). It supports multimodal inputs such as text, data files, and images.
  • Copilot [25] by Microsoft [26]. Microsoft Copilot is an AI-powered assistant integrated within the Bing search engine in Microsoft Edge, released in February 2023 [27]. The academic version used in this research supports image uploads, facilitating enhanced search and productivity features.
  • PerplexityAI [28]. Perplexity AI is a conversational search engine designed to deliver concise and clear answers using advanced natural language processing. Its free version initially did not support image uploads (now allowing up to 10 images per day [29].
  • Claude [30]. Claude is a large language model developed by Anthropic. It is available in several variants—including Sonnet, Haiku, and Opus [31]. In this study, the free Claude 3.5 Sonnet model was used, with an experimental analysis tool.
  • DataAnalyst [32]. DataAnalyst is integrated within ChatGPT’s toolset (often referred to as Advanced Data Analysis or the Code Interpreter) and is designed to help analyze and visualize data across various file types.
  • Consensus [33]. Created by Consensus.app [34] Consensus is an AI-powered academic search engine created to make scientific knowledge more accessible.
  • SciSpace [35]. SciSpace, originally known as Typeset since 2015, is designed to accelerate the research process and facilitate quick discovery of academic information [35].
  • Scholar GPT [36]. Created by awesomegts.ai [37], it provides access to academic resources such as Google Scholar, PubMed, JSTOR, and arXiv.
  • Academic Assistant Pro [38]. Created by @gpt_boost [39], it is designed as a professional academic assistant with a scholarly approach and is integrated as part of ChatGPT.
  • DeepSeek R1 [40], created by Hangzhou DeepSeek Artificial Intelligence Basic Technology Research [41].
  • The tools selected in this study—ChatGPT, Gemini, Claude, and Copilot—were chosen based on their high accessibility and popularity at the time of data collection (February–April 2024). These models were publicly available via user-friendly interfaces and were widely adopted across general-purpose applications, including scientific, medical, and educational use cases [42,43,44].

2.2. The Testing Procedure of AI Tools

  • All AI tools were provided with identical prompts and input data (including data files and images).
  • A diverse set of tasks was administered, comprising:
    Cloud recognition using images sourced from both the WMO Cloud Atlas and a proprietary cloud database;
    Supplementing gaps in literature reviews concerning humid heat waves;
    Generating maps from a provided data file;
    Comparing and interpreting two maps (provided as images); and
    Retrieving archival numerical data from line graphs uploaded as images.
These tasks were repeated (when possible) in February 2025 to assess the progress of the tools and incorporate the most current data.
  • Each outcome was evaluated on a subjective 0–5 scale in terms of accuracy, precision, and overall correctness, where 0 denotes an entirely incorrect response and 5 a fully correct one. For longer responses, one point was added for each correct component; for shorter responses, one point was deducted for every error. For example, if the correct answer was Cumulus humilis and Altocumulus stratiformis translucidus perlucidus but the response provided was “Cirrocumulus stratiformis undulatus, Cumulus humilis”, the score was 3 (one point each for “stratiformis”, “cumulus”, and “humilis”). In another case, if the correct answer was Stratus undulatus and the response was Stratus nebulosus undulatus, one point was deducted for the extraneous “nebulosus”, resulting in a score of 4.
  • The research was conducted between February 2024 and February 2025. During this period, ChatGPT 4.0 was upgraded to ChatGPT 4.o, as well as ChatGPT o1 and o3, while Google Bard was rebranded as Gemini Advanced (initially, version 1.5 and later, 2.0). All of these systems were utilized in their paid versions. At the outset, Consensus ranked first in the “Research and Analysis” category; by the conclusion of the study, SciSpace had ascended to the top position, with Consensus ranking second. Currently (March 2025), ScholarGPT has reached the top position. Notably, several LLMs that previously did not support image or data file inputs have since been updated to include these functionalities.

3. Results

3.1. Cloud Recognition

Cloud recognition presents a significant challenge, even for human observers; therefore, it is of considerable interest to assess how non-human systems—specifically, large language models (LLMs) equipped with image recognition capabilities—manage this task. In the initial phase of the study (May 2024), ChatGPT 4.o, Gemini Advanced, and Copilot were chosen for evaluation because they supported photo uploads at that time. Subsequently, in February 2025, the study was updated to include additional AI tools—namely Claude, Perplexity, Consensus, and SciSpace—as well as newer iterations of ChatGPT, Gemini, and Copilot. During the first phase, nine distinct cloud photographs were obtained from the official WMO Cloud Atlas [45] and presented to the LLMs with the prompt: “Can you recognize clouds in that photo? Please classify them according to the latest WMO classification system, detailing genera, species, varieties, and any special cloud types” (Table 1). This task was then repeated in February 2025 with the expanded set of AI tools (Table 1). The results, comprising the cloud images, the responses from each AI system, their respective scores, and the correct classifications provided by WMO, are summarized in Table 1.
In the cloud recognition task, the latest version of ChatGPT achieved the highest average score (4.0), while the previous ChatGPT version (o1) performed reasonably well, with an average score of 3.7. Both the Copilot Academic version and Claude obtained third-place scores of 3.1, whereas the earliest version of Copilot (available via Bing) delivered the poorest performance, with an average score of 1.7 and the highest standard deviation (SD) of 1.9 and root mean square error (RMSE) of 3.8. Generally, Gemini and ChatGPT accurately identified the cloud genera; however, they occasionally misclassified clouds that appear visually similar (e.g., cirrocumulus versus stratocumulus, altocumulus versus stratocumulus, and stratus versus altostratus). Notably, the earliest version of ChatGPT appears to have utilized an outdated WMO classification—lacking categories such as asperatus and homogenitus—whereas Gemini employed the updated classification system. Differences in species and varieties were frequently observed, with these details often omitted by Copilot and occasionally skipped or incorrectly assigned by ChatGPT. It is important to note that the high scores may be partially attributable to the fact that images of clouds, along with their descriptions, are readily available on the WMO website [45]. To address this potential bias, all AI tools were subsequently evaluated using previously unpublished cloud photographs from a private collection—captured with a standard smartphone—using the same prompt as in phase 1. The results are presented in Table 2.
In the analysis of cloud photographs from a private collection, the highest average score (3.9) was achieved by the latest version of ChatGPT o3-mini. The next best performers—sharing second place—were the Gemini 2.0 and the Copilot Academic versions, both with average scores of 3.3, while the earliest version of Copilot (accessible via Bing) recorded the lowest performance. Notably, ChatGPT 4.o utilized an updated classification system that included a “homogenitus” special cloud and erroneously introduced the term “rectus”, which is not recognized as a valid classification item. In contrast, Gemini Advanced indicated that “contrail is not classified by the WMO system”, and its newer version even “invented” a new cloud type, termed “cirrostratomutatus”. Copilot (Bing) most frequently confused cloud genera and demonstrated significant limitations in identifying species, varieties, and accessory clouds with mean score of 2.0, standard deviation (SD) of 2.1, and root mean square error (RMSE) of 3.6. Meanwhile, Consensus and SciSpace achieved average scores of 3.0 and 3.1, respectively, and Claude and Perplexity scored 2.4 and 2.3. Overall, it is evident that newer versions of LLMs tend to outperform older ones in cloud recognition tasks, suggesting that models currently scoring lower may improve in future updates.
To evaluate the performance of each AI tool in regards to cloud type classification, precision, recall, and F1 scores were calculated based on true positive (TP), false positive (FP), false negative (FN), and true negative (TN) values across 18 cloud images. The best-performing tool was ChatGPT o3-mini, which achieved an F1 score of 0.83, indicating both high accuracy and consistency in identifying relevant cloud types (Figure 1). It showed the highest recall (0.85) and precision (0.81) among all models, reflecting its ability to detect most relevant cloud labels while making relatively few incorrect guesses.
Other tools, such as Gemini 2.0 and ChatGPT o-1, also demonstrated solid performance, with F1 scores of 0.76 and 0.75, respectively. While Gemini had high recall (0.85), it exhibited slightly lower precision (0.68), suggesting a tendency to overpredict. In contrast, Copilot Acad. and Claude 3.5 Sonnet showed more balanced, yet modest, results, with F1 scores of 0.67 and 0.65. These tools exhibited greater difficulty in precisely identifying cloud types, as evidenced by higher false positive and false negative counts. Overall, the results suggest that while all models were capable of basic cloud classification, only a few exhibited strong reliability across multiple classes.
To better understand the classification behavior of each AI tool across specific cloud types, an error rate matrix (Table 3) was calculated based on confusion matrix components (TP, FP, FN) for each model and cloud class. The error rate was defined as the proportion of incorrect classifications—including both false positives and false negatives—relative to all active predictions per class. The analysis revealed substantial differences between tools. ChatGPT o3-mini demonstrated the most consistent performance, achieving the lowest average error rate across all cloud types. In contrast, tools like Claude 3.5 Sonnet and Gemini 1.5 exhibited more frequent misclassifications, particularly for complex or less frequent cloud types such as cumulonimbus and cirrocumulus, where error rates often reached 1.0. Overall, cloud types like altostratus and altocumulus were predicted with relatively higher reliability, while the identification of cumulonimbus and cirrus types proved more challenging across all models.

3.2. Maps Interpretation

If LLMs can effectively recognize clouds in photographs, it stands to reason that they might also interpret and analyze patterns on maps. Theoretically, this task should be less challenging due to the maps’ limited color palette and clearly defined linear features, as opposed to the variability found in photographic imagery. For this experiment, AI tools that support image uploads were tasked with comparing two maps (Figure 2). They were prompted with: “Please write a paragraph comparing these two maps”, without any additional context regarding what the maps depict. The complete responses are documented in Table A1 in Appendix A. Higher scores for this task were assigned to outputs that included detailed geographic references, accurate temperature values, and region-specific summaries, reflecting deeper understanding of the provided maps.
The best performance (Table 4) was observed for Bard (by Google), which is no longer available and has since been replaced by the more advanced Gemini model. Bard’s output included precise temperature values, the percentage of the area exhibiting uniform temperature ranges, and supplementary information related to climate change. Comparable results were obtained using SciSpace and Claude 3.5 Sonnet. Both versions of ChatGPT (4.0 and 4.o), as well as DataAnalyst, delivered accurate interpretations of the data, correctly reporting numerical temperature values and drawing appropriate conclusions. In contrast, Gemini Advanced and Copilot produced more generalized descriptions, although their interpretations of the temperature data remained correct.
These results suggest that while nearly all tools were able to recognize general temperature trends, only a few—including Bard, Claude 3.5 Sonnet, and SciSpace—produced precise, region-specific interpretations. ChatGPT and Consensus delivered accurate conclusions but varied in descriptive detail. Overall, the ability to extract numerical values and regional patterns appears limited to only a few of the tested models.

3.3. Maps Creation

In meteorology and climatology, weather maps serve as fundamental tools for scientific analysis. The spatial distribution of weather and climate data facilitates the identification of regional patterns and dependencies. Generating maps using specialized software (e.g., GIS applications such as ArcGIS or QGIS) or by writing code in R or Python can be time-consuming. Even with the assistance of ChatGPT, the code produced is not always error-free [3].
Given that researchers often need to create many maps before selecting a few for publication, an important question arises: can large language models be used to quickly generate working drafts of maps?
The AI-generated map tasks in this study were exploratory in nature and focused on assessing whether the tools could produce basic, working visualizations from descriptive meteorological prompts. The evaluation did not include formal cartographic criteria such as geographic projection accuracy, exact color fidelity, or legend formatting. This was a deliberate choice, as the aim was not to assess publication-ready map quality, but rather to observe whether AI models could interpret geospatial instructions and create functional visual outputs for internal or educational use. Accordingly, the generated maps were not scored, and their analysis is provided in descriptive form.
Initially, only three LLMs supported the upload of Excel data files—namely, ChatGPT 4.0 and 4.o, DataAnalyst, and Gemini. The Excel file contained four columns: Station (the name of the weather station), longitude, latitude, and MEAN (mean air temperature for 2001–2023, as detailed in Table A2). Among these LLMs, only ChatGPT 4.o successfully generated a map depicting the spatial distribution of UTCI (see Figure 3), whereas the other models were unable to produce a map.
The experiment was repeated in February 2025 with the following outcomes:
  • Gemini 2.0 Flash (the only version at that time accepting file uploads rather than solely images) failed to produce a coherent map; instead, it generated an image in which the station names were densely aggregated and overlapping.
  • ChatGPT o3-mini-high generated Python code but was unable to render an actual map.
  • DataAnalyst reported that it could not detect the file, despite the file being clearly visible in the preview.
  • Claude 3.5 Sonnet (operating in experimental mode) produced a grey, irregular shape with only a few cities labeled and lacking an accompanying legend.
  • Consensus generated an output resembling a scatterplot that used “x” markers instead of bubbles; while the color coding and station positions were accurate, the map appeared disproportionately stretched in the vertical dimension.
  • Perplexity was incapable of producing any graphical output.
The generated map underwent several refinements, including bolding of country borders, the addition of a frame, and replacing “x” markers with bubbles. Despite multiple attempts, however, positioning the station names below the bubbles was unsuccessful. The map clearly indicates that Wrocław exhibits the highest annual temperature (10.1 °C), while Zakopane registers the lowest (6.5 °C). The color scale is appropriately calibrated to the data, and the spatial pattern—where southwestern Poland is the warmest and northeastern Poland the coldest—is effectively represented.
In summary, only ChatGPT 4.o and ScholarGPT succeeded in generating readable and informative weather maps from structured datasets. Other tools either failed to produce a map altogether or generated outputs lacking interpretability and spatial coherence. This highlights the variability in AI tools’ capabilities for geospatial visualization tasks.

3.4. Archive Data Retrieval

In many cases, archive climate or weather data is available solely in printed or published formats—such as official reports—that present the data in tables or graphs. While optical character recognition (OCR) can effectively extract numerical data from tables, a challenge arises when the data is represented exclusively as a line graph. This situation is further complicated if the original underlying data are lost, leaving only a digital image of the graph, as is the case in Figure 3. Given that LLMs have demonstrated the ability to accurately recognize cloud patterns from photographs, one might expect that extracting data from a simpler graphical representation—with a limited color palette and clearly defined lines—should be comparatively easier.
In Figure 4, a screen capture from our University website displays the current weather in the city center of Lublin, southeastern Poland. The second chart from the top depicts the air temperature over the period 18–19 July 2024. This chart was saved as a standard graphical file and was uploaded to several LLMs capable of processing images, including ChatGPT 4.o, Gemini, Copilot, and DataAnalyst. Later, in February 2025, additional models—namely ChatGPT o3-mini-high, Claude 3.5 Sonnet, PerplexityAI, Consensus, and SciSpace—also processed the file.
The results were not satisfactory—none of the tested LLMs produced outputs within a 0.2 °C difference from the original data (see Figure 5). The closest performance was achieved by Claude 3.5 Sonnet, which exhibited an average deviation of 0.4 °C, calculated as the mean absolute difference between the original and generated data. DataAnalyst produced a flat trend line, while Gemini yielded a similar result, with a slightly improved fit at the beginning and end of the series. ChatGPT 4.o and Copilot produced results that deviated even further from the original data. These results indicate a consistent weakness across all tested models in retrieving numerical data from graphical sources. Despite their success in recognizing visual elements like clouds or maps, none of the tools could accurately translate curved line graphs into data points. This underscores a significant limitation in their current ability to interpret visual data.

3.5. Filling the Gaps in Literature Review

Writing a literature review is often one of the most time-consuming tasks in preparing a scientific paper. While completely outsourcing this task to an LLM and claiming full credit would be unethical, employing AI tools to assist in the search and summarization of relevant publications could significantly streamline the process, allowing researchers to focus on the most pertinent studies for their topic. For instance, ChatGPT3.5 proved inadequate for this purpose due to its high propensity for hallucinations, including fabricating nonexistent papers and conflating author names, publication years, and journal titles [22].
In this study, 11 selected AI tools were tasked with identifying the five most important papers regarding humid heat waves. No specific criteria were provided regarding what constituted “importance”—whether it referred to the earliest works, the most recent studies, or those with the highest citation counts—and the task also required the inclusion of bibliographic details. The tools evaluated included ChatGPT4.0, ChatGPT4.o, ChatGPT o1-preview, ChatGPT o3-mini-high, Academic Assistant Pro, Scholar GPT, Consensus, Gemini Advanced, Copilot, and PerplexityAI. Scoring in this task was based on the number of relevant articles retrieved on the topic of humid heat waves, with preference given to sources from high-impact, peer-reviewed journals and accurate bibliographies. As summarized in Table 5, the highest scores were achieved by Consensus and ChatGPT o1-preview, both of which returned five publications directly related to humid heat waves, along with succinct summaries and complete bibliographies. ChatGPT4.o received a slightly lower score, as its selection was less comprehensive than that of Consensus. In other cases, the outputs were suboptimal: some tools provided fewer than five publications, omitted bibliographic details, or returned links to newspaper articles instead of peer-reviewed scientific journals.
Overall, the most effective tools for identifying and summarizing scientific literature were Consensus and ChatGPT o1-preview. They not only identified highly relevant sources but also provided bibliographic metadata. Other models performed inconsistently, often returning incomplete or off-topic results. This suggests that while AI tools can aid in literature reviews, the results vary widely by platform and version.

4. Discussion

The application of large language models (LLMs) in scientific research is not a novel concept; numerous studies have demonstrated their utility across fields such as medicine [4,5], biology and stem cells research [46], chemistry [6], hydrology and Earth sciences [2,3,11,12,13,14,47], language learning [48], education [8,49], and many others. In these studies, ChatGPT and similar LLMs have been highlighted as powerful tools, albeit with certain potential risks and limitations. While these models excel in generating, debugging, and explaining computer code [3], they may struggle to grasp more complex, underlying scientific issues—particularly in their earlier versions [22]. Nonetheless, given the rapid and inevitable progress in AI, it is essential to continuously evaluate the capabilities of these evolving systems.
Newer versions of LLMs are not limited to processing text; they are now capable of handling graphical and numerical data as well. For example, the cloud recognition task demonstrates that ChatGPT o3-mini can accurately identify and classify clouds—assigning types, genera, species, and additional details in accordance with the latest WMO classification system. Notably, many of the errors made by these models mirror common human mistakes; in my experience as an academic instructor, students frequently confuse similar cloud types, such as Ac with Cc and St with At. With further training, such AI systems could potentially be deployed to automatically recognize and classify clouds from sources like surveillance cameras or social media, particularly in regions not covered by standard weather stations. This capability could facilitate the early detection of hazardous cloud formations, potentially mitigating extreme weather impacts or other dangerous weather phenomena, allowing for early warnings to prevent severe events like thunderstorms, tornadoes, or tropical cyclones.
As weather and climate scientists, we often rely on maps for data verification, and a rapid draft map can considerably accelerate the research process. In September 2024, only ChatGPT 4.o managed to generate a map by producing Python code that users could review and modify to meet their specific needs—a useful feature for researchers with limited coding expertise. Subsequent versions of ChatGPT and ScholarGPT have improved upon this, producing maps that include additional details such as rivers, water bodies, and country boundaries. The interactive nature of these tools enables rapid adjustments to map elements (e.g., the map type, legend positioning, color scale, and border or river delineations), making them valuable for quick data verification, although the resulting maps may not be suitable as final publication products. Moreover, these tools hold significant potential in educational settings, enabling students to create maps based on their own collected or calculated data.
Beyond cloud recognition, LLMs are also capable of interpreting map content. Although the descriptions they generate vary in precision and depth—particularly in terms of numerical details, legend interpretation, and regional differences—all models were able to detect patterns of temperature increase and establish connections to climate change. With further refinement and training, AI tools may even reveal previously unrecognized patterns, aid in the screening of automatically generated maps for errors, and pre-select maps of particular interest to researchers.
However, in the task of archival data retrieval, all the tested tools performed poorly. Even though these models can recognize intricate cloud patterns and detailed maps, they struggle to extract precise numerical data from a simple three-color line graph—specifically, they cannot accurately pinpoint values at the intersection of a curved temperature line and an x–y Cartesian coordinate system. This limitation is presumably due to the fact that the models were primarily trained on cloud images and published maps rather than on graphs. This represents a significant area for future improvement.
While this study provides a broad assessment of publicly available large language models across meteorological tasks, it is important to recognize its limitations. Specifically, advanced AI architectures such as self-attention, multi-headed attention, and image super-resolution—commonly employed in cutting-edge deep learning research—were not explicitly tested here. Instead, the evaluation focused on widely accessible AI tools as black-box systems, without deconstructing their internal mechanisms. As a result, the performance comparisons reflect end-user capabilities rather than an in-depth assessment of modern neural network structures. Future work could involve controlled experimentation with fine-tuned transformer models or custom-trained architectures to further explore their utility in climate and weather-related applications.
Another important limitation of this study lies in the treatment of AI tools as autonomous analytical systems. While the evaluated models demonstrate impressive capabilities in processing meteorological information, their outputs should not be interpreted as substitutes for expert human analysis. Numerous studies have emphasized the irreplaceable role of human expertise in interpreting complex geoscientific data—particularly in contexts involving uncertainty, regional variability, and hazard assessment [18,19,50,51]. Just as AI-assisted workflows in fields such as archaeology still rely on human contextual judgment, meteorology and climatology similarly demand a collaborative human–AI framework. The limitations of generic AI tools in capturing nuanced domain-specific reasoning further highlight the necessity for human-in-the-loop systems in operational and research settings.
A further consideration is the geographical scope of the study, which includes tasks that are geographically anchored in Poland, such as interpreting and creating climate maps and retrieving historical data from Polish meteorological services. While these elements are region-specific, the underlying functions—like map reading or archival data retrieval from charts—are common to many countries. Moreover, other tasks in this study, such as cloud classification or improving literature reviews regarding humid heat waves, are not tied to any specific location. As such, although some caution is advised in applying these results globally, I believe the observed performance trends remain relevant beyond the Polish context.
Science remains one of the most critical areas for human progress, offering both tremendous benefits—such as the development of new medicines—and risks, including the potential misuse of research for creating new poisons [7]. Given the vast and continually growing body of scientific literature, AI tools can be invaluable in helping researchers pre-select relevant papers, thereby streamlining the review process. However, these tools should be used only as a supplement to traditional research methods, as they may sometimes omit important studies or inadvertently exclude older publications that are less accessible.

5. Summary and Conclusions

This study evaluated a range of AI tools across several scientific applications in meteorology and climatology, yielding insights into their strengths and limitations.
  • Cloud recognition—Among the tested models, ChatGPT o3-mini achieved the highest performance for both internet-sourced and privately collected cloud images, with an average score approaching 4 out of 5. ChatGPT o1 and Gemini Advanced 2.0 ranked as the second best, while early versions of Copilot performed the poorest—although its Academic version showed marked improvement (average score increasing from 1.8 to 3.2). Common errors across models included the misclassification of visually similar cloud types and occasional confusion between species and genera. Notably, LLMs generally performed better with internet-sourced images than with those from a private collection.
  • Map creation from the data—ChatGPT 4.o emerged as the most efficient tool in generating maps from Excel datasets. Although it produced a simple initial map, subsequent iterations incorporated improved details—such as rivers, water bodies, and clearly defined country borders—enhancing the overall quality. In contrast, neither Gemini nor Copilot managed to generate maps effectively from the given coordinates and values. ScholarGPT produced maps comparable to that of ChatGPT 4.o.
  • Map interpretation—In interpreting maps, the highest precision was initially demonstrated by Bard (no longer available), with subsequent satisfactory performance observed for Claude 3.5 Sonnet and SciSpace. Both versions of ChatGPT (4.0 and 4.o), along with Consensus and DataAnalyst, delivered accurate interpretations of temperature patterns, although the level of descriptive detail varied. Generally, all models correctly identified the trends and regional differences, even if the granularity of their descriptions differed.
  • Filling the gaps in literature review—When tasked with retrieving and summarizing significant scientific publications on humid heat waves, Consensus and ChatGPT o1-preview excelled by providing five highly relevant papers, complete with succinct summaries and full bibliographies. ChatGPT 4.o and ChatGPT o3-mini-high followed in performance levels, whereas other tools occasionally returned incomplete lists or non-peer-reviewed sources. Compared to the early version of ChatGPT 3.5—which frequently hallucinated details [17]—the current models consistently delivered accurate bibliographic information, representing a substantial improvement.
  • Archive data retrieval—All evaluated AI tools struggled with extracting accurate numerical data from graphical representations. Claude 3.5 Sonnet produced the closest results, with an average deviation of approximately 0.4 °C; Gemini 2.0 and ChatGPT o1 followed, with errors of 1.3 °C and 1.5 °C, respectively. This limitation underscores a significant gap in current LLM capabilities regarding the precise extraction of data from simple line graphs.
The capabilities and accessibility of AI tools are rapidly evolving—for example, Bard (by Google) is no longer available, but recently, DeepSeek was introduced, which demonstrates promising performance regarding providing help with literature review. That is why constant research and tests are needed to determine both the possibilities and limitations of newly created AI tools, together with ethical aspects of their applications. The limitations of AI in weather and climate sciences are still present, but during a year of testing AI tools, we observe tremendous development and improvement. Current AI tools fail in regards to numerical data extraction from graphs and charts, but show quite good performance in recognizing and classifying clouds. While they can generate and interpret weather maps, their accuracy is insufficient for expert-level analysis [8]. This creates a need for systematic evaluation of AI capabilities in meteorological and climate applications. The results indicate that while these systems offer significant potential in areas such as cloud classification, map interpretation, and literature review support, their performance remains inconsistent and varies widely between models and tasks. Tools such as ChatGPT o3-mini and Gemini 2.0 demonstrated relatively strong performance for classification tasks, while Consensus and SciSpace showed promise in bibliographic analysis and source interpretation.
However, limitations are also evident. The inability of most tools to accurately extract numerical data from graphical sources highlights a major gap in their current capabilities. Similarly, the inconsistent quality of AI-generated maps and frequent errors in historical data retrieval suggest that while these systems can assist with meteorological analysis, they are not yet ready to operate autonomously in high-stakes or precision-critical environments. These results align with those from previous research noting the challenges of applying general-purpose LLMs to scientific domains [44,50].
The findings suggest that AI tools can serve as assistive technologies—supporting early-stage analysis, educational tasks, or exploratory map generation—rather than as replacements for domain experts. Future research should focus on improving the ability of AI models to interpret structured geoscientific data, such as time series, gridded weather data, and ensemble model outputs. There is also a need for dedicated evaluation benchmarks that reflect domain-specific complexity, such as WMO classification systems or geospatial metadata interpretation.
Finally, as LLMs continue to evolve rapidly, it is essential to establish standardized protocols for evaluating their performance over time. Repeated testing across model updates and platforms will help determine whether these tools can achieve the consistency and reliability needed for broader adoption in meteorology and climate science.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new datasets were created. A few generated maps and mostly incorrect datasets retrieved from the scanned graph are presented in the paper.

Acknowledgments

The author would like to thank Si-Yu Yu from the Department of Architecture at National Cheng Kung University, Taiwan, for sharing the idea that AI cloud recognition could be applied to recognize clouds from outdoor video cameras. Also, the author would like to thank Dorota Matuszko for her help with the proper classification of cloud photos from private collections used in this paper.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A

Table A1. The full answers given by different AI models in interpreting and comparing two maps. Prompt: Please write a paragraph comparing these two maps.
Table A1. The full answers given by different AI models in interpreting and comparing two maps. Prompt: Please write a paragraph comparing these two maps.
AI ToolResult (Accuracy, Correctness, Precision, etc.)Scores
ChatGPT4.oThe two maps displayed illustrate the average annual air temperature across Poland for two different time periods: 1961–1990 and 2011–2020. The left map, representing the period 1961–1990, shows a predominantly cooler climate, with average temperatures ranging from 7.0 °C to 8.5 °C. This cooler trend is especially evident in the northern and northeastern regions, where temperatures are as low as 7.0 °C. In contrast, the right map, representing the period 2011–2020, shows a significant warming across the entire country, with average temperatures rising to between 8.5 °C and 10.0 °C or higher. The most noticeable increase in temperature occurs in the western and southwestern regions, where temperatures now exceed 9.0 °C, and in some areas, even approach 11.0 °C. This stark contrast between the two periods underscores the rapid warming that Poland has experienced over recent decades, reflecting broader global climate change trends. The shift towards warmer temperatures has significant implications for Poland’s climate, agriculture, and ecosystems.4
ChatGPT4.0The two maps in the image represent the average annual air temperature in Poland across two different periods, 1961–1990 and 2011–2020, as indicated by the labels. On the left, the map for 1961–1990 shows a cooler climate, with an average temperature of 7.5 °C. This is visualized by a dominance of greens and yellows, indicating cooler temperatures. On the right, the map for 2011–2020 illustrates a significant warming with an average temperature of 9.1 °C. This period is represented by warmer colors like reds and deep oranges, indicating higher temperatures. The shift from predominantly green and yellow to red and orange between the two maps highlights a clear trend of rising temperatures over the fifty-year span. This comparison suggests a notable increase in temperatures, possibly reflecting broader climate change trends affecting the region.4
Bard (March 2024)As can be seen from the maps, the average annual temperature in Poland has increased by approximately 1.6 °C over the past 30 years. This warming trend is most pronounced in the western and central parts of the country, where the average temperature increase has been greater than 2 °C.
The warming trend is evident in the distribution of temperature ranges across the two maps. In the 1961–1990 period, the majority of Poland (approximately 70%) had an average annual temperature between 6 °C and 8 °C. However, in the 2011–2020 period, this temperature range only accounted for approximately 40% of the country. In contrast, the percentage of Poland with an average annual temperature above 8 °C has increased from 20% to 40% over the same period.
The warming trend in Poland is consistent with the global trend of climate change. The Intergovernmental Panel on Climate Change (IPCC) has concluded that it is extremely likely that human activities caused more than half of the observed increase in global mean surface temperature from 1951 to 2010.
The warming trend in Poland is expected to continue in the future. The IPCC’s Sixth Assessment Report, released in 2022, projects that global average surface temperatures are expected to rise by 0.9 to 3.6 degrees Celsius (1.6 to 6.5 degrees Fahrenheit) by the end of the 21st century. This rise is expected to be unevenly distributed across the globe, with some regions experiencing more warming than others.
The projected temperature rise in Poland will have a range of impacts on the environment, economy, and society. These impacts include:
  • Changes in precipitation patterns
  • Increased risk of floods and landslides
  • Changes in agricultural production
  • Spread of pests and diseases
  • Increased risk of heat waves
  • In order to mitigate the impacts of climate change in Poland, it is essential to reduce greenhouse gas emissions. This can be achieved through a range of measures, including:
  • Transitioning to renewable energy sources
  • Improving energy efficiency
  • Changing land use practices
  • By taking action now, we can reduce the impacts of climate change and protect the future of Poland and the planet.
5
Gemini Advanced (August 2024)The two maps visually depict the change in average annual air temperature in Poland between two time periods: 1961–1990 and 2011–2020. The map on the left, representing the earlier period, shows a predominance of cooler colors, particularly yellow and light green, indicating lower average temperatures. In contrast, the map on the right, depicting the more recent period, is dominated by warmer colors, notably orange and red, signifying a significant increase in average temperatures across Poland. This stark color shift between the two maps clearly illustrates a warming trend in the country’s climate over the three decades3
DataAnalystThese two maps present a striking visual comparison of the average annual air temperature in Poland across two different time periods: 1961–1990 and 2011–2020. The left map, representing the earlier period, shows a cooler color palette, dominated by greens and yellows, indicating average temperatures mainly between 7.5 °C and 8.5 °C, with the overall average for the period marked at 7.5 °C. In contrast, the right map, corresponding to the later period, is illustrated with a much warmer color scheme, with reds and deep oranges predominating, showing average temperatures frequently above 9 °C, some areas even nearing 10 °C, with the period average notably higher at 9.1 °C. This comparative visualization underscores a significant increase in average temperatures over the decades, highlighting a possible warming trend in Poland’s climate.4
CopilotThe two maps provided depict the average annual air temperature in Poland for two distinct periods: 1961–1990 and 2011–2020. The map for the period 1961–1990 is predominantly green and yellow, indicating a cooler climate with an average temperature of 7.5 °C. In contrast, the map for the period 2011–2020 is dominated by red hues, signifying a warmer climate with an increased average temperature of 9.1 °C. This comparison highlights a noticeable increase in temperature over the decades, which could be indicative of climate change impacts. The color scale on the right side of the maps provides a visual guide for interpreting the temperatures represented by each color, ranging from −1 °C (purple) to 13 °C (dark red).3
ChatGPT o3-mini-highThese two maps illustrate a clear increase in the average annual air temperature in Poland when comparing the period 1961–1990 (left) with 2011–2020 (right). The left map, representing 1961–1990, predominantly features cooler shades of green and yellow, indicating average temperatures around 7.5 °C. In contrast, the right map for 2011–2020 is dominated by warmer hues of red and orange, reflecting an average temperature closer to 9.1 °C. The overall shift from greens and yellows to reds and oranges demonstrates that nearly every region of Poland has experienced a noticeable rise in mean annual temperatures, underscoring the warming trend over the last few decades.3
Claude 3.7 Sonnet (February 2025)The image shows a stark comparison of average annual air temperatures in Poland between two time periods: 1961–1990 and 2011–2020. The left map displays a national average of 7.5 °C during 1961–1990, with a predominance of green and yellow colors indicating cooler temperatures, particularly in the northern and eastern regions. The right map reveals a significant warming trend, with the national average rising to 9.1 °C in 2011–2020—an increase of 1.6 °C. This more recent map is dominated by red and orange hues across most of the country, showing that areas that were previously cooler have experienced substantial warming. The temperature gradient is preserved, with the southwestern and central parts of Poland remaining the warmest in both periods, while the mountainous southern regions and northeastern areas remain relatively cooler. However, the overall shift toward higher temperatures is unmistakable, with virtually the entire country experiencing temperatures above 8 °C in the recent period, compared to much more varied temperatures in the earlier timeframe.5
Perplexity (February 2025)The two maps illustrate a clear increase in average annual air temperature in Poland between the periods of 1961–1990 and 2011–2020. The first map, representing 1961–1990, displays a cooler temperature range, predominantly in shades of green and yellow, with an average of 7.5 °C. In contrast, the second map, covering 2011–2020, is significantly warmer, with most of the country depicted in shades of orange and red, indicating higher temperatures and an overall average of 9.1 °C. This shift towards warmer colors across the entire country demonstrates a notable warming trend over the decades.3
Consensus (February 2025)The two maps illustrate the average annual air temperature in Poland for two time periods: 1961–1990 and 2011–2020. The left map, representing 1961–1990, shows predominantly green and yellow shades, indicating lower temperatures, with an overall average of 7.5 °C. In contrast, the right map, for 2011–2020, is dominated by red and orange shades, reflecting significantly higher temperatures, with an average of 9.1 °C. This shift suggests a clear warming trend over time, with nearly all regions experiencing a temperature increase. The most noticeable warming is seen in central and western Poland, where temperatures have risen substantially. The change in color distribution between the maps highlights the impact of climate change, leading to a much warmer climate across the country.4
SciSpace (February 2025)The two maps compare the average annual air temperature in Poland for two different periods: 1961–1990 and 2011–2020. The left map, representing 1961–1990, shows predominantly green and yellow shades, indicating lower temperatures, with a national average of 7.5 °C. Cooler areas, particularly in the northeast and mountainous regions in the south, have temperatures around 6–7 °C, while the western and southwestern parts of the country reach up to 8.5 °C. In contrast, the right map, depicting 2011–2020, is dominated by red and orange shades, reflecting a significant temperature increase, with an average of 9.1 °C. The warming trend is evident across all regions, with the previously cooler northeast now reaching around 8.0–8.5 °C, and the western and southwestern regions exceeding 9.5 °C. This striking difference visually represents climate change’s impact, illustrating a clear rise in temperatures over time. The change suggests a significant warming trend over the past decades, likely linked to global climate change and its regional effects in Poland.
Table A2. Data of mean air temperature in Poland in the 21st century (2001–2023), which was used to create a map in ChatGPT4.o.
Table A2. Data of mean air temperature in Poland in the 21st century (2001–2023), which was used to create a map in ChatGPT4.o.
STATIONlonlatMEAN
BIAŁYSTOK23.1622813153.107259027.9
BIELSKO-BIAŁA19.0023379249.806707599.2
CHOJNICE17.5326310253.715200888.4
CZĘSTOCHOWA19.0915425350.811836999.0
ELBLĄG19.5435471954.223195968.4
GDAŃSK *18.6979445254.399777359.1
GORZÓW WLKP15.2772592752.741229529.7
HEL18.8119166154.603606749.1
JELENIA GÓRA15.7888130250.900252718.3
KALISZ18.0809220951.781151339.5
KATOWICE19.0327412450.240567549.2
KĘTRZYN21.3666599254.067187898.3
KIELCE20.6922103250.810478278.5
KŁODZKO16.6142253950.436891268.4
KOŁOBRZEG15.3889851254.158441669.2
KOSZALIN16.1551672754.204542359.1
KOZIENICE21.5436214951.564784869.0
KRAKÓW19.7948837550.077706649.2
KROSNO21.7691743949.706737928.9
ŁEBA17.5348103454.75367648.7
LĘBORK17.7568494754.553025368.1
LEGNICA16.207662251.192518119.9
LESKO22.3416978649.466471338.4
LESZNO16.5347266651.835549519.5
ŁÓDŹ19.3871776851.718333799.1
LUBLIN22.3931127251.216745428.6
MIKOŁAJKI21.5895741453.789143498.3
MŁAWA20.3610993353.104169198.5
NOWY SĄCZ20.6885999949.627139629.1
OLSZTYN20.4213508553.768577668.3
OPOLE17.9688964350.626986599.9
PIŁA16.7472415553.130546959.1
PŁOCK19.725765152.588426348.8
POZNAŃ16.8346045552.416547449.7
RACIBÓRZ18.190852550.061046859.2
RZESZÓW22.0421051850.110699959.2
SANDOMIERZ21.7159216550.696627788.6
SIEDLCE22.2447745452.180978758.6
SŁUBICE14.6195902452.34852059.8
SULEJÓW19.8664291351.353336448.6
SUWAŁKI22.9488781254.130847977.5
SZCZECIN14.6227269253.395268779.8
TARNÓW20.9839404750.029858379.6
TERESPOL23.6219831452.078660058.6
TORUŃ18.595447653.042081089.2
USTKA16.8540967854.588306339.1
WARSZAWA20.9610910552.162849519.4
WIELUŃ18.5567064551.210221739.4
WŁODAWA23.5295173151.553467598.7
WROCŁAW16.8998933751.1031925610.1
ZAKOPANE19.9603175649.293820546.5
ZIELONA GÓRA15.5246564551.929927029.7
* This station was moved to another location. The data included here are from both, but this series is NOT homogenous. The data are used to show the possibilities of AI tools, not to draw conclusions about air temperature in Poland.

References

  1. Haleem, A.; Javaid, M.; Singh, R.P. An Era of ChatGPT as a Significant Futuristic Support Tool: A Study on Features, Abilities, and Challenges. BenchCouncil Trans. Benchmarks Stand. Eval. 2022, 2, 100089. [Google Scholar] [CrossRef]
  2. Vaghefi, S.A.; Stammbach, D.; Muccione, V.; Bingler, J.; Ni, J.; Kraus, M.; Allen, S.; Colesanti-Senni, C.; Wekhof, T.; Schimanski, T.; et al. ChatClimate: Grounding Conversational AI in Climate Science. Commun. Earth Environ. 2023, 4, 480. [Google Scholar] [CrossRef]
  3. Foroumandi, E.; Moradkhani, H.; Sanchez-Vila, X.; Singha, K.; Castelletti, A.; Destouni, G. ChatGPT in Hydrology and Earth Sciences: Opportunities, Prospects, and Concerns. Water Resour. Res. 2023, 59, e2023WR036288. [Google Scholar] [CrossRef]
  4. Grünebaum, A.; Chervenak, J.; Pollet, S.L.; Katz, A.; Chervenak, F.A. The Exciting Potential for ChatGPT in Obstetrics and Gynecology. Am. J. Obstet. Gynecol. 2023, 228, 696–705. [Google Scholar] [CrossRef]
  5. Raile, P. The Usefulness of ChatGPT for Psychotherapists and Patients. Humanit. Soc. Sci. Commun. 2024, 11, 47. [Google Scholar] [CrossRef]
  6. Castro Nascimento, C.M.; Pimentel, A.S. Do Large Language Models Understand Chemistry? A Conversation with ChatGPT. J. Chem. Inf. Model. 2023, 63, 1649–1655. [Google Scholar] [CrossRef]
  7. Urbina, F.; Lentzos, F.; Invernizzi, C.; Ekins, S. Dual Use of Artificial-Intelligence-Powered Drug Discovery. Nat. Mach. Intell. 2022, 4, 189–191. [Google Scholar] [CrossRef]
  8. Sarker, D.; Eiaz-Ur-Rahman, A.F.M.; Sakib, A.R.; Terano, H.J.R.; Rahman, M.M. ChatGPT’s Applications in Higher Education: Unmasking Opportunities and Challenges. J. Educ. Manag. Dev. Stud. 2023, 3, 37–47. [Google Scholar] [CrossRef]
  9. Thorp, H.H. ChatGPT Is Fun, but Not an Author. Science 2023, 379, 313. [Google Scholar] [CrossRef]
  10. Watkins, R. Guidance for Researchers and Peer-Reviewers on the Ethical Use of Large Language Models (LLMs) in Scientific Research Workflows. AI Ethics 2023, 4, 969–974. [Google Scholar] [CrossRef]
  11. Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Accurate Medium-Range Global Weather Forecasting with 3D Neural Networks. Nature 2023, 619, 533–538. [Google Scholar] [CrossRef] [PubMed]
  12. Wang, W.; Zhang, J.; Su, Q.; Chai, X.; Lu, J.; Ni, W.; Duan, B.; Ren, K. Accurate Initial Field Estimation for Weather Forecasting with a Variational Constrained Neural Network. NPJ Clim. Atmos. Sci. 2024, 7, 223. [Google Scholar] [CrossRef]
  13. Pu, J.; Mu, M.; Feng, J.; Zhong, X.; Li, H. A Fast Physics-Based Perturbation Generator of Machine Learning Weather Model for Efficient Ensemble Forecasts of Tropical Cyclone Track. NPJ Clim. Atmos. Sci. 2025, 8, 128. [Google Scholar] [CrossRef]
  14. Maddy, E.S.; Boukabara, S.A.; Iturbide-Sanchez, F. Assessing the Feasibility of an NWP Satellite Data Assimilation System Entirely Based on AI Techniques. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 9828–9845. [Google Scholar] [CrossRef]
  15. Biswas, S.S. Potential Use of Chat GPT in Global Warming. Ann. Biomed. Eng. 2023, 51, 3–4. [Google Scholar] [CrossRef] [PubMed]
  16. Li, X.; Jiang, Y.; Mostafavi, A. AI-Assisted Protective Action: Study of ChatGPT as an Information Source for a Population Facing Climate Hazards. arXiv 2023, arXiv:2304.06124. [Google Scholar]
  17. Salekpay, F.; van den Bergh, J.; Savin, I. Comparing Advice on Climate Policy between Academic Experts and ChatGPT. Ecol. Econ. 2024, 226, 108352. [Google Scholar] [CrossRef]
  18. McGovern, A.; Ebert-Uphoff, I.; Barnes, E.A.; Bostrom, A.; Cains, M.G.; Davis, P.; Demuth, J.L.; Diochnos, D.I.; Fagg, A.H.; Tissot, P.; et al. The NSF AI Institute for Research on Trustworthy AI for Weather, Climate, and Coastal Oceanography (AI2ES). Bull. Am. Meteorol. Soc. 2022, 103, E1658–E1668. [Google Scholar] [CrossRef]
  19. Bostrom, A.; Demuth, J.L.; Wirz, C.D.; Cains, M.G.; Schumacher, A.; Madlambayan, D.; Bansal, A.S.; Bearth, A.; Chase, R.; Crosman, K.M.; et al. Trust and Trustworthy Artificial Intelligence: A Research Agenda for AI in the Environmental Sciences. Risk Anal. 2024, 44, 1498–1513. [Google Scholar] [CrossRef]
  20. OpenAI ChatGPT. Available online: https://chatgpt.com/ (accessed on 1 February 2024).
  21. OpenAI. Available online: https://openai.com/ (accessed on 19 June 2024).
  22. Krzyżewska, A. Climate Change in Poland—The Assessment of the Conversation with ChatGPT. Misc. Geogr. 2024, 28, 5–12. [Google Scholar] [CrossRef]
  23. Google Gemini Advanced. Available online: https://gemini.google.com/app (accessed on 3 April 2025).
  24. Google. Available online: https://about.google (accessed on 3 April 2025).
  25. Microsoft Copilot. Available online: https://copilot.microsoft.com (accessed on 3 April 2025).
  26. Microsoft. Available online: https://www.microsoft.com (accessed on 3 April 2025).
  27. Microsoft Blog Microsoft. Available online: https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web (accessed on 10 September 2024).
  28. PerplexityAI. Available online: https://www.perplexity.ai (accessed on 10 September 2024).
  29. Perplexity AI Raises $73.6M in Funding Round Led by Nvidia, Bezos, Now Valued at $522M. Available online: https://www.businesstoday.in/entrepreneurship/start-up/story/perplexity-ai-raises-736m-in-funding-round-led-by-nvidia-bezos-now-valued-at-522m-412130-2024-01-06 (accessed on 10 September 2024).
  30. Anthropic. Available online: https://claude.ai/ (accessed on 18 February 2025).
  31. Whitney, L. Anthropic’s Claude 3 Chatbot Claims to Outperform ChatGPT, Gemini. Available online: https://www.zdnet.com/article/anthropics-claude-3-chatbot-claims-to-outperform-chatgpt-gemini/ (accessed on 18 February 2025).
  32. DataAnalyst. Available online: https://chatgpt.com/g/g-HMNcP6w7d-data-analyst (accessed on 3 April 2025).
  33. Consensus. Available online: https://chatgpt.com/g/g-bo0FiWLY7-consensus (accessed on 3 April 2025).
  34. ConsensusApp. Available online: https://consensus.app/home/about-us/ (accessed on 10 September 2024).
  35. scispace.com SciSpace. Available online: https://typeset.io/ (accessed on 18 February 2025).
  36. ScholarGPT. Available online: https://chatgpt.com/g/g-kZ0eYXlJe-scholar-gpt (accessed on 18 February 2025).
  37. Awesomegpts.Ai. Available online: https://awesomegpt.vip/ (accessed on 18 February 2025).
  38. Academic Assistant Pro. Available online: https://chatgpt.com/g/g-UubB08D4y-academic-assistant-pro (accessed on 3 April 2025).
  39. @gpt_boost. Available online: https://github.com/ai-boost (accessed on 18 February 2025).
  40. Hangzhou DeepSeek Artificial Intelligence Basic Technology Research DeepSeek (Hangzhou, China). Available online: https://chat.deepseek.com/ (accessed on 18 February 2025).
  41. Bloomberg.com Hangzhou DeepSeek Artificial Intelligence Basic Technology Research. Available online: https://www.bloomberg.com/profile/company/2544189D:CH (accessed on 18 February 2025).
  42. Khanifar, J. Evaluating AI-Generated Responses from Different Chatbots to Soil Science-Related Questions. Soil. Adv. 2025, 3, 100034. [Google Scholar] [CrossRef]
  43. Cascella, M.; Semeraro, F.; Montomoli, J.; Bellini, V.; Piazza, O.; Bignami, E. The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives. J. Med. Syst. 2024, 48, 22. [Google Scholar] [CrossRef]
  44. Şahin, M.F.; Doğan, Ç.; Topkaç, E.C.; Şeramet, S.; Tuncer, F.B.; Yazıcı, C.M. Which Current Chatbot Is More Competent in Urological Theoretical Knowledge? A Comparative Analysis by the European Board of Urology in-Service Assessment. World J. Urol. 2025, 43, 116. [Google Scholar] [CrossRef]
  45. WMO WMO Cloud Atlas. Available online: https://cloudatlas.wmo.int/en/search-image-gallery.html (accessed on 19 June 2024).
  46. Cahan, P.; Treutlein, B. A Conversation with ChatGPT on the Role of Computational Systems Biology in Stem Cell Research. Stem Cell Rep. 2023, 18, 1–2. [Google Scholar] [CrossRef]
  47. An, H.; Li, X.; Huang, Y.; Wang, W.; Wu, Y.; Liu, L.; Ling, W.; Li, W.; Zhao, H.; Lu, D.; et al. A New ChatGPT-Empowered, Easy-to-Use Machine Learning Paradigm for Environmental Science. Eco-Environ. Health 2024, 3, 131–136. [Google Scholar] [CrossRef] [PubMed]
  48. Lai, V.D.; Ngo, N.T.; Veyseh, A.P.B.; Man, H.; Dernoncourt, F.; Bui, T.; Nguyen, T.H. ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning. In Findings of the Association for Computational Linguistics: EMNLP 2023; Association for Computational Linguistics: Singapore, 2023; pp. 13171–13189. [Google Scholar] [CrossRef]
  49. Cooper, G. Examining Science Education in ChatGPT: An Exploratory Study of Generative Artificial Intelligence. J. Sci. Educ. Technol. 2023, 32, 444–452. [Google Scholar] [CrossRef]
  50. Casini, L.; Marchetti, N.; Montanucci, A.; Orrù, V.; Roccetti, M. A Human–AI Collaboration Workflow for Archaeological Sites Detection. Sci. Rep. 2023, 13, 8699. [Google Scholar] [CrossRef]
  51. McGovern, A.; Demuth, J.; Bostrom, A.; Wirz, C.D.; Tissot, P.E.; Cains, M.G.; Musgrave, K.D. The Value of Convergence Research for Developing Trustworthy AI for Weather, Climate, and Ocean Hazards. NPJ Nat. Hazards 2024, 1, 13. [Google Scholar] [CrossRef]
Figure 1. Precision, recall, and F1 score by AI tool in the classification of cloud types.
Figure 1. Precision, recall, and F1 score by AI tool in the classification of cloud types.
Atmosphere 16 00490 g001
Figure 2. Maps of air temperature change in Poland (source: naukaoklimacie.pl, elaboration: Piotr Djaków, based on IMGW-PIB data).
Figure 2. Maps of air temperature change in Poland (source: naukaoklimacie.pl, elaboration: Piotr Djaków, based on IMGW-PIB data).
Atmosphere 16 00490 g002
Figure 3. Maps of mean annual UTCI (a,b) and air temperature (c) created by ChatGPT4.0 and 4.o. Map (a) was created in February 2024 by ChatGPT4.0; map (b) in June 2024; map (c) in September 2024 (both by ChatGPT4.o); map (d) in February 2025 by ScholarGPT. Prompt: This is the file with the names of meteorological stations in Poland, their longitude and latitude and mean air temperature. Can you draw a bubble map of Poland based on ‘MEAN’ column? Add a legend to the map and names of the stations below the points.
Figure 3. Maps of mean annual UTCI (a,b) and air temperature (c) created by ChatGPT4.0 and 4.o. Map (a) was created in February 2024 by ChatGPT4.0; map (b) in June 2024; map (c) in September 2024 (both by ChatGPT4.o); map (d) in February 2025 by ScholarGPT. Prompt: This is the file with the names of meteorological stations in Poland, their longitude and latitude and mean air temperature. Can you draw a bubble map of Poland based on ‘MEAN’ column? Add a legend to the map and names of the stations below the points.
Atmosphere 16 00490 g003
Figure 4. Meteorological parameters during 18–19 July 2024 in Lublin, Poland (from above: wind speed [m/s], air temperature [°C], heat index [°C], relative humidity [%], station level pressure [hPa], precipitation [mm]). The Air temperature chart, used for the study, is marked with red box. Prompt: This is the chart of some meteorological data. The second chart from the top is air temperature in degrees Celsius in days 18–19 July 2024. At the bottom you have axis with the hours. Can you give me exact values of the air temperature for each hour? Start with 10.00, than 11.00 and end with 8:00.
Figure 4. Meteorological parameters during 18–19 July 2024 in Lublin, Poland (from above: wind speed [m/s], air temperature [°C], heat index [°C], relative humidity [%], station level pressure [hPa], precipitation [mm]). The Air temperature chart, used for the study, is marked with red box. Prompt: This is the chart of some meteorological data. The second chart from the top is air temperature in degrees Celsius in days 18–19 July 2024. At the bottom you have axis with the hours. Can you give me exact values of the air temperature for each hour? Start with 10.00, than 11.00 and end with 8:00.
Atmosphere 16 00490 g004
Figure 5. The comparison of the results of the archive data retrieval by different AI tools. None of the tested tool was able to accurately retrieve meteorological data from an archived image.
Figure 5. The comparison of the results of the archive data retrieval by different AI tools. None of the tested tool was able to accurately retrieve meteorological data from an archived image.
Atmosphere 16 00490 g005
Table 1. (a) Cloud recognition from WMO Cloud Atlas [45]. Prompt: Can you recognize clouds in that photo? Please classify them according to the newest WMO classification system with details about genera, species, varieties, special clouds, etc. (May 2024 and February 2025 for ChatGPT-o1 in green, o3 in orange, Gemini 2.0 Exp 1206 in brown, Copilot Academic version in blue). (b) Cloud recognition from WMO Cloud Atlas [45]. Prompt: Can you recognize clouds in that photo? Please classify them according to the newest WMO classification system with details about genera, species, varieties, special clouds, etc. (February 2025).
Table 1. (a) Cloud recognition from WMO Cloud Atlas [45]. Prompt: Can you recognize clouds in that photo? Please classify them according to the newest WMO classification system with details about genera, species, varieties, special clouds, etc. (May 2024 and February 2025 for ChatGPT-o1 in green, o3 in orange, Gemini 2.0 Exp 1206 in brown, Copilot Academic version in blue). (b) Cloud recognition from WMO Cloud Atlas [45]. Prompt: Can you recognize clouds in that photo? Please classify them according to the newest WMO classification system with details about genera, species, varieties, special clouds, etc. (February 2025).
(a)
No.ImageChatGPT 4.o/o1/o3-MiniGemini Advanced/2.0 Exp-1206Copilot/Copilot Acad (February 2025)Correct Answer (WMO)
1Atmosphere 16 00490 i001Cirrocumulus stratiformis undulatus
Cumulus humilis
/Cumulus humilis, Altocumulus stratiformis translucidus perlucidus/Cumulus humilis and Altocumulus stratiformis translucidus perlucidus
Cirrocumulus stratiformis undulatus
Altocumulus stratiformis translucidus perlucidus
Cumulus humilis, mediocris
/Cirrocumulus stratiformis undulatus, cumulus humilis, cumulus mediocris
Cirrocumulus stratiformis
/Cumulus humilis, Cirrocumulus stratiformis undulatus
Cumulus humilis and altocumulus stratiformis translucidus perlucidus
Assessment: (1–5)3/5/54/41/3
2Atmosphere 16 00490 i002Cirrus fibratus radiatus, cirrus uncinus,
contrail cirrus/
Cirrus homogenitus, cirrus homomutatus
/Cirrus homogenitus, cirrus homomutatus
Cirrus homogenitus spissatus
Contrail cirrus
/Cirrus homogenitus, cirrus homomutatus
Cirrocumulus stratiformis
/Cirrus homogenitus (contrails)
Cirrus homogenitus (contrails) and cirrus homomutatus
Assessment (1–5)4/5/54/50/4
3Atmosphere 16 00490 i003Altostratus undulatus/altostratus undulatus opacus asperitas/
Altostratus stratiformis or possibly stratocumulus stratiformis with undulatus and asperitas
Altocumulus stratiformis undulatus asperitas/
Stratocumulus stratiformis undulatus
Stratocumulus undulatus
/Altostratus undulatus * radiatus
* incorrectly labeled as speciesit is a variety
Stratocumulus stratiformis opacus asperitas
Assessment0/2/42/41/0
4Atmosphere 16 00490 i004Altocumulus stratiformis undulatus/
Altocumulus (stratiformis) perlucidus opacus/
Altocumulus stratiformis perlucidus undulatus
Altocumulus stratiformis translucidus undulatus
/Altocumulus stratiformis translucidus undulatus
Stratocumulus stratiformis
/Altocumulus stratiformis undulatus
Altocumulus stratiformis translucidus perlucidus undulatus
Assessment3/4/4.54/41/3
5Atmosphere 16 00490 i005Nimbostratus/
Nimbostratus (pannus)
/Nimbostratus with pannus
Stratus nebulosus opacus
/Nimbostratus praecipitatio with pannus
Stratocumulus stratiformis
/Stratus opacus *
* incorrectly labeled as species—it is a variety
Nimbostratus praecipitatio cumulonimbogenitus with stratus fractus of wet weather
Assessment:1/2/21/30/1
6Atmosphere 16 00490 i006Cumulonimbus calvus,
cumulus congestus
/Cumulonimbus capillatus incus praecipitatio
/Cumulonimbus capillatus (incus) with praecipitatio
Cumulonimbus calvus praecipitatio with possible virga and pileus
Cumulus mediocris or cumulus congestus.
/Cumulonimbus capillatus with praecipitatio and incus
Cumulus mediocris, cumulus congestus with pileus
Cumulonimbus
/Cumulonimbus calvus
Cumulonimbus capillatus incus praecipitatio
Assessment:2/5/53/41/1
7Atmosphere 16 00490 i007Stratus nebulosus
/Stratus (nebulosus) opacus
/Stratus nebulosus with opacus
Stratus nebulosus undulatus
/Stratus nebulosus opacus
Stratus
/Stratus
Stratus undulatus
Assessment:3/2/24/22/2
8Atmosphere 16 00490 i008Stratus nebulosus
/Altostratus translucidus
/Altostratus translucidus
Altostratus translucidus
/Altostratus translucidus
Stratus
/Stratus opacus
Altostratus translucidus
Assessment:0/5/55/50/2
9Atmosphere 16 00490 i009Cirrostratus fibratus,
22° halo
/Cirrostratus (nebulosus) translucidus, with a 22° halo
/Cirrostratus nebulosus, accompanied by a 22° halo
Cirrus fibratus radiatus
halo around the sun is formed by cirrostratus clouds that are not clearly visible in this photo
/Cirrostratus nebulosus with 22° halo
Cirrus fibrates, cirrus floccus, cirrus spissatus
Cirrus
22° halo.
/Halo on ice crystals in the atmosphere with cumulus humilis
Cirrostratus nebulosus and 22° halo
Assessment:4/5/53/42/1
MEAN3.0/3.7/4.03.0/3.41.7/3.1
SD1.3/1.3/1.61.6/1.21.9/1.5
MSE5.6/3.3/3.26.2/3.814.2/5.7
RMSE2.4/1.8/1.82.5/1.93.8/2.4
CI±1/0/1.0/1.2±1.2/1.0±1.4/1.2
(b)
No.Claude 3.5 SonnetPerplexityConsensusSciSpaceCorrect Answer (WMO)
1Altocumulus perlucidus * translucidus,
Cumulus humilis radiatus
* incorrectly labeled as species—it is a variety
Altocumulus stratiformis perlucidus with possible undulatus,
cumulus humilis
Cirrocumulus stratiformis undulatus,
Cumulus mediocris/humilis *
* incorrectly labeled as species—it is a variety
Cirrocumulus stratiformis undulatus,
Cumulus mediocris/humilis *
* incorrectly labeled as species—it is a variety
Cumulus humilis, altocumulus stratiformis translucidus perlucidus
Assessment (1–5)4433
2Cirrus homogenitusHomogenitus or contrails, homomutatus (no genre for human-made clouds)Cirrus homogenitus,
cirrostratus nebulosus
Cirrus homogenitus (contrails), fibratusCirrus homogenitus (contrails) and cirrus homomutatus
Assessment (1–5)4322
3Altostratus translucidus * undulatus fluctus
* incorrectly labeled as species—it is a variety
Altostratus translucidus *, undulatus, radiatus
* incorrectly labeled as species—it is a variety
Altostratus or altocumulus undulatus asperitasAltostratus or stratocumulus undulatus *, asperitas **
* incorrectly labeled as species—it is a variety
** incorrectly labeled as variety—it is a supplementary feature
Stratocumulus stratiformis opacus asperitas
Assessment (1–5)0012
4Altostratus translucidus * undulatus
* incorrectly labeled as species—it is a variety
Altocumulus stratiformis undulatusStratocumulus stratiformis undulatus perlucidusStratocumulus or altocumulus stratiformis undulatusAltocumulus stratiformis translucidus perlucidus undulatus
Assessment (1–5)3333
5Stratus nebulosus opacusNimbostratus
undulatus praecipitatio
Nimbostratus and stratus fractusNimbostratus pannusNimbostratus praecipitatio cumulonimbogenitus with stratus fractus of wet weather
Assessment (1–5)1231
6Cumulonimbus capillatus (incus) with praecipitatio and pannusCumulonimbus calvus,
or cumulonimbus capillatus praecipitatio with possible arcus
Cumulonimbus capillatus, possibly calvus, with pannus and virga or praecipitatioCumulonimbus calvus incus praecipitatio pannusCumulonimbus capillatus incus praecipitatio
Assessment (1–5)5443
7Stratus nebulosus opacusStratus nebulosusStratus nebulosusStratus or stratocumulus stratiformis opacusStratus undulatus
Assessment (1–5)2222
8Altostratus translucidusStratus nebulosus opacusStratus nebulosus or altostratus nebulosusAltostratus opacus translucidusAltostratus translucidus
Assessment (1–5)5024
9Cirrostratus fibratus or nebulosus translucidus with 22° haloCirrostratus nebulosus undulatus with haloCirrostratus nebulosus with 22° haloCirrostratus fibratus translucidus with 22° haloCirrostratus nebulosus and 22° halo
Assessment (1–5)4454
MEAN3.12.42.82.7
SD1.81.61.21.0
MSE6.38.86.26.3
RMSE2.53.02.52.5
CI±1.4±1.2±0.9±0.8
* incorrectly labeled as species—it is a variety. ** incorrectly labeled as variety—it is a supplementary feature.
Table 2. (a) Clouds photographed by smartphone from the private collection. Prompt: Can you recognize clouds in that photo? Please classify them according to the newest WMO classification system with details about genera, species, varieties, special clouds, etc. (May 2024, update: February 2025 2025 for ChatGPT-o1 in green, o3 in orange, Gemini 2.0 Exp 1206 in brown, Copilot Academic version in blue). (b) Clouds photographed by smartphone from the private collection. Prompt: Can you recognize clouds in that photo? Please classify them according to the newest WMO classification system with details about genera, species, varieties, special clouds, etc. (February 2025).
Table 2. (a) Clouds photographed by smartphone from the private collection. Prompt: Can you recognize clouds in that photo? Please classify them according to the newest WMO classification system with details about genera, species, varieties, special clouds, etc. (May 2024, update: February 2025 2025 for ChatGPT-o1 in green, o3 in orange, Gemini 2.0 Exp 1206 in brown, Copilot Academic version in blue). (b) Clouds photographed by smartphone from the private collection. Prompt: Can you recognize clouds in that photo? Please classify them according to the newest WMO classification system with details about genera, species, varieties, special clouds, etc. (February 2025).
(a)
LPImageChatGPT 4.o/o1/o3-MiniGemini Advanced/2.0 Exp-1206Copilot/Copilot Acad (February 2025)Correct Answer
1Atmosphere 16 00490 i010Cirrus fibratus radiatus/
Cirrus fibratus, uncinus
/Cirrus fibratus, uncinus with a parhelion
Cirrus fibratus, intortus/
Cirrus fibratus, intortus, radiatus
Cirrus spissatus fibratus duplicatus mamma
Cirrostratus fibratus duplicatus nebulosus
Cirrus fibratus
/Cirrus fibratus, uncinus, intortus
Cirrus spissatus, fibratus
Assessment:4/4/44/33/4
2Atmosphere 16 00490 i011Stratus nebulosus/
Stratus nebulosus opacus/
Stratus nebulosus opacus
Stratus nebulosus opacus/
Stratus nebulosus fractus
Stratocumulus/
Stratus
Stratocumulus nebulosus opacus
Assessment1/2/22/12/0
3Atmosphere 16 00490 i012Stratus nebulosus/
Stratus nebulosus opacus/
Stratus nebulosus opacus
Stratus nebulosus opacus/
Stratus nebulosus
Stratocumulus/
Stratus
Status nebulosus opacus
Assessment4/5/55/40/2
4Atmosphere 16 00490 i013Cirrus fibratus radiatus contrail
/Cirrus fibratus, intortus, uncinus
Cirrocumulus floccus or stratiformis,
cirrus homogenitus
/Cirrocumulus stratiformis floccus, cirrus homogenitus
Cirrus fibratus intortus
/Cirrus fibratus intortus
Cirrocumulus stratiformis undulatus
cirrus homogenitus
Stratiformis undulatus/
Cirrus fibratus, with contrails (cirrus/cirrocumulus homogenitus), cirrocumulus stratiformis
Cirrocumulus
Cirrus fibratus intortus
homogenitus
Assessment4/4/33/40/4
5Atmosphere 16 00490 i014Altostratus opacus
undulatus/
Altostratus opacus undulatus
/Altostratus opacus undulatus
Altocumulus stratiformis undulatus/
Stratocumulus stratiformis translucidus undulatus,
altocumulus stratiformis lacunosus
Stratocumulus stratiformis undulatus
/Stratocumulus stratiformis undulatus
Stratocumulus stratiformis opacus undulatus
Assessment2/2/52/34/4
6Atmosphere 16 00490 i015Cirrostratus fibratus nebulosus with halo/
Cirrostratus nebulosus translucidus, with 22° halo/
Cirrostratus nebulosus with a 22° halo
Cirrostratus nebulosus with halo
/Cirrostratus fibratus nebulosus,
cirrostratus cirrostratomutatus, with halo
Cirrostratus fibratus nebulosus with halo
/Cirrostratus nebulosus with halo
Cirrostratus nebulosus with halo
Assessment:4/5/55/44/5
7Atmosphere 16 00490 i016Cirrus homogenitus rectus
/Cirrocumulus
homogenitus
/Cirrus homogenitus
Contrail (not classified under WMO system)
/Cirrus homogenitus.
Altocumulus stratiformis perlucidus
/Cirrus fibratus contrail
Cirrus homogenitus
Assessment4/4/51/50/4
8Atmosphere 16 00490 i017Cumulus mediocris/
Cumulus congestus/
Cumulus mediocris
Cumulus congestus/
Cumulus congestus,
cumulus fractus, cumulus mediocris
Cumulus mediocris or congestus humilis/
Cumulus congestus
Cumulus humilis
Assessment4/4/54/44/4
9Atmosphere 16 00490 i018Altocumulus stratiformis perlucidus
Cirrus or cirrostratus
/Stratocumulus stratiformis opacus with mamma/
Stratocumulus castellanus opacus
Altocumulus stratiformis undulatus perlucidus
cirrus fibratus/
Stratocumulus stratiformis, with translucidus, opacus, and undulatus
Altocumulus stratiformis, lacunosus
Cumulus humilis radiatus pileus pannus
/Stratocumulus stratiformis, altocumulus stratiformis perlucidus,
cirrus fibratus
Stratocumulus stratiformis, floccus
Assessment1/2/11/20/2
MEAN3.1/3.2/3.93.0/3.32.0/3.3
SD1.4/1.5/1.51.6/1.22.1/1.7
MSE5.2/5.1/3.36.2/4.112.8/15.2
RMSE2.3/2.3/1.82.5/2.03.6/2.3
CI±1.0/1.1/1.2±1.2/0.9±1.6/1.3
(b)
No.Claude 3.5 SonnetPerplexityConsensusSciSpaceCorrect Answer
1Cirrus fibratus intortusCirrus fibratus radiatusCirrus fibratus,
cirrus uncinus,
cirrostratus nebulosus with parhelion
Cirrus and cirrostratus,
fibratus,
radiatus,
parhelion,
Cirrus spissatus, fibratus
Assessment (1–5)4444
2StratusStratus nebulosus undulatusNimbostratus and stratus nebulosusNimbostratus opacus pannusStratocumulus nebulosus opacus
Assessment (1–5)0111
3Stratus nebulosus opacusStratus nebulosus undulatusStratus nebulosusStratus nebulosus opacusStatus nebulosus opacus
Assessment (1–5)5445
4Cirrus fibratus with contrails
cirrocumulus
Cirrocumulus
stratiformis undulatus
Cirrus fibratus and uncinus, contrail
Cirrocumulus
Cirrus fibratus
radiatus
homogenitus
Cirrocumulus
Cirrus fibratus intortus
homogenitus
Assessment (1–5)3133
5Cirrus fibratus
Altocumulus translucidus
Altostratus opacus
undulatus
Altostratus and altocumulus undulatusAltostratus and altocumulus StratiformisUndulatusStratocumulus stratiformis opacus undulatus
Assessment (1–5)0112
6Cirrostratus fibratus with haloCirrostratus nebulosus undulatus with haloCirrostratus nebulosus with haloCirrostratus fibratus translucidus with 22° haloCirrostratus nebulosus with halo
Assessment (1–5)4353
7ContrailContrail,
homogenitus
Cirrus homogenitusCirrus homogenitusCirrus homogenitus
Assessment (1–5)1255
8Cumulus congestusCumulus mediocris and altocumulus stratiformis opacusCumulus congestusCumulus congestusCumulus humilis
Assessment (1–5)4444
9Altocumulus stratiformis translucidus perlucidusAltocumulus lenticularis *, stratiformis, undulatus, altostratus translucidus *
* as species, should be variety
Cumulonimbus capillatus mammatusAltocumulus mammatusStratocumulus stratiformis, floccus
Assessment (1–5)1101
MEAN2.42.33.03.1
SD1.91.41.91.5
MSE9.98.97.15.7
RMSE3.13.02.72.4
CI±1.5±1.1±1.4±1.2
* incorrectly labeled as species—it is a variety.
Table 3. Error rate matrix for cloud type classification across all AI tools. Values represent the proportion of incorrect classifications (false positives and false negatives) relative to the total decisions made for each cloud type. Lower values indicate better performance. The intensity of the red background indicate the higher error. The final column shows the mean error rate for each model across all cloud types.
Table 3. Error rate matrix for cloud type classification across all AI tools. Values represent the proportion of incorrect classifications (false positives and false negatives) relative to the total decisions made for each cloud type. Lower values indicate better performance. The intensity of the red background indicate the higher error. The final column shows the mean error rate for each model across all cloud types.
AI ToolCcCsCiAcAsNsScStCuCbMEAN
ChatGPT 4.o1.00.30.20.71.00.01.00.80.30.00.5
ChatGPT o-10.50.00.30.00.70.00.80.80.00.00.3
ChatGPT o3-mini0.00.00.00.00.70.00.50.80.00.00.2
Claude 3.5 Sonnet0.00.00.40.80.71.01.00.50.00.00.4
Consensus0.50.50.01.00.70.51.00.60.00.50.5
Copilot1.00.50.81.01.01.00.60.70.70.00.7
Copilot Acad.0.50.50.20.71.01.00.50.60.30.00.5
Gemini 2.00.50.30.20.80.00.00.30.80.30.00.3
Gemini Adv.1.00.00.50.60.01.01.00.50.30.00.5
Perplexity0.00.00.80.51.00.01.00.80.00.00.4
SciSpace1.00.30.00.80.70.50.80.70.01.00.6
MEAN0.50.20.30.60.70.50.80.70.20.10.5
Table 4. The results of testing AI tools for map interpretation. Prompt: Please write a paragraph comparing these two maps.
Table 4. The results of testing AI tools for map interpretation. Prompt: Please write a paragraph comparing these two maps.
AI ToolResult (Accuracy, Correctness, Precision, etc.)Scores
ChatGPT4.oCorrect interpretation with precise numerical values and regional description. Accurate conclusions.4
ChatGPT4.0Correct interpretation with general numerical values and accurate conclusions.4
Bard (March 2024)Correct interpretation with precise numerical values and percentage of temperature ranges. An abundance of additional information about climate change.5
Gemini Advanced (August 2024)Very general description with accurate conclusions.3
DataAnalystCorrect interpretation with correct numerical values and accurate conclusions.4
CopilotVery general description with accurate conclusions.3
ChayGPT o3-mini-high (February 2025)Very general description with accurate conclusions.3
Claude 3.5 Sonnet (February 2025)Detailed destription with focus on regional changes and accurate conclusions.5
Perplexity AIVery general description with accurate conclusions.3
Consensus (February 2025)General desrpition with some focus on regional changes and accurate conclusions.4
SciSpace (February 2025)Detailed destription with focus on regional changes and accurate conclusions.5
Table 5. The results of testing AI regarding filling the gaps in literature review. Prompt: Hello, I am climate scientist and I am starting my research on humid heat waves. Can you find me 5 most important papers about humid heat waves and briefly summarize them ? Add bibliography.
Table 5. The results of testing AI regarding filling the gaps in literature review. Prompt: Hello, I am climate scientist and I am starting my research on humid heat waves. Can you find me 5 most important papers about humid heat waves and briefly summarize them ? Add bibliography.
AI ToolResult (Accuracy, Correctness, Precision, etc.)Scores
ChatGPT o1-previewFive scientific papers, all of them relevant and important, with summary and bibliography present.5
ChatGPT4.oFive papers, bibliography present, missing most important examples.4
ChatGPT4.0Four papers, no bibliography, missing most important examples.3
Academic Assistant ProTwo papers, most important examples, good summaries of most important findings.3
Scholar GPTFour papers, went off-topic, missing most important examples, 2
no bibliography.
ConsensusMost accurate and precise—five papers, indeed most important ones.5
Gemini AdvancedThree papers, one most important example, two news articles; 1
bibliography present.
Copilot (www.bing.com)Three papers, one most important example, one for heat waves (not humid), one without access (despite the fact that the paper was available for free).2
Perplexity AIThree papers, two most important examples, bibliography present.2
DeepSeekFive papers, one most important example, bibliography entry after each description, most from www.nature.com.3
SciSpaceFive papers from year 2024, two of them important (not most important).2
ChatGPT o3-mini-highFive papers, four of them important, one not about heat waves, but on humidity.4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Krzyżewska, A. The Applications of AI Tools in the Fields of Weather and Climate—Selected Examples. Atmosphere 2025, 16, 490. https://doi.org/10.3390/atmos16050490

AMA Style

Krzyżewska A. The Applications of AI Tools in the Fields of Weather and Climate—Selected Examples. Atmosphere. 2025; 16(5):490. https://doi.org/10.3390/atmos16050490

Chicago/Turabian Style

Krzyżewska, Agnieszka. 2025. "The Applications of AI Tools in the Fields of Weather and Climate—Selected Examples" Atmosphere 16, no. 5: 490. https://doi.org/10.3390/atmos16050490

APA Style

Krzyżewska, A. (2025). The Applications of AI Tools in the Fields of Weather and Climate—Selected Examples. Atmosphere, 16(5), 490. https://doi.org/10.3390/atmos16050490

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop