Next Article in Journal
Enhancing Detection of Pedestrians in Low-Light Conditions by Accentuating Gaussian–Sobel Edge Features from Depth Maps
Previous Article in Journal
Performance Analysis of Pile Group Installation in Saturated Clay
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

Automated MRI Video Analysis for Pediatric Neuro-Oncology: An Experimental Approach

by
Artur Fabijan
1,*,†,
Agnieszka Zawadzka-Fabijan
2,†,
Robert Fabijan
3,
Krzysztof Zakrzewski
1,
Emilia Nowosławska
1,
Róża Kosińska
1 and
Bartosz Polis
1
1
Department of Neurosurgery, Polish-Mother’s Memorial Hospital Research Institute, 93-338 Lodz, Poland
2
Department of Rehabilitation Medicine, Faculty of Health Sciences, Medical University of Lodz, 90-419 Lodz, Poland
3
Independent Researcher, Luton LU2 0GS, UK
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2024, 14(18), 8323; https://doi.org/10.3390/app14188323
Submission received: 5 August 2024 / Revised: 13 September 2024 / Accepted: 14 September 2024 / Published: 15 September 2024

Abstract

:

Featured Application

This study explores the potential of two popular open-source AI models, ChatGPT 4o (omni) and Gemini Pro, to analyze MRI video sequences depicting a pediatric brain tumor. We aimed to evaluate whether these AI models can accurately identify and analyze the content of MRI videos showing a medulloblastoma in sagittal and coronal planes. Our findings revealed that while Gemini Pro correctly identified the video as an MRI, it did not attempt a detailed analysis, deferring to medical specialists. Conversely, ChatGPT 4o performed some image analysis but failed to recognize the video content as MRI. Both models struggled with tumor identification, suggesting that further improvements and specialized training are needed for these AI models to effectively support medical diagnostics.

Abstract

Over the past year, there has been a significant rise in interest in the application of open-source artificial intelligence models (OSAIM) in the field of medicine. An increasing number of studies focus on evaluating the capabilities of these models in image analysis, including magnetic resonance imaging (MRI). This study aimed to investigate whether two of the most popular open-source AI models, namely ChatGPT 4o and Gemini Pro, can analyze MRI video sequences with single-phase contrast in sagittal and frontal projections, depicting a posterior fossa tumor corresponding to a medulloblastoma in a child. The study utilized video files from single-phase contrast-enhanced head MRI in two planes (frontal and sagittal) of a child diagnosed with a posterior fossa tumor, type medulloblastoma, confirmed by histopathological examination. Each model was separately provided with the video file, first in the sagittal plane, analyzing three different sets of commands from the most general to the most specific. The same procedure was applied to the video file in the frontal plane. The Gemini Pro model did not conduct a detailed analysis of the pathological change but correctly identified the content of the video file, indicating it was a brain MRI, and suggested that a specialist in the field should perform the evaluation. Conversely, ChatGPT 4o conducted image analysis but failed to recognize that the content was MRI. The attempts to detect the lesion were random and varied depending on the plane. These models could not accurately identify the video content or indicate the area of the neoplastic change, even after applying detailed queries. The results suggest that despite their widespread use in various fields, these models require further improvements and specialized training to effectively support medical diagnostics.

1. Introduction

Over the past year, there has been growing interest in using open-source artificial intelligence models (OSAIM) in medicine [1]. Increasingly, research is focusing on evaluating these models’ capabilities in image analysis, including magnetic resonance imaging (MRI) [2].
Following the release of the fourth-generation GPT model by OpenAI, there has been a significant increase in interest within the scientific sector. Research mainly focuses on the potential of ChatGPT 4 in solving medical problems. This model demonstrates the ability to analyze medical tests [3] and support education for students [4] and patients [5], and its applications as a diagnostic and therapeutic tool are widely studied [6,7].
Recently, AI models have begun to be developed for analyzing video content, not just images. An example is ChatGPT 4 and the competing Google model, Gemini Pro. Conversational agents using large language models (LLM) offer a new way of interacting with visual data, extending their capabilities to video analysis. One such model is Video-ChatGPT, which combines a video-adapted visual encoder with an LLM [8].
Integrating artificial intelligence (AI) into brain MRI analysis has significantly enhanced diagnostic accuracy and improved patient outcomes. Deep learning algorithms, trained on large MRI datasets, are now capable of precisely identifying and localizing brain tumors [9,10]. AI tools, such as SubtleMR™, further enhance MRI image resolution, enabling the detection of changes as small as 4 mm [11]. Additionally, self-supervised text and vision models facilitate the detection of abnormalities in previously unreported MRI scans, automating tasks such as selection, diagnosis, and assessment of treatment responses. This automation supports radiologists by reducing errors and improving reporting efficiency [12]. Importantly, not all MRI sequences require imaging in the frontal, sagittal, and transverse planes. For example, Chen et al. demonstrated that automating the determination of MRI scanning planes, particularly the mid-sagittal plane (MSP), can expedite diagnosis. Their proposed algorithm determines the MSP by analyzing the lines separating the brain hemispheres in frontal and transverse images. Tested on 100 images, the algorithm’s results aligned closely with those obtained manually by MRI operators [13].
In addition to specialized AI models for brain MRI analysis, commercial models like ChatGPT are also being studied. Kozel et al. [14] evaluated ChatGPT 3.5 and 4 for their ability to accurately diagnose, propose treatment options, and create treatment plans for brain tumors in neuro-oncology cases. Their findings showed that ChatGPT 4 outperformed ChatGPT 3.5 in both diagnosis and treatment planning, demonstrating the potential of commercial AI models in medical applications.
Brain tumors in children represent a significant global health concern, with an estimated incidence of 3 per 100,000 children annually [15]. These tumors are the most common solid tumors in children, with certain types, such as medulloblastoma, germ cell tumors, and ependymoma, occurring slightly more frequently in boys. The majority of pediatric brain tumors are infratentorial, affecting the cerebellum, brainstem, and fourth ventricle across all pediatric age groups [16]. MRI is a highly effective imaging method for detecting and diagnosing brain tumors in children, characterized by high sensitivity and specificity. Conventional brain MRI includes T1-weighted sequences before and after contrast administration, T2-weighted, FLAIR (fluid-attenuated inversion recovery), and diffusion sequences. High-resolution 3D imaging has become standard in T1-weighted imaging, which can be reformatted into various planes. Advanced techniques such as compression compensation and parallel imaging allow for obtaining high-resolution images with thin slices within a reasonable scan time [17].
The study aimed to investigate whether two of the most popular open-source artificial intelligence models (OSAIMs), ChatGPT 4o (omni) and Gemini Pro, could analyze a video file containing a single-phase contrast-enhanced MRI image in sagittal and frontal projections, depicting a posterior fossa tumor corresponding to medulloblastoma in a child. While most existing research focuses on AI models for static image analysis in MRI, particularly in tumor detection [2,9,14], this study explores the application of these models in dynamic video-based MRI analysis, addressing an area that remains underexplored.
Unlike previous studies that focus on AI’s role in static image interpretation or diagnostic tasks in neuro-oncology [14], this research emphasizes the challenges and opportunities in applying AI models to sequential video data. This shift to video-based analysis introduces additional complexities such as motion tracking and identifying temporal patterns, areas where AI applications are still evolving.
Furthermore, by evaluating the performance of ChatGPT 4o and Gemini Pro across different MRI planes (sagittal and frontal), this study provides a comparative framework that highlights the need for more specialized AI training for distinct imaging modalities. While previous work has shown promising results in static radiology, this research pushes the boundaries by assessing AI’s ability to handle dynamic MRI sequences, contributing new insights into AI’s role in complex diagnostic tasks in pediatric neuro-oncology.
To this end, we separately implemented the video in sagittal and frontal sequences using three different sets of questions (prompts)—from the most general to the most specific. The hypotheses we posed are
Hypothesis 1.
Both models could recognize the content of the video.
Hypothesis 2.
Both models correctly identified the area of the neoplastic change.

2. Materials and Methods

The study was conducted under the scientific activities of the Polish Mother’s Memorial Hospital Research Institute. The study utilized video files generated from a single-phase contrast-enhanced head MRI in two planes (frontal and sagittal) of a child diagnosed with a posterior fossa tumor, specifically a medulloblastoma, which was confirmed by histopathological examination. The video images were obtained from a 9-year-old child and dated 24 April 2024. The bioethics committee determined that the study did not require ethical approval as it did not affect the therapeutic and diagnostic procedures for the patient. All recordings used in the study were anonymized, and the guardians provided written consent for the use of the MRI recordings for research purposes.
Inclusion Criteria: MRI images of a child’s head, enhanced with single-phase contrast, showing a brain tumor located in the posterior cranial fossa. The images must consist of at least 200 slices to produce a video file approximately 9–10 s long. Only images free from motion, technical issues, or other artifacts that could hinder interpretation are included. The images must be of sufficient quality to clearly visualize the tumor and surrounding brain structures. Additionally, the images should not show any other central nervous system (CNS) pathologies.
Exclusion Criteria: MRI images other than those of the brain that show tumors located outside the posterior cranial fossa. MRI images that show additional CNS pathologies, such as vascular malformations, strokes, demyelinating changes, etc.
MRI images of low quality make interpretation difficult. Images with too few slices generate a video that is too short. MRI images with motion, technical, or other artifacts could impede interpretation.

2.1. Human Evaluation

The MRI structures were independently identified and marked by two neurosurgery specialists (B.P., E.N.). The image meeting technical requirements was selected. The recordings were generated using RadiAnt software (version 2023.1).

2.2. Preparation of Radiological Material for Analysis

The study utilized radiological material in the form of MRI recordings with dimensions of 384 width and 384 height. The image quality was set at 90, with a frame rate of 20 frames per second, over 9 s. The data rate and total bit rate were 5787 kbps. The file extension was AVI.

2.3. AI System Evaluation Methodology

Two AI models, ChatGPT 4o and Gemini Pro, were used for the study. The choice of these models was dictated by the fact that they are the most widely studied models in medical research and have the most powerful computational capabilities available among OSAIM models. Each model was separately provided with the video file, first in the sagittal plane, and three different sets of commands from the most general to the most specific were tested. The same procedure was applied to the video file in the frontal plane.

2.3.1. Aim and Method of Question Gradation

  • Preparation of AI Models: The strategy of question gradation allowed for the assessment of the AI models’ ability to recognize and analyze medical recordings at different levels of detail. The questions began with very general ones to check if the model could understand and identify the content of the video material. The questions then became more specific to examine the models’ analytical capabilities in more detail;
  • Gradation of Difficulty: Starting with general questions, researchers could first establish if the model had a basic ability to interpret the image. Subsequently, more detailed questions allowed for a more thorough examination of how well the model could identify specific pathological features. This approach minimizes the risk of overlooking significant errors in the models’ performance at an early stage.

2.3.2. Set of Questions 1: General Questions

  • Could you analyze this video?
  • Do I need a more detailed scientific analysis of what you observe in the attached video?
  • Are you able to recognize what this video contains?
Rationale: These questions are very general and aim to determine whether the model can interpret the video in a basic way. Such questions test the models’ ability to process and generate initial analyses without specific instructions. These are preliminary questions intended to attain a general impression of the AI models’ capabilities.

2.3.3. Set of Questions 2: More Specific Questions

  • Are you able to recognize what this video contains?
  • What does the attached video show?
Rationale: These questions, though still quite general, begin to focus the models’ attention on more specific elements of the video content. They test the models’ ability to recognize more concrete information, such as identifying structures and potential pathologies, without yet providing detailed information on what is expected of the models.

2.3.4. Set of Questions 3: Detailed Questions

  • I am uploading a video file of a brain MRI with contrast showing a tumor. Are you able to analyze this video and identify the pathological change?
Rationale: This question is the most detailed and targeted. It informs the AI model of the specific context of the image (brain MRI with contrast showing a tumor) and requires pathological analysis. Such a question tests the AI models’ ability to not only recognize general image features but also identify specific pathological changes.
Gradating the questions from general to specific is crucial in assessing AI models’ abilities to interpret medical images at different levels of detail (Figure 1). The initial sets of questions allow for a general assessment of the models’ capacity to process and interpret images, while the subsequent sets of questions enable a more thorough analysis and evaluation of the model’s ability to identify specific pathological features. This approach is systematic and allows for a comprehensive evaluation of the AI models’ performance in a medical context.

3. Results

In Figure 2, a scan of a single-phase contrast-enhanced MRI of the head in frontal and sagittal planes reveals a medulloblastoma tumor located in the posterior cranial fossa.

3.1. Results of the AI Model Analysis

3.1.1. GEMINI PRO

Despite using three sets of questions, Gemini Pro did not attempt to analyze the MRI, regardless of the plane or implementation system (Figure 3). However, despite not evaluating the video material, it recognized that the content was an MRI study of the brain.

3.1.2. ChatGPT 4o

Table 1 presents a detailed summary of the differences and similarities in AI model responses to the first set of questions introduced to ChatGPT 4o. This comparison highlights the general characteristics of the videos, the analysis methods applied, the specificity of the analysis, and the identified elements within each plane. Additionally, it outlines the advanced techniques used, the initial results of the analyses, and the proposed next steps for further processing. This table serves as a key reference for understanding how ChatGPT 4o differentiates its approach based on the frontal and sagittal planes in medical video processing.

3.2. Summary and Tabulation of Differences and Similarities in AI Models’ Responses to the First Set of Questions

Figure 4 and Figure 5 illustrate the analyses performed by ChatGPT 4o on the frontal and sagittal planes, respectively, in response to the first set of questions. Figure 4 shows the results of motion detection and contour analysis for the frontal plane, highlighting regions of significant change between frames. Figure 5 presents the segmentation and annotation detection for the sagittal plane, with green contours marking areas of interest and red lines indicating detected annotations, such as line segments.

3.2.1. Similarities

  • Initial Analysis Methodology
    • Both models began the analysis by extracting key frames from the video;
    • In both cases, it was suggested to use image analysis methods such as contrast, edge detection, and motion analysis;
  • Advanced Techniques
    • Both models suggested using object detection algorithms, although they encountered technical limitations in accessing the necessary libraries and model files;
  • Analysis Steps
    • Both models conducted motion analysis and edge detection to identify regions of interest.

3.2.2. Differences

  • Detail of Initial Characterization
    • Frontal Plane: Detailed video properties were provided (FPS, number of frames, duration, and dimensions);
    • Sagittal Plane: Detailed video properties were not provided;
  • Specificity of Analysis
    • Frontal Plane: A wide range of analysis methods was suggested (frame-by-frame analysis, object detection, motion analysis, and image processing techniques);
    • Sagittal Plane: The focus was on image content analysis, brightness and contrast, object detection, and temporal analysis;
  • Results of Initial Analysis
    • Frontal Plane: The results indicated regions of motion and contour analysis;
    • Sagittal Plane: The results included brightness and contrast analysis and edge detection;
  • Recognition of Video Content:
    • Frontal Plane: The model could not clearly recognize the medical context, suggesting motion and contour analysis;
    • Sagittal Plane: The model suggested a medical imaging context and tumor detection based on context and image analysis.

3.3. Summary and Tabulation of Differences and Similarities in AI Models’ Responses to the Second Set of Questions

The results from the second set of questions highlight the differences and similarities in how ChatGPT 4o processes video data in the frontal and sagittal planes. Table 2 provides a comparative overview of video content recognition, frame extraction, and frame-by-frame analysis between these two planes, showcasing the distinct approaches and their impact on the interpretation of video content. This comparison emphasizes the variance in detail and the next steps proposed for further analysis in each plane.

3.3.1. Similarities

  • Analysis Methodology
    • Both models suggest frame extraction as the fundamental step for analyzing video content;
  • Basic Video Data
    • Both models provide detailed information on the number of frames, frames per second (FPS), and video duration;
  • Readiness for Further Analysis
    • Both models are prepared for further analysis based on additional details or specific questions about the video content.

3.3.2. Differences

  • Direct Video Analysis
    • In the frontal plane, the model indicates an inability to directly play or analyze the video, suggesting the need for additional information or specific queries. In the sagittal plane, the model undertakes direct analysis by extracting frames and providing key information about the video;
  • Frame Extraction
    • In the frontal plane, only the first frame is extracted without details about its content. In the sagittal plane, 10 frames are extracted at equal intervals, with a description of each frame’s content;
  • Frame Content Description
    • The frontal plane lacks a detailed description of the first frame, while the sagittal plane provides detailed descriptions of each of the 10 extracted frames, including scene context, perspective, motion, and interactions.

3.4. Summary and Tabulation of Differences and Similarities in AI Models’ Responses to the Third Set of Questions

The third set of questions focuses on ChatGPT 4o’s ability to identify potential tumor regions in video data from the frontal and sagittal planes. Table 3 presents a comparison of video content recognition, frame extraction, analysis methods, and identified elements, with an emphasis on tumor detection. The table highlights differences in frame extraction and the specific contours used to mark potential tumor areas, providing insights into the distinct approaches applied in each plane for detecting pathological changes.
Figure 6 and Figure 7 illustrate ChatGPT 4o’s analysis of tumor detection in the frontal and sagittal planes, respectively. Figure 6 shows the detection of two potential tumor regions in the frontal plane, marked with green contours. Figure 7 highlights areas of interest in the sagittal plane across three selected frames, with red rectangles indicating regions of potential abnormal enhancement, suggesting the presence of a tumor.

3.4.1. Similarities

  • Analysis Methodology
    • Both models began by extracting frames from the video and suggested further frame analysis to identify pathological changes, with an emphasis on identifying tumor regions;
  • Identified Elements
    • Both models identify and mark potential tumor areas in the video frames;
  • Next Steps:
    • Both models suggest the possibility of further frame analysis or focusing on more detailed aspects of the analysis, including the need for additional clinical or radiological information.

3.4.2. Differences

  • Extent of Frame Extraction
    • In the frontal plane, only the first 10 frames are extracted, potentially limiting the full video sequence analysis. In the sagittal plane, all 180 frames are extracted, allowing for a more comprehensive video sequence analysis;
  • Marking Methodology
    • In the frontal plane, potential tumor regions are marked with green contours. In the sagittal plane, potential tumor regions are marked with red rectangles;
  • Detail of Analysis
    • The frontal plane suggests further frame analysis without providing details on the needed clinical information, while the sagittal plane clearly suggests the need for additional clinical or radiological information for a more precise analysis.

4. Discussion

Our study only partially confirmed the hypotheses regarding the capabilities of AI models like ChatGPT 4o and Gemini Pro in recognizing MRI video content and correctly identifying the region of the neoplastic change. We conducted an analysis using three sets of questions with varying degrees of specificity, from very general to more detailed. The question sets aimed to assess whether the models could interpret and analyze MRI brain videos depicting a posterior fossa tumor. The first set of questions included general inquiries about the video content, the second contained more specific questions, and the third focused on detailed pathological analysis.
The analysis of the models’ responses showed that, with general question sets, only the Gemini Pro model could correctly recognize the video content, unlike the ChatGPT 4o model. For more specific questions, neither model provided correct answers, suggesting that the models lacked sufficient ability to identify detailed pathological features in the examined material. The most challenging task was the most specific question, which indicated that the video contained a brain MRI with contrast and a neoplastic change. The ChatGPT 4o model could not accurately identify the region of the neoplastic change. The responses were random and unrelated to the actual video content. Gemini Pro did not attempt any analysis.

4.1. Comprehensive Summary

The study evaluated AI models, ChatGPT 4o and Gemini Pro, in analyzing single-phase contrast-enhanced MRI videos in frontal and sagittal planes of a child’s head, focusing on identifying a medulloblastoma tumor. In the frontal plane, the first set of questions revealed that the model provided detailed video properties and suggested various analysis methods, identifying regions of motion and contours. The second set indicated that the model recognized the need for additional information for detailed analysis. The third set demonstrated the model’s capability to identify potential tumor regions in the initial frames. In the sagittal plane, the first set showed the model’s ability to provide detailed video properties and suggest analysis methods like frame extraction and contrast analysis. The second set detailed the model’s extraction of key frames and description of each. The third set highlighted the model’s identification of potential tumor regions, emphasizing the need for additional clinical information.
Both AI models demonstrated the ability to analyze videos by extracting key frames and employing techniques such as object detection, motion analysis, and edge detection. They were able to proceed with further analysis when provided with additional details or specific queries about the video content. However, neither model successfully identified and marked potential tumor areas in the video frames, indicating challenges in detecting possible pathological regions.
Differences emerged in frame extraction: the frontal plane model extracted fewer frames, limiting the scope of sequence analysis, while the sagittal plane model extracted more frames, enabling a more comprehensive review. The marking methods also varied, with potential tumor regions highlighted by green contours in the frontal plane and red rectangles in the sagittal plane. Moreover, the frontal plane model functioned more generally with less detailed clinical input, whereas the sagittal plane model provided more specific suggestions for additional clinical information.
The integration of AI technologies, particularly ChatGPT, into medical imaging has shown significant promise, as evidenced by several recent studies. For instance, Sultan et al. highlighted ChatGPT-4’s capabilities in ultrasound imaging, emphasizing its potential to reduce operator dependency and improve diagnostic accuracy in fields where expertise is limited. However, while ChatGPT-4 demonstrated strengths in ultrasound segmentation tasks, our findings underscore its limitations in handling more complex MRI video sequences, particularly in identifying specific pathological regions. This comparison suggests the need for specialized training and model refinement for different imaging modalities [18]. Similarly, Lee et al. examined ChatGPT’s performance in answering questions about MRI physics, revealing its difficulties in managing context-dependent and clinically complex tasks. Both our study and theirs indicate ChatGPT’s proficiency in basic radiological concepts but highlight its challenges with advanced image interpretation, particularly in dynamic video content [19]. Moreover, Arachchige et al. explored ChatGPT’s role in diagnostic imaging, reporting similar limitations such as the fabrication of findings and the necessity for human oversight. This aligns with our observation of ChatGPT 4o’s struggle to accurately identify neoplastic changes in MRI videos, further emphasizing the need for improved AI models in clinical tasks [20]. Lastly, Rawas et al. (2024) investigated a CNN and transformer-based approach that incorporated ChatGPT for better feature interpretation in MRI tumor detection. This study, similar to ours, highlighted the potential of integrating NLP with deep learning to enhance medical imaging analysis [21]. While both studies underscore the promise of AI tools, they also point to the critical need for continued refinement and development to meet the demands of complex medical data and ensure accurate diagnostics in clinical practice.
In summary, while both AI models showed promising capabilities in video analysis and image processing, they require significant improvements and specialized training to achieve accurate medical diagnostics. The comparison of their performance across different question sets and planes underscores their current limitations and highlights areas for further development in medical imaging applications.

4.2. Potential Reasons for Failure

  • Lack of Medical Specialization in the Models: Models such as ChatGPT 4o and Gemini Pro are general language models not specifically designed or trained to analyze medical video images. Although they possess the ability to process textual information and some knowledge of medicine, their lack of specialization in medical image analysis results in low accuracy and inconsistency in interpreting MRI data. Large language models (LLMs), such as ChatGPT and Gemini Pro, while proficient in processing textual information and demonstrating some medical knowledge [22], lack specialization in medical image analysis, leading to potential inaccuracies in interpreting MRI data. It is recommended that specialized medical LLMs, trained based on authoritative medical databases that are human-validated, provide greater accuracy and completeness in medical fields [23]. Evaluations of ChatGPT 4o’s performance on medical licensing exams show high proficiency in handling textual and visual questions, meeting positive criteria but demonstrating limitations in clinical assessment and prioritization [24]. Additionally, a study comparing LLMs with healthcare-specific NLP tools found that ChatGPT 4 performed similarly in some tasks but less accurately in others, highlighting the need for task-specific evaluation before implementing LLMs in medical contexts [25];
  • Technical Limitations in Video Processing: The technical limitations in video processing of ChatGPT 4o and Gemini Pro arise primarily from their focus on text and static image analysis and lack the advanced visual analysis algorithms necessary for complex MRI sequences [26,27]. Although ChatGPT excels in natural language processing tasks, it encounters challenges in tasks such as summarization and commonsense reasoning [27]. Additionally, the explainability of results in language models like ChatGPT poses significant challenges, hindering their use in sensitive applications [28]. Integrating video analysis with text-based language analysis remains a technological hurdle that has not been effectively addressed, underscoring the need for further development and optimization of AI models like ChatGPT [27]. Overcoming these limitations is crucial for enhancing the models’ capabilities for more comprehensive and integrated data analysis across different modalities;
  • Limited Ability to Understand Medical Context: Although these models have some medical knowledge, their ability to understand specific clinical contexts is limited. The inability to correctly identify the tumor in the video material may stem from their incapacity to apply medical context in analyzing MRI images. Research articles provide insights into the performance of AI models like ChatGPT in medical contexts. While ChatGPT 4o demonstrated high proficiency in handling text- and image-based questions on the Japanese Medical Licensing Examination (JMLE) [20], it is noted that the model had difficulties with clinical assessment and prioritization, indicating limitations in applying medical context in certain task scenarios. Similarly, a study evaluating ChatGPT’s ability to diagnose keratinocyte tumors found that although ChatGPT 4 improved diagnostic accuracy compared to ChatGPT 3.5, it still had limitations in specific tumor identification [29]. Additionally, a study comparing ChatGPT with Google in diagnosing rare rheumatologic diseases highlighted ChatGPT’s comparable diagnostic effectiveness but with significantly reduced query execution time, suggesting its practicality in clinical settings [30]. These findings collectively suggest that while AI models like ChatGPT possess medical knowledge, their ability to understand specific clinical contexts, particularly in tasks such as identifying tumors in MRI images, may still be limited, necessitating further improvements to enhance accuracy and reliability [31,32];
  • Issues with Access to Sufficient Training Data: For AI models to be effective in analyzing medical images, they must be trained on large diverse medical datasets. It is possible that the data used to train ChatGPT 4o and Gemini Pro did not include a sufficient number of cases involving contrast-enhanced brain MRI, resulting in their inability to analyze such data correctly;
  • Mismatch with Diagnostic Tasks: ChatGPT 4o and Gemini Pro were not originally designed for diagnostic tasks. Gemini Pro, in particular, may have built-in limitations regarding medical analysis, which could have led to the refusal to conduct the analysis and generate responses suggesting a lack of diagnostic capability.
These results suggest that current versions of AI models, such as ChatGPT 4o and Gemini Pro, have limited capabilities in analyzing MRI video content and are unable to meet the requirements posed by advanced diagnostic questions. The lack of ability to correctly interpret and locate pathological changes indicates the need for further research and improvements in these models to be effectively used in medical practice.
The conducted study had several significant limitations that could have impacted the results and data interpretation:
  • Lack of Specialized Training for Models: ChatGPT 4o and Gemini Pro are general language models not specifically trained for analyzing medical video images. Their application in a medical context is thus limited, which significantly affects their ability to recognize MRI video content and identify pathological changes;
  • Limited Number of Cases: The study was based on analyzing only one MRI video case of a child with a brain tumor. A larger number of diverse cases could provide more representative data and better assess the models’ ability to analyze different types of neoplastic changes;
  • Specificity of the Selected Material: The research material came from a single patient, which may limit the generalizability of the results. Brain tumors can vary depending on many factors, such as patient age, tumor location, or histopathological type, which could affect the AI models’ analysis results;
  • Lack of Comparison with Other Tools: The study did not include a comparison with other specialized AI tools designed for medical image analysis. Such a comparison could provide valuable insights into the relative effectiveness of ChatGPT 4o and Gemini Pro compared to tools specifically created for MRI image analysis;
  • Technical Limitations: AI models may have technical limitations in video processing, which, combined with a limited number of frames per second and image quality, could affect the models’ ability to correctly analyze the material.

5. Conclusions

The conducted study demonstrated that current versions of general AI models like ChatGPT 4o and Gemini Pro have limited capability in analyzing MRI video of the head depicting a posterior fossa tumor. These models were unable to correctly recognize the video content or indicate the neoplastic region, even after applying detailed questions. The results suggest that despite their widespread use in various fields, further improvements and specialized training of these models are needed for them to effectively support medical diagnostics. Study limitations, such as the small number of cases analyzed and the lack of specialized training for the models, highlight the need for further research using larger datasets and comparisons with dedicated medical image analysis tools.

Author Contributions

Conceptualization, A.F., R.F. and A.Z.-F.; methodology, A.F., R.F., A.Z.-F. and B.P.; investigation, A.F., B.P., A.Z.-F. and R.F.; data curation, A.F., R.K. and B.P.; writing—original draft preparation, A.F., A.Z.-F., B.P., R.K. and R.F.; writing—review and editing, A.F., A.Z.-F., R.F., B.P., R.K., E.N. and K.Z.; supervision, B.P., K.Z. and E.N.; funding acquisition, A.F., E.N., K.Z. and B.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

Many thanks are extended to Agnieszka Strzała for linguistic proofreading and to Robert Fabijan for the substantive support in the field of AI and for assistance in designing the presented study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Uppalapati, V.K.; Nag, D.S. A Comparative Analysis of AI Models in Complex Medical Decision-Making Scenarios: Evaluating ChatGPT, Claude AI, Bard, and Perplexity. Cureus 2024, 16, e52485. [Google Scholar] [CrossRef] [PubMed]
  2. Waisberg, E.; Ong, J.; Masalkhi, M.; Zaman, N.; Sarker, P.; Lee, A.G.; Tavakkoli, A. GPT-4 and medical image analysis: Strengths, weaknesses and future directions. J. Med. Artif. Intell. 2023, 6, 29. [Google Scholar] [CrossRef]
  3. Zong, H.; Li, J.; Wu, E.; Wu, R.; Lu, J.; Shen, B. Performance of ChatGPT on Chinese national medical licensing examinations: A five-year examination evaluation study for physicians, pharmacists and nurses. BMC Med. Educ. 2024, 24, 143. [Google Scholar] [CrossRef] [PubMed]
  4. Saravia-Rojas, M.Á.; Camarena-Fonseca, A.R.; León-Manco, R.; Geng-Vivanco, R. Artificial intelligence: ChatGPT as a disruptive didactic strategy in dental education. J. Dent. Educ. 2024, 88, 872–876. [Google Scholar] [CrossRef]
  5. Pradhan, F.; Fiedler, A.; Samson, K.; Olivera-Martinez, M.; Manatsathit, W.; Peeraphatdit, T. Artificial intelligence compared with human-derived patient educational materials on cirrhosis. Hepatol. Commun. 2024, 8, e0367. [Google Scholar] [CrossRef]
  6. Masalkhi, M.; Ong, J.; Waisberg, E.; Lee, A.G. Google DeepMind’s gemini AI versus ChatGPT: A comparative analysis in ophthalmology. Eye 2024, 38, 1412–1417. [Google Scholar] [CrossRef]
  7. Maniaci, A.; Fakhry, N.; Chiesa-Estomba, C.; Lechien, J.R.; Lavalle, S. Synergizing ChatGPT and general AI for enhanced medical diagnostic processes in head and neck imaging. Eur. Arch. Otorhinolaryngol. 2024, 281, 3297–3298. [Google Scholar] [CrossRef]
  8. Maaz, M.; Rasheed, H.; Khan, S.; Khan, F.S. Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models. arXiv 2024, arXiv:2306.05424. [Google Scholar] [CrossRef]
  9. Reddy Kalli, V.D. Creating an AI-powered platform for neurosurgery alongside a usability examination: Progressing towards minimally invasive robotics. J. Artif. Intell. Gen. Sci. (JAIGS) 2024, 3, 256–268. [Google Scholar] [CrossRef]
  10. Dip, S.S.; Rahman, M.H.; Islam, N.; Arafat, M.E.; Bhowmick, P.K.; Yousuf, M.A. Enhancing Brain Tumor Classification in MRI: Leveraging Deep Convolutional Neural Networks for Improved Accuracy. Int. J. Inf. Technol. Comput. Sci. 2024, 16, 12–21. [Google Scholar] [CrossRef]
  11. Lemaire, R.; Raboutet, C.; Leleu, T.; Jaudet, C.; Dessoude, L.; Missohou, F.; Poirier, Y.; Deslandes, P.Y.; Lechervy, A.; Lacroix, J.; et al. Artificial intelligence solution to accelerate the acquisition of MRI images: Impact on the therapeutic care in oncology in radiology and radiotherapy departments. Cancer Radiother. 2024, 28, 251–257. [Google Scholar] [CrossRef] [PubMed]
  12. Wood, D.; Guilhem, E.; Kafiabadi, S.; Al Busaidi, A.; Al Busaidi, A.; Hammam, A.; Mansoor, N.; Townend, M.; Agarwal, S.; Wei, Y.; et al. Automated Brain Abnormality Detection using a Self-Supervised Text-Vision Framework. Authorea 2024, 2, 1. [Google Scholar] [CrossRef]
  13. Chen, H.; Xu, Q.; Zhang, L.; Kiraly, A.P.; Novak, C.L. Automated definition of mid-sagittal planes for MRI brain scans. In Medical Imaging 2007: Image Processing; SPIE: Bellingham, WA, USA, 2007. [Google Scholar] [CrossRef]
  14. Kozel, G.; Gurses, M.E.; Gecici, N.N.; Gökalp, E.; Bahadir, S.; Merenzon, M.A.; Shah, A.H.; Komotar, R.J.; Ivan, M.E. Chat-GPT on brain tumors: An examination of Artificial Intelligence/Machine Learning’s ability to provide diagnoses and treatment plans for example neuro-oncology cases. Clin. Neurol. Neurosurg. 2024, 239, 108238. [Google Scholar] [CrossRef] [PubMed]
  15. Abbas, A.A.; Shitran, R.; Dagash, H.T.; Khalil, M.A.; Abdulrazzaq, R. Prevalence of Pediatric brain tumor in children from a tertiary neurosurgical center, during a period from 2010 to 2018 in Baghdad, Iraq. Ann. Trop. Med. Public Health 2021, 24, 315–321. [Google Scholar] [CrossRef]
  16. Elgamal, E.A.; Mohamed, R.M. Pediatric Brain Tumors. In Clinical Child Neurology; Salih, M.A., Ed.; Springer: Cham, Germany, 2020; pp. 1033–1068. [Google Scholar] [CrossRef]
  17. Jaju, A.; Yeom, K.W.; Ryan, M.E. MR Imaging of Pediatric Brain Tumors. Diagnostics 2022, 12, 961. [Google Scholar] [CrossRef]
  18. Sultan, L.R.; Mohamed, M.K.; Andronikou, S. ChatGPT-4: A Breakthrough in Ultrasound Image Analysis. Radiol. Adv. 2024, 1, umae006. [Google Scholar] [CrossRef]
  19. Lee, K.-H.; Lee, R.-W. ChatGPT’s Accuracy on Magnetic Resonance Imaging Basics: Characteristics and Limitations Depending on the Question Type. Diagnostics 2024, 14, 171. [Google Scholar] [CrossRef]
  20. Perera Molligoda Arachchige, A.S. Empowering Radiology: The Transformative Role of ChatGPT. Clin. Radiol. 2023, 78, 851–855. [Google Scholar] [CrossRef]
  21. Rawas, S.; Tafran, C.; AlSaeed, D. ChatGPT-powered Deep Learning: Elevating Brain Tumor Detection in MRI Scans. Appl. Comput. Inform. 2024, 1–13. [Google Scholar] [CrossRef]
  22. Yan, Z.; Liu, J.; Shuang, L.; Xu, D.; Yang, Y.; Wang, H.; Mao, J.; Tseng, H.; Chang, T.; Chen, Y.; et al. Large Language Models (LLMs) vs. Specialist Doctors: A Comparative Study on Health Information in specific medical domains. J. Med. Internet Res. 2024. in preprint. [Google Scholar] [CrossRef]
  23. Wang, S. Beyond ChatGPT: It Is Time to Focus More on Specialized Medical LLMs. J. Endourol. 2024, 1–9. [Google Scholar] [CrossRef] [PubMed]
  24. Miyazaki, Y.; Hata, M.; Omori, H.; Hirashima, A.; Nakagawa, Y.; Etō, M.; Takahashi, S.; Ikeda, M. Performance and Errors of ChatGPT-4o on the Japanese Medical Licensing Examination: Solving All Questions Including Images with Over 90% Accuracy. JMIR Med. Educ. 2024. in preprint. [Google Scholar] [CrossRef]
  25. Rough, K.; Feng, H.; Milligan, P.B.; Tombini, F.; Kwon, T.; El Abidine, K.Z.; Mack, C.; Hughes, B. How well it works: Benchmarking performance of GPT models on medical natural language processing tasks. medRxiv 2024. [Google Scholar] [CrossRef]
  26. Patil, P.; Kulkarni, K.; Sharma, P. Algorithmic Issues, Challenges, and Theoretical Concerns of ChatGPT. In Applications, Challenges, and the Future of ChatGPT; Sharma, P., Jyotiyana, M., Senthil Kumar, A.V., Eds.; IGI Global: Hershey, PA, USA, 2024; Chapter 3; pp. 56–74. [Google Scholar] [CrossRef]
  27. Wu, Y. Evaluating ChatGPT: Strengths and Limitations in NLP Problem Solving. Highl. Sci. Eng. Technol. 2024, 94, 319–325. [Google Scholar] [CrossRef]
  28. Arnold, T. Herausforderungen in der Forschung: Mangelnde Reproduzierbarkeit und Erklärbarkeit. In KI:Text: Diskurse Über KI-Textgeneratoren; Schreiber, G., Ohly, L., Eds.; De Gruyter: Berlin, Germany; Boston, MA, USA, 2024; pp. 67–80. [Google Scholar] [CrossRef]
  29. Yang, K.; Zeb, L.; Bae, S.; Pavlidakey, P.G. Diagnostic Accuracy of ChatGPT for Textbook Descriptions of Epidermal Tumors: An Exploratory Study. Am. J. Dermatopathol. 2024, 46, 632–634. [Google Scholar] [CrossRef]
  30. Lasnier-Siron, J. Pos0749 Respective Performances of ChatGPT and Google for the Diagnosis of Rare Diseases in Rheumatology. Ann. Rheum. Dis. 2024, 83, 1115–1116. [Google Scholar] [CrossRef]
  31. Holland, A.; Lorenz, W.; Cavanaugh, J.; Ayuso, S.; Scarola, G.; Jorgensen, L.; Kercher, K.; Smart, N.; Fischer, J.; Janiset, J.; et al. ChatGPT, MD: A Pilot Study Utilizing Large Language Models to Write Medical Abstracts. Br. J. Surg. 2024, 111 (Suppl. S5), znae122-039. [Google Scholar] [CrossRef]
  32. Li, K.C.; Bu, Z.J.; Shahjalal, M.; He, B.X.; Zhuang, Z.F.; Li, C.; Liu, J.P.; Wang, B.; Liu, Z.L. Performance of ChatGPT on Chinese Master’s Degree Entrance Examination in Clinical Medicine. PLoS ONE 2024, 19, e0301702. [Google Scholar] [CrossRef]
Figure 1. Progression of the questions in the study from general to specific with their specific reasoning.
Figure 1. Progression of the questions in the study from general to specific with their specific reasoning.
Applsci 14 08323 g001
Figure 2. Sagittal plane MRI scan (A) and frontal plane MRI scan (B) show a multifocal cystic lesion with a solid component, mainly in the medial part, peripherally in the left cerebellar hemisphere. The lesion measures approximately 53 × 44 × 40 mm (SD × AP × CC) and adheres to the inner plate of the occipital bone, causing its thinning, and to the cerebellar tentorium. The solid part of the lesion shows fairly uniform contrast enhancement.
Figure 2. Sagittal plane MRI scan (A) and frontal plane MRI scan (B) show a multifocal cystic lesion with a solid component, mainly in the medial part, peripherally in the left cerebellar hemisphere. The lesion measures approximately 53 × 44 × 40 mm (SD × AP × CC) and adheres to the inner plate of the occipital bone, causing its thinning, and to the cerebellar tentorium. The solid part of the lesion shows fairly uniform contrast enhancement.
Applsci 14 08323 g002
Figure 3. Figure shows a fragment of the answer from Gemini Pro regarding the analyzed MRI video material.
Figure 3. Figure shows a fragment of the answer from Gemini Pro regarding the analyzed MRI video material.
Applsci 14 08323 g003
Figure 4. Analysis performed by ChatGPT 4o for the frontal plane. The results of the motion detection and contour analysis indicate the following: 1. Contours in Frames (Left side). The green contours highlight regions where significant changes or movements have occurred between consecutive frames. These regions could correspond to moving objects, changing light conditions, or other dynamic elements in the video. 2. Thresholded Difference (Right side): The binary images show the areas where the differences between frames exceed a certain threshold. White areas represent significant changes, while black areas indicate little to no change.
Figure 4. Analysis performed by ChatGPT 4o for the frontal plane. The results of the motion detection and contour analysis indicate the following: 1. Contours in Frames (Left side). The green contours highlight regions where significant changes or movements have occurred between consecutive frames. These regions could correspond to moving objects, changing light conditions, or other dynamic elements in the video. 2. Thresholded Difference (Right side): The binary images show the areas where the differences between frames exceed a certain threshold. White areas represent significant changes, while black areas indicate little to no change.
Applsci 14 08323 g004
Figure 5. Analysis performed by ChatGPT 4o for the sagittal plane. The frames with segmentation and annotations show the following: Segmentation: Green contours highlight potential areas of interest in the images. Annotation Detection: Red lines indicate detected line segments, which may correspond to annotations such as arrows.
Figure 5. Analysis performed by ChatGPT 4o for the sagittal plane. The frames with segmentation and annotations show the following: Segmentation: Green contours highlight potential areas of interest in the images. Annotation Detection: Red lines indicate detected line segments, which may correspond to annotations such as arrows.
Applsci 14 08323 g005
Figure 6. ChatGPT 4o’s response regarding tumor detection in the frontal plane. The analysis of the first frame indicates two potential tumor regions, which are highlighted with green contours.
Figure 6. ChatGPT 4o’s response regarding tumor detection in the frontal plane. The analysis of the first frame indicates two potential tumor regions, which are highlighted with green contours.
Applsci 14 08323 g006
Figure 7. ChatGPT 4o’s response regarding tumor detection in the sagittal plane. Potential areas of interest in the three selected frames are highlighted. The red rectangles indicate regions where there might be abnormal enhancement, suggesting the presence of a tumor.
Figure 7. ChatGPT 4o’s response regarding tumor detection in the sagittal plane. Potential areas of interest in the three selected frames are highlighted. The red rectangles indicate regions where there might be abnormal enhancement, suggesting the presence of a tumor.
Applsci 14 08323 g007
Table 1. Comparison of analysis methods and outcomes between the frontal and sagittal planes for video processing in medical imaging.
Table 1. Comparison of analysis methods and outcomes between the frontal and sagittal planes for video processing in medical imaging.
CriterionFrontal PlaneSagittal Plane
General CharacteristicsVideo properties: 20 FPS, 200 frames, 10 s, 1024 × 1024 pixelsDetailed video properties not provided
Analysis MethodsFrame-by-frame, object detection, motion analysis, image processingFrame extraction, image content analysis, contrast and brightness analysis, object detection, temporal analysis
Analysis SpecificityEnhancements (contrast, edge detection)Edge detection, regions of interest based on brightness and contrast
Identified ElementsContours, significant changes, motion regionsSignificant features, structural elements, potential regions of interest
Advanced TechniquesObject detection (OpenCV), contour analysis, motion detectionObject detection (OpenCV), edge detection
Initial Analysis ResultsMotion regions, contour analysisBrightness and contrast variations, edge detection results
Next StepsTracking motion, object recognitionMotion tracking, detailed region analysis
Table 2. Table comparing the differences in video content recognition, frame extraction, and analysis between the frontal and sagittal planes. It highlights the distinct approaches used for processing video data and conducting frame-by-frame analysis, emphasizing how each method impacts the recognition and interpretation of content in the respective anatomical views.
Table 2. Table comparing the differences in video content recognition, frame extraction, and analysis between the frontal and sagittal planes. It highlights the distinct approaches used for processing video data and conducting frame-by-frame analysis, emphasizing how each method impacts the recognition and interpretation of content in the respective anatomical views.
CriterionFrontal PlaneSagittal Plane
Video Content RecognitionI cannot directly play or analyze the video. Please provide more details or specific queries.I can analyze video content starting with frame extraction and key information.
Detailed Video Data200 frames, 20 FPS, 10 s180 frames, 20 FPS, 9 s
Frame ExtractionExtracted first frame, no detailed content provided.Extraction of 10 frames at equal intervals, with detailed description of each frame.
Frame Content DescriptionNo detailed description of the first frame.Detailed description of each of the 10 extracted frames, focusing on scene context, perspective, movement, and interactions.
Next StepsRequest for further information or specific queries.Suggested further analysis or detailed video content information based on frames.
Table 3. Comparison of video content recognition, frame extraction, analysis methods, identified elements, and next steps between the frontal and sagittal planes, with a focus on identifying potential tumor regions.
Table 3. Comparison of video content recognition, frame extraction, analysis methods, identified elements, and next steps between the frontal and sagittal planes, with a focus on identifying potential tumor regions.
CriterionFrontal PlaneSagittal Plane
Video Content RecognitionYes, analysis is possible, but detailed medical analysis should be performed by a specialist.Yes, analysis is possible with a focus on identifying pathological changes, especially tumors.
Frame ExtractionExtracted the first 10 frames: Frame 0, 10, 20, 30, 40, 50, 60, 70, 80, and 90.Extracted all 180 frames from the video.
Analysis MethodsFrame analysis to identify potential tumor areas, green contours.Frame analysis to identify pathological changes, especially tumors, with red contours.
Identified ElementsPotential tumor regions marked with green contours.Potential tumor regions marked with red rectangles.
Next StepsFurther analysis of more frames or focus on a specific aspect of the analysis.Suggested more detailed analysis based on clinical information or radiological markers.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fabijan, A.; Zawadzka-Fabijan, A.; Fabijan, R.; Zakrzewski, K.; Nowosławska, E.; Kosińska, R.; Polis, B. Automated MRI Video Analysis for Pediatric Neuro-Oncology: An Experimental Approach. Appl. Sci. 2024, 14, 8323. https://doi.org/10.3390/app14188323

AMA Style

Fabijan A, Zawadzka-Fabijan A, Fabijan R, Zakrzewski K, Nowosławska E, Kosińska R, Polis B. Automated MRI Video Analysis for Pediatric Neuro-Oncology: An Experimental Approach. Applied Sciences. 2024; 14(18):8323. https://doi.org/10.3390/app14188323

Chicago/Turabian Style

Fabijan, Artur, Agnieszka Zawadzka-Fabijan, Robert Fabijan, Krzysztof Zakrzewski, Emilia Nowosławska, Róża Kosińska, and Bartosz Polis. 2024. "Automated MRI Video Analysis for Pediatric Neuro-Oncology: An Experimental Approach" Applied Sciences 14, no. 18: 8323. https://doi.org/10.3390/app14188323

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop