Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (3)

Search Parameters:
Keywords = Moondream2

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 5178 KB  
Article
Estimating Age and Sex from Dental Panoramic Radiographs Using Neural Networks and Vision–Language Models
by Salem Shamsul Alam, Nabila Rashid, Tasfia Azrin Faiza, Saif Ahmed, Rifat Ahmed Hassan, James Dudley and Taseef Hasan Farook
Oral 2025, 5(1), 3; https://doi.org/10.3390/oral5010003 - 8 Jan 2025
Cited by 3 | Viewed by 3766
Abstract
Purpose: The purpose of this study was to compare multiple deep learning models for estimating age and sex using dental panoramic radiographs and identify the most successful deep learning models for the specified tasks. Methods: The dataset of 437 panoramic radiographs was divided [...] Read more.
Purpose: The purpose of this study was to compare multiple deep learning models for estimating age and sex using dental panoramic radiographs and identify the most successful deep learning models for the specified tasks. Methods: The dataset of 437 panoramic radiographs was divided into training, validation, and testing sets. Random oversampling was used to balance the class distributions in the training data and address the class imbalance in sex and age. The models studied were neural network models (CNN, VGG16, VGG19, ResNet50, ResNet101, ResNet152, MobileNet, DenseNet121, DenseNet169) and vision–language models (Vision Transformer and Moondream2). Binary classification models were built for sex classification, while regression models were developed for age estimations. Sex classification was evaluated using precision, recall, F1 score, accuracy, area under the curve (AUC), and a confusion matrix. For age regression, performance was evaluated using mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), R2, and mean absolute percentage error (MAPE). Results: In sex classification, neural networks achieved accuracies of 85% and an AUC of 0.85, while Moondream2 had much lower accuracy (49%) and AUC (0.48). DenseNet169 performed better than other models for age regression, with an R2 of 0.57 and an MAE of 7.07. Among sex classes, the CNN model achieved the highest precision, recall, and F1 score for both males and females. Vision Transformers that specialised in identifying objects from images demonstrated weaker performance in dental panoramic radiographs, with an inference time of 4.5 s per image. Conclusions: The CNN and DenseNet169 were the most effective models for classifying sex and age regression, performing better than other models for estimating age and sex from dental panoramic radiographs. Full article
(This article belongs to the Special Issue Artificial Intelligence in Oral Medicine: Advancements and Challenges)
Show Figures

Figure 1

19 pages, 30513 KB  
Article
From Detection to Action: A Multimodal AI Framework for Traffic Incident Response
by Afaq Ahmed, Muhammad Farhan, Hassan Eesaar, Kil To Chong and Hilal Tayara
Drones 2024, 8(12), 741; https://doi.org/10.3390/drones8120741 - 9 Dec 2024
Cited by 9 | Viewed by 5406
Abstract
With the rising incidence of traffic accidents and growing environmental concerns, the demand for advanced systems to ensure traffic and environmental safety has become increasingly urgent. This paper introduces an automated highway safety management framework that integrates computer vision and natural language processing [...] Read more.
With the rising incidence of traffic accidents and growing environmental concerns, the demand for advanced systems to ensure traffic and environmental safety has become increasingly urgent. This paper introduces an automated highway safety management framework that integrates computer vision and natural language processing for real-time monitoring, analysis, and reporting of traffic incidents. The system not only identifies accidents but also aids in coordinating emergency responses, such as dispatching ambulances, fire services, and police, while simultaneously managing traffic flow. The approach begins with the creation of a diverse highway accident dataset, combining public datasets with drone and CCTV footage. YOLOv11s is retrained on this dataset to enable real-time detection of critical traffic elements and anomalies, such as collisions and fires. A vision–language model (VLM), Moondream2, is employed to generate detailed scene descriptions, which are further refined by a large language model (LLM), GPT 4-Turbo, to produce concise incident reports and actionable suggestions. These reports are automatically sent to relevant authorities, ensuring prompt and effective response. The system’s effectiveness is validated through the analysis of diverse accident videos and zero-shot simulation testing within the Webots environment. The results highlight the potential of combining drone and CCTV imagery with AI-driven methodologies to improve traffic management and enhance public safety. Future work will include refining detection models, expanding dataset diversity, and deploying the framework in real-world scenarios using live drone and CCTV feeds. This study lays the groundwork for scalable and reliable solutions to address critical traffic safety challenges. Full article
Show Figures

Figure 1

16 pages, 3101 KB  
Article
Using Multimodal Foundation Models for Detecting Fake Images on the Internet with Explanations
by Vishnu S. Pendyala and Ashwin Chintalapati
Future Internet 2024, 16(12), 432; https://doi.org/10.3390/fi16120432 - 21 Nov 2024
Cited by 2 | Viewed by 3005
Abstract
Generative AI and multimodal foundation models have fueled a proliferation of fake content on the Internet. This paper investigates if foundation models help detect and thereby contain the spread of fake images. The task of detecting fake images is a formidable challenge owing [...] Read more.
Generative AI and multimodal foundation models have fueled a proliferation of fake content on the Internet. This paper investigates if foundation models help detect and thereby contain the spread of fake images. The task of detecting fake images is a formidable challenge owing to its visual nature and intricate analysis. This paper details experiments using four multimodal foundation models, Llava, CLIP, Moondream2, and Gemini 1.5 Flash, to detect fake images. Explainable AI techniques such as Local Interpretable Model-Agnostic Explanations (LIME) and removal-based explanations are used to gain insights into the detection process. The dataset used comprised real images and fake images generated by a generative artificial intelligence tool called MidJourney. Results show that the models can achieve up to a 69% accuracy rate in detecting fake images in an intuitively explainable way, as confirmed by multiple techniques and metrics. Full article
Show Figures

Figure 1

Back to TopTop