Next Article in Journal
Integrating the Capsule-like Smart Aggregate-Based EMI Technique with Deep Learning for Stress Assessment in Concrete
Previous Article in Journal
Trustworthy High-Performance Multiplayer Games with Trust-but-Verify Protocol Sensor Validation
Previous Article in Special Issue
Accurate Monocular SLAM Initialization via Structural Line Tracking
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Open-Vocabulary Predictive World Models from Sensor Observations

1
Graduate School of Informatics, Nagoya University, Nagoya 464-8603, Japan
2
Independent Researcher, Lausanne 1005, Switzerland
3
Department of Electrical, Electronic and Computer Engineering, Gifu University, Gifu 501-1112, Japan
4
TIER IV, Nagoya 450-6610, Japan
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(14), 4735; https://doi.org/10.3390/s24144735 (registering DOI)
Submission received: 8 June 2024 / Revised: 16 July 2024 / Accepted: 19 July 2024 / Published: 21 July 2024
(This article belongs to the Collection Robotics and 3D Computer Vision)

Abstract

Cognitive scientists believe that adaptable intelligent agents like humans perform spatial reasoning tasks by learned causal mental simulation. The problem of learning these simulations is called predictive world modeling. We present the first framework for a learning open-vocabulary predictive world model (OV-PWM) from sensor observations. The model is implemented through a hierarchical variational autoencoder (HVAE) capable of predicting diverse and accurate fully observed environments from accumulated partial observations. We show that the OV-PWM can model high-dimensional embedding maps of latent compositional embeddings representing sets of overlapping semantics inferable by sufficient similarity inference. The OV-PWM simplifies the prior two-stage closed-set PWM approach to the single-stage end-to-end learning method. CARLA simulator experiments show that the OV-PWM can learn compact latent representations and generate diverse and accurate worlds with fine details like road markings, achieving 69 mIoU over six query semantics on an urban evaluation sequence. We propose the OV-PWM as a versatile continual learning paradigm for providing spatio-semantic memory and learned internal simulation capabilities to future general-purpose mobile robots.
Keywords: world models; open-vocabulary semantics; generative models; BEV generation; continual learning; self-supervised learning; mobile robots; autonomous driving world models; open-vocabulary semantics; generative models; BEV generation; continual learning; self-supervised learning; mobile robots; autonomous driving

Share and Cite

MDPI and ACS Style

Karlsson, R.; Asfandiyarov, R.; Carballo, A.; Fujii, K.; Ohtani, K.; Takeda, K. Open-Vocabulary Predictive World Models from Sensor Observations. Sensors 2024, 24, 4735. https://doi.org/10.3390/s24144735

AMA Style

Karlsson R, Asfandiyarov R, Carballo A, Fujii K, Ohtani K, Takeda K. Open-Vocabulary Predictive World Models from Sensor Observations. Sensors. 2024; 24(14):4735. https://doi.org/10.3390/s24144735

Chicago/Turabian Style

Karlsson, Robin, Ruslan Asfandiyarov, Alexander Carballo, Keisuke Fujii, Kento Ohtani, and Kazuya Takeda. 2024. "Open-Vocabulary Predictive World Models from Sensor Observations" Sensors 24, no. 14: 4735. https://doi.org/10.3390/s24144735

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop