An Open-Source Probabilistic Programming System for Data Generation and Safety in AI-Based Autonomy

CVPR 2024 Tutorial


Autonomous systems, such as self-driving cars or intelligent robots, are increasingly operating in complex, stochastic environments where they dynamically interact with multiple entities (human and robot). There is a need to formally model and generate such environments in simulation, for use cases that span synthetic training data generation and rigorous evaluation of safety. In this tutorial, we provide an in-depth tutorial on Scenic, a simulator-agnostic probabilistic programming language to model complex multi-agent, physical environments with stochasticity and spatio-temporal constraints. Scenic has been used in a variety of domains such as self-driving, aviation, indoor robotics, multi-agent systems, and augmented/virtual reality. Using Scenic and associated open source tools, one can (1) model and sample from distributions with spatial and temporal constraints, (2) generate synthetic data in a controlled, programmatic fashion to train and test machine learning components, (3) reason about the safety of AI-enabled autonomous systems, (4) automatically find edge cases, (5) debug and root-cause failures of AI components including for perception, and (6) bridge the sim-to-real gap in autonomous system design. We will provide a hands-on tutorial on the basics of Scenic and its applications, how to create Scenic programs and your own new applications on top of Scenic, and to interface the language to your simulator/renderer of choice. For more information on Scenic, please visit the website:

Flexible Scenario / Data Generation Across Different Domains

Autonomous Vehicles



Reinforcement Learning

Augmented Reality


Sanjit Seshia

University of California, Berkeley

Daniel Fremont

University of California, Santa Cruz

Edward Kim

University of California, Berkeley


Edward Kim

University of California, Berkeley

Jinkyu Kim

Korea University, South Korea

Necmiye Ozay

University of Michigan, Ann Arbor

Parasara Sridhar Duggirala

University of North Carolina, Chapel Hill

Hazem Torfah

Chalmers University of Technology, Sweden

Kimin Lee

School of AI, KAIST, South Korea

Marcell Vazquez-Chanlatte

Nissan's Alliance Innovation Lab, Silicon Valley


Today's autonomous systems rely heavily on the use of machine learning components trained on large amounts of data. Even so, it is expensive to collect relevant data and test these systems in the real world in a manner that captures typical data distributions and also covers edge cases. Therefore, simulators are widely adopted in the robotics and computer vision community to train, test, and debug autonomous and semi-autonomous systems. However, working directly with simulators can be too low-level and problem-specific. To support the design lifecycle of autonomous/semi-autonomous systems, one needs to raise the level of abstraction above individual simulators and provide a formal framework for world modeling. Such a world model can help reason about the safety of a system and facilitate data generation and sim-to-real validation, as well as help to interpret, validate, share, or re-use training and test scenarios across the community.

The objective of this tutorial is to introduce Scenic, an open-source, domain-specific probabilistic programming language for world modeling that addresses the above needs. Scenic is designed to model and generate interactive (or reactive), multi-agent scenarios in a manner portable to any simulator. In Scenic, users can precisely model a stochastic environment in which an autonomous/semi-autonomous system operates, can perform a variety of design and analysis tasks, and can communicate them as interpretable programs. Scenic has a variety of demonstrated use cases, including synthetic data generation, data augmentation, debugging and retraining and redesign of perception components, sim-to-real validation, testing safety of autonomous system both in simulation and in the real world, training reinforcement learning agents in multiplayer settings, and more. To achieve these goals, Scenic has been designed to be (i) intuitive to learn, (ii) probabilistic to capture the uncertainty and stochasticity in the real world, (iii) simulator-agnostic, and (iv) open-source and in the public domain for external members to contribute.

Details & Schedule

Time: June 17th, 9:00 AM-12:00 PM

Time zone: PDT (Pacific Daylight Time)
Location: Seattle Convention Center, Arch 307-308

Here are the slides used in the tutorial: Tutorial Slides (PDF)

For details, please contact Edward Kim.
Website template by Oriane Siméoni from the Object localization for free CVPR '23 tutorial.
Last updated: 10th of June 2024