CPS25 - Scenic: An Open-Source Probabilistic Programming System for Data Generation and Safety in AI-Based Autonomy

Abstract

Autonomous systems, such as self-driving cars or intelligent robots, are increasingly operating in complex, stochastic environments where they dynamically interact with multiple entities (human and robot). There is a need to formally model and generate such environments in simulation, for use cases that span synthetic training data generation and rigorous evaluation of safety. In this tutorial, we provide an in-depth tutorial on Scenic, a simulator-agnostic probabilistic programming language to model complex multi-agent, physical environments with stochasticity and spatio-temporal constraints. Scenic has been used in a variety of domains such as self-driving, aviation, indoor robotics, multi-agent systems, and augmented/virtual reality. Using Scenic and associated open source tools, one can (1) model and sample from distributions with spatial and temporal constraints, (2) generate synthetic data in a controlled, programmatic fashion to train and test machine learning components, (3) reason about the safety of AI-enabled autonomous systems, (4) automatically find edge cases, (5) debug and root-cause failures of AI components including for perception, and (6) bridge the sim-to-real gap in autonomous system design. We will provide a hands-on tutorial on the basics of Scenic and its applications, how to create Scenic programs and your own new applications on top of Scenic, and to interface the language to your simulator/renderer of choice. For more information on Scenic, please visit the website: https://scenic-lang.org.

Flexible Scenario / Data Generation Across Different Domains

Autonomous Vehicles

Robotics

Aviation

Reinforcement Learning

Augmented Reality

Speakers

Sanjit Seshia

University of California, Berkeley

Daniel Fremont

University of California, Santa Cruz

Edward Kim

University of California, Berkeley

Overview

Today's cyber-physical systems (CPS) rely heavily on the use of machine learning components trained on large amounts of data. Even so, it is expensive to collect relevant data and test these systems in the real world in a manner that captures typical data distributions and also covers edge cases. Therefore, simulators are widely adopted across communities, such as robotics, machine learning, reinforcement learning, and computer vision, to train, test, and debug autonomous and semi-autonomous systems. However, working directly with simulators can be too low-level and problem-specific. To support the design life cycle of autonomous/semi-autonomous systems, one needs to raise the level of abstraction above individual simulators and provide a formal framework for world modeling. Such a world model can help reason about the safety of a system and facilitate data generation and sim-to-real validation, as well as help to interpret, validate, share, or re-use training and test scenarios across the community.

The objective of this tutorial is to introduce Scenic, an opensource, domain-specific probabilistic programming language for world modeling that addresses the above needs. Scenic is designed to model and generate interactive (or reactive), multi-agent scenarios in a manner portable to any simulator. In Scenic, users can precisely model a stochastic environment in which an autonomous/semiautonomous system operates, can perform a variety of design and analysis tasks, and can communicate them as interpretable programs. Scenic has a variety of demonstrated use cases, including synthetic data generation, data augmentation, debugging and retraining and redesign of perception, behavior prediction, and planner components, sim-to-real validation, testing safety of autonomous system both in simulation and in the real world, training reinforcement learning agents in multi-player settings, and more. To achieve these goals, Scenic has been designed to be (i) intuitive to learn, (ii) probabilistic to capture the uncertainty and stochasticity in the real world, (iii) simulator-agnostic, and (iv) open-source and in the public domain for external members to contribute.

Details & Schedule

Time zone: PDT (Pacific Daylight Time)

2:00 - 2:30 - Opening Remarks and Motivation for the Tutorial: On the challenges of achieving safe AI-based autonomy and generating and curating data to support the design life cycles of (semi-)autonomous systems.
2:30 - 3:10 - Introduction to the Scenic language: We will use a range of examples to illustrate the various features of Scenic and their use cases.
3:10 - 3:30 - Hands-On Programming with Scenic (Part 1): We will prepare a colab notebook for attendees to model and generate scenarios and ask any questions about the Scenic language.
- Notebook 1: Static Scenarios
- Notebook 2: Dynamic Scenarios
3:30 - 4:00 - Break
4:00 - 4:30 - Applications of Scenic (Part 1): We will cover how to test perception, behavior prediction, and planning components or the full autopilot stack in simulation with Scenic using its support toolkit called VERIFAI.
4:30 - 5:00 - Applications of Scenic (Part 2): We will cover how to collect desired sensor data (e.g. RGB, LiDAR) and labels (e.g. segmentation, 3D bounding boxes) using Scenic, and how to perform sim-to-real validation.
5:00 - 5:20 - Hands-On Programming with Scenic (Part 2)
5:20 - 5:30 - Ongoing and Future Research Directions: We will discuss novel applications of Scenic, including to mixed reality, and integrations of Scenic with large language models.

Scenic:

An Open-Source Probabilistic Programming System for Data Generation and Safety in AI-Based Autonomy

CPS 2025