CVPR24 Tutorial - Scenic: An Open-Source Probabilistic Programming System for Data Generation and Safety in AI-Based Autonomy

Abstract

Autonomous systems, such as self-driving cars or intelligent robots, are increasingly operating in complex, stochastic environments where they dynamically interact with multiple entities (human and robot). There is a need to formally model and generate such environments in simulation, for use cases that span synthetic training data generation and rigorous evaluation of safety. In this tutorial, we provide an in-depth tutorial on Scenic, a simulator-agnostic probabilistic programming language to model complex multi-agent, physical environments with stochasticity and spatio-temporal constraints. Scenic has been used in a variety of domains such as self-driving, aviation, indoor robotics, multi-agent systems, and augmented/virtual reality. Using Scenic and associated open source tools, one can (1) model and sample from distributions with spatial and temporal constraints, (2) generate synthetic data in a controlled, programmatic fashion to train and test machine learning components, (3) reason about the safety of AI-enabled autonomous systems, (4) automatically find edge cases, (5) debug and root-cause failures of AI components including for perception, and (6) bridge the sim-to-real gap in autonomous system design. We will provide a hands-on tutorial on the basics of Scenic and its applications, how to create Scenic programs and your own new applications on top of Scenic, and to interface the language to your simulator/renderer of choice. For more information on Scenic, please visit the website: https://scenic-lang.org.

Flexible Scenario / Data Generation Across Different Domains

Autonomous Vehicles

Robotics

Aviation

Reinforcement Learning

Augmented Reality

Speakers

Sanjit Seshia

University of California, Berkeley

Daniel Fremont

University of California, Santa Cruz

Edward Kim

University of California, Berkeley

Organizers

Edward Kim

University of California, Berkeley

Jinkyu Kim

Korea University, South Korea

Necmiye Ozay

University of Michigan, Ann Arbor

Parasara Sridhar Duggirala

University of North Carolina, Chapel Hill

Hazem Torfah

Chalmers University of Technology, Sweden

Kimin Lee

School of AI, KAIST, South Korea

Marcell Vazquez-Chanlatte

Nissan's Alliance Innovation Lab, Silicon Valley

Overview

Today's autonomous systems rely heavily on the use of machine learning components trained on large amounts of data. Even so, it is expensive to collect relevant data and test these systems in the real world in a manner that captures typical data distributions and also covers edge cases. Therefore, simulators are widely adopted in the robotics and computer vision community to train, test, and debug autonomous and semi-autonomous systems. However, working directly with simulators can be too low-level and problem-specific. To support the design lifecycle of autonomous/semi-autonomous systems, one needs to raise the level of abstraction above individual simulators and provide a formal framework for world modeling. Such a world model can help reason about the safety of a system and facilitate data generation and sim-to-real validation, as well as help to interpret, validate, share, or re-use training and test scenarios across the community.

The objective of this tutorial is to introduce Scenic, an open-source, domain-specific probabilistic programming language for world modeling that addresses the above needs. Scenic is designed to model and generate interactive (or reactive), multi-agent scenarios in a manner portable to any simulator. In Scenic, users can precisely model a stochastic environment in which an autonomous/semi-autonomous system operates, can perform a variety of design and analysis tasks, and can communicate them as interpretable programs. Scenic has a variety of demonstrated use cases, including synthetic data generation, data augmentation, debugging and retraining and redesign of perception components, sim-to-real validation, testing safety of autonomous system both in simulation and in the real world, training reinforcement learning agents in multiplayer settings, and more. To achieve these goals, Scenic has been designed to be (i) intuitive to learn, (ii) probabilistic to capture the uncertainty and stochasticity in the real world, (iii) simulator-agnostic, and (iv) open-source and in the public domain for external members to contribute.

Details & Schedule

Time: June 17th, 9:00 AM-12:00 PM

Time zone: PDT (Pacific Daylight Time)
Location: Seattle Convention Center, Arch 307-308

Here are the slides used in the tutorial: Tutorial Slides (PDF)

09:00 - 09:30 - Opening Remark and Motivation for the Tutorial: On the challenges of achieving safe AI- based autonomy and generating and curating data to support the design life cycles of (semi-)autonomous systems
09:30 - 10:45 - Introduction to the Scenic language with hands-on programming: We will provide a tutorial on how to model dynamic, interactive, multi-agent scenarios with hands-on programming in Google Colab notebooks.
- Notebook 1: Static Scenarios
- Notebook 2: Dynamic Scenarios
10:45 - 11:00 - Coffee Break
11:00 - 11:25 - Applications of Scenic (Part 1): We will cover how to test perception, behavior prediction, and planning components or the full autopilot stack in simulation with Scenic using its support toolkit called VERIFAI
11:25 - 11:50 - Applications of Scenic (Part 2): We will cover how to collect desired sensor data (e.g. RGB, LiDAR) and labels (e.g. segmentation, 3D bounding boxes) using Scenic, and how to perform sim-to-real validation
11:50 - 12:00 - Ongoing and Future Research Directions: We will discuss novel applications of Scenic, including to mixed reality, and integrations of Scenic with large language models

Scenic:

An Open-Source Probabilistic Programming System for Data Generation and Safety in AI-Based Autonomy

CVPR 2024 Tutorial

Abstract

Flexible Scenario / Data Generation Across Different Domains

Speakers

Sanjit Seshia

Daniel Fremont

Edward Kim

Organizers

Edward Kim

Jinkyu Kim

Necmiye Ozay

Parasara Sridhar Duggirala

Hazem Torfah

Kimin Lee

Marcell Vazquez-Chanlatte

Overview

Details & Schedule

Time: June 17th, 9:00 AM-12:00 PM