For autonomous vehicles to safely share the road with human drivers, autonomous vehicles must abide by specific "road rules" that human drivers have agreed to follow. "Road rules" include rules that drivers are required to follow by law – such as the requirement that vehicles stop at red lights – as well as more subtle social rules – such as the implicit designation of fast lanes on the highway. In this paper, we provide empirical evidence that suggests that – instead of hard-coding road rules into self-driving algorithms – a scalable alternative may be to design multi-agent environments in which road rules emerge as optimal solutions to the problem of maximizing traffic flow. We analyze what ingredients in driving environments cause the emergence of these road rules and find that two crucial factors are noisy perception and agents’ spatial density. We provide qualitative and quantitative evidence of the emergence of seven social driving behaviors, ranging from obeying traffic signals to following lanes, all of which emerge from training agents to drive quickly to destinations without colliding. Our results add empirical support for the social road rules that countries worldwide have agreed on for safe, efficient driving.
Paper
Overview
Our main contributions in this paper are:
-
We define a multi-agent driving environment in which agents equipped with noisy LiDAR sensors are rewarded for reaching a given destination as quickly as possible without colliding with other agents and show that agents trained in this environment learn road rules thatmimic road rules common in human driving systems.
-
We analyze what choices in the definition of the MDP lead to the emergence of these roadrules and find that the most important factors are perception noise and the spatial density ofagents in the driving environment.
-
We release a suite of 2D driving environments with the intention of stimulating interest within the MARL community to solve fundamental self-driving problems
Emergent Social Driving Rules
1. Stopping at a Traffic Signal
In a 4-way intersection, agents learn to obey traffic signals to safely navigate to the opposite road in minimum time.
Note that the agents merely observe a ternary value representing the traffic light’s state, not color. To make the visualizations, we visually inspect rollouts for each converged policy to find a permutation of the ternary states that align with human red/yellow/green traffic light conventions
Lidar Noise = 0%
Lidar Noise = 25%
Lidar Noise = 50%
Lidar Noise = 75%
Transfer from Synthetic Map to a Real World Map
In this experiment we show that policies trained on the synthetic intersection above transfer to real-world intersections found in the nuScenes dataset.
2. Emergence of Lanes
When the agents are trained in an environment contraining 4 agents they follow a consistent lane till the time they cross the intersection. This is a consequence of the starting configuration which prevents agents from colliding with other agents after having crossed the intersection.
Lidar Noise = 25%
Lidar Noise = 50%
Lidar Noise = 75%
Lidar Noise = 0%
Lidar Noise = 100%
When the number of agents during training are increased, the agents tend to follow the lanes consistently till they reach their destination. Additionally in case of 8 agents, we see the formation of multi-lane tracks. Qualitatively we can see that agents starting from the left side of the road tend to take the lane closer to the center and the ones starting on the right side take the extreme right lane. This additionally allows a smoother traffic flow.
Number of Training Agents = 4
Number of Training Agents = 8
Number of Training Agents = 8
Spatial Locations of the Agents when trained on 8 agent environment
3. Fast Lanes on a Highway
We observe that depending on how fast the agents are they choose to travel along the left hand side or right hand side of the road. This is similar to fast lanes which are present on the highways. (Darker shades denote faster cars)
4. Stopping at a Crosswalk
The agents detect the pedestrians (small green boxes) walking along the crosswalk, and slow down once they approach the crosswalk.
5. Communication
We denote the signals sent by the agents with the colors of the agents. In the left video where there is no perception noise, the agents' signals are not correlated with their actions/heading. In the right hand side video, we do see that agents that turn right, tend to be colored black while the ones turning left / going straight are colored white.
Lidar Noise = 0%
Lidar Noise = 25%
Lidar Noise = 50%
Lidar Noise = 100%
6. Rollouts on nuScenes
In these experiments, we attempt to show that agents learn to maintain a minimum distance between themselves as a function of their relative velocity. Additionally we observe the emergence of right of way where the agent which arrives first at the intersection gets to leave it first.