Idea:
- Action from the autonomous car will affect human responses and these could be leveraged for planning.
- Approximate human as optimal driver, with a reward function acquired through inverse reinforcement learning.
Motivation:
- Current autonomous cars are defensive
- Plan more efficient and communicative behaviors for autonomous cars.
Assumptions
- Two car system.
Method
The robot will use MPC at every iteration by computing a finite sequence of actions to maximize its reward and then execute the first one. (Eqn 5).
Compute u^{*}_H by optimizing the following:
Learn r_h with inverse reinforcement learning (separate optimization process that maximizes probability of demonstrations).
Implementations
Use theano to compute jacobian and hessian symbolically and use L-BFGS to optimize Eqn 5. (code: https://github.com/dsadigh/driving-interactions/blob/master/utils.py)