Planning for Autononous Cars that Leverage Effects on Human Actions

Idea:

Action from the autonomous car will affect human responses and these could be leveraged for planning.
Approximate human as optimal driver, with a reward function acquired through inverse reinforcement learning.

Motivation:

Assumptions

Method

The robot will use MPC at every iteration by computing a finite sequence of actions to maximize its reward and then execute the first one. (Eqn 5).
Compute u^{*}_H by optimizing the following:
Learn r_h with inverse reinforcement learning (separate optimization process that maximizes probability of demonstrations).

Implementations
Use theano to compute jacobian and hessian symbolically and use L-BFGS to optimize Eqn 5. (code: https://github.com/dsadigh/driving-interactions/blob/master/utils.py)