Formalizing Human-Robot Mutual Adaptation: A Bounded Memory Model
Idea: the robot reasons how human may change its strategy, based on a model of human adaptation.
Motivation:
- Previous works do not use a model of human adaptation that can enable the robot to actively influence the actions of human.
- Previous works do not reason over the human adaptation throughout the interaction. Compare with Intention-Aware Motion Planning paper.
Assumptions
- Definition of m_r, m_h requires a specific collaborative task (rotating table as in paper).
Preliminaries
- f maps QxArxAh into modes M: {0,1}.
- Q=X_state x H_k at time step i is reprensented eventually as m_{h}^{i}, m_{r}^{i}, computed over a history of length k.
Method
Modelling (adaptable human): Human adaptability a is accounted for via transition function P: Q to P(Q). At state q, the human will choose action specified by m_r with probability a, or m_h with 1-a.
Planning (adaptable robot):
- S: X x Y; X: X_world x M^{k} x M^{k}. M is the mode of robot/human; Y is partial observable variable, adaptabiity.
- human policy $\pi_h$ is from BAM model. It outputs m_h or m_r.
- The belief is based on unobserable variable a. Therefore, MOMDP maps from V(Q, b(a)) to a_r. Then BAM models gives human action a_h. Then use Eqn.3 (below) to update the belief.