End-to-End Robotic Reinforcement Learning without Reward Engineering
Motivation: Manual-engineered reward defeats the purpose of end-to-end learning
Idea:
- Human periodically labels the queries, used to supervise reward training
- Train a classifier to predict reward, based on high-dimensional input (pixels)
- Use RL to provide negative samples for step2 and RL uses the reward from step 2.