0%

reinforcement_learning-1

End-to-End Robotic Reinforcement Learning without Reward Engineering

Motivation: Manual-engineered reward defeats the purpose of end-to-end learning
Idea:

  1. Human periodically labels the queries, used to supervise reward training
  2. Train a classifier to predict reward, based on high-dimensional input (pixels)
  3. Use RL to provide negative samples for step2 and RL uses the reward from step 2.