reinforcement_learning-1

Posted on 2020-03-16 In Papers Views:

End-to-End Robotic Reinforcement Learning without Reward Engineering

Motivation: Manual-engineered reward defeats the purpose of end-to-end learning
Idea:

Human periodically labels the queries, used to supervise reward training
Train a classifier to predict reward, based on high-dimensional input (pixels)
Use RL to provide negative samples for step2 and RL uses the reward from step 2.