Deep Reinforcement Learning

USC Media Coverage of Our Work!

USC Robotics Open House

Our demo in exhibition for USC Robotics Open House. My colleagues and I presented “Robust Grasping via Human Adversarial” to visitors and explained the motivation behind the algorithm. Our demo is implemented in customized simulation environment based on physics engine mujoco and supports real-time human interactions. During the day, we give users the opportunity to apply perturbations to objects via keyboards and mouse, and we show that the manipulator’s grasping skill as well as robustness increases over time.

Robust Grasping with Adversary

In the context of reinforcement learning, Mujoco + gym is more popular than Gazebo in research that involve robotics. However, mujoco-py released by OpenAI doesn’t provide full flexibility compared to original Mujoco C++ API. In a recent work of mine, I upgraded mujoco-py==1.5.0 that supports: Interactive manipulation as provided by simulate in Mujoco, written in Cython Force visualization similar to deepmind-control but allows for headless rendering The code is available at https://github.

Review DDPG

Deterministic policy gradient is a variation of A2C, but is off-policy. In A2C, the actor estimates the stochastic policy, either in the form of probability distribute over discrete actions or, the parameters fo normal distribution. DPG also belong to the A2C family, but its policy is deterministic. This makes it possible to apply the chain rule to maximize the Q-value. DPG has to components. First is the actor. In a continuous action domain, every action is a number, so the actor network will take the state as input and output N values, one for each action.

Language guided visual 3D indoor navigation

Group Seminar Presentation on RL introduction