Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient
A Baxter robot learned to insert a cylinder, dangling from a string, into a tube using only camera images. I investigated here if we can learn a policy faster when using explorative actions generated from a trajectory optimizer. The optimization algorithm used a deep Q-network as objective function.
IROS 2019
Deep Reinforcement Learning
Optimization
Model-Based Exploration