Droids given ‘one shot’ lessons to stack blocks
In the demonstration, a human wears a virtual reality headset to stack a series of colored blocks in an imaginary world. The robot then copies what the person did in the VR simulation after seeing it once, creating a tower of blocks in the same order. The software that the robot learns from is split into two neural networks: one for vision and the other for imitation.
First, the vision component takes an input from the robot’s camera to gauge the positions of the different objects. It takes a lot of training to achieve this. Hundreds and thousands of simulated images of the objects – in various configurations with different lighting and textures – are shown to the robot.
Second, the imitation network processes the simulated demonstration to predict what steps need to be taken to replicate the actions the robot has seen. The blocks aren’t necessarily in the same starting positions seen in the demo, meaning that the robot has to generalize and perform the task in a new setting.
It may be trivial for humans, but it’s challenging for robots. Thousands of training examples need to be fed into the network for each task. It learns by tracking the full set of trajectories of the robot arm for a complete task, and looking at a single trajectory from a second demo of the same task under a slightly different environment.
The one-shot imitation learning algorithm learns to predict what actions were taken to produce the result seen in the second demonstration. It does this by examining all the movements taken in the first video. It learns the similarities between both examples, even if the demos are identically laid out.
After enough training, the robot can learn to imitate the human demonstrator in VR even though it hasn’t encountered the exact same task during training.
A process called “soft attention” makes the imitation network focus on the steps taken using the relevant block in the block stacking challenge, as well as keeping track of the locations of all the other blocks.
OpenAI says this allows the robot to adapt to “work with demonstrations of variable length,” “imitate longer trajectories,” and “stack blocks into a configuration that has more blocks than any demonstration in its training data.”
To learn how to mitigate any potential mistakes during the robot’s imitation stage, it has to learn what problems it might face first.
Researchers did this by injecting noise into the “scripted policy,” a strategy that teaches the robot how to stack the blocks in order. It could then learn how to recover when things went wrong. It’s a critical step – “without injecting the noise, the policy learned by the imitation network would usually fail to complete the stacking task,” OpenAI explained in a blog post.
The researchers would like to explore how the robot would behave with household items instead of square blocks. The goal is to eventually build a general-purpose robot that can help around the house and perform chores such as setting chairs around a table.
At the moment, the robot can stack block in “tens of seconds,” Wojciech Zaremba, a co-founder and researcher at OpenAI, told The Register. “We’re running it slowly because it makes it easier to work with and around. We could easily run it a few times faster, but more research is needed to reach human performance.” ®