Breadth vs Depth | Khai Nguyen

This is my project within the course 16-813: Introduction to Robot Learning at Carnegie Mellon University in Fall 2023 semester.

Breadth vs Depth: Benchmarking Generalist and Specialist Policies in Robot Agility Learning | Python, PyTorch, IssacGym, Unitree Go1 |

We introduce a benchmark for learning robot agility on low-priced robots. Our study assesses both specialist and generalist policies, enabling robots to (a) traverse challenging terrain by (b) walking robustly, (c, d) climbing up and down high obstacles, (e) leaping over long gaps, and (f) crossing narrow beams only using one-side legs.

Abstract—The motivation for our project stems from the growing interest and importance of agile robots in various fields, including entertainment, search and rescue operations, and everyday tasks. As robots become more integrated into human environments, their ability to navigate complex and dynamic environments becomes crucial. Understanding whether a generalist or specialist approach is more effective in robot agility has direct implications for real-world applications. For instance, a generalist robot may be more versatile in handling different types of terrain commonly found in various environments, while a specialist robot may excel in specific scenarios. Knowing which approach is more effective can guide the development of robots for specific tasks or environments. By addressing these motivations, our project can contribute valuable insights to the field of robotics and help advance athletic intelligence that enables navigating and interacting with complex environments effectively.

Stay tuned for more updates!

Go1 robot is able to traverse challenging terrain: steps, gaps, narrow beams

Approaches

Our research lays the foundation for a comprehensive benchmark in evaluating diverse learning-based approaches for robot agility. We introduce three distinct baselines: i) specialist policies that acquire individual skills through on-policy reinforcement learning; ii) a hierarchical structure featuring a policy/skill selector that learns to choose the appropriate behavior among these specialists; and iii) a true generalist policy that simultaneously learns to handle all tasks. Additionally, we explore the policy/skill selector’s performance under two training paradigms: on-policy reinforcement learning and imitation learning based on an oracle policy.

We seek to answer these questions:

Does a specialist generalize to new terrains, e.g., can the robot traverse gap terrains with learned climbing skills?
Does the generalist benefit from learning multiple skills, e.g., can climbing and leaping skills help each other while learned simultaneously?
Does the hierachical structure outperform the generalist, e.g., should we instead leverage modular approaches – a high-level task/skill selector and low-level skills?

The hierachical structure features a policy/skill selector that learns to choose the appropriate behavior among these specialists. This can be trained via on-policy reinforcement learning or imitation learning based on an oracle policy.