Torque-limited simple pendulum: A toolkit for getting familiar with control algorithms in underactuated robotics

There are many, wildly different approaches to robotic control. Underactuated robots are systems for which it is not possible to dictate arbitrary accelerations to all joints. Hence, a controller cannot be used to override the system dynamics and force the system on a desired trajectory as it often is done in classical control techniques. A torque-limited pendulum is arguably the simplest underactuated robotic system and thus is a suitable system to study, test, and benchmark different controllers.


Summary
There are many, wildly different approaches to robotic control. Underactuated robots are systems for which it is not possible to dictate arbitrary accelerations to all joints. Hence, a controller cannot be used to override the system dynamics and force the system on a desired trajectory as it often is done in classical control techniques. A torque-limited pendulum is arguably the simplest underactuated robotic system and thus is a suitable system to study, test, and benchmark different controllers.
This repository describes the hardware (Computer-aided design (CAD) models, Bill Of Materials (BOM), etc.) required to build a physical pendulum system and provides the software (Unified Robot Description Format (URDF) models, simulation, and controller) to control it. It provides a setup for studying established and novel control methods, and targets students and researchers interested in studying underactuation.

Statement of need
This repository is designed to be used in education and research. It lowers the entry barrier for studying underactuation in real systems, which is often overlooked in conventional robotics courses. With this software package, students who want to learn about robotics, optimal control, or reinforcement learning can gain hands-on experience with hardware and software for robot control. The dualistic approach of describing software and hardware as well as the large spectrum of control methods are outstanding features of this package in comparison to similar software such as stable baselines (Raffin et al., 2021), open AI gym (Brockman et al., 2016), and Drake (Tedrake & the Drake Development Team, 2019). Results from real experiments are provided to ensure reproducibility and evaluate novel control methods.

Background
This project provides an easy accessible plant for the pendulum dynamics, which is built up from scratch and uses only standard libraries. The plant can be passed to a simulator object, which is capable of integrating the equations of motion and thus simulating the pendulum's motion forward in time. The simulator can perform Euler and Runge-Kutta integration and can also visualize the motion in a matplotlib animation. Furthermore, it is possible to interface a controller to the simulator that sends control a signal in form of a torque τ to the motor.
The pendulum has stable (downward configuration) and unstable (upward configuration) fixed points. A challenge from the control point of view is to swing the pendulum up to the unstable fixed point and stabilize the pendulum in that state, respecting the torque limitation. The pendulum ( Figure 1) is constructed by mounting a motor to a fixed frame, attaching a rod to the motor, and a weight to the other end of the rod. The motor used in this setup is the AK80-6 actuator from T-Motor (CubeMars, 2021), which is a quasi direct drive with a gear ratio of 6:1 and a peak torque of 12 Nm at the output shaft.

Electrical Setup
The schematic below ( Figure 2) displays the electrial setup of the testbench. A main PC is connected to a motor controller board (CubeMars_AK_V1.1, see CubeMars (2021)) mounted on the actuator. The communication takes place on a CAN bus with a maximum signal frequency of 1 Mbit/sec with the 'classical' CAN protocol. Furthermore, a USB to CAN interface is needed if the main PC doesn't feature a PCI CAN card. The actuator requires an input voltage of 24 Volts and consumes up to 24 Amps for peak torque. The power supply in our test setup is the EA-PS 9032-40 from Elektro-Automatik. The capacitor filters back EMF coming from the actuator and protects the power supply from high voltage peaks. An emergency stop button serves as additional safety measure.

Pendulum Dynamics
The motions of a pendulum are described by the following equation of motion: where • θ,θ,θ are the angular displacement, angular velocity and angular acceleration of the pendulum in counter-clockwise direction. θ = 0 means the pendulum is at its stable fixed point (i.e., hanging down). • I is the inertia of the pendulum. For a point mass: We provide a pendulum plant model that can be used for computing trajectories and policies without the actual hardware, simulating the execution of a controller, as well as simulating the system's response during real time control. Also, a system identification method is implemented that can reliably estimate the unknown pendulum parameters of the real setup.

Control Methods
The swing-up with a limited motor torque τ serves as a benchmark for various control algorithms. If the torque limit is set low enough, the pendulum is no longer able to simply go up to the unstable fixed point but instead, the pendulum has to swing and build up energy in the system. The control methods that are currently implemented in this library (see also Figure 3) can be grouped in four categories: Trajectory optimization tries to find a trajectory of control inputs and states that is feasible for the system while minimizing a cost function. The cost function can, for example, include terms that drive the system to a desired goal state and penalize the usage of high torques. The following trajectory optimization algorithms are implemented: • Direct Collocation (Hargraves & Paris, 1987) • Iterative Linear Quadratic Regulator (iLQR) (Weiwei & Todorov, 2004) • Feasibility driven Differential Dynamic Programming (FDDP) (Mastalli et al., 2020) The optimization is done with a simulation of the pendulum dynamics.
Reinforcement Learning (RL) can be used to learn a policy on the state space of the robot, which then can be used to control the robot. The simple pendulum can be formulated as a RL problem with two continuous inputs and one continuous output. Similar to the cost function in trajectory optimization, the policy is trained with a reward function. The following RL algorithms are implemented: • Soft Actor Critic (SAC) (Haarnoja et al., 2018) • Deep Deterministic Policy Gradient (DDPG) (Lillicrap et al., 2019) Both methods are model-free, i.e., they use the dynamics of the system as a black box. Currently, learning is possible in the simulation environment.
Trajectory-based Controllers act on a precomputed trajectory and ensure that the system follows the trajectory properly. The trajectory-based controllers implemented in this project are:

• Feedforward torque • Proportional-integral-derivative (PID) control • Time-varying Linear Quadratic Regulator (TVLQR) • Model Predictive Control (MPC) with iLQR
The Feedforward and PID controller are model-independent, while the TVLQR and iLQR MPC controllers utilize knowledge about the pendulum model. In contrast to the others, the iLQR MPC controller optimizes over a predefined horizon at every timestep.
Policy-based Controllers take the state of the system as input and output a control signal. In contrast to trajectory optimization, these controllers do not compute just a single trajectory. Instead, they react to the current state of the pendulum and because of this, they can cope with perturbations during the execution. The following policy-based controllers are implemented: • Energy Shaping • Linear Quadratic Regulator (LQR) • Gravity Compensation All of these controllers utilize model knowledge. Additionally, the control policies, obtained by one of the RL methods, fall in the category of policy-based control.
The implementations of direct collocation and TVLQR make use of Drake (Tedrake & the Drake Development Team, 2019), iLQR makes use of either the symbolic library of Drake or sympy, FDDP makes use of Crocoddyl (Mastalli et al., 2020), SAC uses stable-baselines3 (Raffin et al., 2021), and DDPG is implemented in TensorFlow (Abadi et al., 2016). The other methods use only standard libraries. This repository is designed to welcome contributions in form of novel optimization methods/controllers/learning algorithms to extend this list.
To get an understanding of the functionality of the implemented controllers, they can be visualized in the pendulum's state space. Example visualizations of the energy shaping controller and the policy learned with DDPG are shown in Figure 4.  The pendulum parameters (mass, length, friction, inertia) are modified without using this knowledge in the controller. • Reduced torque limit: The minimal torque limit with which the controller is still able to swing-up the pendulum.
The results shown in Figure 5 are the average of 100 repetitions for every controller and criterion. In the case of consistency, robustness, and insensitivity, the percentage refers to the ratio of successful swing-up motions of the 100 repetitions. Trajectory optimization (iLQR, direct collocation, ddp) produces smooth trajectories, which swing up the pendulum relatively quickly. But they do require a trajectory-following control loop (PID, TVLQR) to make them more consistent, robust, and insensitive. This can become a problem for large deviations from the nominal trajectory. RL policies perform well on consistency, robustness, insensitivity, and are able to perform fast swing-up motions. Their drawback is that their output can fluctuate, which can result in rougher motions. The model predictive iLQR controller has an overall good performance but has the disadvantage that is it comparatively slow due to the optimization at every timestep. The energy shaping plus LQR controller, despite its simplicity, shows very satisfying results in all benchmark categories.