Nano Lie Theory For Roboticist (Work in Progress)

A Tutorial on Micro Lie theory with simple examples

Author:
Benjamin Wong

Contents


Background
Primer
Motivating Examples
Manifold
Exp and Log
\(\oplus\) and \(\ominus\)
References

Background


This tutorial provides some informal interpretation of A micro Lie theory for state estimation in robotics. Here, we emphasize on application and we will provide various examples for better understanding. You are strongly recommended to read the original paper before proceeding. The notations used in this article will be consistent to the paper.

In short, we want to use a tiny portion of Lie theory to handle representation of rotation in both control and state estimation. By transforming between the tangent space and manifold, we can effectively parameterize coordinate transformations; handle symmetries; define uncertainty; and define derivatives.

Primer


This section briefly explains the undergraduate level basics of rigid transformation before introducing Lie theory. Those who are familiar can skip to the next section

Transformation Matrix

A matrix can be think of as an operator that moves a point into a different point in the same space. These points are represented in the form of a vector. Each entry of the vector represents the contribution of each standard basis vector.

\(v =\begin{bmatrix} c_1 \\ c_2 \\ c_3 \end{bmatrix} = c_1e_1+c_2e_2+c_3e_3 \)

The columns of a transformation matrix tells us where each basis gets mapped to

\(\begin{bmatrix} \mid & \mid & \mid \\ b_1 & b_2 & b_3\\ \mid & \mid & \mid \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} = b_1 \), \(\begin{bmatrix} \mid & \mid & \mid \\ b_1 & b_2 & b_3\\ \mid & \mid & \mid \end{bmatrix} \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} = b_2 \), \(\begin{bmatrix} \mid & \mid & \mid \\ b_1 & b_2 & b_3\\ \mid & \mid & \mid \end{bmatrix} \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix} = b_3 \)

Combining the two, the matrix transformation can be think of applying the weighted sum to the transformed basis.

\(\begin{bmatrix} \mid & \mid & \mid \\ v_1 & v_2 & v_3\\ \mid & \mid & \mid \end{bmatrix} \begin{bmatrix} c_1 \\ c_2 \\ c_3 \end{bmatrix} = v_1c_1+v_2c_2+v_3c_3 \)

All matrix transformation can be think of as a combination of the following 5 geometric operations:
1. scale 2.rotate 3.shear 4.reflect 5.projection

Rotation Matrix

In 2 dimensions, the rotation matrix is often introduced as

\(R = \begin{bmatrix} \cos\theta & -\sin\theta\\ \sin\theta & \cos\theta \end{bmatrix} \)

because it has an intuitive geometric interpretation where \(\theta\) is the counter-clockwise angle of rotation. Later we will show how generalizing this form into higher dimension will be problemetic. It is more useful to define it by its fundamental proeprties. For the matrix to be non-scaling and non-shearing, the column vectors of the matrix has to be orthogonormal i.e.

\(R = \begin{bmatrix} \mid & \mid &\\ v_1 & v_2 & \cdots \\ \mid & \mid & \end{bmatrix} \)

\( v_{i}\cdot v_{i} = 1 \)

\( v_{i}\cdot v_{j} = 0 \text{ if } i\neq j \)

Or more compactly,

\(R^TR = RR^T = I \)

In addition, to exclude reflection, the determinant of the matrix has to be positive 1

\( \text{det}(R) = 1\)

Side note, this means that the matrix has even numbers of negative eigenvalues, which also means that any pair of reflection forms a rotation

Rigid Transformation Matrix

A rigid transformation is a combination of rotation and translation, both preserve the shape of the set of points after the transformation. It can be written in the form of

\( x_2 = Rx_1 + t \)

where \(R\) is the rotaion matrix, \(x\) is point, and \(t\) is the translation. Translation is not a linear operation because the zero element is not preserved. However it is an affine operation, which we can augment to a linear form by introducing the homogenous coordinate.

\( \begin{bmatrix} x_2 \\1 \end{bmatrix}= \begin{bmatrix}R & t\\ 0 & 1\end{bmatrix}\begin{bmatrix} x_1 \\1 \end{bmatrix} \)

Essentially, the translation is performed by lifting the point to the homogenous coordinate, and then perform a shearing transformation on an extra dimention.

Reference Frame

Instead of transforming vectors, the transformation matrix can be treated as a transformation between reference frame. The column vectors is the basis of the observed frame in the global coordinate. This can be used to represent a sequence of robot's poses over time, or a tree of relative position of the links of a robot arm.

More importantly, applying the transformation matrix to a vector can be considered a change of basis from the observed frame to the global frame.

Motivating Examples


If you are wonder why do we bother learning Lie theory, this section will give you a glimpse on the trouble we face when dealing with rotations. If you are already struggling with rotation, you can skip this part promptly.

Example 1: Planar Rotation

For the first example, consider a simple proportional control on 2D rotation \(SO(2)\). We have a reference angle \(\theta_{ref}\) we wish the robot to rotate to; and the current angle of the robot \(\theta_{t} \). we implement a naive linear proportional controller

\( u_{t} = K(\theta_{ref} - \theta_{t}) \)

\( \theta_{t+1} = \theta_{t} + u_{t} \)

lets say \(\theta_{ref} = \pi/4\), \(\theta_{0} = 0\), \(K=0.05\)

this worked as expected.

Now try \(\theta_{ref} = \frac{3}{2}\pi\)

Not the smartest way to do it, it's a lot more efficient if the robot rotates clockwise to the reference pose

Even worst, see \(\theta_{ref} = 2\pi\), we know \(2\pi\) is exactly the same as \(0\) for rotation, but for our controller:

the naive controller has no idea about the cyclic nature of rotation, it's simply trying to drive \(0\) toward \(2\pi\) on the real number line, worse, any number larger than \(2\pi\) causes a huge difference and causes a proportionally large input to the robot. The same problem exists in state estimation. Consider two sensors measuring an angle, one measured \(5^{\circ}\), another measured \(355^{\circ}\), our intuition tells us the mean should be \(0\), because both are 5 degrees away from 0, one from each sides. But if we naively get the average between the two numbers, we get \(180^{\circ}\), the exact opposite of what we want.

Example 2: 3D rotation

The 2D rotation can be handled by some smart combinations of modulus and min, but things get more complicated in 3D. Lets say we represent the rotation in a matrix form

\(R_{1}, R_{2} \in SO(3) \)

\(SO\) stands for special orthogonal group, \((3)\) denotes it is in dimension 3. Matrices in \(SO(3)\) are \(3\times 3\). For a valid rotation, the columns of the matrix has to be orthogonal to each other; all columns are normalized, and the determinant of the matrix is \(+1\), Plainly speaking, the matrix does not stretch, skew, or reflect any vector when it is applied to the vector. Now we want to get an average between two rotation matrices \(R_1, R_2\). We can naively do that arithmetically

\( \bar{R} = \frac{1}{2}(R_{1}+R_{2}\))

The problem is that the result is not guaranteed to be a valid rotation matrix. This problem is demonstrated by this youtube video.

Rotation in 3D has 3 degrees of freedom. With this, we can decompose the rotation matrix into 3 2D rotation matrices, each corresponds to rotation around the x, y, z axes. These matrices are parameterized by the angle of rotation row, pitch ,and yaw respectively. This is called the Euler angles representation. Doing arithmetic in euler angles guarantees us to maintain valid rotation. Euler angles are notoriously problematic. For instance, gimbal lock, where the matrix loses a degree of freedom due to two of the rotation axes lining up. And more relevantly, interpolating two sets of euler angles does not create the most efficient path between the two poses, as shown in this video. This means that when we fuse two sources of measurements, the weighted average can have a pose that is completely different from any of the two measurements.

Lie theory provides us a unified method to tackle all these problems at once, regardless of the dimension, and extends beyond rotations.

Manifold


The most important concepts we need to understand in order to apply Lie theory are manifold and tangent space. Formally, a manifold is a set of elements that satisfies

The above properties defined a group . In addition, if the group locally resembles Euclidean space, that is, the local neighborhood of any point can be continuously mapped to \(\mathbb{R}^n\) without creating holes, and the mapping is bijective, then we call it a manifold. A lie group is a manifold that is differentiable, namely, there are no sharp corners on the manifold.

It is helpful to think of a hypersurface embedded in a higher dimensional space in order to "visualize" the operations. For example, unit complex number living in a 2D plane \(x+yi\), but constrainted by the relation \(x^2+y^2=1\), making it a 1D manifold (a circle). Similarly 2D rotation matrices are 2x2 matrix \(\mathbb{R}^{2\times 2}\), but with only 1 degree of freedom, making it a 1D surface embedded in \(\mathbb{R}^{2\times 2}\) space. Refer back to figure 1 of micro Lie theory for better visualization.

A tangent space is a linear hyper plane tangents to any point on the manifold. The tangent space at the identity is called Lie algebra. The tangent space is useful because it is a vector space, with all vectors corresponds to a point on the manifold, and the dimension of the vector equal to the degrees of freedom of the manifold. With that, we can perform meaningful With that, we can perform meaningful operations like addition, subtraction, and calculus, in their usual vector definition. In robotic context, the Lie algebra can be think of as the global coordinate; and the tangent spaces are the local coordinates anywhere in the workspace. This is useful because any measurements the robot takes, or any action the robot performs are, most of the time, local to the robot's frame at the time of execution instead of the global frame.

Example:


\(\theta\) is the tangent space of the Lie group \(SO(2)\),

\(\begin{bmatrix} \cos(\theta) & -\sin(\theta)\\ \sin(\theta) & \cos(\theta) \end{bmatrix} \rightarrow \theta \)

unlike the rotation matrices themselves, we can add and subtract the angles

\( \theta'=\theta+d\theta\)

and we can put the result back to the manifold

\( \theta' \rightarrow \begin{bmatrix} \cos(\theta') & -\sin(\theta')\\ \sin(\theta') & \cos(\theta') \end{bmatrix} \)

Note that the tangent vector can be extended to infinity, but the manifold representation is bounded. This can be visualized by wrapping the vector around the circular manifold. The vector wraps around on the surface and never escape. Using this property, we can handle the rotational symmetry by wrapping an angle on to the manifold, and find the tangent space representation of the corresponding element. This gives us the angle wrapping formula

\(\theta = \text{atan2}(\text{sin}(\theta) ,\text{cos}(\theta)) \)

The new \(\theta\) will always be in the range \((-\pi, \pi]\)

Exp and Log


These transformation between the manifold and the tangent space are called the log and exp operators. The log(\(\cdot\)) map takes a vector on the tangent space and "wrap" it onto the manifold, returning the group representation. The line formed on the surface is called the geodesic, which is the shortest path between the element at the origin and the target element returned by the log map. The exp(\(\cdot\)) map takes a group element and unwrap it on to the Lie algebra.

The exp operator and log operator of rotation matrices are the matrix exponential and matrix logarithm. For our \(SO(2)\) example,

\( \exp\left( \begin{bmatrix} 0 & -\theta\\ \theta &0 \end{bmatrix} \right) = \begin{bmatrix} \cos(\theta) & -\sin(\theta)\\ \sin(\theta) & \cos(\theta) \end{bmatrix} \)

\( \log\left( \begin{bmatrix} \cos(\theta) & -\sin(\theta)\\ \sin(\theta) & \cos(\theta) \end{bmatrix} \right) = \begin{bmatrix} 0 & -\theta\\ \theta &0 \end{bmatrix} \)

The matrix log does not return \(\theta\) directly as promised. While the tangent space is a vector space, the elements are not necessarily a cartesian vector. However, they can be represented by a cartesian vector as linear combination of the base elements of the space. For this example, since 2D rotation has 1 degree of freedom, there is only one basis

\( \begin{bmatrix} 0 & -\theta\\ \theta &0 \end{bmatrix} = \theta \begin{bmatrix} 0 & -1\\ 1 &0 \end{bmatrix}=\theta \mathbf{e} \)

This is more clear in 3D rotation.

\( \log\left( \mathbf{R} \right) = \begin{bmatrix} 0 & -\omega_z & \omega_y\\ \omega_z & 0 & -\omega_x\\ -\omega_y & \omega_x & 0 \end{bmatrix} = \omega_x\begin{bmatrix} 0 &0 & 0\\ 0 & 0 & -1\\ 0 & 1 & 0 \end{bmatrix} + \omega_y\begin{bmatrix} 0 &0 & 1\\ 0 & 0 & 0\\ -1 & 0 & 0 \end{bmatrix} + \omega_z\begin{bmatrix} 0 & -1 & 0\\ 1 & 0 & 0\\ 0 & 0 & 0 \end{bmatrix}, \quad \mathbf{R} \in SO(3)\)

The matrix

\([\omega]_{\times} =\begin{bmatrix} 0 & -\omega_z & \omega_y\\ \omega_z & 0 & -\omega_x\\ -\omega_y & \omega_x & 0 \end{bmatrix}\)

,

is called the cross-product matrix. It's the Lie algebra of \(SO(2)\), denoted by \(\mathfrak{so}(3)\), The cartesian vector

\( \mathbb{\omega} = \begin{bmatrix} \omega_x\\ \omega_y\\ \omega_z \end{bmatrix} \)

is the axis-angle representation of the rotation, where the direction of the vector is the axis of rotation, and the magnitude of the vector is the angle of rotation. Sometimes explicitly written as \(\theta \hat{u} \)

The operation of turning a tangent space representation to a cartesian vector is called Vee , denoted by \((\cdot)^\vee\). The inverse is hat, denoted by \((\cdot)^\wedge\). A generic cartesian representation is denoted as \(\tau\), so the tangent space representation is \( \tau^\wedge \), and \((\tau^\wedge)^\vee = \tau\). For the \(SO(3)\) example,

\( \omega^\wedge = [\omega]_\times \)

\( ([\omega]_\times)^\vee = \omega \)

In general, we want to operate on the cartesian representation of the tangent space. So the exp and log, vee and hat are chained together to skip the non-cartesian representation, namely, the capitalized \(\text{Log}(\cdot)\) and \(\text{Exp}(\cdot)\). Where

\( \text{Exp}(\tau) \doteq \exp(\tau^\wedge) \)

\( \text{Log}(\mathcal{X}) \doteq \log(\mathcal{X})^\vee \)

for example, in \(SO(2)\)

\( \text{Log}(R) = \text{arctan2}(r_{21},r_{11}) \)

\( \text{Exp}(\theta) =\begin{bmatrix} \cos(\theta) & -\sin(\theta)\\ \sin(\theta) & \cos(\theta) \end{bmatrix} \)

\(\oplus\) and \(\ominus\)


The \(\oplus\) and \(\ominus\) operators allow us to perform increments and finding difference on the elements of the manifold directly. In robotic context, the goal is to keep all the state representation on the manifold, and incorporate vector information such as input and observation.

\( \text{Exp}(\theta) =\begin{bmatrix} \cos(\theta) & -\sin(\theta)\\ \sin(\theta) & \cos(\theta) \end{bmatrix} \)

References


The Banana Distribution Is Gaussian: A Localization Study with Exponential Coordinates. In: Robotics. MIT Press; 2013. doi:10.7551/mitpress/9816.003.0039