Linear Operators
Now let us look at how we can start putting together an artificial neuron in our new setting. We have an input manifold [math]M[/math] and an output manifold [math]N[/math] that are both homogeneous spaces of a Lie group [math]G[/math]. Our input data is a function on [math]M[/math], say [math]f \in X = \iident{B}(M)[/math] and we are expected to output a function on [math]N[/math], say an element of [math]Y = \iident{B}(N)[/math]. Recall that the set of bounded functions [math]\iident{B}(M)[/math] is a Banach space under the supremum norm (a.k.a. the [math]\infty[/math]-norm or the uniform norm) given by [math]\| f \|_\infty := \sup_{p \in M} |f(p)|[/math]. The first part of a discrete artificial neuron was a linear operator [math]A:\mathbb{R}^n \to \mathbb{R}^m[/math], given by:
Its analogue in the continuous setting is an integral operator [math]A:\mathbb{R}^M \to \mathbb{R}^N[/math] of the form:
where the function [math]k_A:M \times N \to \mathbb{R}[/math] is called the operator's kernel.
Technically for the Lebesgue integral in \ref{eq:integral_operator} to exist the integrand needs to be measurable. We will not be dealing with non-measurable functions and you may assume that when we say function we mean measurable function. If the concept of measurable functions is new to you, you may ignore the issue.
In this framework, instead of training the matrix [math]A[/math], we will train the kernel [math]k_A[/math]. In practice we cannot train a continuous function so training the kernel will come down to either training a discretization or training the parameters of some parameterization of [math]k_A[/math]. Now we still need to specify how we are going to integrate on a homogeneous space to make progress.
Integration
Integration on [math]\mathbb{R}^n[/math] has the desirable property that it is translation invariant: for all [math]\boldsymbol{y} \in \mathbb{R}^n[/math] and integrable functions [math]f:\mathbb{R}^n \to \mathbb{R}[/math] we have
Ideally we would want integration on a homogeneous space [math]M[/math] of a Lie group [math]G[/math] to behave similarly, namely for all [math]g \in G[/math] we would like:
for some Radon measure [math]\mu_M[/math] on [math]M[/math].
Recall that measures are the generalization of concepts such as length, volume, mass, probability etc. A measure assigns a non-negative real number to subsets of a space in such a way that it behaves similarly to the aforementioned concepts. A Radon measure on a Hausdorff topological space is a measure that plays well with the topology of the space (defined for open and closed sets, finite on compact sets, etc.). The Lebesgue measure is the translation invariant Radon measure on [math]\mathbb{R}^n[/math] and coincides with our less general notion of the length/area/volume of subsets of [math]\mathbb{R}^n[/math]. Integration on [math]\mathbb{R}^n[/math] such as in \ref{eq:translation_invariant_integral} implicitly uses the Lebesgue measure and so is translation invariant. For a comprehensive introduction to measure theory see [1]. For the purpose of this course it is sufficient to think about a measure as measuring the volume of a subset.
This imposes a condition on the measure [math]\mu_M[/math], namely: for all measurable subsets [math]S[/math] of [math]M[/math] and [math]g \in G[/math] we require
In other words we would need a (non-zero) group invariant measure to get the desired integral. These G-invariant measures, or just invariant measures, do not always exist. In some cases we can still obtain a covariant measure, which is a measure that satisfies
where [math]\chi:G \to \mathbb{R}^+[/math] is a character of [math]G[/math].
A multiplicative character or linear character or simply character of a Lie group [math]G[/math] is a continuous homomorphism from the group to the multiplicative group of positive real numbers, i.e. [math]\chi:G \to \mathbb{R}_{ \gt 0}[/math] so that:
The function [math]\chi[/math] needs to be a character since by \ref{eq:covariant_measure} we have:
for all [math]g_1,g_2 \in G[/math] and all measurable [math]S \subset M[/math].
If we integrate with respect to a G-invariant measure we say we have a G-invariant integral, or just an invariant integral, if the measure is covariant with a character [math]\chi[/math] we say we have a [math]\chi[/math]-covariant integral or just covariant integral.
Let [math]M[/math] be a homogeneous space of a Lie group [math]G[/math], we say the integral [math]\int_M \ldots dp[/math] (using some Radon measure on [math]M[/math]) is covariant with respect to [math]G[/math] if there exists a character [math]\chi_M[/math] of [math]G[/math] so that
Integration is always with respect to some measure. If we are integrating with respect to the measure [math]\mu[/math] then for the sake of completeness we should write
If the homogeneous space is [math]G[/math] itself then an invariant measure is called the (left) Haar measure on [math]G[/math] (named after the Hungarian mathematician Alfréd Haar). We can say the Haar measure since Haar measures are unique up to multiplication with a constant and always exist (see [2](Ch. 2.7)). Hence when integrating on the group itself we can always have a Haar measure [math]\mu_G[/math] so that the following equality holds
where we abbreviated [math]dg := d\mu_G(g)[/math]. We also call this invariant integral on the group the (left) Haar integral. Not all homogeneous spaces admit a covariant integral but those in which we are interested all do. Going forward we will assume that all homogeneous spaces that we consider admit a covariant integral and that we can always use the equality from Definition definition.
Example [[math]G=\iident{SE}(2)[/math] and [math]M=\mathbb{R}^2[/math]] In the case we are most interested in, namely [math]G=\iident{SE}(2)[/math] and [math]M=\mathbb{R}^2[/math], we are fortunate that the Lebesgue measure on [math]\mathbb{R}^2[/math] is invariant with respect to [math]G[/math]. This is intuitively easy to understand: the area of a subset of [math]\mathbb{R}^2[/math] is invariant under both translation and rotation.
Example [Haar measure on [math]\iident{SE}(2)[/math]] The Haar measure on [math]\iident{SE}(2)[/math] also conveniently coincides with the Lebesgue measure on [math]\mathbb{R}^2 \times [0,2\pi)[/math] when using the parameterization from Example example. Indeed, let [math]g=(\boldsymbol{x}_1,\theta_1)[/math] and [math]h=(\boldsymbol{x}_2,\theta_2)[/math] then:
When we change variables to [math](\boldsymbol{x}_3,\theta_3)=(\boldsymbol{x}_1,\theta_1)^{-1} (\boldsymbol{x}_2,\theta_2)[/math] we obtain the following Jacobian matrix:
which has determinant [math]1[/math]. Consequently, the Haar integral (up to a multiplicative constant) on [math]\iident{SE}(2)[/math] can be calculated as:
Equivariant Linear Operators
Of course the objective of this chapter is building equivariant operators, so when is an integral operator \ref{eq:integral_operator} equivariant? Equivariance means that
for all [math]g \in G[/math] and [math]f \in \iident{B}(M)[/math] or equivalently
This extra condition on [math]A[/math] will naturally impose some restrictions on the kernel of the operator as the following lemma shows.
Let [math]M[/math] and [math]N[/math] be homogeneous spaces of a Lie group [math]G[/math] so that [math]M[/math] admits a covariant integral with character [math]\chi_M[/math]. Let [math]A[/math] be an integral operator \ref{eq:integral_operator} from [math]\iident{C}(M) \cap \iident{B}(M)[/math] to [math]\iident{C}(N) \cap \iident{B}(N)[/math] with a kernel [math]k_A \in \iident{C}(M \times N)[/math]. Then
Moreover [math]A[/math] is bounded (and so continuous) in the supremum norm if
\ \\[5pt] “[math]\Rightarrow[/math]” Assuming [math]A[/math] to be equivariant, take an arbitrary [math]g \in G[/math] and [math]f \in \iident{C}(M) \cap \iident{B}(M)[/math] and substitute the definition of the group representation and [math]A[/math] in \ref{eq:equivariant_A_2} to find
Fix [math]q \in N[/math] and let [math]F(p):=k_A(g \cdot p, g \cdot q) f(p)[/math] then observe that
Boundedness of [math]A[/math] follows from
The condition on the kernel \ref{eq:kernel_boundedness_requirement} is partially redundant with the symmetry requirement as the following lemma shows.
In the same setting as Lemma lemma. If the kernel [math]k_A \in \iident{C}(M \times N)[/math] satisfies the symmetry \ref{eq:equivariant_kernel_symmetry} and condition \ref{eq:kernel_boundedness_requirement} then
Since [math]N[/math] is a homogeneous space then for all [math]q_1,q_2 \in N[/math] there exists a [math]g \in G[/math] so that [math]q_1 = g \cdot q_2[/math], then
The condition on the kernel from Lemma lemma can be exploited to express it as a function on [math]M[/math] instead of [math]M \times N[/math]. If we fix a [math]q_0 \in N[/math] and for all [math]q \in N[/math] we choose a [math]g_q \in G_{q_0,q}[/math] (i.e. so that [math]g_q \cdot q_0=q[/math]) then by \ref{eq:equivariant_kernel_symmetry} we have
which fixes the second input of [math]k_A[/math]. Consequently we could contain all the information of our kernel in a function that exists only on [math]M[/math] as [math]\kappa_A(p) := k_A(p,q_0)[/math]. This reduced kernel [math]\kappa_A[/math] still has some restrictions placed on it for the resulting operator to be equivariant, as the following theorem makes precise.
Let [math]M[/math] and [math]N[/math] be homogeneous spaces of a Lie group [math]G[/math] so that [math]M[/math] admits a covariant integral with respect to a character [math]\chi_M[/math] of [math]G[/math]. Fix a [math]q_0 \in N[/math] and let [math]\kappa_A \in \iident{C}(M) \cap \iident{L}^1(M)[/math] be compatible, i.e. have the property that
Then the operator [math]A[/math] defined by
Conversely every equivariant integral operator with a kernel [math]k_A \in \iident{C}(M\times N)[/math] and with [math]k_A(\,\cdot\,,q) \in \iident{L}^1(M)[/math] for some [math]q \in N[/math] is of this form.
\ \\ “[math]\Rightarrow[/math]” Assuming we have a [math]\kappa_A \in \iident{C}(M) \cap \iident{L}^1(M)[/math] that satisfies \ref{eq:kernel_compatibility}. Define [math]k_A \in C(M \times N)[/math] by
“[math]\Leftarrow[/math]” \ \\ Assuming we have an equivariant linear operator [math]A[/math] with kernel [math]k_A \in \iident{C}(M \times N)[/math] then we pick a fixed [math]q_0 \in N[/math] and define [math]\kappa_A \in \iident{C}(M)[/math]
Theorem theorem is the at the core of group equivariant CNNs since it allows us to generalize the familiar convolution operation present in CNNs to general linear operators that are equivariant with respect to a group of choice.
Example [Group convolution]
Let [math]G=M=N[/math] be some Lie group. A Lie group always admits a Haar integral, so we have a trivial character [math]\chi=1[/math]. As reference element we obviously choose the unit element [math]e[/math], though any group element would do. Then [math]G_g = \{ e \}[/math] and [math]G_{e,g}=\{ g \}[/math] are both trivial. Hence we have no symmetry condition on the kernel. Any [math]\kappa_A \in C(G) \cap L^1(G)[/math] defines a linear operator [math]A: C(G) \cap B(G) \to C(G) \cap B(G)[/math] by
We also call this operation group cross-correlation and denote it as
As in the familiar [math]\mathbb{R}^n[/math] setting, group cross-correlation is closely related to group convolution, which is defined as
We leave relating the two kernels [math]\kappa[/math] and [math]\check{\kappa}[/math] as an exercise: when is [math]\kappa \star_G f = \check{\kappa} *_G f[/math]?
As in the [math]\mathbb{R}^n[/math] case, when we talk about group convolution we mean both group cross-correlation and group convolution since they are interchangeable.
Example [Rotation-translation equivariance in [math]\mathbb{R}^2[/math]]
Let [math]G=\iident{SE}(2)= \mathbb{R}^2 \rtimes \iident{SO}(2)[/math] and [math]M=N=\mathbb{R}^2[/math].
The Lebesgue measure on [math]\mathbb{R}^2[/math] is rotation-translation invariant so we have a G-invariant integral on [math]\mathbb{R}^2[/math].
Choose [math]\boldsymbol{y}_0=\boldsymbol{0}[/math] as the reference element then [math]G_{\boldsymbol{y}_0}=\left\{ (\boldsymbol{0},\, R(\theta)) \in G \ \middle\vert\ \theta \in [0,2\pi) \right\}[/math] is the stabilizer of [math]\boldsymbol{y}_0[/math].
A kernel [math]\kappa_A[/math] on [math]\mathbb{R}^2[/math] is then compatible if
i.e. [math]\kappa_A[/math] needs to be radially symmetric. Now, we could have figured that out without building up the whole equivariance framework. But the next section will show how we can use the equivariance framework to step over the severe restriction that is imposed on the allowable kernels here.
General references
Smets, Bart M. N. (2024). "Mathematics of Neural Networks". arXiv:2403.04807 [cs.LG].