Linear Operators

[math] \newcommand{\ul}{\mathbf} \newcommand{\symbf}{\bm} \newcommand\subsetap{\mathrel{\overset{\makebox[0pt]{\mbox{\normalfont\tiny\sffamily ap.}}}{\rule{0pt}{.8ex}\smash{\subset}}}} \newcommand{\rident}[1]{\mathrm{#1}} \newcommand{\iident}[1]{\mathit{#1}} \newcommand{\wip}{\emoji{construction}} \newcommand{\pointright}{\emoji{backhand-index-pointing-right-light-skin-tone}} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\Arg}{Arg} \DeclareMathOperator*{\Var}{Var} \DeclareMathOperator*{\dom}{dom} \DeclareMathOperator{\Div}{div} \DeclareMathOperator{\morph}{\scalebox{0.7}{\ensuremath\square}} \DeclareMathOperator*{\esssup}{ess\,sup} \DeclareMathOperator{\Int}{Int} \DeclareMathOperator{\Cl}{Cl} \DeclareMathOperator{\id}{id} \DeclareMathOperator{\diam}{diam} \DeclareMathOperator{\supp}{supp} \DeclareMathOperator{\arctantwo}{arctan2} \DeclareMathOperator{\relu}{ReLU} \newcommand{\mathds}{\mathbb}[/math]

This article was automatically generated from a tex file and may contain conversion errors. If permitted, you may login and edit this article to improve the conversion.

Now let us look at how we can start putting together an artificial neuron in our new setting. We have an input manifold [math]M[/math] and an output manifold [math]N[/math] that are both homogeneous spaces of a Lie group [math]G[/math]. Our input data is a function on [math]M[/math], say [math]f \in X = \iident{B}(M)[/math] and we are expected to output a function on [math]N[/math], say an element of [math]Y = \iident{B}(N)[/math]. Recall that the set of bounded functions [math]\iident{B}(M)[/math] is a Banach space under the supremum norm (a.k.a. the [math]\infty[/math]-norm or the uniform norm) given by [math]\| f \|_\infty := \sup_{p \in M} |f(p)|[/math]. The first part of a discrete artificial neuron was a linear operator [math]A:\mathbb{R}^n \to \mathbb{R}^m[/math], given by:

[[math]] \begin{equation*} \left( A \boldsymbol{x} \right)_i = \sum_{j} (A)_{ij} x_i. \end{equation*} [[/math]]

Its analogue in the continuous setting is an integral operator [math]A:\mathbb{R}^M \to \mathbb{R}^N[/math] of the form:

[[math]] \begin{equation} \label{eq:integral_operator} \left( A f \right)(q) = \int_M k_A (p,q) \, f(p) \, dp , \end{equation} [[/math]]

where the function [math]k_A:M \times N \to \mathbb{R}[/math] is called the operator's kernel.

[Measurable functions]

Technically for the Lebesgue integral in \ref{eq:integral_operator} to exist the integrand needs to be measurable. We will not be dealing with non-measurable functions and you may assume that when we say function we mean measurable function. If the concept of measurable functions is new to you, you may ignore the issue.

In this framework, instead of training the matrix [math]A[/math], we will train the kernel [math]k_A[/math]. In practice we cannot train a continuous function so training the kernel will come down to either training a discretization or training the parameters of some parameterization of [math]k_A[/math]. Now we still need to specify how we are going to integrate on a homogeneous space to make progress.


Integration

Integration on [math]\mathbb{R}^n[/math] has the desirable property that it is translation invariant: for all [math]\boldsymbol{y} \in \mathbb{R}^n[/math] and integrable functions [math]f:\mathbb{R}^n \to \mathbb{R}[/math] we have

[[math]] \begin{equation} \label{eq:translation_invariant_integral} \int_{\mathbb{R}^n} f(\boldsymbol{x}-\boldsymbol{y}) \, \d\boldsymbol{x} = \int_{\mathbb{R}^n} f(\boldsymbol{x}) \, d\boldsymbol{x} . \end{equation} [[/math]]

Ideally we would want integration on a homogeneous space [math]M[/math] of a Lie group [math]G[/math] to behave similarly, namely for all [math]g \in G[/math] we would like:

[[math]] \begin{equation} \label{eq:group_invariant_integral} \int_M \left( g \cdot f \right)(p) \, d\mu_M(p) := \int_M f(g^{-1} \cdot p) \, d\mu_M(p) = \int_M f(p) \, d\mu_M(p), \end{equation} [[/math]]

for some Radon measure [math]\mu_M[/math] on [math]M[/math].

[Measures]

Recall that measures are the generalization of concepts such as length, volume, mass, probability etc. A measure assigns a non-negative real number to subsets of a space in such a way that it behaves similarly to the aforementioned concepts. A Radon measure on a Hausdorff topological space is a measure that plays well with the topology of the space (defined for open and closed sets, finite on compact sets, etc.). The Lebesgue measure is the translation invariant Radon measure on [math]\mathbb{R}^n[/math] and coincides with our less general notion of the length/area/volume of subsets of [math]\mathbb{R}^n[/math]. Integration on [math]\mathbb{R}^n[/math] such as in \ref{eq:translation_invariant_integral} implicitly uses the Lebesgue measure and so is translation invariant. For a comprehensive introduction to measure theory see [1]. For the purpose of this course it is sufficient to think about a measure as measuring the volume of a subset.

This imposes a condition on the measure [math]\mu_M[/math], namely: for all measurable subsets [math]S[/math] of [math]M[/math] and [math]g \in G[/math] we require

[[math]] \begin{equation} \label{eq:invariant_measure} \mu_M(g\cdot S)=\mu_M(S). \end{equation} [[/math]]

In other words we would need a (non-zero) group invariant measure to get the desired integral. These G-invariant measures, or just invariant measures, do not always exist. In some cases we can still obtain a covariant measure, which is a measure that satisfies

[[math]] \begin{equation} \label{eq:covariant_measure} \mu(g \cdot S)=\chi(g)\, \mu(S) , \end{equation} [[/math]]

where [math]\chi:G \to \mathbb{R}^+[/math] is a character of [math]G[/math].

Definition (Character)

A multiplicative character or linear character or simply character of a Lie group [math]G[/math] is a continuous homomorphism from the group to the multiplicative group of positive real numbers, i.e. [math]\chi:G \to \mathbb{R}_{ \gt 0}[/math] so that:

[[math]] \begin{equation*} \chi(g_1 g_2) = \chi(g_1)\, \chi(g_2) \qquad \forall g_1,g_2 \in G. \end{equation*} [[/math]]

The function [math]\chi[/math] needs to be a character since by \ref{eq:covariant_measure} we have:

[[math]] \begin{equation*} \chi(g_1 g_2)\, \mu(S) = \mu(g_1 g_2 \cdot S) = \mu(g_1 \cdot (g_2 \cdot S)) = \chi(g_1) \, \mu(g_2 \cdot S) = \chi(g_1) \, \chi(g_2) \, \mu(S) , \end{equation*} [[/math]]

for all [math]g_1,g_2 \in G[/math] and all measurable [math]S \subset M[/math].

If we integrate with respect to a G-invariant measure we say we have a G-invariant integral, or just an invariant integral, if the measure is covariant with a character [math]\chi[/math] we say we have a [math]\chi[/math]-covariant integral or just covariant integral.

Definition (Covariant integral)

Let [math]M[/math] be a homogeneous space of a Lie group [math]G[/math], we say the integral [math]\int_M \ldots dp[/math] (using some Radon measure on [math]M[/math]) is covariant with respect to [math]G[/math] if there exists a character [math]\chi_M[/math] of [math]G[/math] so that

[[math]] \begin{equation*} \int_M \left( g \cdot f \right) (p) \, dp = \chi_M(g) \, \int_M f(p) \, dp \end{equation*} [[/math]]
for all [math]g \in G[/math] and all [math]f:M \to \mathbb{R}[/math] for which the integral exists. In the special case that [math]\chi_M \equiv 1[/math] we say the integral is invariant.

[Abuse of notation]

Integration is always with respect to some measure. If we are integrating with respect to the measure [math]\mu[/math] then for the sake of completeness we should write

[[math]] \begin{equation*} \int_M \ldots\ d\mu(p) . \end{equation*} [[/math]]
But since we only ever consider one measure per space we integrate over and for the sake of brevity we abbreviate [math]dp \equiv d\mu(p)[/math].

If the homogeneous space is [math]G[/math] itself then an invariant measure is called the (left) Haar measure on [math]G[/math] (named after the Hungarian mathematician Alfréd Haar). We can say the Haar measure since Haar measures are unique up to multiplication with a constant and always exist (see [2](Ch. 2.7)). Hence when integrating on the group itself we can always have a Haar measure [math]\mu_G[/math] so that the following equality holds

[[math]] \begin{equation} \label{eq:invariant_group_integral} \int_G \left( h \cdot f \right)(g) \, dg = \int_G f(g) \, dg \qquad \forall h \in G , \end{equation} [[/math]]

where we abbreviated [math]dg := d\mu_G(g)[/math]. We also call this invariant integral on the group the (left) Haar integral. Not all homogeneous spaces admit a covariant integral but those in which we are interested all do. Going forward we will assume that all homogeneous spaces that we consider admit a covariant integral and that we can always use the equality from Definition definition.

Example [[math]G=\iident{SE}(2)[/math] and [math]M=\mathbb{R}^2[/math]] In the case we are most interested in, namely [math]G=\iident{SE}(2)[/math] and [math]M=\mathbb{R}^2[/math], we are fortunate that the Lebesgue measure on [math]\mathbb{R}^2[/math] is invariant with respect to [math]G[/math]. This is intuitively easy to understand: the area of a subset of [math]\mathbb{R}^2[/math] is invariant under both translation and rotation.

Example [Haar measure on [math]\iident{SE}(2)[/math]] The Haar measure on [math]\iident{SE}(2)[/math] also conveniently coincides with the Lebesgue measure on [math]\mathbb{R}^2 \times [0,2\pi)[/math] when using the parameterization from Example example. Indeed, let [math]g=(\boldsymbol{x}_1,\theta_1)[/math] and [math]h=(\boldsymbol{x}_2,\theta_2)[/math] then:

[[math]] \begin{equation*} \int_{\mathbb{R}^2} \int_{0}^{2\pi} \left( (\boldsymbol{x}_1,\theta_1) \cdot f \right) (\boldsymbol{x}_2,\theta_2) \, d\theta_2 \,d\boldsymbol{x}_2 = \int_{\mathbb{R}^2} \int_{0}^{2\pi} f \left( (\boldsymbol{x}_1,\theta_1)^{-1} (\boldsymbol{x}_2,\theta_2) \right) \, d\theta_2 \,d\boldsymbol{x}_2 . \end{equation*} [[/math]]

When we change variables to [math](\boldsymbol{x}_3,\theta_3)=(\boldsymbol{x}_1,\theta_1)^{-1} (\boldsymbol{x}_2,\theta_2)[/math] we obtain the following Jacobian matrix:

[[math]] \begin{equation*} \frac{\partial(x_2^1,x_2^2,\theta_2)}{\partial(x_3^1,x_3^2,\theta_3)} = \begin{pmatrix} \cos\theta_1 & -\sin\theta_1 & 0 \\ \sin\theta_1 & \cos\theta_1 & 0 \\ 0 & 0 & 1 \end{pmatrix} , \end{equation*} [[/math]]

which has determinant [math]1[/math]. Consequently, the Haar integral (up to a multiplicative constant) on [math]\iident{SE}(2)[/math] can be calculated as:

[[math]] \begin{equation} \int_{\iident{SE}(2)} f(g) \, dg = \int_{\mathbb{R}^2} \int_0^{2\pi} f(\boldsymbol{x},\theta) \, d\theta d\boldsymbol{x} . \end{equation} [[/math]]


Equivariant Linear Operators

Of course the objective of this chapter is building equivariant operators, so when is an integral operator \ref{eq:integral_operator} equivariant? Equivariance means that

[[math]] \begin{equation*} A (g \cdot f) = g\cdot(A f) \end{equation*} [[/math]]

for all [math]g \in G[/math] and [math]f \in \iident{B}(M)[/math] or equivalently

[[math]] \begin{equation} \label{eq:equivariant_A_2} g^{-1} \cdot A (g \cdot f) = A f . \end{equation} [[/math]]

This extra condition on [math]A[/math] will naturally impose some restrictions on the kernel of the operator as the following lemma shows.

Lemma (Equivariant linear operators)

Let [math]M[/math] and [math]N[/math] be homogeneous spaces of a Lie group [math]G[/math] so that [math]M[/math] admits a covariant integral with character [math]\chi_M[/math]. Let [math]A[/math] be an integral operator \ref{eq:integral_operator} from [math]\iident{C}(M) \cap \iident{B}(M)[/math] to [math]\iident{C}(N) \cap \iident{B}(N)[/math] with a kernel [math]k_A \in \iident{C}(M \times N)[/math]. Then

[[math]] \begin{equation*} A(g\cdot f) = g \cdot (A f) \end{equation*} [[/math]]
for all [math]g \in G[/math] and [math]f \in \iident{C}(M) \cap \iident{B}(M)[/math] if and only if

[[math]] \begin{equation} \label{eq:equivariant_kernel_symmetry} \chi_M(g) \, k_A(g \cdot p,g \cdot q) = k_A(p,q) \end{equation} [[/math]]
for all [math]g \in G[/math], [math]p \in M[/math] and [math]q \in N[/math].

Moreover [math]A[/math] is bounded (and so continuous) in the supremum norm if

[[math]] \begin{equation} \label{eq:kernel_boundedness_requirement} \sup_{q \in N} \int_{M} |k_A(p,q)| dp \lt \infty . \end{equation} [[/math]]


Show Proof

\ \\[5pt] “[math]\Rightarrow[/math]” Assuming [math]A[/math] to be equivariant, take an arbitrary [math]g \in G[/math] and [math]f \in \iident{C}(M) \cap \iident{B}(M)[/math] and substitute the definition of the group representation and [math]A[/math] in \ref{eq:equivariant_A_2} to find

[[math]] \begin{equation} \label{eq:ELO_1} \int_M k_A(p,g \cdot q) \, f (g^{-1} \cdot p) \, dp = \int_M k_A(p,q) \, f(p) \,dp \end{equation} [[/math]]
for all [math]q \in N[/math].

Fix [math]q \in N[/math] and let [math]F(p):=k_A(g \cdot p, g \cdot q) f(p)[/math] then observe that

[[math]] \begin{equation*} (g \cdot F)(p) = k_A(g \cdot g^{-1} \cdot p, g \cdot q) f(g^{-1} \cdot p) = k_A(p,g \cdot q) \, f (g^{-1} \cdot p) , \end{equation*} [[/math]]
which is the left integrand from \ref{eq:ELO_1}. Since we have assumed covariant integration we use Definition definition and have

[[math]] \begin{equation*} \int_M \left( g \cdot F \right) (p) \, dp = \chi_M(g) \, \int_M F(p) \, dp . \end{equation*} [[/math]]
Applying this to \ref{eq:ELO_1} we find

[[math]] \begin{equation} \label{eq:ELO_2} \chi_M(g) \int_M k_A(g \cdot p,g \cdot q) \, f (p) \, dp = \int_M k_A(p,q) \, f(p) \,dp . \end{equation} [[/math]]
Since [math]f[/math] was arbitrary and [math]p \mapsto k_A(p,q)[/math] continuous it follows that

[[math]] \begin{equation*} \chi_M(g) \, k_A(g \cdot p,g \cdot q) = k_A(p,q) \end{equation*} [[/math]]
for all [math]p \in M[/math]. “[math]\Leftarrow[/math]” Assuming [math]\chi_M(g) \, k_A(g \cdot p,g \cdot q)=k_A(p,q)[/math] for all [math]g \in G[/math], [math]p \in M[/math] and [math]q \in N[/math] then \ref{eq:ELO_2} follows for any choice of [math]f \in \iident{C}(M) \cap \iident{B}(M)[/math], [math]g \in G[/math] and [math]q \in N[/math]. Substituting the covariant integral the other way yields \ref{eq:ELO_1}, which implies \ref{eq:equivariant_A_2} since [math]q \in N[/math] is arbitrary. The function [math]f[/math] and group element [math]g[/math] were also chosen arbitrarily so \ref{eq:equivariant_A_2} follows for all [math]f \in \iident{C}(M) \cap \iident{B}(M)[/math] and [math]g \in G[/math].

Boundedness of [math]A[/math] follows from

[[math]] \begin{align*} \left\Vert A f \right\Vert_{\infty} &= \sup_{q \in N} \left| \int_M k_A(p,q) \, f(p) dp \right| \\ &\leq \sup_{q \in N} \int_M | k_A(p,q) | \, |f(p)| dp \\ &\leq \left\Vert f \right\Vert_\infty \cdot\ \sup_{q \in N} \int_M | k_A(p,q) | dp \\ & \overset{\ref{eq:kernel_boundedness_requirement}}{ \lt } \infty . \end{align*} [[/math]]

The condition on the kernel \ref{eq:kernel_boundedness_requirement} is partially redundant with the symmetry requirement as the following lemma shows.

Lemma

In the same setting as Lemma lemma. If the kernel [math]k_A \in \iident{C}(M \times N)[/math] satisfies the symmetry \ref{eq:equivariant_kernel_symmetry} and condition \ref{eq:kernel_boundedness_requirement} then

[[math]] \begin{equation*} \left\Vert k_A(\ \cdot\ ,q_1) \right\Vert_{L^1(M)} = \left\Vert k_A(\ \cdot\ ,q_2) \right\Vert_{L^1(M)} \end{equation*} [[/math]]
for all [math]q_1,q_2 \in N[/math].


Show Proof

Since [math]N[/math] is a homogeneous space then for all [math]q_1,q_2 \in N[/math] there exists a [math]g \in G[/math] so that [math]q_1 = g \cdot q_2[/math], then

[[math]] \begin{align*} \int_M \left| k_A(p,q_1) \right| dp &= \int_M \left| k_A(p,g \cdot q_2) \right| dp \\ &= \int_M \left| k_A(g \cdot g^{-1} \cdot p, g \cdot q_2) \right| dp \\ {\scriptsize \ref{eq:equivariant_kernel_symmetry}} &= \frac{1}{\chi_M(g)} \int_M \left| k_A(g^{-1} \cdot p, q_2) \right| dp \\ \text{\scriptsize (Def.~[[#def:covariant_integral |definition]])} &= \frac{\chi_M(g)}{\chi_M(g)} \int_M \left| k_A(p, q_2) \right| dp \\ &= \int_M \left| k_A(p,q_2) \right| dp . \end{align*} [[/math]]

The condition on the kernel from Lemma lemma can be exploited to express it as a function on [math]M[/math] instead of [math]M \times N[/math]. If we fix a [math]q_0 \in N[/math] and for all [math]q \in N[/math] we choose a [math]g_q \in G_{q_0,q}[/math] (i.e. so that [math]g_q \cdot q_0=q[/math]) then by \ref{eq:equivariant_kernel_symmetry} we have

[[math]] \begin{equation*} k_A(p,q) = \chi_M(g_q^{-1}) \ k_A(g_q^{-1} \cdot p, g_q^{-1} \cdot q) = \chi_M(g_q^{-1}) \ k_A(g_q^{-1} \cdot p, q_0) , \end{equation*} [[/math]]

which fixes the second input of [math]k_A[/math]. Consequently we could contain all the information of our kernel in a function that exists only on [math]M[/math] as [math]\kappa_A(p) := k_A(p,q_0)[/math]. This reduced kernel [math]\kappa_A[/math] still has some restrictions placed on it for the resulting operator to be equivariant, as the following theorem makes precise.

Theorem (Equivariant linear operators)

Let [math]M[/math] and [math]N[/math] be homogeneous spaces of a Lie group [math]G[/math] so that [math]M[/math] admits a covariant integral with respect to a character [math]\chi_M[/math] of [math]G[/math]. Fix a [math]q_0 \in N[/math] and let [math]\kappa_A \in \iident{C}(M) \cap \iident{L}^1(M)[/math] be compatible, i.e. have the property that

[[math]] \begin{equation} \label{eq:kernel_compatibility} \forall h \in G_{q_0} : h \cdot \kappa_A = \chi_M (h) \, \kappa_A. \end{equation} [[/math]]

Then the operator [math]A[/math] defined by

[[math]] \begin{equation*} (Af)(q) := \frac{1}{\chi_M(g_q)} \int_M (g_q \cdot \kappa_A) (p) \, f(p) \, dp \end{equation*} [[/math]]
where for all [math]q \in N[/math] we can choose any [math]g_q[/math] so that [math]g_q \cdot q_0 = q[/math], is a well defined bounded linear operator from [math]\iident{C}(M) \cap \iident{B}(M)[/math] to [math]\iident{C}(N) \cap \iident{B}(N)[/math] that is equivariant with respect to [math]G[/math].

Conversely every equivariant integral operator with a kernel [math]k_A \in \iident{C}(M\times N)[/math] and with [math]k_A(\,\cdot\,,q) \in \iident{L}^1(M)[/math] for some [math]q \in N[/math] is of this form.


Show Proof

\ \\ “[math]\Rightarrow[/math]” Assuming we have a [math]\kappa_A \in \iident{C}(M) \cap \iident{L}^1(M)[/math] that satisfies \ref{eq:kernel_compatibility}. Define [math]k_A \in C(M \times N)[/math] by

[[math]] \begin{equation*} k_A(p,q) := \frac{1}{\chi_M(g_q)} (g_q \cdot \kappa_A)(p). \end{equation*} [[/math]]
Then [math]k_A[/math] is well defined since it does not depend on the choice of [math]g_q[/math] for a given [math]q \in N[/math]. If [math]g_q'[/math] is another group element with [math]g_q \cdot q_0 = q[/math] then there exists a [math]h \in G_{q_0}[/math] so that [math]g_q' = g_q h[/math], we can check [math]k_A[/math] is invariant under choice of [math]h \in G_{q_0}[/math]:

[[math]] \begin{align*} \frac{1}{\chi_M(g_q h)} (g_q \cdot h \cdot \kappa_A)(p) = \frac{\chi_M(h)}{\chi_M(g_q) \chi_M(h)} (g_q \cdot \kappa_A)(p) = \frac{1}{\chi_M(g_q)} (g_q \cdot \kappa_A)(p) . \end{align*} [[/math]]
The kernel [math]k_A[/math] also satisfies the symmetry requirement \ref{eq:equivariant_kernel_symmetry} from Lemma lemma:

[[math]] \begin{align*} \chi_M(g) \, k_A(g \cdot p,g \cdot q) &= \chi_M(g) \, \frac{1}{\chi_M(g_{(g\cdot q)})} (g_{(g\cdot q)} \cdot \kappa_A)(g \cdot p) \\ &= \chi_M(g) \, \frac{1}{\chi_M(g g_{q})} (g \cdot g_{q} \cdot \kappa_A)(g \cdot p) \\ &= \frac{\chi_M(g)}{\chi_M(g)\chi_M(g_{q})} (g_{q} \cdot \kappa_A)(g^{-1}g \cdot p) \\ &= \frac{1}{\chi_M(g_q)} (g_q \cdot \kappa_A)(p) \\ &= k_A(p,q) . \end{align*} [[/math]]
By Lemma lemma we have

[[math]] \begin{equation*} \sup_{q \in N} \int_{M} |k_A(p,q)| dp = \left\Vert k_A(\,\cdot\,,q_0) \right\Vert_{L^1(M)} = \left\Vert \kappa_A \right\Vert_{L^1(M)} \lt \infty . \end{equation*} [[/math]]
Consequently, [math]A[/math] also satisfies \ref{eq:kernel_boundedness_requirement} and is a bounded equivariant linear operator per Lemma lemma.

[math]\Leftarrow[/math]” \ \\ Assuming we have an equivariant linear operator [math]A[/math] with kernel [math]k_A \in \iident{C}(M \times N)[/math] then we pick a fixed [math]q_0 \in N[/math] and define [math]\kappa_A \in \iident{C}(M)[/math]

[[math]] \begin{equation*} \kappa_A(p) := k_A(p,q_0) . \end{equation*} [[/math]]
This reduced kernel [math]\kappa_A[/math] satisfies the compatibility condition \ref{eq:kernel_compatibility} since if [math]h \in G_{q_0}[/math] then

[[math]] \begin{align*} (h \cdot \kappa_A)(p) &= k_A(h^{-1} \cdot p, q_0) \\ &= k_A(h^{-1} \cdot p, h^{-1} \cdot q_0) \\ &= \chi_M(h) \, k_A(p, q_0) \\ &= \chi_M(h) \, \kappa_A(p) . \end{align*} [[/math]]
Since we required [math]k_A(\,\cdot\,,q) \in \iident{L}^1(M)[/math] for some [math]q \in N[/math], we apply Lemma lemma to find

[[math]] \begin{equation*} \left\Vert \kappa_A \right\Vert_{L^1(M)} = \left\Vert k_A(\ \cdot\ ,q_0) \right\Vert_{L^1(M)} = \left\Vert k_A(\ \cdot\ ,q) \right\Vert_{L^1(M)} \lt \infty. \end{equation*} [[/math]]

Theorem theorem is the at the core of group equivariant CNNs since it allows us to generalize the familiar convolution operation present in CNNs to general linear operators that are equivariant with respect to a group of choice.

Example [Group convolution]

Let [math]G=M=N[/math] be some Lie group. A Lie group always admits a Haar integral, so we have a trivial character [math]\chi=1[/math]. As reference element we obviously choose the unit element [math]e[/math], though any group element would do. Then [math]G_g = \{ e \}[/math] and [math]G_{e,g}=\{ g \}[/math] are both trivial. Hence we have no symmetry condition on the kernel. Any [math]\kappa_A \in C(G) \cap L^1(G)[/math] defines a linear operator [math]A: C(G) \cap B(G) \to C(G) \cap B(G)[/math] by

[[math]] \begin{equation*} (Af)(h) = \int_G (h \cdot \kappa_A)(g) \, f(g) \, dg = \int_G \kappa_A (h^{-1} g) \, f(g) \, dg \end{equation*} [[/math]]

We also call this operation group cross-correlation and denote it as

[[math]] \begin{equation*} (\kappa \star_G f)(h) := \int_G (h \cdot \kappa)(g) \, f(g) \, dg . \end{equation*} [[/math]]

As in the familiar [math]\mathbb{R}^n[/math] setting, group cross-correlation is closely related to group convolution, which is defined as

[[math]] \begin{equation*} (\check{\kappa} *_G f)(h) := \int_G \check{\kappa} (g^{-1} h) \, f(g) \, dg . \end{equation*} [[/math]]

We leave relating the two kernels [math]\kappa[/math] and [math]\check{\kappa}[/math] as an exercise: when is [math]\kappa \star_G f = \check{\kappa} *_G f[/math]?


As in the [math]\mathbb{R}^n[/math] case, when we talk about group convolution we mean both group cross-correlation and group convolution since they are interchangeable.

Example [Rotation-translation equivariance in [math]\mathbb{R}^2[/math]]


Let [math]G=\iident{SE}(2)= \mathbb{R}^2 \rtimes \iident{SO}(2)[/math] and [math]M=N=\mathbb{R}^2[/math]. The Lebesgue measure on [math]\mathbb{R}^2[/math] is rotation-translation invariant so we have a G-invariant integral on [math]\mathbb{R}^2[/math]. Choose [math]\boldsymbol{y}_0=\boldsymbol{0}[/math] as the reference element then [math]G_{\boldsymbol{y}_0}=\left\{ (\boldsymbol{0},\, R(\theta)) \in G \ \middle\vert\ \theta \in [0,2\pi) \right\}[/math] is the stabilizer of [math]\boldsymbol{y}_0[/math]. A kernel [math]\kappa_A[/math] on [math]\mathbb{R}^2[/math] is then compatible if

[[math]] \begin{equation*} (\boldsymbol{0},\, R(\theta)) \cdot \kappa_A = \kappa_A \qquad \forall \theta \in [0,2\pi) , \end{equation*} [[/math]]

i.e. [math]\kappa_A[/math] needs to be radially symmetric. Now, we could have figured that out without building up the whole equivariance framework. But the next section will show how we can use the equivariance framework to step over the severe restriction that is imposed on the allowable kernels here.

General references

Smets, Bart M. N. (2024). "Mathematics of Neural Networks". arXiv:2403.04807 [cs.LG].

References

  1. Cite error: Invalid <ref> tag; no text was provided for refs named tao2011introduction
  2. Cite error: Invalid <ref> tag; no text was provided for refs named federer2014geometric