|
|
Line 45: |
Line 45: |
| \left( A f \right)(q) | | \left( A f \right)(q) |
| = | | = |
| \int_M k_A (p,q) \, f(p) \, \d p | | \int_M k_A (p,q) \, f(p) \, dp |
| , | | , |
| \end{equation} | | \end{equation} |
Line 144: |
Line 144: |
| If we integrate with respect to a G-invariant measure we say we have a G-invariant integral, or just an ''invariant integral'', if the measure is covariant with a character <math>\chi</math> we say we have a <math>\chi</math>-covariant integral or just ''covariant integral''. | | If we integrate with respect to a G-invariant measure we say we have a G-invariant integral, or just an ''invariant integral'', if the measure is covariant with a character <math>\chi</math> we say we have a <math>\chi</math>-covariant integral or just ''covariant integral''. |
| {{defncard|label=Covariant integral|id=def:covariant_integral| | | {{defncard|label=Covariant integral|id=def:covariant_integral| |
| Let <math>M</math> be a homogeneous space of a Lie group <math>G</math>, we say the integral <math>\int_M \ldots \d p</math> (using some Radon measure on <math>M</math>) is ''covariant'' with respect to <math>G</math> if there exists a character <math>\chi_M</math> of <math>G</math> so that | | Let <math>M</math> be a homogeneous space of a Lie group <math>G</math>, we say the integral <math>\int_M \ldots dp</math> (using some Radon measure on <math>M</math>) is ''covariant'' with respect to <math>G</math> if there exists a character <math>\chi_M</math> of <math>G</math> so that |
|
| |
|
| <math display="block"> | | <math display="block"> |
| \begin{equation*} | | \begin{equation*} |
| \int_M \left( g \cdot f \right) (p) \, \d p | | \int_M \left( g \cdot f \right) (p) \, dp |
| = | | = |
| \chi_M(g) \, \int_M f(p) \, \d p | | \chi_M(g) \, \int_M f(p) \, dp |
| \end{equation*} | | \end{equation*} |
| </math> | | </math> |
Line 161: |
Line 161: |
| <math display="block"> | | <math display="block"> |
| \begin{equation*} | | \begin{equation*} |
| \int_M \ldots\ \d \mu(p) | | \int_M \ldots\ d\mu(p) |
| . | | . |
| \end{equation*} | | \end{equation*} |
| </math> | | </math> |
| But since we only ever consider one measure per space we integrate over and for the sake of brevity we abbreviate <math>\d p \equiv \d\mu(p)</math>. | | But since we only ever consider one measure per space we integrate over and for the sake of brevity we abbreviate <math>dp \equiv \d\mu(p)</math>. |
| }} | | }} |
| If the homogeneous space is <math>G</math> itself then an invariant measure is called the (left) ''Haar measure'' on <math>G</math> (named after the Hungarian mathematician Alfréd Haar). We can say ''the'' Haar measure since Haar measures are unique up to multiplication with a constant and always exist (see <ref name="federer2014geometric"></ref>{{rp|at=Ch. 2.7}}). | | If the homogeneous space is <math>G</math> itself then an invariant measure is called the (left) ''Haar measure'' on <math>G</math> (named after the Hungarian mathematician Alfréd Haar). We can say ''the'' Haar measure since Haar measures are unique up to multiplication with a constant and always exist (see <ref name="federer2014geometric"></ref>{{rp|at=Ch. 2.7}}). |
Line 174: |
Line 174: |
| \begin{equation} | | \begin{equation} |
| \label{eq:invariant_group_integral} | | \label{eq:invariant_group_integral} |
| \int_G \left( h \cdot f \right)(g) \, \d g | | \int_G \left( h \cdot f \right)(g) \, dg |
| = | | = |
| \int_G f(g) \, \d g | | \int_G f(g) \, dg |
| \qquad | | \qquad |
| \forall h \in G | | \forall h \in G |
Line 182: |
Line 182: |
| \end{equation} | | \end{equation} |
| </math> | | </math> |
| where we abbreviated <math>\d g := \d\mu_G(g)</math>. | | where we abbreviated <math>dg := \d\mu_G(g)</math>. |
| We also call this invariant integral on the group the (left) ''Haar integral''. | | We also call this invariant integral on the group the (left) ''Haar integral''. |
| Not all homogeneous spaces admit a covariant integral but those in which we are interested all do. | | Not all homogeneous spaces admit a covariant integral but those in which we are interested all do. |
Line 230: |
Line 230: |
| <math display="block"> | | <math display="block"> |
| \begin{equation} | | \begin{equation} |
| \int_{\iident{SE}(2)} f(g) \, \d g | | \int_{\iident{SE}(2)} f(g) \, dg |
| = | | = |
| \int_{\mathbb{R}^2} | | \int_{\mathbb{R}^2} |
Line 292: |
Line 292: |
| \begin{equation} | | \begin{equation} |
| \label{eq:kernel_boundedness_requirement} | | \label{eq:kernel_boundedness_requirement} |
| \sup_{q \in N} \int_{M} |k_A(p,q)| \d p < \infty | | \sup_{q \in N} \int_{M} |k_A(p,q)| dp < \infty |
| . | | . |
| \end{equation} | | \end{equation} |
Line 305: |
Line 305: |
| \begin{equation} | | \begin{equation} |
| \label{eq:ELO_1} | | \label{eq:ELO_1} |
| \int_M k_A(p,g \cdot q) \, f (g^{-1} \cdot p) \, \d p | | \int_M k_A(p,g \cdot q) \, f (g^{-1} \cdot p) \, dp |
| = | | = |
| \int_M k_A(p,q) \, f(p) \,\d p | | \int_M k_A(p,q) \, f(p) \,dp |
| \end{equation} | | \end{equation} |
| </math> | | </math> |
Line 329: |
Line 329: |
| <math display="block"> | | <math display="block"> |
| \begin{equation*} | | \begin{equation*} |
| \int_M \left( g \cdot F \right) (p) \, \d p | | \int_M \left( g \cdot F \right) (p) \, dp |
| = | | = |
| \chi_M(g) \, \int_M F(p) \, \d p | | \chi_M(g) \, \int_M F(p) \, dp |
| . | | . |
| \end{equation*} | | \end{equation*} |
Line 342: |
Line 342: |
| \label{eq:ELO_2} | | \label{eq:ELO_2} |
| \chi_M(g) | | \chi_M(g) |
| \int_M k_A(g \cdot p,g \cdot q) \, f (p) \, \d p | | \int_M k_A(g \cdot p,g \cdot q) \, f (p) \, dp |
| = | | = |
| \int_M k_A(p,q) \, f(p) \,\d p | | \int_M k_A(p,q) \, f(p) \,dp |
| . | | . |
| \end{equation} | | \end{equation} |
Line 372: |
Line 372: |
| \sup_{q \in N} | | \sup_{q \in N} |
| \left| | | \left| |
| \int_M k_A(p,q) \, f(p) \d p | | \int_M k_A(p,q) \, f(p) dp |
| \right| | | \right| |
| \\ | | \\ |
Line 378: |
Line 378: |
| \sup_{q \in N} | | \sup_{q \in N} |
| \int_M | | \int_M |
| | k_A(p,q) | \, |f(p)| \d p | | | k_A(p,q) | \, |f(p)| dp |
| \\ | | \\ |
| &\leq | | &\leq |
Line 384: |
Line 384: |
| \cdot\ | | \cdot\ |
| \sup_{q \in N} | | \sup_{q \in N} |
| \int_M | k_A(p,q) | \d p | | \int_M | k_A(p,q) | dp |
| \\ | | \\ |
| & | | & |
Line 391: |
Line 391: |
| \end{align*} | | \end{align*} |
| </math>}} | | </math>}} |
| | |
| The condition on the kernel \ref{eq:kernel_boundedness_requirement} is partially redundant with the symmetry requirement as the following lemma shows. | | The condition on the kernel \ref{eq:kernel_boundedness_requirement} is partially redundant with the symmetry requirement as the following lemma shows. |
| {{proofcard|Lemma|lem:kernel_L1|In the same setting as Lemma [[#lem:equivariant_integral_operator |lemma]]. If the kernel <math>k_A \in \iident{C}(M \times N)</math> satisfies the symmetry \ref{eq:equivariant_kernel_symmetry} and condition \ref{eq:kernel_boundedness_requirement} then | | {{proofcard|Lemma|lem:kernel_L1|In the same setting as Lemma [[#lem:equivariant_integral_operator |lemma]]. If the kernel <math>k_A \in \iident{C}(M \times N)</math> satisfies the symmetry \ref{eq:equivariant_kernel_symmetry} and condition \ref{eq:kernel_boundedness_requirement} then |
Line 406: |
Line 407: |
| <math display="block"> | | <math display="block"> |
| \begin{align*} | | \begin{align*} |
| \int_M \left| k_A(p,q_1) \right| \d p | | \int_M \left| k_A(p,q_1) \right| dp |
| &= | | &= |
| \int_M \left| k_A(p,g \cdot q_2) \right| \d p | | \int_M \left| k_A(p,g \cdot q_2) \right| dp |
| \\ | | \\ |
| &= | | &= |
| \int_M \left| k_A(g \cdot g^{-1} \cdot p, g \cdot q_2) \right| \d p | | \int_M \left| k_A(g \cdot g^{-1} \cdot p, g \cdot q_2) \right| dp |
| \\ | | \\ |
| {\scriptsize \ref{eq:equivariant_kernel_symmetry}} | | {\scriptsize \ref{eq:equivariant_kernel_symmetry}} |
| &= | | &= |
| \frac{1}{\chi_M(g)} | | \frac{1}{\chi_M(g)} |
| \int_M \left| k_A(g^{-1} \cdot p, q_2) \right| \d p | | \int_M \left| k_A(g^{-1} \cdot p, q_2) \right| dp |
| \\ | | \\ |
| \text{\scriptsize (Def.~[[#def:covariant_integral |definition]])} | | \text{\scriptsize (Def.~[[#def:covariant_integral |definition]])} |
| &= | | &= |
| \frac{\chi_M(g)}{\chi_M(g)} | | \frac{\chi_M(g)}{\chi_M(g)} |
| \int_M \left| k_A(p, q_2) \right| \d p | | \int_M \left| k_A(p, q_2) \right| dp |
| \\ | | \\ |
| &= | | &= |
| \int_M \left| k_A(p,q_2) \right| \d p | | \int_M \left| k_A(p,q_2) \right| dp |
| . | | . |
| \end{align*} | | \end{align*} |
Line 463: |
Line 464: |
| <math display="block"> | | <math display="block"> |
| \begin{equation*} | | \begin{equation*} |
| (Af)(q) := \frac{1}{\chi_M(g_q)} \int_M (g_q \cdot \kappa_A) (p) \, f(p) \, \d p | | (Af)(q) := \frac{1}{\chi_M(g_q)} \int_M (g_q \cdot \kappa_A) (p) \, f(p) \, dp |
| \end{equation*} | | \end{equation*} |
| </math> | | </math> |
Line 518: |
Line 519: |
| <math display="block"> | | <math display="block"> |
| \begin{equation*} | | \begin{equation*} |
| \sup_{q \in N} \int_{M} |k_A(p,q)| \d p | | \sup_{q \in N} \int_{M} |k_A(p,q)| dp |
| = | | = |
| \left\Vert k_A(\,\cdot\,,q_0) \right\Vert_{L^1(M)} | | \left\Vert k_A(\,\cdot\,,q_0) \right\Vert_{L^1(M)} |
Line 572: |
Line 573: |
| </math>}} | | </math>}} |
| Theorem [[#thm:equivariant_linear_operators |theorem]] is the at the core of group equivariant CNNs since it allows us to generalize the familiar convolution operation present in CNNs to general linear operators that are equivariant with respect to a group of choice. | | Theorem [[#thm:equivariant_linear_operators |theorem]] is the at the core of group equivariant CNNs since it allows us to generalize the familiar convolution operation present in CNNs to general linear operators that are equivariant with respect to a group of choice. |
| \iffalse
| |
|
| |
| ===<span id="subsection:equivariant_linear_operators"></span>Equivariant Linear Operators===
| |
| Of course the objective of this chapter is building equivariant operators, so when is an integral operator equivariant?
| |
| Let <math>M</math> and <math>N</math> be homogeneous spaces of a lie group <math>G</math> so that <math>M</math> admits a covariant integral with character <math>\chi_M</math>.
| |
| Let <math>A</math> be an integral operator \ref{eq:integral_operator} from <math>\iident{B}(M)</math> to <math>\iident{B}(N)</math> with a kernel <math>k_A \in \iident{B}(M \times N)</math> and assume that <math>\forall q \in N:\, k(\,\cdot\,,q)</math> is compactly supported.
| |
| Let us check what conditions the kernel has to satisfy for <math>A</math> to be equivariant with respect to <math>G</math>.
| |
|
| |
| <math display="block">
| |
| \begin{align*}
| |
| && A (g \cdot f) &= g\cdot(A f)
| |
| \qquad
| |
| &&\forall g \in G, \forall f \in \iident{B}(M),
| |
| \\
| |
| &\Leftrightarrow&
| |
| g^{-1} \cdot A( g \cdot f)
| |
| &=
| |
| A f
| |
| \intertext{(definitions of the group representations and $A$)}
| |
| &\Leftrightarrow&
| |
| \int_M k_A(r,g \cdot q) \, f (g^{-1} \cdot r) \, \d r
| |
| &=
| |
| \int_M k_A(p,q) \, f(p) \,\d p
| |
| &&
| |
| \forall q \in N,
| |
| \intertext{(change of variable on the left <math>p=g^{-1} \cdot r</math>)}
| |
| &\Leftrightarrow&
| |
| \int_M k_A(g \cdot p,g \cdot q) \, f (p) \, \d (g\cdot p)
| |
| &=
| |
| \int_M k_A(p,q) \, f(p) \,\d p
| |
| &&
| |
| \intertext{(covariant integral on <math>M</math> with character <math>\chi_M</math> [[#def:character |definition]])}
| |
| &\Leftrightarrow&
| |
| \chi_M(g) \int_M k_A(g \cdot p,g \cdot q) \, f (p) \, \d p
| |
| &=
| |
| \int_M k_A(p,q) \, f(p) \,\d p
| |
| &&
| |
| \\
| |
| &\Leftrightarrow&
| |
| \chi_M(g) \, k_A(g \cdot p, g \cdot q)
| |
| &=
| |
| k_A(p,q)
| |
| && \textrm{a.e.}
| |
| \end{align*}
| |
| </math>
| |
|
| |
|
| |
| The above equality only needs to hold almost everywhere but is obviously satisfied if we let it hold everywhere.
| |
| We can summarize that <math>A</math> is equivariant when
| |
|
| |
| <span id{{=}}"eq:equivariant_kernel"/>
| |
| <math display="block">
| |
| \begin{equation}
| |
| \label{eq:equivariant_kernel}
| |
| \chi_M(g) \, k_A (g \cdot p, g \cdot q)= k_A(p, q) \qquad\forall p \in M, q \in N, g \in G
| |
| .
| |
| \end{equation}
| |
| </math>
| |
|
| |
|
| |
| Now, since the action on <math>N</math> is transitive we can fix a reference element <math>q_0 \in N</math> and write the kernel as
| |
|
| |
| <math display="block">
| |
| \begin{equation*}
| |
| k_A(p,\, q) = k_A (p,\, g_q \cdot q_0)
| |
| \qquad
| |
| \forall g_q \in G_{q_0,q}
| |
| ,
| |
| \end{equation*}
| |
| </math>
| |
| with <math>G_{q_0,q}</math> per [[guide:Ebe14069e2#eq:homogeneous_set |equation]].
| |
| Applying the symmetry we just found in \ref{eq:equivariant_kernel} yields
| |
|
| |
| <math display="block">
| |
| \begin{equation*}
| |
| k_A (p,\, q)
| |
| =
| |
| \chi_M(g_q^{-1}) \, k_A(g_q^{-1} \cdot p, \, q_0)
| |
| \qquad
| |
| \forall g_q \in G_{q_0,q}
| |
| ,
| |
| \end{equation*}
| |
| </math>
| |
| which fixes the second argument of the kernel to <math>q_0</math>.
| |
| So we could define a simplified kernel
| |
|
| |
| <math display="block">
| |
| \begin{equation*}
| |
| \kappa_A(p)
| |
| :=
| |
| k_A(p,q_0)
| |
| \end{equation*}
| |
| </math>
| |
| that is bounded and compactly supported on <math>M</math> since <math>p \mapsto k_A(p,q)</math> was compactly supported for all <math>q</math>.
| |
| Then we can express the original kernel as
| |
|
| |
| <math display="block">
| |
| \begin{equation*}
| |
| k_A(p,q)
| |
| =
| |
| \chi_M(g_q^{-1}) \, \kappa_A(g_q^{-1} \cdot p)
| |
| =
| |
| \frac{(g_q \cdot \kappa_A) (p)}{\chi_M(g_q)}
| |
| .
| |
| \end{equation*}
| |
| </math>
| |
| Since <math>g_q</math> is not unique, to be well defined we need that for all <math>g_q,g_q' \in G_{q_0,q}</math> the following to hold
| |
|
| |
| <math display="block">
| |
| \begin{equation*}
| |
| \frac{ ( g_q \cdot \kappa_A ) (p)}{\chi_M(g_q)}
| |
| =
| |
| \frac{ ( g_q' \cdot \kappa_A ) (p)}{\chi_M(g_q')}
| |
| .
| |
| \end{equation*}
| |
| </math>
| |
| From [[guide:Ebe14069e2#eq:homogeneous_equivalence |equation]] we know that this amounts to
| |
|
| |
| <math display="block">
| |
| \begin{align*}
| |
| &&
| |
| \frac{(g_q \cdot \kappa_A)(p)}{\chi_M(g_q)}
| |
| &=
| |
| \frac{(g_q h \cdot \kappa_A)(p)}{\chi_M(g_q h)}
| |
| \qquad
| |
| &\forall h \in G_{p_0},
| |
| \\[5pt]
| |
| &\Leftrightarrow&
| |
| \frac{(g_q \cdot \kappa_A)(p)}{\chi_M(g_q)}
| |
| &=
| |
| \frac{(g_q \cdot (h \cdot \kappa_A))(p)}{\chi_M(g_q) \chi_M(h)},
| |
| \\[5pt]
| |
| &\Leftrightarrow&
| |
| (h \cdot \kappa_A) (p)
| |
| &=
| |
| \chi_M(h) \, \kappa_A(p)
| |
| .
| |
| \end{align*}
| |
| </math>
| |
| This condition on <math>\kappa_A</math> holds when <math>k_A</math> satisfies the equivariance requirement \ref{eq:equivariant_kernel}:
| |
|
| |
| <math display="block">
| |
| \begin{equation*}
| |
| (h \cdot \kappa_A)(p)
| |
| =
| |
| \kappa_A(h^{-1} \cdot p)
| |
| =
| |
| k_A(h^{-1} \cdot p, q_0)
| |
| =
| |
| \frac{k_A(p,h \cdot q_0)}{ \chi_M(h^{-1})}
| |
| =
| |
| \chi_M(h) \, k_A(p,q_0)
| |
| =
| |
| \chi_M(h) \, \kappa_A(p),
| |
| \end{equation*}
| |
| </math>
| |
| for any <math>h \in G_{q_0}</math>.
| |
| So now we can rewrite the operator <math>A</math> in terms of the simplified kernel <math>\kappa_A</math>:
| |
|
| |
| <math display="block">
| |
| \begin{equation*}
| |
| (Af)(q)
| |
| =
| |
| \frac{1}{\chi_M(g_q)}
| |
| \int_M
| |
| \left( g_q \cdot \kappa_A \right) (p) \, f(p) \, \d p
| |
| \quad
| |
| \text{for any} \ g_q \in G_{q_0,q}.
| |
| \end{equation*}
| |
| </math>
| |
| As long as <math>h \cdot \kappa_A(p)= \chi_M(h) \, \kappa_A(p)</math> for all <math>h \in G_{q_0,q}</math> this operator is well defined. We say that the kernel <math>\kappa_A</math> is <math>(N,G,q_0)</math>-compatible or just ''compatible'' with the output space <math>N</math>. If the kernel is bounded and compactly supported then <math>A</math> is a bounded linear operator from <math>\iident{B}(M)</math> to <math>\iident{B}(N)</math> under the supremum norm.
| |
| We can check that all this work has indeed resulted in an equivariant operator. Let <math>q \in N</math> then
| |
|
| |
| <math display="block">
| |
| \begin{align*}
| |
| \left( A(g \cdot f) \right)(q)
| |
| &\overset{\ref{eq:integral_operator}}{=}
| |
| \frac{1}{\chi_M(g_q)}
| |
| \int_M
| |
| \left( g_q \cdot \kappa_A \right) (p) \, \left( g \cdot f \right)(p) \, \d p
| |
| \\[5pt]
| |
| &=
| |
| \frac{1}{\chi_M(g_q)}
| |
| \int_M
| |
| \left(g \cdot g^{-1} \cdot g_q \cdot \kappa_A \right) (p)
| |
| \,
| |
| \left( g \cdot f \right)(p)
| |
| \, \d p
| |
| \\[5pt]
| |
| &\overset{[[#def:covariant_integral |definition]]}{=}
| |
| \frac{\chi_M(g)}{\chi_M(g_q)}
| |
| \int_M
| |
| \left(g^{-1} \cdot g_q \cdot \kappa_A \right) (p)
| |
| \,
| |
| f (p)
| |
| \, \d p
| |
| \\[5pt]
| |
| &=
| |
| \frac{1}{\chi_M(g^{-1} g_q)}
| |
| \int_M
| |
| \left((g^{-1} g_q) \cdot \kappa_A \right) (p)
| |
| \,
| |
| f (p)
| |
| \, \d p
| |
| \intertext{(if $g_q \in G_{q_0,q}$ then <math>g^{-1} g_q \in G_{q_0,(g^{-1}\cdot q)}</math> so we write <math>g^{-1} g_q=g_{(g^{-1} \cdot q)}</math>)}
| |
| &=
| |
| \frac{1}{\chi_M \left( g_{(g^{-1} \cdot q)} \right)}
| |
| \int_M
| |
| \left(g_{(g^{-1}\cdot q)} \cdot \kappa_A \right) (p)
| |
| \,
| |
| f (p)
| |
| \, \d p
| |
| \\[5pt]
| |
| &=
| |
| (Af)(g^{-1} \cdot q)
| |
| \\[5pt]
| |
| &=
| |
| (g \cdot (Af))(q),
| |
| \end{align*}
| |
| </math>
| |
| since <math>q</math> was arbitrary <math>A</math> is equivariant with respect to <math>G</math>.
| |
| We can summarize this result in the following theorem.
| |
| {{proofcard|Theorem (Equivariant Linear Operators)|thm:equivariant_linear_operators|Let <math>M</math> and <math>N</math> be homogeneous spaces of a Lie group <math>G</math> so that <math>M</math> admits a covariant integral with respect to a character <math>\chi</math> of <math>G</math>.
| |
| Fix a <math>q_0 \in N</math> and let <math>\kappa_A \in \iident{B}(M)</math> have compact support and be <math>(N,G,q_0)</math>-compatible, i.e
| |
|
| |
| <math display="block">
| |
| \begin{equation*}
| |
| \forall h \in G_{q_0}
| |
| :
| |
| h \cdot \kappa_A = \chi(h) \, \kappa_A.
| |
| \end{equation*}
| |
| </math>
| |
|
| |
| Then the operator <math>A</math> defined by
| |
|
| |
|
| <math display="block">
| |
| \begin{equation*}
| |
| (Af)(q) := \frac{1}{\chi(g_q)} \int_M (g_q \cdot \kappa_A) (p) \, f(p) \, \d p
| |
| \end{equation*}
| |
| </math>
| |
| where for all <math>q \in N</math> we can choose any <math>g_q</math> so that <math>g_q \cdot q_0 = q</math>, is a well defined bounded linear operator from <math>\iident{B}(M)</math> to <math>\iident{B}(N)</math> that is equivariant with respect to <math>G</math>.
| |
| Moreover every equivariant integral operator \ref{eq:integral_operator} with a kernel <math>k_A \in \iident{B}(M\times N)</math> and with <math>k_A(\,\cdot\,,q)</math> having compact support and being continous for all <math>q \in N</math> is of this form.
| |
| |The preceding derivations serve as proof that all equivariant integral operators \ref{eq:integral_operator} (with the given conditions on the kernel) are of the specified form and so that <math>A</math> is an equivariant linear operator.
| |
| We will just verify that the operator defined in the lemma is bounded in the supremum norm.
| |
| \setlength{\jot}{2.0ex}
| |
|
| |
| <math display="block">
| |
| \begin{align*}
| |
| \left\| Af \right\|_\infty
| |
| &=
| |
| \sup_{q \in N}\
| |
| \left|
| |
| \frac{1}{\chi(g_q)} \int_M (g_q \cdot \kappa_A) (p) \, f(p) \, \d p
| |
| \right|
| |
| \\
| |
| &=
| |
| \sup_{g \in G}\
| |
| \left|
| |
| \frac{1}{\chi(g)} \int_M (g \cdot \kappa_A) (p) \, f(p) \, \d p
| |
| \right|
| |
| \\
| |
| &=
| |
| \sup_{g \in G}\
| |
| \left|
| |
| \frac{1}{\chi(g)} \int_{\supp(g \cdot \kappa_A)} (g \cdot \kappa_A) (p) \, f(p) \, \d p
| |
| \right|
| |
| \\
| |
| &\leq
| |
| \sup_{g \in G}\
| |
| \frac{1}{\chi(g)} \int_{\supp(g \cdot \kappa_A)}
| |
| \left|
| |
| (g \cdot \kappa_A) (p) \, f(p)
| |
| \right|
| |
| \, \d p
| |
| \\
| |
| &\leq
| |
| \sup_{g \in G}\
| |
| \frac{\mu(\supp(g\cdot \kappa_A))}{\chi(g)}
| |
| \| g \cdot \kappa_A \|_\infty
| |
| \,
| |
| \| f \|_\infty.
| |
| \end{align*}
| |
| </math>
| |
| Since our group representation on <math>\iident{B}(M)</math> corresponds to the group action on <math>M</math> it does not affect the codomain of the function. Hence we have that <math>\| g \cdot \kappa_A \|_\infty = \| \kappa_A \|_\infty</math> and <math>\supp(g\cdot \kappa_A)=g\cdot\supp(\kappa_A)</math>.
| |
| Since we have a covariant measure we find
| |
|
| |
| <math display="block">
| |
| \begin{equation*}
| |
| \mu(\supp(g\cdot \kappa_A))
| |
| =
| |
| \mu(g\cdot\supp(\kappa_A))
| |
| =
| |
| \chi(g) \, \mu(\supp(\kappa_A)).
| |
| \end{equation*}
| |
| </math>
| |
| Going back to our norm:
| |
|
| |
| <math display="block">
| |
| \begin{align*}
| |
| \left\| Af \right\|_\infty
| |
| &\leq
| |
| \sup_{g \in G}\
| |
| \frac{\chi(g) \, \mu(\supp(\kappa_A))}{\chi(g)}
| |
| \| \kappa_A \|_\infty
| |
| \,
| |
| \| f \|_\infty.
| |
| \\
| |
| &=
| |
| \mu(\supp(\kappa_A))
| |
| \,
| |
| \| \kappa_A \|_\infty
| |
| \,
| |
| \| f \|_\infty,
| |
| \\
| |
| &\Downarrow
| |
| \\
| |
| \frac{\left\| Af \right\|_\infty}{\| f \|_\infty}
| |
| &\leq
| |
| \mu(\supp(\kappa_A))
| |
| \,
| |
| \| \kappa_A \|_\infty
| |
| < \infty
| |
| ,
| |
| \end{align*}
| |
| </math>
| |
| since <math>\kappa_A</math> is bounded and compactly supported (and <math>\mu</math> is a Radon measure, so the measure of a compact set is finite).
| |
| Consequently <math>A</math> is bounded.}}
| |
| '''Example'''
| |
| [Traditional convolution]
| |
| Let <math>G=(\mathbb{R}^n,+)</math> and <math>M = N = \mathbb{R}^n</math>.
| |
| Let <math>G</math> act on <math>M=N</math> by <math>\boldsymbol{g} \cdot \boldsymbol{x} = \boldsymbol{x} + \boldsymbol{g}</math>, then the induced representation on functions on <math>M</math> is <math>(\boldsymbol{g} \cdot f) (\boldsymbol{x})=f(\boldsymbol{g}^{-1}\cdot \boldsymbol{x})=f(\boldsymbol{x}-\boldsymbol{g})</math>.
| |
|
| |
| We know integration on <math>\mathbb{R}^n</math> is translation invariant, so we have a trivial character <math>\chi_M \equiv 1</math>.
| |
|
| |
| Choose as reference element of <math>N</math> the origin <math>\boldsymbol{y}_0 = \boldsymbol{0}</math>.
| |
| Then <math>G_{\boldsymbol{y}_0} = \{ e \} = \{ \boldsymbol{0} \}</math> and <math>G_{\boldsymbol{y}_0,\boldsymbol{y}} = \{ \boldsymbol{y} \}</math>.
| |
| Hence the compatibility requirement on the kernel is trivial, any <math>\kappa_A \in B_c(M)</math> will yield an equivariant operator.
| |
|
| |
| <math display="block">
| |
| \begin{align*}
| |
| (Af)(\boldsymbol{y})
| |
| &=
| |
| \int_{\mathbb{R}^n} (g_{\boldsymbol{y}} \cdot \kappa_A)(\boldsymbol{x}) \, f(\boldsymbol{x}) \, \d\boldsymbol{x}
| |
| \\
| |
| &=
| |
| \int_{\mathbb{R}^n} (\boldsymbol{y} \cdot \kappa_A)(\boldsymbol{x}) \, f(\boldsymbol{x}) \, \d\boldsymbol{x}
| |
| \\
| |
| &=
| |
| \int_{\mathbb{R}^n} \kappa_A( \boldsymbol{y}^{-1} \cdot \boldsymbol{x}) \, f(\boldsymbol{x}) \, \d\boldsymbol{x}
| |
| \\
| |
| &=
| |
| \int_{\mathbb{R}^n} \kappa_A( \boldsymbol{x} - \boldsymbol{y} ) \, f(\boldsymbol{x}) \, \d\boldsymbol{x}
| |
| \\
| |
| &=
| |
| (\kappa_A \star f)(\boldsymbol{y})
| |
| \\
| |
| &=
| |
| (\check{\kappa}_A * f)(\boldsymbol{y}).
| |
| \end{align*}
| |
| </math>
| |
| where <math>\check{\kappa}_A</math> is the reflected kernel <math>\check{\kappa}_A(\boldsymbol{x})=\kappa_A(\boldsymbol{-x})</math>.
| |
| So we see that what we know as convolution (or cross-correlation) is really just a special case of this larger notion of equivariant linear operators.
| |
| In this context we can look at a traditional CNN as being a G-CNN where the translation group was (implicitly) chosen as the Lie group.
| |
|
| |
| \fi
| |
| <span id="example:group_convolution"/> | | <span id="example:group_convolution"/> |
| '''Example''' | | '''Example''' |
Line 952: |
Line 589: |
| (Af)(h) | | (Af)(h) |
| = | | = |
| \int_G (h \cdot \kappa_A)(g) \, f(g) \, \d g | | \int_G (h \cdot \kappa_A)(g) \, f(g) \, dg |
| = | | = |
| \int_G \kappa_A (h^{-1} g) \, f(g) \, \d g | | \int_G \kappa_A (h^{-1} g) \, f(g) \, dg |
| \end{equation*} | | \end{equation*} |
| </math> | | </math> |
Line 963: |
Line 600: |
| (\kappa \star_G f)(h) | | (\kappa \star_G f)(h) |
| := | | := |
| \int_G (h \cdot \kappa)(g) \, f(g) \, \d g | | \int_G (h \cdot \kappa)(g) \, f(g) \, dg |
| . | | . |
| \end{equation*} | | \end{equation*} |
Line 973: |
Line 610: |
| (\check{\kappa} *_G f)(h) | | (\check{\kappa} *_G f)(h) |
| := | | := |
| \int_G \check{\kappa} (g^{-1} h) \, f(g) \, \d g | | \int_G \check{\kappa} (g^{-1} h) \, f(g) \, dg |
| . | | . |
| \end{equation*} | | \end{equation*} |
Line 1,003: |
Line 640: |
| Now, we could have figured that out without building up the whole equivariance framework. | | Now, we could have figured that out without building up the whole equivariance framework. |
| But the next section will show how we can use the equivariance framework to step over the severe restriction that is imposed on the allowable kernels here. | | But the next section will show how we can use the equivariance framework to step over the severe restriction that is imposed on the allowable kernels here. |
|
| |
|
| |
|
| ==General references== | | ==General references== |
[math]
\newcommand{\ul}{\mathbf}
\newcommand{\symbf}{\bm}
\newcommand\subsetap{\mathrel{\overset{\makebox[0pt]{\mbox{\normalfont\tiny\sffamily ap.}}}{\rule{0pt}{.8ex}\smash{\subset}}}}
\newcommand{\rident}[1]{\mathrm{#1}}
\newcommand{\iident}[1]{\mathit{#1}}
\newcommand{\wip}{\emoji{construction}}
\newcommand{\pointright}{\emoji{backhand-index-pointing-right-light-skin-tone}}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator*{\Arg}{Arg}
\DeclareMathOperator*{\Var}{Var}
\DeclareMathOperator*{\dom}{dom}
\DeclareMathOperator{\Div}{div}
\DeclareMathOperator{\morph}{\scalebox{0.7}{\ensuremath\square}}
\DeclareMathOperator*{\esssup}{ess\,sup}
\DeclareMathOperator{\Int}{Int}
\DeclareMathOperator{\Cl}{Cl}
\DeclareMathOperator{\id}{id}
\DeclareMathOperator{\diam}{diam}
\DeclareMathOperator{\supp}{supp}
\DeclareMathOperator{\arctantwo}{arctan2}
\DeclareMathOperator{\relu}{ReLU}
\newcommand{\mathds}{\mathbb}[/math]
This article was automatically generated from a tex file and may contain conversion errors. If permitted, you may login and edit this article to improve the conversion.
Now let us look at how we can start putting together an artificial neuron in our new setting.
We have an input manifold [math]M[/math] and an output manifold [math]N[/math] that are both homogeneous spaces of a Lie group [math]G[/math].
Our input data is a function on [math]M[/math], say [math]f \in X = \iident{B}(M)[/math] and we are expected to output a function on [math]N[/math], say an element of [math]Y = \iident{B}(N)[/math].
Recall that the set of bounded functions [math]\iident{B}(M)[/math] is a Banach space under the supremum norm (a.k.a. the [math]\infty[/math]-norm or the uniform norm) given by [math]\| f \|_\infty := \sup_{p \in M} |f(p)|[/math].
The first part of a discrete artificial neuron was a linear operator [math]A:\mathbb{R}^n \to \mathbb{R}^m[/math], given by:
[[math]]
\begin{equation*}
\left( A \boldsymbol{x} \right)_i
=
\sum_{j} (A)_{ij} x_i.
\end{equation*}
[[/math]]
Its analogue in the continuous setting is an integral operator [math]A:\mathbb{R}^M \to \mathbb{R}^N[/math] of the form:
[[math]]
\begin{equation}
\label{eq:integral_operator}
\left( A f \right)(q)
=
\int_M k_A (p,q) \, f(p) \, dp
,
\end{equation}
[[/math]]
where the function [math]k_A:M \times N \to \mathbb{R}[/math] is called the operator's kernel.
[Measurable functions]
Technically for the Lebesgue integral in \ref{eq:integral_operator} to exist the integrand needs to be measurable.
We will not be dealing with non-measurable functions and you may assume that when we say function we mean measurable function.
If the concept of measurable functions is new to you, you may ignore the issue.
In this framework, instead of training the matrix [math]A[/math], we will train the kernel [math]k_A[/math]. In practice we cannot train a continuous function so training the kernel will come down to either training a discretization or training the parameters of some parameterization of [math]k_A[/math].
Now we still need to specify how we are going to integrate on a homogeneous space to make progress.
Integration
Integration on [math]\mathbb{R}^n[/math] has the desirable property that it is translation invariant: for all [math]\boldsymbol{y} \in \mathbb{R}^n[/math] and integrable functions [math]f:\mathbb{R}^n \to \mathbb{R}[/math] we have
[[math]]
\begin{equation}
\label{eq:translation_invariant_integral}
\int_{\mathbb{R}^n} f(\boldsymbol{x}-\boldsymbol{y}) \, \d\boldsymbol{x}
=
\int_{\mathbb{R}^n} f(\boldsymbol{x}) \, \d\boldsymbol{x}
.
\end{equation}
[[/math]]
Ideally we would want integration on a homogeneous space [math]M[/math] of a Lie group [math]G[/math] to behave similarly, namely for all [math]g \in G[/math] we would like:
[[math]]
\begin{equation}
\label{eq:group_invariant_integral}
\int_M \left( g \cdot f \right)(p) \, \d\mu_M(p)
:=
\int_M f(g^{-1} \cdot p) \, \d\mu_M(p)
=
\int_M f(p) \, \d\mu_M(p),
\end{equation}
[[/math]]
for some Radon measure [math]\mu_M[/math] on [math]M[/math].
[Measures]
Recall that measures are the generalization of concepts such as length, volume, mass, probability etc. A measure assigns a non-negative real number to subsets of a space in such a way that it behaves similarly to the aforementioned concepts.
A Radon measure on a Hausdorff topological space is a measure that plays well with the topology of the space (defined for open and closed sets, finite on compact sets, etc.).
The Lebesgue measure is the translation invariant Radon measure on [math]\mathbb{R}^n[/math] and coincides with our less general notion of the length/area/volume of subsets of [math]\mathbb{R}^n[/math].
Integration on [math]\mathbb{R}^n[/math] such as in \ref{eq:translation_invariant_integral} implicitly uses the Lebesgue measure and so is translation invariant.
For a comprehensive introduction to measure theory see [1]. For the purpose of this course it is sufficient to think about a measure as measuring the volume of a subset.
This imposes a condition on the measure [math]\mu_M[/math], namely: for all measurable subsets [math]S[/math] of [math]M[/math] and [math]g \in G[/math] we require
[[math]]
\begin{equation}
\label{eq:invariant_measure}
\mu_M(g\cdot S)=\mu_M(S).
\end{equation}
[[/math]]
In other words we would need a (non-zero) group invariant measure to get the desired integral.
These G-invariant measures, or just invariant measures, do not always exist. In some cases we can still obtain a covariant measure, which is a measure that satisfies
[[math]]
\begin{equation}
\label{eq:covariant_measure}
\mu(g \cdot S)=\chi(g)\, \mu(S)
,
\end{equation}
[[/math]]
where [math]\chi:G \to \mathbb{R}^+[/math] is a character of [math]G[/math].
A multiplicative character or linear character or simply character of a Lie group [math]G[/math] is a continuous homomorphism from the group to the multiplicative group of positive real numbers, i.e. [math]\chi:G \to \mathbb{R}_{ \gt 0}[/math] so that:
[[math]]
\begin{equation*}
\chi(g_1 g_2) = \chi(g_1)\, \chi(g_2) \qquad \forall g_1,g_2 \in G.
\end{equation*}
[[/math]]
The function [math]\chi[/math] needs to be a character since by \ref{eq:covariant_measure} we have:
[[math]]
\begin{equation*}
\chi(g_1 g_2)\, \mu(S)
=
\mu(g_1 g_2 \cdot S)
=
\mu(g_1 \cdot (g_2 \cdot S))
=
\chi(g_1) \, \mu(g_2 \cdot S)
=
\chi(g_1) \, \chi(g_2) \, \mu(S)
,
\end{equation*}
[[/math]]
for all [math]g_1,g_2 \in G[/math] and all measurable [math]S \subset M[/math].
If we integrate with respect to a G-invariant measure we say we have a G-invariant integral, or just an invariant integral, if the measure is covariant with a character [math]\chi[/math] we say we have a [math]\chi[/math]-covariant integral or just covariant integral.
Let [math]M[/math] be a homogeneous space of a Lie group [math]G[/math], we say the integral [math]\int_M \ldots dp[/math] (using some Radon measure on [math]M[/math]) is covariant with respect to [math]G[/math] if there exists a character [math]\chi_M[/math] of [math]G[/math] so that
[[math]]
\begin{equation*}
\int_M \left( g \cdot f \right) (p) \, dp
=
\chi_M(g) \, \int_M f(p) \, dp
\end{equation*}
[[/math]]
for all
[math]g \in G[/math] and all
[math]f:M \to \mathbb{R}[/math] for which the integral exists.
In the special case that
[math]\chi_M \equiv 1[/math] we say the integral is
invariant.
[Abuse of notation]
Integration is always with respect to some measure.
If we are integrating with respect to the measure [math]\mu[/math] then for the sake of completeness we should write
[[math]]
\begin{equation*}
\int_M \ldots\ d\mu(p)
.
\end{equation*}
[[/math]]
But since we only ever consider one measure per space we integrate over and for the sake of brevity we abbreviate
[math]dp \equiv \d\mu(p)[/math].
If the homogeneous space is [math]G[/math] itself then an invariant measure is called the (left) Haar measure on [math]G[/math] (named after the Hungarian mathematician Alfréd Haar). We can say the Haar measure since Haar measures are unique up to multiplication with a constant and always exist (see [2](Ch. 2.7)).
Hence when integrating on the group itself we can always have a Haar measure [math]\mu_G[/math] so that the following equality holds
[[math]]
\begin{equation}
\label{eq:invariant_group_integral}
\int_G \left( h \cdot f \right)(g) \, dg
=
\int_G f(g) \, dg
\qquad
\forall h \in G
,
\end{equation}
[[/math]]
where we abbreviated [math]dg := \d\mu_G(g)[/math].
We also call this invariant integral on the group the (left) Haar integral.
Not all homogeneous spaces admit a covariant integral but those in which we are interested all do.
Going forward we will assume that all homogeneous spaces that we consider admit a covariant integral and that we can always use the equality from Definition definition.
Example
[[math]G=\iident{SE}(2)[/math] and [math]M=\mathbb{R}^2[/math]]
In the case we are most interested in, namely [math]G=\iident{SE}(2)[/math] and [math]M=\mathbb{R}^2[/math], we are fortunate that the Lebesgue measure on [math]\mathbb{R}^2[/math] is invariant with respect to [math]G[/math].
This is intuitively easy to understand: the area of a subset of [math]\mathbb{R}^2[/math] is invariant under both translation and rotation.
Example
[Haar measure on [math]\iident{SE}(2)[/math]]
The Haar measure on [math]\iident{SE}(2)[/math] also conveniently coincides with the Lebesgue measure on [math]\mathbb{R}^2 \times [0,2\pi)[/math] when using the parameterization from Example example.
Indeed, let [math]g=(\boldsymbol{x}_1,\theta_1)[/math] and [math]h=(\boldsymbol{x}_2,\theta_2)[/math] then:
[[math]]
\begin{equation*}
\int_{\mathbb{R}^2} \int_{0}^{2\pi}
\left( (\boldsymbol{x}_1,\theta_1) \cdot f \right) (\boldsymbol{x}_2,\theta_2)
\, \d\theta_2 \,\d\boldsymbol{x}_2
=
\int_{\mathbb{R}^2} \int_{0}^{2\pi}
f \left( (\boldsymbol{x}_1,\theta_1)^{-1} (\boldsymbol{x}_2,\theta_2) \right)
\, \d\theta_2 \,\d\boldsymbol{x}_2
.
\end{equation*}
[[/math]]
When we change variables to [math](\boldsymbol{x}_3,\theta_3)=(\boldsymbol{x}_1,\theta_1)^{-1} (\boldsymbol{x}_2,\theta_2)[/math] we obtain the following Jacobian matrix:
[[math]]
\begin{equation*}
\frac{\partial(x_2^1,x_2^2,\theta_2)}{\partial(x_3^1,x_3^2,\theta_3)}
=
\begin{pmatrix}
\cos\theta_1 & -\sin\theta_1 & 0
\\
\sin\theta_1 & \cos\theta_1 & 0
\\
0 & 0 & 1
\end{pmatrix}
,
\end{equation*}
[[/math]]
which has determinant [math]1[/math].
Consequently, the Haar integral (up to a multiplicative constant) on [math]\iident{SE}(2)[/math] can be calculated as:
[[math]]
\begin{equation}
\int_{\iident{SE}(2)} f(g) \, dg
=
\int_{\mathbb{R}^2}
\int_0^{2\pi}
f(\boldsymbol{x},\theta)
\,
\d\theta
\d\boldsymbol{x}
.
\end{equation}
[[/math]]
Equivariant Linear Operators
Of course the objective of this chapter is building equivariant operators, so when is an integral operator \ref{eq:integral_operator} equivariant?
Equivariance means that
[[math]]
\begin{equation*}
A (g \cdot f) = g\cdot(A f)
\end{equation*}
[[/math]]
for all [math]g \in G[/math] and [math]f \in \iident{B}(M)[/math] or equivalently
[[math]]
\begin{equation}
\label{eq:equivariant_A_2}
g^{-1} \cdot A (g \cdot f) = A f
.
\end{equation}
[[/math]]
This extra condition on [math]A[/math] will naturally impose some restrictions on the kernel of the operator as the following lemma shows.
Let [math]M[/math] and [math]N[/math] be homogeneous spaces of a Lie group [math]G[/math] so that [math]M[/math] admits a covariant integral with character [math]\chi_M[/math].
Let [math]A[/math] be an integral operator \ref{eq:integral_operator} from [math]\iident{C}(M) \cap \iident{B}(M)[/math] to [math]\iident{C}(N) \cap \iident{B}(N)[/math] with a kernel [math]k_A \in \iident{C}(M \times N)[/math].
Then
[[math]]
\begin{equation*}
A(g\cdot f) = g \cdot (A f)
\end{equation*}
[[/math]]
for all
[math]g \in G[/math] and
[math]f \in \iident{C}(M) \cap \iident{B}(M)[/math] if and only if
[[math]]
\begin{equation}
\label{eq:equivariant_kernel_symmetry}
\chi_M(g) \, k_A(g \cdot p,g \cdot q)
=
k_A(p,q)
\end{equation}
[[/math]]
for all
[math]g \in G[/math],
[math]p \in M[/math] and
[math]q \in N[/math].
Moreover [math]A[/math] is bounded (and so continuous) in the supremum norm if
[[math]]
\begin{equation}
\label{eq:kernel_boundedness_requirement}
\sup_{q \in N} \int_{M} |k_A(p,q)| dp \lt \infty
.
\end{equation}
[[/math]]
Show Proof
\ \\[5pt]
“[math]\Rightarrow[/math]”
Assuming [math]A[/math] to be equivariant, take an arbitrary [math]g \in G[/math] and [math]f \in \iident{C}(M) \cap \iident{B}(M)[/math] and substitute the definition of the group representation and [math]A[/math] in \ref{eq:equivariant_A_2} to find
[[math]]
\begin{equation}
\label{eq:ELO_1}
\int_M k_A(p,g \cdot q) \, f (g^{-1} \cdot p) \, dp
=
\int_M k_A(p,q) \, f(p) \,dp
\end{equation}
[[/math]]
for all
[math]q \in N[/math].
Fix [math]q \in N[/math] and let [math]F(p):=k_A(g \cdot p, g \cdot q) f(p)[/math] then observe that
[[math]]
\begin{equation*}
(g \cdot F)(p)
=
k_A(g \cdot g^{-1} \cdot p, g \cdot q) f(g^{-1} \cdot p)
=
k_A(p,g \cdot q) \, f (g^{-1} \cdot p)
,
\end{equation*}
[[/math]]
which is the left integrand from \ref{eq:ELO_1}.
Since we have assumed covariant integration we use Definition
definition and have
[[math]]
\begin{equation*}
\int_M \left( g \cdot F \right) (p) \, dp
=
\chi_M(g) \, \int_M F(p) \, dp
.
\end{equation*}
[[/math]]
Applying this to \ref{eq:ELO_1} we find
[[math]]
\begin{equation}
\label{eq:ELO_2}
\chi_M(g)
\int_M k_A(g \cdot p,g \cdot q) \, f (p) \, dp
=
\int_M k_A(p,q) \, f(p) \,dp
.
\end{equation}
[[/math]]
Since
[math]f[/math] was arbitrary and
[math]p \mapsto k_A(p,q)[/math] continuous it follows that
[[math]]
\begin{equation*}
\chi_M(g) \, k_A(g \cdot p,g \cdot q)
=
k_A(p,q)
\end{equation*}
[[/math]]
for all
[math]p \in M[/math].
“
[math]\Leftarrow[/math]”
Assuming
[math]\chi_M(g) \, k_A(g \cdot p,g \cdot q)=k_A(p,q)[/math] for all
[math]g \in G[/math],
[math]p \in M[/math] and
[math]q \in N[/math] then \ref{eq:ELO_2} follows for any choice of
[math]f \in \iident{C}(M) \cap \iident{B}(M)[/math],
[math]g \in G[/math] and
[math]q \in N[/math].
Substituting the covariant integral the other way yields
\ref{eq:ELO_1}, which implies \ref{eq:equivariant_A_2} since
[math]q \in N[/math] is arbitrary.
The function
[math]f[/math] and group element
[math]g[/math] were also chosen arbitrarily so \ref{eq:equivariant_A_2} follows for all
[math]f \in \iident{C}(M) \cap \iident{B}(M)[/math] and
[math]g \in G[/math].
Boundedness of [math]A[/math] follows from
[[math]]
\begin{align*}
\left\Vert A f \right\Vert_{\infty}
&=
\sup_{q \in N}
\left|
\int_M k_A(p,q) \, f(p) dp
\right|
\\
&\leq
\sup_{q \in N}
\int_M
| k_A(p,q) | \, |f(p)| dp
\\
&\leq
\left\Vert f \right\Vert_\infty
\cdot\
\sup_{q \in N}
\int_M | k_A(p,q) | dp
\\
&
\overset{\ref{eq:kernel_boundedness_requirement}}{ \lt } \infty
.
\end{align*}
[[/math]]
■
The condition on the kernel \ref{eq:kernel_boundedness_requirement} is partially redundant with the symmetry requirement as the following lemma shows.
In the same setting as Lemma lemma. If the kernel [math]k_A \in \iident{C}(M \times N)[/math] satisfies the symmetry \ref{eq:equivariant_kernel_symmetry} and condition \ref{eq:kernel_boundedness_requirement} then
[[math]]
\begin{equation*}
\left\Vert k_A(\ \cdot\ ,q_1) \right\Vert_{L^1(M)}
=
\left\Vert k_A(\ \cdot\ ,q_2) \right\Vert_{L^1(M)}
\end{equation*}
[[/math]]
for all
[math]q_1,q_2 \in N[/math].
Show Proof
Since [math]N[/math] is a homogeneous space then for all [math]q_1,q_2 \in N[/math] there exists a [math]g \in G[/math] so that [math]q_1 = g \cdot q_2[/math], then
[[math]]
\begin{align*}
\int_M \left| k_A(p,q_1) \right| dp
&=
\int_M \left| k_A(p,g \cdot q_2) \right| dp
\\
&=
\int_M \left| k_A(g \cdot g^{-1} \cdot p, g \cdot q_2) \right| dp
\\
{\scriptsize \ref{eq:equivariant_kernel_symmetry}}
&=
\frac{1}{\chi_M(g)}
\int_M \left| k_A(g^{-1} \cdot p, q_2) \right| dp
\\
\text{\scriptsize (Def.~[[#def:covariant_integral |definition]])}
&=
\frac{\chi_M(g)}{\chi_M(g)}
\int_M \left| k_A(p, q_2) \right| dp
\\
&=
\int_M \left| k_A(p,q_2) \right| dp
.
\end{align*}
[[/math]]
■
The condition on the kernel from Lemma lemma can be exploited to express it as a function on [math]M[/math] instead of [math]M \times N[/math].
If we fix a [math]q_0 \in N[/math] and for all [math]q \in N[/math] we choose a [math]g_q \in G_{q_0,q}[/math] (i.e. so that [math]g_q \cdot q_0=q[/math]) then by \ref{eq:equivariant_kernel_symmetry} we have
[[math]]
\begin{equation*}
k_A(p,q)
=
\chi_M(g_q^{-1}) \ k_A(g_q^{-1} \cdot p, g_q^{-1} \cdot q)
=
\chi_M(g_q^{-1}) \ k_A(g_q^{-1} \cdot p, q_0)
,
\end{equation*}
[[/math]]
which fixes the second input of [math]k_A[/math].
Consequently we could contain all the information of our kernel in a function that exists only on [math]M[/math] as [math]\kappa_A(p) := k_A(p,q_0)[/math].
This reduced kernel [math]\kappa_A[/math] still has some restrictions placed on it for the resulting operator to be equivariant, as the following theorem makes precise.
Let [math]M[/math] and [math]N[/math] be homogeneous spaces of a Lie group [math]G[/math] so that [math]M[/math] admits a covariant integral with respect to a character [math]\chi_M[/math] of [math]G[/math].
Fix a [math]q_0 \in N[/math] and let [math]\kappa_A \in \iident{C}(M) \cap \iident{L}^1(M)[/math] be compatible, i.e. have the property that
[[math]]
\begin{equation}
\label{eq:kernel_compatibility}
\forall h \in G_{q_0}
:
h \cdot \kappa_A = \chi_M (h) \, \kappa_A.
\end{equation}
[[/math]]
Then the operator [math]A[/math] defined by
[[math]]
\begin{equation*}
(Af)(q) := \frac{1}{\chi_M(g_q)} \int_M (g_q \cdot \kappa_A) (p) \, f(p) \, dp
\end{equation*}
[[/math]]
where for all
[math]q \in N[/math] we can choose any
[math]g_q[/math] so that
[math]g_q \cdot q_0 = q[/math], is a
well defined bounded linear operator from
[math]\iident{C}(M) \cap \iident{B}(M)[/math] to
[math]\iident{C}(N) \cap \iident{B}(N)[/math] that is
equivariant with respect to
[math]G[/math].
Conversely every equivariant integral operator with a kernel [math]k_A \in \iident{C}(M\times N)[/math] and with [math]k_A(\,\cdot\,,q) \in \iident{L}^1(M)[/math] for some [math]q \in N[/math] is of this form.
Show Proof
\ \\
“[math]\Rightarrow[/math]”
Assuming we have a [math]\kappa_A \in \iident{C}(M) \cap \iident{L}^1(M)[/math] that satisfies \ref{eq:kernel_compatibility}.
Define [math]k_A \in C(M \times N)[/math] by
[[math]]
\begin{equation*}
k_A(p,q) := \frac{1}{\chi_M(g_q)} (g_q \cdot \kappa_A)(p).
\end{equation*}
[[/math]]
Then
[math]k_A[/math] is well defined since it does not depend on the choice of
[math]g_q[/math] for a given
[math]q \in N[/math].
If
[math]g_q'[/math] is another group element with
[math]g_q \cdot q_0 = q[/math] then there exists a
[math]h \in G_{q_0}[/math] so that
[math]g_q' = g_q h[/math], we can check
[math]k_A[/math] is invariant under choice of
[math]h \in G_{q_0}[/math]:
[[math]]
\begin{align*}
\frac{1}{\chi_M(g_q h)} (g_q \cdot h \cdot \kappa_A)(p)
=
\frac{\chi_M(h)}{\chi_M(g_q) \chi_M(h)} (g_q \cdot \kappa_A)(p)
=
\frac{1}{\chi_M(g_q)} (g_q \cdot \kappa_A)(p)
.
\end{align*}
[[/math]]
The kernel
[math]k_A[/math] also satisfies the symmetry requirement \ref{eq:equivariant_kernel_symmetry} from Lemma
lemma:
[[math]]
\begin{align*}
\chi_M(g) \, k_A(g \cdot p,g \cdot q)
&=
\chi_M(g) \, \frac{1}{\chi_M(g_{(g\cdot q)})} (g_{(g\cdot q)} \cdot \kappa_A)(g \cdot p)
\\
&=
\chi_M(g) \, \frac{1}{\chi_M(g g_{q})} (g \cdot g_{q} \cdot \kappa_A)(g \cdot p)
\\
&=
\frac{\chi_M(g)}{\chi_M(g)\chi_M(g_{q})} (g_{q} \cdot \kappa_A)(g^{-1}g \cdot p)
\\
&=
\frac{1}{\chi_M(g_q)} (g_q \cdot \kappa_A)(p)
\\
&=
k_A(p,q)
.
\end{align*}
[[/math]]
By Lemma
lemma we have
[[math]]
\begin{equation*}
\sup_{q \in N} \int_{M} |k_A(p,q)| dp
=
\left\Vert k_A(\,\cdot\,,q_0) \right\Vert_{L^1(M)}
=
\left\Vert \kappa_A \right\Vert_{L^1(M)}
\lt \infty
.
\end{equation*}
[[/math]]
Consequently,
[math]A[/math] also satisfies \ref{eq:kernel_boundedness_requirement} and is a bounded equivariant linear operator per Lemma
lemma.
“[math]\Leftarrow[/math]”
\ \\
Assuming we have an equivariant linear operator [math]A[/math] with kernel [math]k_A \in \iident{C}(M \times N)[/math] then we pick a fixed [math]q_0 \in N[/math] and define [math]\kappa_A \in \iident{C}(M)[/math]
[[math]]
\begin{equation*}
\kappa_A(p) := k_A(p,q_0)
.
\end{equation*}
[[/math]]
This reduced kernel
[math]\kappa_A[/math] satisfies the compatibility condition \ref{eq:kernel_compatibility} since if
[math]h \in G_{q_0}[/math] then
[[math]]
\begin{align*}
(h \cdot \kappa_A)(p)
&=
k_A(h^{-1} \cdot p, q_0)
\\
&=
k_A(h^{-1} \cdot p, h^{-1} \cdot q_0)
\\
&=
\chi_M(h) \, k_A(p, q_0)
\\
&=
\chi_M(h) \, \kappa_A(p)
.
\end{align*}
[[/math]]
Since we required
[math]k_A(\,\cdot\,,q) \in \iident{L}^1(M)[/math] for some
[math]q \in N[/math], we apply Lemma
lemma to find
[[math]]
\begin{equation*}
\left\Vert \kappa_A \right\Vert_{L^1(M)}
=
\left\Vert k_A(\ \cdot\ ,q_0) \right\Vert_{L^1(M)}
=
\left\Vert k_A(\ \cdot\ ,q) \right\Vert_{L^1(M)}
\lt
\infty.
\end{equation*}
[[/math]]
■
Theorem theorem is the at the core of group equivariant CNNs since it allows us to generalize the familiar convolution operation present in CNNs to general linear operators that are equivariant with respect to a group of choice.
Example
[Group convolution]
Let [math]G=M=N[/math] be some Lie group.
A Lie group always admits a Haar integral, so we have a trivial character [math]\chi=1[/math].
As reference element we obviously choose the unit element [math]e[/math], though any group element would do.
Then [math]G_g = \{ e \}[/math] and [math]G_{e,g}=\{ g \}[/math] are both trivial.
Hence we have no symmetry condition on the kernel.
Any [math]\kappa_A \in C(G) \cap L^1(G)[/math] defines a linear operator [math]A: C(G) \cap B(G) \to C(G) \cap B(G)[/math] by
[[math]]
\begin{equation*}
(Af)(h)
=
\int_G (h \cdot \kappa_A)(g) \, f(g) \, dg
=
\int_G \kappa_A (h^{-1} g) \, f(g) \, dg
\end{equation*}
[[/math]]
We also call this operation group cross-correlation and denote it as
[[math]]
\begin{equation*}
(\kappa \star_G f)(h)
:=
\int_G (h \cdot \kappa)(g) \, f(g) \, dg
.
\end{equation*}
[[/math]]
As in the familiar [math]\mathbb{R}^n[/math] setting, group cross-correlation is closely related to group convolution, which is defined as
[[math]]
\begin{equation*}
(\check{\kappa} *_G f)(h)
:=
\int_G \check{\kappa} (g^{-1} h) \, f(g) \, dg
.
\end{equation*}
[[/math]]
We leave relating the two kernels [math]\kappa[/math] and [math]\check{\kappa}[/math] as an exercise: when is [math]\kappa \star_G f = \check{\kappa} *_G f[/math]?
As in the [math]\mathbb{R}^n[/math] case, when we talk about group convolution we mean both group cross-correlation and group convolution since they are interchangeable.
Example
[Rotation-translation equivariance in [math]\mathbb{R}^2[/math]]
Let [math]G=\iident{SE}(2)= \mathbb{R}^2 \rtimes \iident{SO}(2)[/math] and [math]M=N=\mathbb{R}^2[/math].
The Lebesgue measure on [math]\mathbb{R}^2[/math] is rotation-translation invariant so we have a G-invariant integral on [math]\mathbb{R}^2[/math].
Choose [math]\boldsymbol{y}_0=\boldsymbol{0}[/math] as the reference element then [math]G_{\boldsymbol{y}_0}=\left\{ (\boldsymbol{0},\, R(\theta)) \in G \ \middle\vert\ \theta \in [0,2\pi) \right\}[/math] is the stabilizer of [math]\boldsymbol{y}_0[/math].
A kernel [math]\kappa_A[/math] on [math]\mathbb{R}^2[/math] is then compatible if
[[math]]
\begin{equation*}
(\boldsymbol{0},\, R(\theta)) \cdot \kappa_A = \kappa_A
\qquad
\forall \theta \in [0,2\pi)
,
\end{equation*}
[[/math]]
i.e. [math]\kappa_A[/math] needs to be radially symmetric.
Now, we could have figured that out without building up the whole equivariance framework.
But the next section will show how we can use the equivariance framework to step over the severe restriction that is imposed on the allowable kernels here.
General references
Smets, Bart M. N. (2024). "Mathematics of Neural Networks". arXiv:2403.04807 [cs.LG].
References
- Cite error: Invalid
<ref>
tag; no text was provided for refs named tao2011introduction
- Cite error: Invalid
<ref>
tag; no text was provided for refs named federer2014geometric