Tropical Operators
We previously generalized the idea of a convolution to that of equivariant linear operators in a Lie group setting, essentially generalizing the domain of our functions. But we can go further by looking at the codomain and generalizing what we mean by `linear'. Linear and non-linear are generally thought of as an absolute dichotomy. Another way of thinking about a linear function or operator is as a map that preserves the algebraic properties of the field of real numbers. But could we not look at maps that preserve the structure of other algebraic structures rather than the field of real numbers? We could in principle develop this idea using any algebraic structure, but to be useful in the context of neural networks we want the underlying set to be numeric and the operations to be able to be performed by a computer. Additionally, restricting ourselves to fields would be very limiting, a more congenial algebraic structure is the semiring. Semirings turn up in many places, indeed the first mathematical structure we learn about, the set of natural numbers [math]\mathbb{N}[/math], is a semiring. Semirings arise in many areas of mathematics such as functional analysis, topology, graphs, combinatorics, optimization, etc. See [1] for an extensive survey of the applications of semirings.
Semirings
A semiring is an algebraic structure in which we can add and multiply elements, but in which neither subtraction nor division are necessarily possible.
A semiring is a set [math]R[/math] equipped with two binary operations [math]\oplus[/math] and [math]\odot[/math], called addition and multiplication, such that
- addition and multiplication are associative,
- addition is commutative,
- addition has an identity element [math]\mathbb{0}[/math],
- multiplication has an identity element [math]\mathbb{1}[/math],
- multiplication distributes over addition:
[[math]] \begin{equation*} \begin{split} a \odot (b \oplus c) = a \odot b \oplus a \odot c, \\ (a \oplus b) \odot c = a \odot c \oplus b \odot c, \end{split} \end{equation*} [[/math]]
- multiplication by [math]\mathbb{0}[/math] annihilates:
[[math]] \begin{equation*} \mathbb{0} \odot a = a \odot \mathbb{0} = \mathbb{0}. \end{equation*} [[/math]]
Additionally if [math]a \oplus a = a[/math] we say the semiring is idempotent and if [math]a \odot b = b \odot a[/math] we say the semiring is commutative or abelian.
Just like with standard multiplication it is conventional to abbreviate [math]a b \equiv a \odot b[/math] if there can be no confusion. We also let [math]\odot[/math] take precedence over [math]\oplus[/math], i.e. [math]a \odot b \oplus c = (a \odot b) \oplus c[/math].
Some works refer to semirings as rigs, from rings `without negatives', hence the missing `n'.
Example [Real linear semiring]
The real numbers form a commutative semiring [math](\mathbb{R},+, \,\cdot\,)[/math] under standard addition and multiplication. Naturally all fields and rings are also semirings.
Example [Viterbi semiring]
The Viterbi semiring is given by the unit interval [math][0,1][/math] and the operations
The additive unit is [math]0[/math] and multiplicative unit is [math]1[/math]. This semiring is both idempotent and commutative. Note that there exists no element [math](-a)[/math] so that [math]a \oplus (-a) = \mathbb{0}=0[/math] and no element [math]a^{-1}[/math] so that [math]a \odot a^{-1} = \mathbb{1}=1[/math]. So neither subtraction nor division is possible in this semiring.
Example [Log semiring]
The log semiring is given by the set [math]\mathbb{R} \cup \{ -\infty,\,+\infty\}[/math] and the operations
The additive unit is [math]-\infty[/math] and the multiplicative unit is [math]0[/math].
The central object of study in linear algebra is the vector space over a field (usually [math]\mathbb{R}[/math] or [math]\mathbb{C}[/math]). The analogue to the vector space in our generalized setting is the semimodule.
Let [math](R,\oplus,\odot)[/math] be a semiring. A left [math]R[/math]-semimodule or semimodule over [math]R[/math] is a commutative monoid [math](M,+)[/math] with additive identity [math]0_M[/math] and a map [math]R \times M \to M[/math] denoted by [math](a,m) \mapsto a m[/math], called scalar multiplication, so that for all [math]a,b \in R[/math] and [math]m,m' \in M[/math] the following hold:
- [math](a \odot b) m = a (b m)[/math],
- [math]a(m+m')=a m + a m'[/math],
- [math](a \oplus b)m=a m + b m[/math],
- [math]\mathbb{1} m = m[/math],
- [math]a 0_M = 0_M = \mathbb{0} m[/math].
Naturally every semiring is a semimodule over itself just like every field is a vector space over itself.
Example
Let [math]R[/math] be a semiring and [math]S[/math] a non-empty set, then [math]R^S[/math] (the set of all functions [math]S \to R[/math]) is a semimodule with addition and scalar multiplication defined element-wise: let [math]f,f' \in R^S[/math] and [math]r \in R[/math] then
for all [math]s \in S[/math]. Here we used [math]\oplus[/math] and [math]\odot[/math] for the operations in the semimodule as well since they correspond with the [math]\oplus[/math] and [math]\odot[/math] operations in the semiring. The additive identity of the semimodule is the constant function [math]s \mapsto \mathbb{0}[/math]. This generalizes the prototypical vector space [math]\mathbb{R}^n[/math] to the semimodule [math]R^n[/math] using the notational convention [math]n\equiv\{1,\ldots,n\}[/math]. For non-finite [math]S[/math] we get the generalization of function vector spaces to function semimodules.
Example
Let [math]R=([0,1],\max,\,\cdot\,)[/math] be the Viterbi semiring from Example example and let [math]S[/math] be a manifold. Let [math]f,f': S \to [0,1][/math] and define addition and scalar multiplication as
then the functions from [math]S[/math] to [math][0,1][/math] form a semimodule.
Now we can formulate a generalization of linear maps in the form of semiring homomorphisms.
Let [math]R[/math] be a semiring and let [math]X[/math] and [math]Y[/math] be semimodules over [math]R[/math]. Then a map [math]A:X \to Y[/math] is an [math]R[/math]-homomorphic map or an [math]R[/math]-homomorphism if for all [math]a,b \in R[/math] and [math]f,f' \in X[/math]:
Just like with the definition of linearity the single condition above is equivalent to the following two conditions:
Under this definition we can understand linear as meaning homomorphic with respect to the real linear semigroup [math](\mathbb{R},+,\,\cdot\,)[/math].
But now, instead of just considering linear maps we can pick another semigroup and consider homomorphisms with respect to this semigroup.
This allows us to construct equivariant semimodule homomorphic operators in the same fashion as we did with equivariant linear operators in section Equivariant Linear Operators.
We will develop such a class of equivariant operators for a particular choice of semiring.
Tropical Semiring
The max tropical semiring or max-plus algebra or simply tropical semiring is the semiring [math]\mathbb{R}_{\max}:=\left(\mathbb{R} \cup \{-\infty\} ,\oplus,\odot \right)[/math], with the operations
Since [math]a \oplus a = \max \{ a, a \} = a[/math] and [math]a \odot b = a+b=b+a = b \odot a[/math] the tropical semiring is both idempotent and commutative. By definition we set [math]a + (-\infty) := -\infty[/math] so that we satisfy the annihilation property [math]a \odot \mathbb{0} = \mathbb{0} \odot a = \mathbb{0}[/math].
The tropical semiring can alternatively be defined as [math](\mathbb{R} \cup \{ +\infty \}, \min, +)[/math], which is then also called the min tropical semiring or min-plus algebra. But we observe that one is isomorphic to the other via negation so we make a style choice and go with the max version.
Example [ReLU]
Recall that the rectified linear unit is defined as [math]x \mapsto \{ x,\, 0\}[/math] for [math]x \in \mathbb{R}[/math]. In the tropical setting we can write this as [math]x \mapsto x \oplus 0[/math] for [math]x \in \mathbb{R}_{\max}[/math], hence the ReLU may not be a linear or affine function but it is tropically affine. So we can think about a typical ReLU neural network as really alternating operations from two distinct semirings.
We based our construction of equivariant linear operators on integration. When you consider an integral as a limit of a sum of products we can see how we could define a type of integration with respect to another semigroup. Recall that Riemannian integration is defined in terms of Darboux sums over ever smaller partitions of the underlying space. Let [math]M[/math] be a manifold with some Radon measure [math]\mu[/math] and [math](R,\oplus,\odot)[/math] a semiring. Let [math]P[/math] be a partition of [math]M[/math] and define [math]\mu(P) := \sup_{S \in P} \mu(S)[/math]. Then we can generalize a Darboux sum

where [math]p_S \in M[/math] is any arbitrary point in the partition element [math]S[/math] and assuming that for all [math]\varepsilon \gt 0[/math] we can find a partition [math]P[/math] so that [math]\mu(P) \lt \varepsilon[/math]. Filling in the tropical semiring operations we obtain:
([math]\mu(S) \to 0[/math] since [math]\mu(P) \to 0[/math])
(Assuming [math]f[/math] is such that the limit exists and is unique)
Lebesgue integration can be generalized similarly. This would yield the same [math]\sup_{p \in M} f(p)[/math] formula but for all measurable functions that are bounded from above rather then all the functions for which the Darboux sum has a unique limit. We will not detail the Lebesgue construction (see [2] for that) but proceed with the [math]\sup_{p \in M} f(p)[/math] formula as our definition of tropical integral.
Classic example of a function that is Lebesgue integrable but not Riemann integrable is the indicator function of the rational numbers [math]\mathbb{1}_{\mathbb{Q}}[/math]. The limit \ref{eq:lim_max_f} does not exist depending on our choice of points [math]p_S[/math] but in the Lebesgue sense we simply get [math]\sup_{x \in \mathbb{R}} \mathbb{1}_{\mathbb{Q}}(x)=1[/math] as expected.
Let [math]M[/math] be a manifold, then we define the set of measurable [math]\mathbb{R}_{\max}[/math]-valued functions that are bounded from above as
Clearly BA([math]M[/math]) is a tropical semimodule (i.e. a semimodule with respect to the tropical semiring) under pointwise addition and multiplication:
for all [math]a \in \mathbb{R}_{\max}[/math], [math]f,f' \in \iident{BA}(M)[/math] and all [math]p \in M[/math]. The functions in BA([math]M[/math]) are exactly those for which the tropical integral exists.
Let [math]M[/math] be a manifold then we call the mapping [math]\iident{BA}(M) \to \mathbb{R}_{\max}[/math] defined by
We can easily check that the tropical integral is a tropical map (i.e. a semimodule homomorphism with respect to the tropical semiring) from [math]\iident{BA}(M)[/math] to [math]\mathbb{R}_{\max}[/math] in the same way that the (linear) integral is a linear map from the integrable functions to [math]\mathbb{R}[/math]. For all [math]a,b \in \mathbb{R}_{\max}[/math] and [math]f,f' \in \iident{BA}(M)[/math] we have:
Let [math]M[/math] be a homogeneous space of the Lie group [math]G[/math], let `[math]\cdot[/math]' denote both the action of [math]G[/math] on [math]M[/math] and the corresponding representation on functions on [math]M[/math]. Since this representation does not affect the codomain of the functions it does not change the supremum, hence the tropical integral is always invariant:
So now we have developed an alternative notion of linearity and an alternative to the (linear) invariant integral. We will use these elements to construct a new type of equivariant operator in the same manner as we did before with linear operators.
Equivariant Tropical Operators
The starting point for our equivariant linear operators was the integral operator from equation. We obtain the tropical analogue simply by replacing the linear semiring operations:
We can see that if [math]k_T \in \iident{BA}(M \times N)[/math] and [math]f \in \iident{BA}(M)[/math] then [math]Tf \in \iident{BA}(N)[/math]. It is straightforward to verify that [math]T[/math] is a tropical operator from [math]\iident{BA}(N)[/math] to [math]\iident{BA}(N)[/math]. Now we can proceed in the exact same way as in section Equivariant Linear Operators to find out when [math]T[/math] is equivariant.
Let [math]M[/math] and [math]N[/math] be homogeneous spaces of a Lie group [math]G[/math]. Let [math]T[/math] be a tropical integral operator from [math]\iident{BA}(M)[/math] to [math]\iident{BA}(N)[/math] with a kernel [math]k_T \in \iident{BA}(M \times N)[/math]. Then
First we show that [math]T f \in \iident{BA}(N)[/math]:
“[math]\Rightarrow[/math]” Assuming [math]T[/math] to be equivariant, take an arbitrary [math]g \in G[/math] and [math]f \in \iident{BA}(M)[/math] and substitute the definition of the group representation and [math]T[/math] in
“[math]\Leftarrow[/math]”
Assuming [math]k_T(g \cdot p,g \cdot q)=k_T(p,q)[/math] for all [math]g \in G[/math], [math]p \in M[/math] and [math]q \in N[/math] then \ref{eq:TLO_2} follows for any choice of [math]f \in \iident{BA}(M)[/math], [math]g \in G[/math] and [math]q \in N[/math]. Substituting the invariant tropical integral the other way yields \ref{eq:TLO_1}, which implies the equivariance
Now we can use the same strategy as we did in the linear case and use the symmetry \ref{eq:tropical_equivariant_kernel_symmetry} to fix the second argument of the kernel [math]k_T[/math]. Pick a [math]q_0 \in N[/math] then for all [math]q \in N[/math] there exists at least one [math]g_q \in G_{q_0}[/math], consequently
for all [math]p \in M[/math] and [math]q \in N[/math]. From which we can define a reduced kernel [math]\kappa_T \in \iident{BA}(M)[/math] as
that still allows us to recover the full kernel by
This whole construction in the end gives us the same type of operator as in Theorem theorem but with the linear operations switched for tropical operations:

We summarize this result in the following theorem.
Let [math]M[/math] and [math]N[/math] be homogeneous spaces of a Lie group [math]G[/math]. Fix a [math]q_0 \in N[/math] and let [math]\kappa_T \in \iident{BA}(M)[/math] be compatible, i.e
Then the operator [math]T[/math] defined by
Conversely every tropical integral operator \ref{eq:tropical_integral_operator} with a kernel in [math]\iident{BA}(M \times N)[/math] that is equivariant is of this form.
“[math]\Rightarrow[/math]”
Assume we have a [math]\kappa_T \in \iident{BA}(M)[/math] that is compatible. Define
“[math]\Leftarrow[/math]”
Assume we have a tropical integral operator [math]T[/math], then by Lemma lemma we have a kernel [math]k_T \in \iident{BA}(M \times N)[/math] that satisfies \ref{eq:tropical_equivariant_kernel_symmetry}. Fix a [math]q_0 \in N[/math] and define
Example [Max pooling] Let [math]G=(\mathbb{R}^n,+)[/math] be the translation group and [math]M=N=\mathbb{R}^n[/math]. Pick a subset [math]S \subset \mathbb{R}^n[/math] and define
Then the corresponding operator [math]T[/math] equals
so at each point [math]\boldsymbol{y}[/math] the output equals the supremum of the input in the subset [math]\boldsymbol{y}+S[/math]. Think of this a continous version the shift-invariant max pooling operation usually seen in CNNs. In this light we can see that max pooling is a tropical operator.
Example [Tropical convolution] Let [math]G=M=N[/math] be some Lie group, then we call the equivariant tropical operation tropical convolution or morphological convolution, which we denote as
The name morphological convolution comes from the field of grayscale morphology where these types of operators have previously been used for image processing applications, see [3]. An example use of these types of operations in neural networks can be found in [4].
Example [Pointwise ReLU] Let [math]G=M=N[/math] be some Lie group and let [math]f \in \iident{BA}(M)[/math]. Define a kernel
Then applying the corresponding operator to [math]f[/math] gives
We have not said anything about the boundedness of [math]T[/math] in Theorem theorem since it is not clear what metric or norm we should consider on the function space BA.
In the following lemma we detail some reasonable conditions under which [math]T[/math] will be bounded in the supremum norm.
In the setting of Theorem theorem: if [math]f \in \iident{B}(M)[/math] and [math]\sup_{p \in M} \kappa_T(p)=0[/math] then [math]Tf \in \iident{B}(N)[/math] and [math]T[/math] is a bounded operator from [math]\iident{B}(M)[/math] to [math]\iident{B}(N)[/math] in the supremum norm.
Show ProofWe have seen how we can use semirings, and the tropical semiring in particular, to develop classes of equivariant operators other than linear. From the examples we saw we can conclude that many operators currently in use in neural networks (particularly the ReLU and max pooling activation functions) are special cases of tropical or tropically affine operators and fit well inside the equivariant semimodule homomorphism framework.
General references
Smets, Bart M. N. (2024). "Mathematics of Neural Networks". arXiv:2403.04807 [cs.LG].
References
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedgolan1999semirings
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedkolokoltsov1997idempotent
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedwikipedia2021morphology
- Cite error: Invalid
<ref>
tag; no text was provided for refs namedsmets2021pdebased