Norm maximizers

[math] \newcommand{\mathds}{\mathbb}[/math]

This article was automatically generated from a tex file and may contain conversion errors. If permitted, you may login and edit this article to improve the conversion.

3a. Critical points

We have seen in the previous chapter that the set [math]Y_N=M_N(\pm1)\cap\sqrt{N}O_N[/math] formed by the [math]N\times N[/math] Hadamard matrices can be located inside [math]\sqrt{N}O_N[/math] by using analytic techniques, and more precisely variations of the following result:

Theorem

Given a matrix [math]H\in\sqrt{N}O_N[/math] we have:

  • [math]||H||_p\leq N^{2/p}[/math] for [math]p\in[1,2)[/math], with equality precisely when [math]H[/math] is Hadamard.
  • [math]||H||_p\geq N^{2/p}[/math] for [math]p\in(2,\infty][/math], with equality precisely when [math]H[/math] is Hadamard.


Show Proof

This is something that we know from chapter 2, in rescaled reformulation. Consider indeed the [math]p[/math]-norm on [math]\sqrt{N}O_N[/math], which at [math]p\in[1,\infty)[/math] is given by:

[[math]] ||H||_p=\left(\sum_{ij}|H_{ij}|^p\right)^{1/p} [[/math]]


We have then [math]||H||_2=N[/math], and by using this, together with the Jensen inequality for [math]\psi(x)=x^{p/2}[/math], or simply the Hölder inequality for the norms, we obtain the results. As for the case [math]p=\infty[/math], this follows with [math]p\to\infty[/math], or directly via Cauchy-Schwarz.

Once again following the material in chapter 2, we have seen there that a nice result can be obtained along these lines at [math]N=3[/math] and [math]p=1[/math]. To be more precise, the maximizers of the 1-norm on [math]\sqrt{3}O_3[/math] are the following matrix, and its Hadamard conjugates:

[[math]] K_3=\frac{1}{\sqrt{3}}\begin{pmatrix} -1&2&2\\ 2&-1&2\\ 2&2&-1 \end{pmatrix} [[/math]]


In general, however, computing the maximizers of the [math]1[/math]-norm on [math]\sqrt{N}O_N[/math] remains a difficult question. So, based on the above, let us formulate the following definition:

Definition

A matrix [math]H\in\sqrt{N}O_N[/math] is called:

  • Almost Hadamard, if it locally maximizes the [math]1[/math]-norm on [math]\sqrt{N}O_N[/math].
  • Optimal almost Hadamard, if it maximizes the [math]1[/math]-norm on [math]\sqrt{N}O_N[/math].

More generally, we can talk about [math]p[/math]-almost Hadamard matrices, exactly in the same way, at any [math]p\in[1,\infty]-\{2\}[/math], by using the results in Theorem 3.1. When a matrix [math]H\in\sqrt{N}O_N[/math] is almost Hadamard at any [math]p[/math], we call it “absolute almost Hadamard”. We will see in what follows that, while the study of the optimal almost Hadamard matrices remains something quite difficult, in the general almost Hadamard setting there are many interesting things to be done, and some nice theory to be developed.


Needless to say, all this is motivated by the lack of Hadamard matrices at [math]N \gt 2[/math], [math]N\notin4\mathbb N[/math]. However, we will see that our theory is quite interesting even at values [math]N\in4\mathbb N[/math]. Finally, let us mention that there is a long story with the almost Hadamard matrices, going back to the 2010 paper [1], then to the 2012 paper [2], and with the theory of such matrices having been further developed all over the 10s, in the series of papers [3], [4], [5], [6], [7]. We will try to explain here the basics of this theory.


In order to get started, let us study the local mazimizers of the 1-norm on [math]\sqrt{N}O_N[/math]. It is technically convenient here to rescale by [math]1\sqrt{N}[/math], and work instead over the orthogonal group [math]O_N[/math], by using the avaliable tools here. Following [1], we first have:

Theorem

If [math]U\in O_N[/math] locally maximizes the [math]1[/math]-norm, then

[[math]] U_{ij}\neq 0 [[/math]]
must hold for any [math]i,j[/math].


Show Proof

Assume by contradiction that [math]U[/math] has a 0 entry. By permuting the rows we can assume that this 0 entry is in the first row, having under it a nonzero entry in the second row. We denote by [math]U_1,\ldots,U_N[/math] the rows of [math]U[/math]. By permuting the columns we can assume that we have a block decomposition of the following type:

[[math]] \begin{pmatrix}U_1\\ U_2\end{pmatrix} =\begin{pmatrix} 0&0&Y&A&B\\ 0&X&0&C&D \end{pmatrix} [[/math]]


Here [math]X,Y,A,B,C,D[/math] are certain vectors with nonzero entries, with [math]A,B,C,D[/math] chosen such that each entry of [math]A[/math] has the same sign as the corresponding entry of [math]C[/math], and each entry of [math]B[/math] has sign opposite to the sign of the corresponding entry of [math]D[/math]. Now for [math]t \gt 0[/math] small consider the matrix [math]U^t[/math] obtained by rotating by an angle [math]t[/math] the first two rows of [math]U[/math]. In row notation, this matrix is given by the following formula:

[[math]] U^t =\begin{pmatrix} \cos t&\sin t\\ -\sin t&\cos t\\ &&1\\ &&&\ddots\\ &&&&1\end{pmatrix} \begin{pmatrix} U_1\\ U_2\\ U_3\\ \vdots\\ U_N \end{pmatrix} =\begin{pmatrix} \cos t\cdot U_1+\sin t\cdot U_2\\ -\sin t\cdot U_1+\cos t\cdot U_2\\ U_3\\ \vdots\\ U_N \end{pmatrix} [[/math]]


We make the convention that the lower-case letters denote the 1-norms of the corresponding upper-case vectors. According to the above sign conventions, we have:

[[math]] \begin{eqnarray*} ||U^t||_1 &=&||\cos t\cdot U_1+\sin t\cdot U_2||_1+||-\sin t\cdot U_1+\cos t\cdot U_2||_1+\sum_{i=3}^Nu_i\\ &=&(\cos t+\sin t)(x+y+b+c)+(\cos t-\sin t)(a+d)+\sum_{i=3}^Nu_i\\ &=&||U||_1+(\cos t+\sin t-1)(x+y+b+c)+(\cos t-\sin t-1)(a+d) \end{eqnarray*} [[/math]]


By using [math]\sin t=t+O(t^2)[/math] and [math]\cos t=1+O(t^2)[/math] we obtain:

[[math]] \begin{eqnarray*} ||U^t||_1 &=&||U||_1+t(x+y+b+c)-t(a+d)+O(t^2)\\ &=&||U||_1+t(x+y+b+c-a-d)+O(t^2) \end{eqnarray*} [[/math]]


In order to conclude, we have to prove that [math]U[/math] cannot be a local maximizer of the [math]1[/math]-norm. This will basically follow by comparing the norm of [math]U[/math] to the norm of [math]U^t[/math], with [math]t \gt 0[/math] small or [math]t \lt 0[/math] big. However, since in the above computation it was technically convenient to assume [math]t \gt 0[/math], we actually have three cases:


\underline{Case 1}: [math]b+c \gt a+d[/math]. Here for [math]t \gt 0[/math] small enough the above formula shows that we have [math]||U^t||_1 \gt ||U||_1[/math], and we are done.


\underline{Case 2}: [math]b+c=a+d[/math]. Here we use the fact that [math]X[/math] is not null, which gives [math]x \gt 0[/math]. Once again for [math]t \gt 0[/math] small enough we have [math]||U^t||_1 \gt ||U||_1[/math], and we are done.


\underline{Case 3}: [math]b+c \lt a+d[/math]. In this case we can interchange the first two rows of [math]U[/math] and restart the whole procedure: we fall in Case 1, and we are done again.

Let us study now the critical points. It is convenient here to talk about more general [math]p[/math]-norms, or even more general functions of the quantities [math]|U_{ij}|[/math], because this will lead to some interesting combinatorics. Following [1], [4], we have the following result:

Theorem

Consider a differentiable function [math]\varphi:[0,\infty)\to\mathbb R[/math]. An orthogonal matrix having nonzero entries, [math]U\in O_N^*[/math], is then a critical point of the function

[[math]] F(U)=\sum_{ij}\varphi(|U_{ij}|) [[/math]]
precisely when the matrix [math]WU^t[/math] is symmetric, where:

[[math]] W_{ij}={\rm sgn}(U_{ij})\varphi'(|U_{ij}|) [[/math]]
In particular, for [math]F(U)=||U||_1[/math] we need [math]SU^t[/math] to be symmetric, where [math]S_{ij}={\rm sgn}(U_{ij})[/math].


Show Proof

We regard [math]O_N[/math] as a real algebraic manifold, with coordinates [math]U_{ij}[/math]. This manifold consists by definition of the zeroes of the following polynomials:

[[math]] A_{ij}=\sum_kU_{ik}U_{jk}-\delta_{ij} [[/math]]


Since [math]O_N[/math] is smooth, and so is a differential manifold in the usual sense, it follows from the general theory of Lagrange multipliers that a given matrix [math]U\in O_N[/math] is a critical point of [math]F[/math] precisely when the following condition is satisfied:

[[math]] dF\in span(dA_{ij}) [[/math]]


Regarding the space [math]span(dA_{ij})[/math], this consists of the following quantities:

[[math]] \begin{eqnarray*} \sum_{ij}M_{ij}dA_{ij} &=&\sum_{ijk}M_{ij}(U_{ik}dU_{jk}+U_{jk}dU_{ik})\\ &=&\sum_{jk}(M^tU)_{jk}dU_{jk}+\sum_{ik}(MU)_{ik}dU_{ik}\\ &=&\sum_{ij}(M^tU)_{ij}dU_{ij}+\sum_{ij}(MU)_{ij}dU_{ij} \end{eqnarray*} [[/math]]


In order to compute [math]dF[/math], observe first that, with [math]S_{ij}=sgn(U_{ij})[/math], we have:

[[math]] d|U_{ij}| =d\sqrt{U_{ij}^2} =\frac{U_{ij}dU_{ij}}{|U_{ij}|} =S_{ij}dU_{ij} [[/math]]


Now let us set, as in the statement:

[[math]] W_{ij}=sgn(U_{ij})\varphi'(|U_{ij}|) [[/math]]


In terms of these variables, we obtain:

[[math]] dF =\sum_{ij}d\left(\varphi(|U_{ij}|)\right) =\sum_{ij}\varphi'(|U_{ij}|)d|U_{ij}| =\sum_{ij}W_{ij}dU_{ij} [[/math]]


We conclude that [math]U\in O_N[/math] is a critical point of [math]F[/math] if and only if there exists a matrix [math]M\in M_N(\mathbb R)[/math] such that the following two conditions are satisfied:

[[math]] W=M^tU\quad,\quad W=MU [[/math]]


Now observe that these two equations can be written as follows:

[[math]] M^t=WU^t\quad,\quad M=WU^t [[/math]]


Thus, the matrix [math]WU^t[/math] must be symmetric, as claimed.

In order to process the above result, we can use the following notion:

Definition

Given [math]U\in O_N[/math], we consider its “color decomposition”

[[math]] U=\sum_{r \gt 0}rU_r [[/math]]
with [math]U_r\in M_N(-1,0,1)[/math] containing the sign components at [math]r \gt 0[/math], and we call [math]U[/math]:

  • Semi-balanced, if [math]U_rU^t[/math] and [math]U^tU_r[/math], with [math]r \gt 0[/math], are all symmetric.
  • Balanced, if [math]U_rU_s^t[/math] and [math]U_r^tU_s[/math], with [math]r,s \gt 0[/math], are all symmetric.

These conditions are quite natural, because for an orthogonal matrix [math]U\in O_N[/math], the relations [math]UU^t=U^tU=1[/math] translate as follows, in terms of the color decomposition:

[[math]] \sum_{r \gt 0}rU_rU^t=\sum_{r \gt 0}rU^tU_r=1 [[/math]]

[[math]] \sum_{r,s \gt 0}rsU_rU_s^t=\sum_{r,s \gt 0}rsU_r^tU_s=1 [[/math]]


Thus, our balancing conditions express the fact that the various components of the above sums are all symmetric. Now back to our critical point questions, we have:

Theorem

For a matrix [math]U\in O_N^*[/math], the following are equivalent:

  • [math]U[/math] is a critical point of [math]F(U)=\sum_{ij}\varphi(|U_{ij}|)[/math], for any [math]\varphi:[0,\infty)\to\mathbb R[/math].
  • [math]U[/math] is a critical point of all the [math]p[/math]-norms, with [math]p\in[1,\infty)[/math].
  • [math]U[/math] is semi-balanced, in the above sense.


Show Proof

We use the critical point criterion found in Theorem 3.4. In terms of the color decomposition, the matrix constructed there is given by:

[[math]] \begin{eqnarray*} (WU^t)_{ij} &=&\sum_k{\rm sgn}(U_{ik})\varphi'(|U_{ik}|)U_{jk}\\ &=&\sum_{r \gt 0}\varphi'(r)\sum_{k,|U_{ik}|=r}{\rm sgn}(U_{ik})U_{jk}\\ &=&\sum_{r \gt 0}\varphi'(r)\sum_k(U_r)_{ik}U_{jk}\\ &=&\sum_{r \gt 0}\varphi'(r)(U_rU^t)_{ij} \end{eqnarray*} [[/math]]


Thus we have the following formula:

[[math]] WU^t=\sum_{r \gt 0}\varphi'(r)U_rU^t [[/math]]


Now when the function [math]\varphi:[0,\infty)\to\mathbb R[/math] varies, either as an arbitrary differentiable function, or as a power function [math]\varphi(x)=x^p[/math] with [math]p\in[1,\infty)[/math], the individual components of this sum must be all self-adjoint, and this leads to the conclusion in the statement.

In practice now, most of the known examples of semi-balanced matrices are actually balanced, so we will investigate instead this latter class of matrices. Following [4], we have the following collection of simple facts, regarding such matrices:

Theorem

The class of balanced matrices is as follows:

  • It contains the matrices [math]U=H/\sqrt{N}[/math], with [math]H\in M_N(\pm1)[/math] Hadamard.
  • It is stable under transposition.
  • It is stable under taking tensor products.
  • It is stable under Hadamard equivalence.
  • It contains the matrix [math]V_N=\frac{1}{N}(2\mathbb I_N-N1_N)[/math], where [math]\mathbb I_N[/math] is the all-[math]1[/math] matrix.


Show Proof

All these results are elementary, the proof being as follows:


(1) Here [math]U\in O_N[/math] follows from the Hadamard condition, and since there is only one color component, namely [math]U_{1/\sqrt{N}}=H[/math], the balancing condition is satisfied as well.


(2) Assuming that [math]U=\sum_{r \gt 0}rU_r[/math] is the color decomposition of a given matrix [math]U\in O_N[/math], the color decomposition of the transposed matrix [math]U^t[/math] is as follows:

[[math]] U^t=\sum_{r \gt 0}rU_r^t [[/math]]


It follows that if [math]U[/math] is balanced, so is the transposed matrix [math]U^t[/math].


(3) Assuming that [math]U=\sum_{r \gt 0}rU_r[/math] and [math]V=\sum_{s \gt 0}sV_s[/math] are the color decompositions of two given orthogonal matrices [math]U,V[/math], we have:

[[math]] U\otimes V =\sum_{r,s \gt 0}rs\cdot U_r\otimes V_s =\sum_{p \gt 0}p\sum_{p=rs}U_r\otimes V_s [[/math]]


Thus the color components of [math]W=U\otimes V[/math] are the following matrices:

[[math]] W_p=\sum_{p=rs}U_r\otimes V_s [[/math]]


It follows that if [math]U,V[/math] are both balanced, then so is [math]W=U\otimes V[/math].


(4) We recall that the Hadamard equivalence consists in permuting rows and columns, and switching signs on rows and columns. Since all these operations correspond to certain conjugations at the level of the matrices [math]U_rU_s^t,U_r^tU_s[/math], we obtain the result.


(5) The matrix in the statement, which goes back to [2], is as follows:

[[math]] V_N=\frac{1}{N} \begin{pmatrix} 2-N&2&\ldots&2\\ 2&2-N&\ldots&2\\ \ldots&\ldots&\ldots&\ldots\\ 2&2&\ldots&2-N \end{pmatrix} [[/math]]


Observe that this matrix is indeed orthogonal, its rows being of norm one, and pairwise orthogonal. The color components of this matrix being [math]V_{2/N-1}=1_N[/math] and [math]V_{2/N}=\mathbb I_N-1_N[/math], it follows that this matrix is balanced as well, as claimed.

Let us look now more in detail at the matrix [math]V_N[/math] from the above statement, and at the matrices having similar properties. Following [2], let us start our study with:

Definition

An [math](a,b,c)[/math] pattern is a matrix [math]M\in M_N(0,1)[/math], with [math]N=a+2b+c[/math], such that any two rows look as follows,

[[math]] \begin{matrix} 0\ldots 0&0\ldots 0&1\ldots 1&1\ldots 1\\ \underbrace{0\ldots 0}_a&\underbrace{1\ldots 1}_b&\underbrace{0\ldots 0}_b&\underbrace{1\ldots 1}_c \end{matrix} [[/math]]
up to a permutation of the columns.

As explained in [2], there are many interesting examples of [math](a,b,c)[/math] patterns, coming from the balanced incomplete block designs (BIBD), and all these examples can produce two-entry unitary matrices, by replacing the [math]0,1[/math] entries with suitable numbers [math]x,y[/math]. For more on BIBD and design theory, we refer to Colbourn-Dinitz [8] or Stinson [9].


Now back to the matrix [math]V_N[/math] from Theorem 3.7 (5), observe that this matrix comes from a [math](0,1,N-2)[/math] pattern. And also, independently of this, this matrix has the remarkable property of being at the same time circulant and self-adjoint. We have in fact:

Theorem

The following matrices are balanced:

  • The orthogonal matrices coming from [math](a,b,c)[/math] patterns.
  • The orthogonal matrices which are circulant and symmetric.


Show Proof

These observations basically go back to [2], the proofs being as follows:


(1) If we denote by [math]P,Q\in M_N(0,1)[/math] the matrices describing the positions of the [math]0,1[/math] entries inside the pattern, then we have the following formulae:

[[math]] \begin{eqnarray*} PP^t=P^tP&=&a\mathbb I_N+b1_N\\ QQ^t=Q^tQ&=&c\mathbb I_N+b1_N\\ PQ^t=P^tQ=QP^t=Q^tP&=&b\mathbb I_N-b1_N \end{eqnarray*} [[/math]]


Since all these matrices are symmetric, [math]U[/math] is balanced, as claimed.


(2) Assume that [math]U\in O_N[/math] is circulant, [math]U_{ij}=\gamma_{j-i}[/math], and in addition symmetric, which means [math]\gamma_i=\gamma_{-i}[/math]. Consider the following sets, which must satisfy [math]D_r=-D_r[/math]:

[[math]] D_r=\{k:|\gamma_r|=k\} [[/math]]


In terms of these sets, we have the following formula:

[[math]] \begin{eqnarray*} (U_rU_s^t)_{ij} &=&\sum_k(U_r)_{ik}(U_s)_{jk}\\ &=&\sum_k\delta_{|\gamma_{k-i}|,r}\,{\rm sgn}(\gamma_{k-i})\cdot\delta_{|\gamma_{k-j}|,s}\,{\rm sgn}(\gamma_{k-j})\\ &=&\sum_{k\in(D_r+i)\cap(D_s+j)}{\rm sgn}(\gamma_{k-i})\,{\rm sgn}(\gamma_{k-j}) \end{eqnarray*} [[/math]]


With [math]k=i+j-m[/math] we obtain, by using [math]D_r=-D_r[/math], and then [math]\gamma_i=\gamma_{-i}[/math]:

[[math]] \begin{eqnarray*} (U_rU_s^t)_{ij} &=&\sum_{m\in(-D_r+j)\cap(-D_s+i)}{\rm sgn}(\gamma_{j-m})\,{\rm sgn}(\gamma_{i-m})\\ &=&\sum_{m\in(D_r+i)\cap(D_r+j)}{\rm sgn}(\gamma_{j-m})\,{\rm sgn}(\gamma_{i-m})\\ &=&\sum_{m\in(D_r+i)\cap(D_r+j)}{\rm sgn}(\gamma_{m-j})\,{\rm sgn}(\gamma_{m-i}) \end{eqnarray*} [[/math]]


Now by interchanging [math]i\leftrightarrow j[/math], and with [math]m\to k[/math], this formula becomes:

[[math]] (U_rU_s^t)_{ji}=\sum_{k\in(D_r+i)\cap(D_r+j)}{\rm sgn}(\gamma_{k-i})\,{\rm sgn}(\gamma_{k-j}) [[/math]]


By comparing with the previous formula, we deduce that the matrix [math]U_rU_s^t[/math] is symmetric, as claimed. The proof for [math]U_r^tU_s[/math] is similar.

As a conclusion to all this, the study of the critical points of the various [math]p[/math]-norms on [math]O_N[/math] has led us into the class of balanced matrices, which looks like an interesting class, which is waiting to be further investigated. We will be back to this.

3b. Second derivatives

Let us get now into analytic questions. As in Theorem 3.4, it is convenient to do the computations in a general framework, with a function as follows:

[[math]] F(U)=\sum_{ij}\psi(U_{ij}^2) [[/math]]


Consider the following function, depending on [math]t \gt 0[/math] small:

[[math]] f(t) =F(Ue^{tA}) =\sum_{ij}\psi\left((Ue^{tA})_{ij}^2\right) [[/math]]


Here [math]U\in O_N[/math] is an arbitrary orthogonal matrix, and [math]A\in M_N(\mathbb R)[/math] is assumed to be antisymmetric, [math]A^t=-A[/math], with this latter assumption needed for having [math]e^A\in O_N[/math]. Let us first compute the derivative of [math]f[/math]. Following [4], we have the following result:

Proposition

We have the following formula,

[[math]] f'(t)=2\sum_{ij}\psi'((Ue^{tA})_{ij}^2)(UAe^{tA})_{ij}(Ue^{tA})_{ij} [[/math]]
valid for any [math]U\in O_N[/math], and any [math]A\in M_N(\mathbb R)[/math] antisymmetric.


Show Proof

The matrices [math]U,e^{tA}[/math] being both orthogonal, we have:

[[math]] \begin{eqnarray*} (Ue^{tA})_{ij}^2 &=&(Ue^{tA})_{ij}((Ue^{tA})^t)_{ji}\\ &=&(Ue^{tA})_{ij}(e^{tA^t}U^t)_{ji}\\ &=&(Ue^{tA})_{ij}(e^{-tA}U^t)_{ji} \end{eqnarray*} [[/math]]


We can now differentiate our function [math]f[/math], and by using once again the orthogonality of the matrices [math]U,e^{tA}[/math], along with the formula [math]A^t=-A[/math], we obtain:

[[math]] \begin{eqnarray*} f'(t) &=&\sum_{ij}\psi'((Ue^{tA})_{ij}^2)\left[(UAe^{tA})_{ij}(e^{-tA}U^t)_{ji}-(Ue^{tA})_{ij}(e^{-tA}AU^t)_{ji}\right]\\ &=&\sum_{ij}\psi'((Ue^{tA})_{ij}^2)\left[(UAe^{tA})_{ij}((e^{-tA}U^t)^t)_{ij}-(Ue^{tA})_{ij}((e^{-tA}AU^t)^t)_{ij}\right]\\ &=&\sum_{ij}\psi'((Ue^{tA})_{ij}^2)\left[(UAe^{tA})_{ij}(Ue^{tA})_{ij}+(Ue^{tA})_{ij}(UAe^{tA})_{ij}\right] \end{eqnarray*} [[/math]]


But this gives the formula in the statement, and we are done.

Before computing the second derivative, let us evaluate [math]f'(0)[/math]. In terms of the color decomposition [math]U=\sum_{r \gt 0}rU_r[/math] of our matrix, the result is:

Proposition

We have the following formula,

[[math]] f'(0)=2\sum_{r \gt 0}r\psi'(r^2)Tr(U_r^tUA) [[/math]]
where the matrices [math]U_r\in M_N(-1,0,1)[/math] are the color components of [math]U[/math].


Show Proof

We use the formula in Proposition 3.10. At [math]t=0[/math], we obtain:

[[math]] f'(0)=2\sum_{ij}\psi'(U_{ij}^2)(UA)_{ij}U_{ij} [[/math]]


Consider now the color decomposition of [math]U[/math]. We have the following formulae:

[[math]] \begin{eqnarray*} U_{ij}=\sum_{r \gt 0}r(U_r)_{ij} &\implies&U_{ij}^2=\sum_{r \gt 0}r^2|(U_r)_{ij}|\\ &\implies&\psi'(U_{ij}^2)=\sum_{r \gt 0}\psi'(r^2)|(U_r)_{ij}| \end{eqnarray*} [[/math]]


Now by getting back to the above formula of [math]f'(0)[/math], we obtain:

[[math]] f'(0)=2\sum_{r \gt 0}\psi'(r^2)\sum_{ij}(UA)_{ij}U_{ij}|(U_r)_{ij}| [[/math]]


Our claim now is that we have the following formula:

[[math]] U_{ij}|(U_r)_{ij}|=r(U_r)_{ij} [[/math]]


Indeed, in the case [math]|U_{ij}|\neq r[/math] this formula reads [math]U_{ij}\cdot 0=r\cdot 0[/math], which is true, and in the case [math]|U_{ij}|=r[/math] this formula reads [math]rS_{ij}\cdot 1=r\cdot S_{ij}[/math], which is once again true. Thus:

[[math]] f'(0)=2\sum_{r \gt 0}r\psi'(r^2)\sum_{ij}(UA)_{ij}(U_r)_{ij} [[/math]]


But this gives the formula in the statement, and we are done.

Let us compute now the second derivative. The result here is as follows:

Proposition

We have the following formula,

[[math]] \begin{eqnarray*} f''(0) &=&4\sum_{ij}\psi''(U_{ij}^2)\left[(UA)_{ij}U_{ij}\right]^2\\ &&+2\sum_{ij}\psi'(U_{ij}^2)\left[(UA^2)_{ij}U_{ij}\right]\\ &&+2\sum_{ij}\psi'(U_{ij}^2)(UA)_{ij}^2 \end{eqnarray*} [[/math]]
valid for any [math]U\in O_N[/math], and any [math]A\in M_N(\mathbb R)[/math] antisymmetric.


Show Proof

We use the formula in Proposition 3.10, namely:

[[math]] f'(t)=2\sum_{ij}\psi'((Ue^{tA})_{ij}^2)(UAe^{tA})_{ij}(Ue^{tA})_{ij} [[/math]]


Since the term on the right, or rather its double, appears as the derivative of the quantity [math](Ue^{tA})_{ij}^2[/math], when differentiating a second time, we obtain:

[[math]] \begin{eqnarray*} f''(t) &=&4\sum_{ij}\psi''((Ue^{tA})_{ij}^2)\left[(UAe^{tA})_{ij}(Ue^{tA})_{ij}\right]^2\\ &&+2\sum_{ij}\psi'((Ue^{tA})_{ij}^2)\left[(UAe^{tA})_{ij}(Ue^{tA})_{ij}\right]' \end{eqnarray*} [[/math]]


In order to compute now the missing derivative, observe that we have:

[[math]] \left[(UAe^{tA})_{ij}(Ue^{tA})_{ij}\right]' =(UA^2e^{tA})_{ij}(Ue^{tA})_{ij}+(UAe^{tA})_{ij}^2 [[/math]]


Summing up, we have obtained the following formula:

[[math]] \begin{eqnarray*} f''(t) &=&4\sum_{ij}\psi''((Ue^{tA})_{ij}^2)\left[(UAe^{tA})_{ij}(Ue^{tA})_{ij}\right]^2\\ &&+2\sum_{ij}\psi'((Ue^{tA})_{ij}^2)\left[(UA^2e^{tA})_{ij}(Ue^{tA})_{ij}\right]\\ &&+2\sum_{ij}\psi'((Ue^{tA})_{ij}^2)(UAe^{tA})_{ij}^2 \end{eqnarray*} [[/math]]


But at [math]t=0[/math] this gives the formula in the statement, and we are done.

For the function [math]\psi(x)=\sqrt{x}[/math], corresponding to the functional [math]F(U)=||U||_1[/math], there are some simplifications, that we will work out now in detail. First, we have:

Proposition

For the function [math]F(U)=||U||_1[/math] we have the formula

[[math]] f''(0)=Tr(S^tUA^2) [[/math]]
valid for any antisymmetric matrix [math]A[/math], where [math]S_{ij}={\rm sgn}(U_{ij})[/math].


Show Proof

We use the formula in Proposition 3.12, with the following data:

[[math]] \psi(x)=\sqrt{x}\quad,\quad \psi'(x)=\frac{1}{2\sqrt{x}}\quad,\quad \psi''(x)=-\frac{1}{4x\sqrt{x}} [[/math]]


We therefore obtain the following formula:

[[math]] \begin{eqnarray*} f''(0) &=&-\sum_{ij}\frac{\left[(UA)_{ij}U_{ij}\right]^2}{|U_{ij}|^3} +\sum_{ij}\frac{(UA^2)_{ij}U_{ij}}{|U_{ij}|} +\sum_{ij}\frac{(UA)_{ij}^2}{|U_{ij}|}\\ &=&-\sum_{ij}\frac{(UA)_{ij}^2}{|U_{ij}|} +\sum_{ij}(UA^2)_{ij}S_{ij} +\sum_{ij}\frac{(UA)_{ij}^2}{|U_{ij}|}\\ &=&\sum_{ij}(UA^2)_{ij}S_{ij} \end{eqnarray*} [[/math]]


But this gives the formula in the statement, and we are done.

We are therefore led to the following result, from [4], regarding the 1-norm:

Theorem

A matrix [math]U\in O_N[/math] locally maximizes the [math]1[/math]-norm on [math]O_N[/math] precisely when the following conditions are satisfied:

  • The matrix [math]U[/math] has nonzero entries, [math]U\in O_N^*[/math].
  • The matrix [math]X=S^tU[/math] is symmetric, where [math]S_{ij}={\rm sgn}(U_{ij})[/math].
  • We have [math]Tr(XA^2)\leq0[/math], for any antisymmetric matrix [math]A\in M_N(\mathbb R)[/math].


Show Proof

This follows the results that we have, with (1,2,3) coming respectively from Theorem 3.3, Theorem 3.4 and Proposition 3.13.

In order to further improve the above result, we will need:

Proposition

For a symmetric matrix [math]X\in M_N(\mathbb R)[/math], the following are equivalent:

  • [math]Tr(XA^2)\leq0[/math], for any antisymmetric matrix [math]A[/math].
  • The sum of the two smallest eigenvalues of [math]X[/math] is positive.


Show Proof

Consider the following vector, which is antisymmetric:

[[math]] a=\sum_{ij}A_{ij}e_i\otimes e_j [[/math]]


In terms of this vector, we have the following formula:

[[math]] \begin{eqnarray*} Tr(XA^2) &=& \lt X,A^2 \gt \\ &=&- \lt AX,A \gt \\ &=&- \lt a,(1\otimes X)a \gt \end{eqnarray*} [[/math]]


Thus the condition (1) is equivalent to [math]P(1\otimes X)P[/math] being positive, with [math]P[/math] being the orthogonal projection on the antisymmetric subspace in [math]\mathbb R^N\otimes\mathbb R^N[/math]. Now observe that for any two eigenvectors [math]x_i \perp x_j[/math] of [math]X[/math], with eigenvalues [math]\lambda_i, \lambda_j[/math], we have:

[[math]] \begin{eqnarray*} P(1\otimes X)P(x_i\otimes x_j-x_j\otimes x_i) &=&P(\lambda_j x_i\otimes x_j-\lambda_i x_j\otimes x_i)\\ &=&\frac{\lambda_i +\lambda_j}{2}(x_i\otimes x_j-x_j\otimes x_i) \end{eqnarray*} [[/math]]


Thus, we are led to the conclusion in the statement.

Following [4], we can now formulate a final result on the subject, which improves some previous findings from [1], and from [2], as follows:

Theorem

A matrix [math]U\in O_N[/math] locally maximizes the [math]1[/math]-norm on [math]O_N[/math] precisely when it has nonzero entries, and when the following matrix, with [math]S_{ij}={\rm sgn}(U_{ij})[/math],

[[math]] X=S^tU [[/math]]
is symmetric, and the sum of its two smallest eigenvalues is positive.


Show Proof

This follows indeed from our main result so far, Theorem 3.14, by taking into account the positivity criterion from Proposition 3.15.

In terms of the almost Hadamard matrices, as introduced in Definition 3.2, as rescaled versions of the above matrices, the above result reformulates as follows:

Theorem

The almost Hadamard matrices are the matrices [math]H\in\sqrt{N}O_N[/math] having nonzero entries, and which are such that the following matrix, with [math]S_{ij}={\rm sgn}(H_{ij})[/math],

[[math]] X=S^tH [[/math]]
is symmetric, and the sum of its two smallest eigenvalues is positive.


Show Proof

This is a reformulation of Theorem 3.16, by rescaling everything by [math]\sqrt{N}[/math], as to reach to the objects axiomatized in Definition 3.2.

Regarding now the examples of such matrices, which can be useful for various reasons, especially at values [math]N\notin4\mathbb N[/math], there are many of them, and we will discuss them gradually, in what follows. To start with, we have the following general result, from [1], [2]:

Theorem

The class of almost Hadamard matrices has the following properties:

  • It contains all the Hadamard matrices.
  • It is stable under transposition.
  • It is stable under taking tensor products.
  • It is stable under Hadamard equivalence.
  • It contains the matrix [math]K_N=\frac{1}{\sqrt{N}}(2\mathbb I_N-N1_N)[/math].


Show Proof

All the assertions are clear from what we have, as follows:


(1) This follows either from Theorem 3.1, which shows that Hadamard implies almost Hadamard, without any need for further computations, or from the fact that if [math]H[/math] is Hadamard then [math]U=H/\sqrt{N}[/math] is orthogonal, and [math]SU^t=HU^t=\sqrt{N}1_N[/math] is positive.


(2) This follows either from definitions, because the transposition operation preserves the local maximizers of the 1-norm, or from Theorem 3.17.


(3) For a tensor product of almost Hadamard matrices [math]H=H'\otimes H''[/math] we have [math]U=U'\otimes U''[/math] and [math]S=S'\otimes S''[/math], so that [math]U[/math] is unitary and [math]SU^t[/math] is symmetric, with the sum of the two smallest eigenvalues being positive, as claimed.


(4) This follows either from definitions, because the Hadamard equivalence preserves the local maximizers of the 1-norm, or from Theorem 3.17.


(5) We know from Theorem 3.7 that the matrix [math]U=K_N/\sqrt{N}[/math] is orthogonal. Also, we have [math]S=\mathbb I_N-21_N[/math], and so [math]SU^t[/math] is positive, because with [math]J_N=\mathbb I_N/N[/math] we have:

[[math]] \begin{eqnarray*} SU^t &=&(NJ_N-21_N)(2J_N-1_N)\\ &=&(N-2)J_N+2(1_N-J_N) \end{eqnarray*} [[/math]]


Thus, we are led to the conclusion in the statement.

Observe the similarity between the above result and Theorem 3.7, which was about the balanced matrices. However, these two statements, even when properly rescaled, either both on [math]O_N[/math] or both on [math]\sqrt{N}O_N[/math], do not exactly cover the same class of matrices. Based on this analogy, however, we can look for explicit examples of almost Hadamard matrices by taking some inspiration from the main examples of balanced matrices, from Theorem 3.9. We will discuss this in the remainder of this chapter.

3c. Circulant matrices

We have two classes of matrices to be investigated, generalizing the matrix [math]K_N[/math] from Theorem 3.18, namely the circulant matrices, and the 2-entry matrices. Following the work in [2], let us start with the circulant matrices. We let [math]F\in U_N[/math] be the normalized Fourier matrix, given by [math]F_{ij}=w^{ij}/\sqrt{N}[/math], where [math]w=e^{2\pi i/N}[/math]. Also, we make the convention that associated to any vector [math]\alpha\in\mathbb C^N[/math] is the following diagonal matrix:

[[math]] \alpha'= \begin{pmatrix} \alpha_0\\ &\ddots\\ &&\alpha_{N-1} \end{pmatrix} [[/math]]


With these conventions, we have the following well-known result:

Proposition

For a matrix [math]H\in M_N(\mathbb C)[/math], the following are equivalent:

  • [math]H[/math] is circulant, i.e. [math]H_{ij}=\gamma_{j-i}[/math], for a certain vector [math]\gamma\in\mathbb C^N[/math].
  • [math]H[/math] is Fourier-diagonal, i.e. [math]H=FDF^*[/math], with [math]D\in M_N(\mathbb C)[/math] diagonal.

In addition, if so is the case, then with [math]D=\sqrt{N}\alpha'[/math] we have [math]\gamma=F\alpha[/math].


Show Proof

(1)[math]\implies[/math](2) The matrix [math]D=F^*HF[/math] is indeed diagonal, given by:

[[math]] D_{ij}=\frac{1}{N}\sum_{kl}w^{jl-ik}\gamma_{l-k}=\delta_{ij}\sum_rw^{jr}\gamma_r [[/math]]

(2)[math]\implies[/math](1) The matrix [math]H=FDF^*[/math] is indeed circulant, given by:

[[math]] H_{ij}=\sum_kF_{ik}D_{kk}\bar{F}_{jk}=\frac{1}{N}\sum_kw^{(i-j)k}D_{kk} [[/math]]


Finally, the last assertion is clear from the above formula of [math]H_{ij}[/math].

Let us investigate now the circulant orthogonal matrices. We have:

Proposition

For a matrix [math]U\in M_N(\mathbb C)[/math], the following are equivalent:

  • [math]U[/math] is orthogonal and circulant.
  • [math]U=F\alpha'F^*[/math] with [math]\alpha\in\mathbb T^N[/math] satisfying [math]\bar{\alpha}_i=\alpha_{-i}[/math] for any [math]i[/math].


Show Proof

We will use many times the fact that given a vector [math]\alpha\in\mathbb C^N[/math], the vector [math]\gamma=F\alpha[/math] is real if and only if the following happens, for any [math]i[/math]:

[[math]] \bar{\alpha}_i=\alpha_{-i} [[/math]]


This follows indeed from [math]\overline{F\alpha}=F\tilde{\alpha}[/math], with [math]\tilde{\alpha}_i=\bar{\alpha}_{-i}[/math].


(1)[math]\implies[/math](2) Write [math]H_{ij}=\gamma_{j-i}[/math] with [math]\gamma\in\mathbb R^N[/math]. By using Proposition 3.19 we obtain [math]H=FDF^*[/math] with [math]D=\sqrt{N}\alpha'[/math] and [math]\gamma=F\alpha[/math]. Now since [math]U=F\alpha'F^*[/math] is unitary, so is [math]\alpha'[/math], so we must have [math]\alpha\in\mathbb T^N[/math]. Finally, since [math]\gamma[/math] is real we have [math]\bar{\alpha}_i=\alpha_{-i}[/math], and we are done.


(2)[math]\implies[/math](1) We know from Proposition 3.19 that [math]U[/math] is circulant. Also, from [math]\alpha\in\mathbb T^N[/math] we obtain that [math]\alpha'[/math] is unitary, and so must be [math]U[/math]. Finally, since we have [math]\bar{\alpha}_i=\alpha_{-i}[/math], the vector [math]\gamma=F\alpha[/math] is real, and hence we have [math]U\in M_N(\mathbb R)[/math], which finishes the proof.

Let us discuss now the almost Hadamard case. First, in the usual Hadamard case, the known examples and the corresponding [math]\alpha[/math]-vectors are as follows:

Proposition

The known circulant Hadamard matrices, namely

[[math]] \pm\begin{pmatrix} -1\!\!&\!\!1\!\!&\!\!1\!\!&\!\!1\\ 1\!\!&\!\!-1\!\!&\!\!1\!\!&\!\!1\\ 1\!\!&\!\!1\!\!&\!\!-1\!\!&\!\!1\\ 1\!\!&\!\!1\!\!&\!\!1\!\!&\!\!-1 \end{pmatrix}\qquad,\qquad \pm\begin{pmatrix} 1\!\!&\!\!-1\!\!&\!\!1\!\!&\!\!1\\ 1\!\!&\!\!1\!\!&\!\!-1\!\!&\!\!1\\ 1\!\!&\!\!1\!\!&\!\!1\!\!&\!\!-1\\ -1\!\!&\!\!1\!\!&\!\!1\!\!&\!\!1 \end{pmatrix} [[/math]]

[[math]] \pm\begin{pmatrix} 1\!\!&\!\!1\!\!&\!\!-1\!\!&\!\!1\\ 1\!\!&\!\!1\!\!&\!\!1\!\!&\!\!-1\\ -1\!\!&\!\!1\!\!&\!\!1\!\!&\!\!1\\ 1\!\!&\!\!-1\!\!&\!\!1\!\!&\!\!1 \end{pmatrix}\qquad,\qquad \pm\begin{pmatrix} 1\!\!&\!\!1\!\!&\!\!1\!\!&\!\!-1\\ -1\!\!&\!\!1\!\!&\!\!1\!\!&\!\!1\\ 1\!\!&\!\!-1\!\!&\!\!1\!\!&\!\!1\\ 1\!\!&\!\!1\!\!&\!\!-1\!\!&\!\!1 \end{pmatrix} [[/math]]
come respectively from the following [math]\alpha[/math] vectors, via the above construction:

[[math]] \pm(1,-1,-1,-1)\qquad,\qquad\pm(1,-i,1,i) [[/math]]

[[math]] \pm(1,1,-1,1)\qquad\ \ \ \ ,\ \qquad\pm(1,i,1,-i) [[/math]]


Show Proof

At [math]N=4[/math] the conjugate of the Fourier matrix is given by:

[[math]] F^*=\frac{1}{2}\begin{pmatrix} 1&1&1&1\\ 1&-i&-1&i\\ 1&-1&1&-1\\ 1&i&-1&-i \end{pmatrix} [[/math]]


Thus the vectors [math]\alpha=F^*\gamma[/math] are indeed those in the statement.

Following [2], we have the following generalization of the above matrices:

Proposition

If [math]q^N=1[/math] then the vector

[[math]] \alpha=\pm(1,-q,-q^2,\ldots,-q^{N-1}) [[/math]]
produces an almost Hadamard matrix, equivalent to [math]K_N=\frac{1}{\sqrt{N}}(2\mathbb I_N-N1_N)[/math].


Show Proof

Observe first that these matrices generalize those in Proposition 3.21. Indeed, at [math]N=4[/math] the choices for [math]q[/math] are [math]1,i,-1,-i[/math], and this gives the above [math]\alpha[/math]-vectors. Assume that the [math]\pm[/math] sign in the statement is [math]+[/math]. With [math]q=w^r[/math], we have:

[[math]] \begin{eqnarray*} \sqrt{N}\gamma_i &=&\sum_{k=0}^{N-1}w^{ik}\alpha_k\\ &=&1-\sum_{k=1}^{N-1}w^{(i+r)k}\\ &=&2-\sum_{k=0}^{N-1}w^{(i+r)k}\\ &=&2-\delta_{i,-r}N \end{eqnarray*} [[/math]]


In terms of the standard long cycle [math](C_N)_{ij}=\delta_{i+1,j}[/math], we obtain:

[[math]] H=\frac{1}{\sqrt{N}}(2\mathbb I_N-NC_N^{-r}) [[/math]]


Thus [math]H[/math] is equivalent to [math]K_N[/math], and by Theorem 3.18, it is almost Hadamard.

In general, the construction of circulant almost Hadamard matrices is quite a tricky problem. At the abstract level, we have the following result, from [2]:

Proposition

A circulant matrix [math]H\in M_N(\mathbb R^*)[/math], written [math]H_{ij}=\gamma_{j-i}[/math], is almost Hadamard provided that the following conditions are satisfied:

  • The vector [math]\alpha=F^*\gamma[/math] satisfies [math]\alpha\in\mathbb T^N[/math].
  • With [math]\varepsilon={\rm sgn}(\gamma)[/math], [math]\rho_i=\sum_r\varepsilon_r\gamma_{i+r}[/math] and [math]\nu=F^*\rho[/math], we have [math]\nu \gt 0[/math].

In addition, if so is the case, then [math]\bar{\alpha}_i=\alpha_{-i}[/math], [math]\rho_i=\rho_{-i}[/math] and [math]\nu_i=\nu_{-i}[/math] for any [math]i[/math].


Show Proof

We know from Theorem 3.17 our matrix [math]H[/math] is almost Hadamard if the matrix [math]U=H/\sqrt{N}[/math] is orthogonal and [math]SU^t \gt 0[/math], where [math]S_{ij}={\rm sgn}(U_{ij})[/math]. By Proposition 3.19 the orthogonality of [math]U[/math] is equivalent to the condition (1). Regarding now the condition [math]SU^t \gt 0[/math], this is equivalent to [math]S^tU \gt 0[/math]. But, with [math]k=i-r[/math], we have:

[[math]] \begin{eqnarray*} (S^tH)_{ij} &=&\sum_kS_{ki}H_{kj}\\ &=&\sum_k\varepsilon_{i-k}\gamma_{j-k}\\ &=&\sum_r\varepsilon_r\gamma_{j-i+r}\\ &=&\rho_{j-i} \end{eqnarray*} [[/math]]


Thus [math]S^tU[/math] is circulant, with [math]\rho/\sqrt{N}[/math] as first row. From Proposition 3.19 we get [math]S^tU=FLF^*[/math] with [math]L=\nu'[/math] and [math]\nu=F^*\rho[/math], so [math]S^tU \gt 0[/math] iff [math]\nu \gt 0[/math], which is the condition (2). Finally, the assertions about [math]\alpha,\nu[/math] follow from the fact that the vectors [math]F\alpha,F\nu[/math] are real. As for the assertion about [math]\rho[/math], this follows from the fact that [math]S^tU[/math] is symmetric.

Here are now the main examples of such matrices, once again following [2]:

Theorem

For [math]N[/math] odd the following matrix is almost Hadamard,

[[math]] L_N=\frac{1}{\sqrt{N}} \begin{pmatrix} 1&-\cos^{-1}\frac{\pi}{N}&\cos^{-1}\frac{2\pi}{N}&\ldots\ldots&\cos^{-1}\frac{(N-1)\pi}{N}\\ \cos^{-1}\frac{(N-1)\pi}{N}&1&-\cos^{-1}\frac{\pi}{N}&\ldots\ldots&-\cos^{-1}\frac{(N-2)\pi}{N}\\ \vdots&\vdots&\vdots&&\vdots\\ \vdots&\vdots&\vdots&&\vdots\\ -\cos^{-1}\frac{\pi}{N}&\cos^{-1}\frac{2\pi}{N}&-\cos^{-1}\frac{3\pi}{N}&\ldots\ldots&1 \end{pmatrix} [[/math]]
and comes from an [math]\alpha[/math]-vector having all entries equal to [math]1[/math] or [math]-1[/math].


Show Proof

Write [math]N=2n+1[/math], and consider the following vector:

[[math]] \alpha_i=\begin{cases} (-1)^{n+i}&{\rm for }\ i=0,1,\ldots,n\\ (-1)^{n+i+1}&{\rm for}\ i=n+1,\ldots,2n \end{cases} [[/math]]


Let us first prove that [math](L_N)_{ij}=\gamma_{j-i}[/math], where [math]\gamma=F\alpha[/math]. With [math]w=e^{2\pi i/N}[/math] we have:

[[math]] \begin{eqnarray*} \sqrt{N}\gamma_i &=&\sum_{j=0}^{2n}w^{ij}\alpha_j\\ &=&\sum_{j=0}^n(-1)^{n+j}w^{ij}+\sum_{j=1}^n(-1)^{n+(N-j)+1}w^{i(N-j)} \end{eqnarray*} [[/math]]


Now since [math]N[/math] is odd, and since [math]w^N=1[/math], we obtain:

[[math]] \begin{eqnarray*} \sqrt{N}\gamma_i &=&\sum_{j=0}^n(-1)^{n+j}w^{ij}+\sum_{j=1}^n(-1)^{n-j}w^{-ij}\\ &=&\sum_{j=-n}^n(-1)^{n+j}w^{ij} \end{eqnarray*} [[/math]]


By computing the sum on the right, with [math]\xi=e^{\pi i/N}[/math] we get, as claimed:

[[math]] \begin{eqnarray*} \sqrt{N}\gamma_i &=&\frac{2w^{-ni}}{1+w^i}\\ &=&\frac{2\xi^{-2ni}}{1+\xi^{2i}}\\ &=&\frac{2\xi^{-Ni}}{\xi^{-i}+\xi^i}\\ &=&(-1)^i\cos^{-1}\frac{i\pi}{N} \end{eqnarray*} [[/math]]


In order to prove now that [math]L_N[/math] is almost Hadamard, we use Proposition 3.23. Since the sign vector is simply [math]\varepsilon=(-1)^n\alpha[/math], the vector [math]\rho_i=\sum_r\varepsilon_r\gamma_{i+r}[/math] is given by:

[[math]] \begin{eqnarray*} \sqrt{N}\rho_i &=&(-1)^n\sum_{r=0}^{2n}\alpha_r\sum_{j=-n}^n(-1)^{n+j}w^{(i+r)j}\\ &=&\sum_{j=-n}^n(-1)^jw^{ij}\sum_{r=0}^{2n}\alpha_rw^{rj} \end{eqnarray*} [[/math]]


Now since the last sum on the right is [math](\sqrt{N}F\alpha)_j=\sqrt{N}\gamma_j[/math], we obtain:

[[math]] \begin{eqnarray*} \rho_i &=&\sum_{j=-n}^n(-1)^jw^{ij}\gamma_j\\ &=&\frac{1}{\sqrt{N}}\sum_{j=-n}^n(-1)^jw^{ij}\sum_{k=-n}^n(-1)^{n+k}w^{jk} \end{eqnarray*} [[/math]]


Thus we have the following formula:

[[math]] \rho_i=\frac{(-1)^n}{\sqrt{N}}\sum_{j=-n}^n\sum_{k=-n}^n(-1)^{j+k}w^{(i+k)j} [[/math]]


Let us compute now the vector [math]\nu=F^*\rho[/math]. We have:

[[math]] \begin{eqnarray*} \nu_l &=&\frac{1}{\sqrt{N}}\sum_{i=0}^{2n}w^{-il}\rho_i\\ &=&\frac{(-1)^n}{N}\sum_{j=-n}^n\sum_{k=-n}^n(-1)^{j+k}w^{jk}\sum_{i=0}^{2n}w^{i(j-l)} \end{eqnarray*} [[/math]]


The sum on the right is [math]N\delta_{jl}[/math], with both [math]j,l[/math] taken modulo [math]N[/math], so it is equal to [math]N\delta_{jL}[/math], where [math]L=l[/math] for [math]l\leq n[/math], and [math]L=l-N[/math] for [math]l \gt n[/math]. We obtain:

[[math]] \begin{eqnarray*} \nu_l &=&(-1)^n\sum_{k=-n}^n(-1)^{L+k}w^{Lk}\\ &=&(-1)^{n+L}\sum_{k=-n}^n(-w^L)^k \end{eqnarray*} [[/math]]


With [math]\xi=e^{\pi i/N}[/math] as before, this gives the following formula:

[[math]] \begin{eqnarray*} \nu_l &=&(-1)^{n+L}\frac{2(-w^L)^{-n}}{1+w^L}\\ &=&(-1)^L\frac{2w^{-nL}}{1+w^L} \end{eqnarray*} [[/math]]


In terms of the variable [math]\xi=e^{\pi i/N}[/math], we obtain the following formula:

[[math]] \begin{eqnarray*} \nu_l &=&(-1)^L\frac{2\xi^{-2nL}}{1+\xi^{2L}}\\ &=&(-1)^L\frac{2\xi^{-NL}}{\xi^{-L}+\xi^L}\\ &=&\cos^{-1}\frac{L\pi}{N} \end{eqnarray*} [[/math]]


Now since [math]L\in[-n,n][/math], all the entries of [math]\nu[/math] are positive, and we are done.

At the level of examples now, at [math]N=3[/math] we obtain the matrix [math]L_3=-K_3[/math]:

[[math]] L_3=\frac{1}{\sqrt{3}}\begin{pmatrix} 1&-2&-2\\ -2&1&-2\\ -2&-2&1 \end{pmatrix} [[/math]]


At [math]N=5[/math] we obtain the following matrix, with [math]x=-\cos^{-1}\frac{\pi}{5}[/math], [math]y=\cos^{-1}\frac{2\pi}{5}[/math]:

[[math]] L_5=\frac{1}{\sqrt{5}}\begin{pmatrix} 1&x&y&y&x\\ x&1&x&y&y\\ y&x&1&x&y\\ y&y&x&1&x\\ x&y&y&x&1 \end{pmatrix} [[/math]]


For further examples of matrices of this type, and for a discussion of their 1-norms, which happen quite often to be optimal, or almost, we refer to [2].

3d. Block designs

Let us study now the almost Hadamard matrices having two entries, [math]H\in M_N(x,y)[/math], with [math]x,y\in\mathbb R[/math]. These are related to design theory, so let us start with:

Definition

A filled [math](a,b,c)[/math] pattern is a matrix [math]M\in M_N(x,y)[/math], with [math]N=a+2b+c[/math], such that any two rows look as follows, up to a permutation of columns:

[[math]] \begin{matrix} x\ldots x&x\ldots x&y\ldots y&y\ldots y\\ \underbrace{x\ldots x}_a&\underbrace{y\ldots y}_b&\underbrace{x\ldots x}_b&\underbrace{y\ldots y}_c \end{matrix} [[/math]]
When the entries [math]x,y[/math] are the numbers [math]0,1[/math], we say that we have an [math](a,b,c)[/math] pattern.

There are many interesting examples of patterns coming from block designs, that we can use in order to construct almost Hadamard matrices. Let us begin with:

Definition

A [math](v,k,\lambda)[/math] symmetric balanced incomplete block design is a collection [math]B[/math] of subsets of a set [math]X[/math], called blocks, with the following properties:

  • [math]|X|=|B|=v[/math].
  • Each block contains exactly [math]k[/math] points from [math]X[/math].
  • Each pair of distinct points is contained in exactly [math]\lambda[/math] blocks of [math]B[/math].

This is a standard definition in design theory, and for more we refer to Colbourn-Dinitz [8] and Stinson [9]. In relation with our linear algebra questions, we will be interested in the incidence matrix of such a block design, which is the [math]v\times v[/math] matrix given by:

[[math]] M_{bx}=\begin{cases} 1&\text{if }x\in b\\ 0&\text{if }x\notin b \end{cases} [[/math]]


The connection between designs and patterns comes from:

Proposition

If [math]N=a+2b+c[/math] then the adjacency matrix of any [math](N,a+b,a)[/math] symmetric balanced incomplete block design is an [math](a,b,c)[/math] pattern.


Show Proof

Let us replace the [math]0-1[/math] values in the adjacency matrix [math]M[/math] by abstract [math]x-y[/math] values. Then each row of [math]M[/math] contains [math]a+b[/math] copies of [math]x[/math] and [math]b+c[/math] copies of [math]y[/math], and since every pair of distinct blocks intersect in exactly [math]a[/math] points, we see that every pair of rows has exactly [math]a[/math] variables [math]x[/math] in matching positions, so that [math]M[/math] is an [math](a,b,c)[/math] pattern.

As a first example for all this, consider the Fano plane, which is the simplest instance of “discrete geometry”, consisting of 7 points and 7 lines, as follows:

[[math]] \xymatrix@R=9pt@C=10pt{ &&&&\bullet\ar@{-}[ddddd]\\ &&&&&&&\\ &&&&&&&\\ &&&&\ar@{-}@/^/[drr]\\ &&\bullet\ar@{-}[uuuurr]\ar@{-}@/^/[urr]\ar@{-}@/_/[dd]&&&&\bullet\ar@{-}[uuuull]&&\\ &&&&\bullet\ar@{-}[urr]\ar@{-}[ull]&&&&\\ &&\ar@{-}@/_/[drr]&&&&\ar@{-}@/^/[dll]\ar@{-}@/_/[uu]&&&\\ \bullet\ar@{-}[uuurr]\ar@{-}[rrrr]\ar@{-}[uurrrr]&&&&\bullet\ar@{-}[rrrr]\ar@{-}[uu]&&&&\bullet\ar@{-}[uuull]\ar@{-}[uullll] } [[/math]]


Here the circle in the middle is by definition a line, and with this convention, the basic axioms of elementary geometry are satisfied, in the sense that any two points determine a line, and any two lines determine a point. Which is something really beautiful.


Now observe that the sets [math]X,B[/math] of points and lines of the Fano plane form a [math](7,3,1)[/math] block design, corresponding to the following filled [math](1,2,2)[/math] pattern:

[[math]] I_7=\begin{pmatrix} x&x&y&y&y&x&y\\ y&x&x&y&y&y&x\\ x&y&x&x&y&y&y\\ y&x&y&x&x&y&y\\ y&y&x&y&x&x&y\\ y&y&y&x&y&x&x\\ x&y&y&y&x&y&x \end{pmatrix} [[/math]]


In order to construct now more general examples, along the same lines, observe that the Fano plane is the projective plane over the finite field [math]\mathbb F_2=\{0,1\}[/math]. The same method works with [math]\mathbb F_2[/math] replaced by an arbitrary finite field [math]\mathbb F_q[/math], and we have:

Proposition

Assume that [math]q=p^k[/math] is a prime power. Then the point-line incidence matrix of the projective plane over [math]\mathbb F_q[/math] is a [math](1,q,q^2-q)[/math] pattern.


Show Proof

The sets [math]X,B[/math] of points and lines of the projective plane over [math]\mathbb F_q[/math] are indeed known to form a [math](q^2+q+1,q+1,1)[/math] block design, and this gives the result.

There are many other interesting examples of block designs giving rise to patterns, via Proposition 3.27. For instance the Paley biplane, which is a famous object in combinatorics, is a [math](11,5,2)[/math] block design, giving rise to a [math](2,3,3)[/math] pattern. See [2].


Let us discuss now the problem of associating real values to the symbols [math]x,y[/math] in an [math](a,b,c)[/math] pattern such that the resulting matrix [math]U(x,y)[/math] is orthogonal. We have:

Proposition

Given [math]a,b,c\in\mathbb N[/math], there exists an orthogonal matrix having pattern [math](a,b,c)[/math] iff [math]b^2\geq ac[/math]. In this case the solutions are [math]U(x,y)[/math] and [math]-U(x,y)[/math], where

[[math]] x=-\frac{t}{\sqrt{b}(t+1)}\quad,\quad y=\frac{1}{\sqrt{b}(t+1)} [[/math]]
with [math]t=(b\pm\sqrt{b^2-ac})/a[/math] being one of the solutions of [math]at^2-2bt+c=0[/math].


Show Proof

Consider a filled [math](a,b,c)[/math] pattern [math]U\in M_N(x,y)[/math], as in Definition 3.25. In order for this matrix [math]U[/math] to be orthogonal, the following conditions must be satisfied:

[[math]] ax^2+2bxy+cy^2=0 [[/math]]

[[math]] (a+b)x^2+(b+c)y^2=1 [[/math]]


The first condition, coming from the orthogonality of rows, tells us that [math]t=-x/y[/math] must be the variable in the statement. As for the second condition, this becomes:

[[math]] \begin{eqnarray*} y^2 &=&\frac{1}{(a+b)t^2+(b+c)}\\ &=&\frac{1}{(at^2+c)+(bt^2+b)}\\ &=&\frac{1}{2bt+bt^2+b}\\ &=&\frac{1}{b(t+1)^2} \end{eqnarray*} [[/math]]


This gives the above formula of [math]y[/math], and hence the formula of [math]x=-ty[/math] as well.

Next in line, following [4], [2], we have the following result:

Proposition

Let [math]U=U(x,y)[/math] be orthogonal, corresponding to an [math](a,b,c)[/math] pattern. Then [math]H=\sqrt{N}U[/math] is almost Hadamard if:

[[math]] (N(a-b)+2b)|x|+(N(c-b)+2b)|y|\geq 0 [[/math]]


Show Proof

Let [math]S_{ij}={\rm sgn}(U_{ij})[/math]. Since any row of [math]U[/math] consists of [math]a+b[/math] copies of [math]x[/math] and [math]b+c[/math] copies of [math]y[/math], we have:

[[math]] (SU^t)_{ii}=\sum_k{\rm sgn}(U_{ik})U_{ik}=(a+b)|x|+(b+c)|y| [[/math]]


Regarding now [math](SU^t)_{ij}[/math] with [math]i\neq j[/math], we can assume in the computation that the [math]i[/math]-th and [math]j[/math]-th row of [math]U[/math] are exactly those pictured in Definition 3.25. Thus:

[[math]] \begin{eqnarray*} (SU^t)_{ij} &=&\sum_k{\rm sgn}(U_{ik})U_{jk}\\ &=&a\,{\rm sgn}(x)x+b\,{\rm sgn}(x)y+b\,{\rm sgn}(y)x+c\,{\rm sgn}(y)y\\ &=&a|x|-b|y|-b|x|+c|y|\\ &=&(a-b)|x|+(c-b)|y| \end{eqnarray*} [[/math]]


We obtain the following formula for the matrix [math]SU^t[/math] itself, with [math]J_N=\mathbb I_N/N[/math]:

[[math]] \begin{eqnarray*} SU^t &=&2b(|x|+|y|)1_N+((a-b)|x|+(c-b)|y|)NJ_N\\ &=&2b(|x|+|y|)(1_N-J_N)+((N(a-b)+2b)|x|+(N(c-b)+2b)|y|))J_N \end{eqnarray*} [[/math]]


Now since the matrices [math]1_N-J_N,J_N[/math] are orthogonal projections, we have [math]SU^t \gt 0[/math] if and only if the coefficients of these matrices in the above expression are both positive. Since the coefficient of [math]1_N-J_N[/math] is clearly positive, the condition left is:

[[math]] (N(a-b)+2b)|x|+(N(c-b)+2b)|y|\geq 0 [[/math]]


So, we have obtained the condition in the statement, and we are done.

Once again following [4], [2], we have the following result:

Theorem

Assume that [math]a,b,c\in\mathbb N[/math] satisfy [math]c\geq a[/math] and [math]b(b-1)=ac[/math], and consider the [math](a,b,c)[/math] pattern [math]U=U(x,y)[/math], where:

[[math]] x=\frac{a+(1-a-b)\sqrt{b}}{Na}\quad,\quad y=\frac{b+(a+b)\sqrt{b}}{Nb} [[/math]]
Then [math]H=\sqrt{N}U[/math] is an almost Hadamard matrix.


Show Proof

We have [math]b^2-ac=b[/math], so Proposition 3.30 applies, and shows that with [math]t=(b-\sqrt{b})/a[/math] we have an orthogonal matrix [math]U=U(x,y)[/math], where:

[[math]] x=-\frac{t}{\sqrt{b}(t+1)}\quad,\quad y=\frac{1}{\sqrt{b}(t+1)} [[/math]]


But this gives the formulae of [math]x,y[/math] in the statement. Now, observe that we have:

[[math]] \begin{eqnarray*} N(a-b)+2b &=&(a+2b+c)(a-b)+2b\\ &=&a^2+ab-2b^2+ac-bc+2b\\ &=&a^2+ab-ac-bc\\ &=&(a-c)(a+b) \end{eqnarray*} [[/math]]


Similarly, we have the following formula:

[[math]] N(c-b)+2b=(c-a)(c+b) [[/math]]


Thus the quantity in Proposition 3.30 is [math]Ky[/math], with:

[[math]] \begin{eqnarray*} K &=&(a-c)(a+b)t+(c-a)(c+b)\\ &=&(c-a)(c+b-(a+b)t)\\ &=&\frac{c-a}{a}(ac+ab-(a+b)(b-\sqrt{b}))\\ &=&\frac{c-a}{a}((ac-b^2)+(a+b)\sqrt{b})\\ &=&\frac{c-a}{a}((a+b)\sqrt{b}-b) \end{eqnarray*} [[/math]]


Since this quantity is positive, Proposition 3.30 applies and gives the result.

As a main application, we have the following result, also from [4], [2]:

Theorem

Assume that [math]q=p^k[/math] is a prime power. Then the matrix [math]I_N\in M_N(x,y)[/math], where [math]N=q^2+q+1[/math] and

[[math]] x=\frac{1-q\sqrt{q}}{\sqrt{N}}\quad,\quad y=\frac{q+(q+1)\sqrt{q}}{q\sqrt{N}} [[/math]]
having [math](1,q,q^2-q)[/math] pattern coming from the point-line incidence of the projective plane over [math]\mathbb F_q[/math] is an almost Hadamard matrix.


Show Proof

Indeed, the conditions [math]c\geq a[/math] and [math]b(b-1)=ac[/math] in Theorem 3.31 are satisfied, and the variables constructed there are [math]x'=x/\sqrt{N}[/math] and [math]y'=y/\sqrt{N}[/math].

We refer to [4], [2] for more on such matrices, including examples and norm numerics, in relation with the optimization question for the 1-norm. In what concerns us, we will be back to this in chapter 12 below, with a similar discussion in the complex case.

General references

Banica, Teo (2024). "Invitation to Hadamard matrices". arXiv:1910.06911 [math.CO].

References

  1. 1.0 1.1 1.2 1.3 1.4 T. Banica, B. Collins and J.M. Schlenker, On orthogonal matrices maximizing the 1-norm, Indiana Univ. Math. J. 59 (2010), 839--856.
  2. 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 2.12 2.13 2.14 2.15 2.16 T. Banica, I. Nechita and K. \.Zyczkowski, Almost Hadamard matrices: general theory and examples, Open Syst. Inf. Dyn. 19 (2012), 1--26.
  3. T. Banica and I. Nechita, Almost Hadamard matrices: the case of arbitrary exponents, Discrete Appl. Math. 161 (2013), 2367--2379.
  4. 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 T. Banica and I. Nechita, Almost Hadamard matrices with complex entries, Adv. Oper. Theory 3 (2018), 149--189.
  5. T. Banica, I. Nechita and J.M. Schlenker, Analytic aspects of the circulant Hadamard conjecture, Ann. Math. Blaise Pascal 21 (2014), 25--59.
  6. T. Banica, I. Nechita and J.M. Schlenker, Submatrices of Hadamard matrices: complementation results, Electron. J. Linear Algebra 27 (2014), 197--212.
  7. M.T. Mohan, On some p-almost Hadamard matrices, Oper. Matrices 13 (2019), 253--281.
  8. 8.0 8.1 C.J. Colbourn and J.H. Dinitz, Handbook of combinatorial designs, CRC Press (2007).
  9. 9.0 9.1 D.R. Stinson, Combinatorial designs: constructions and analysis, Springer-Verlag (2006).