Notes of Maryna Viazovska’s Hadamard lectures 23-04-2019

Universal optimality of {E_8} and Leech lattices

A set of points in Euclidean {d}-space, energy is a sum of contributions from each pair, which are functions of the distance. What are minimizers?

1. Potential and energy

Fix a potential function {p:(0,+\infty)\rightarrow{\mathbb R}}. The potential {p}-energy of a finite set {C\subset{\mathbb R}^d} is

\displaystyle  \sum_{x,\,y\in C,\,x\not=y}p(|x-y|).

1.1. Configurations

Motivated by cristallography, we extend the definition to a class of infinite sets.

A point configuration is a non-empty discrete subset of {{\mathbb R}^d} (every closed ball {B_r^d(x)} contains only finitely many points of {C}). Its lower {p}-energy is

\displaystyle  E_p(C):=\liminf_{r\rightarrow+\infty}\frac{1}{|C\cap B_r^d(0)|}\sum_{x,\,y\in C\cap B_r^d(0),\,x\not=y}p(|x-y|).

If the limit exists, we call it the {p}-energy of {C}.

1.2. Density

We say that a configuration {C} has density {\rho} if

\displaystyle  \liminf_{r\rightarrow+\infty}\frac{1}{|C\cap B_r^d(0)|}=\rho.

Example. Let {\Lambda\subset{\mathbb R}^d} be a lattice. The density of {\Lambda} is

\displaystyle  \frac{1}{vol({\mathbb R}^d/\Lambda)}.

Furthermore, the {p}-energy of {\Lambda} is

\displaystyle  \sum_{x\in\Lambda\setminus\{0\}} p(|x|).

Example. Periodic configurations. Fix a lattice {\Lambda}. A configuration is {\Lambda}-periodic if {C+\ell=C} for every {\ell\in\Lambda}. Such a configuration has density

\displaystyle  \frac{|C/\Lambda|}{vol({\mathbb R}^d/\Lambda)}

and {p}-energy

\displaystyle  \frac{1}{|C/\Lambda|}\left(\sum_{x,\,y\in C/\Lambda,\,x\not=y}\sum_{\ell\in\Lambda} p(|x-y+\ell|)\right)+\sum_{\ell\in\Lambda\setminus\{0\}} p(|\ell|).

Our goal is to minimize {p}-energy among configurations of fixed density {\rho}. We say that {C} minimizes {p}-energy (or {C} is a ground state for {p}) if its {p}-energy exists and every configuration in {{\mathbb R}^d} with equal density has lower {p}-energy at least {E_p(C)}.

It sometimes happens that several different potentials admit the same ground states.

2. Universal optimality

How do the ground states depend on {p}?

2.1. Example: Gaussian core model in {{\mathbb R}^3}

Fix {\alpha>0}, let

\displaystyle  p_\alpha(r)=e^{-\alpha r^2}.

Changing {\alpha} turns out to be equivalent to changing density {\rho}.

Experiments show that for small densities, the fcc (face centered cubic) lattice wins. For large densities, the bcc (body centered cubic) lattice wins. In 1979, F. Stillinger observed that phase coexistence wins, i.e. fcc in a half space and bcc in the other, with suitable densities, does better than either fcc and bcc.

2.2. Completely monotonic functions

In the sequel, we shall stick to completely monotonic potentials. Say a function {g} is completely monotonic if

  1. infinitely differentiable,
  2. derivatives satisfy {(-1)^k g^{(k)}\geq 0}.

Gaussian kernel, power laws are completely monotonic. Yukawa potential probably not.

Definition 1 (Cohn, Kumar 2007) Say a configuration in {{\mathbb R}^d} with density {\rho>0} is universally optimal if it minimizes {p}-energy for every potential {p} ofthe form {p(r)=g(r^2)}, where {g} is completely monotonic.

Theorem 2 (Bernstein) Every completely monotonic function {g:(0,+\infty)\rightarrow{\mathbb R}} can be written as an integral

\displaystyle  g(r)=\int_0^\infty e^{-\alpha r}\,d\mu(\alpha)

for some positive measure {\mu} on {(0,+\infty)}.

Corollary 3 {C} is universally optimal {\iff} {C} minimizes all energies defined by Gaussian core models.

2.3. Results

Theorem 4 (Cohn, Kumar 2007) {{\mathbb Z}\subset{\mathbb R}} is universally optimal.

They conjectured that the {A_2} lattice in {{\mathbb R}^2}, the {E_8} lattice in {{\mathbb R}^8} and the Leeach lattice

The main result of the present lectures is

Theorem 5 (Cohn, Kumar, Miller, Radchenko, Viazovska 2019) {E_8} and the Leech lattice are universally optimal.

{A_2} is still open.

The sphere packing theorem follows by a limiting argument.

3. Strategy of the proof

The key are linear programming bounds. These amount to relax the problem: replace it by a series of convex problems on functions on {{\mathbb R}^d}. It becomes infinite dimensional. Fourier interpolation enters the scene. This reduces the problem to the positivity of a function of 2 variables. The last step is handled numerically.

3.1. Linear programming bounds

A Schwarz function on {{\mathbb R}^d} is

  1. infinitely differentiable,
  2. each derivative decays faster than any inverse polynomial.

Our choice of normalization of Fourier transform is

\displaystyle  \hat f(y)=\int_{{\mathbb R}^d}f(x)e^{-2\pi i c\cdot y}\,dx.

The following linear programming formulation is due to Cohn and Kumar 2007.

Proposition 6 Let {p} be a nonnegative potential. Let {f} be a Schwarz function such that

  1. {f(x)\leq p(|x|)}.
  2. {\hat f\geq \geq 0}.

Then every configuration of {{\mathbb R}^d} with density {\rho} has lower {p}-energy at least {\rho \hat f(0)-f(0)}.

This reduces the problem to finding a clever radial Schwartz function {f}. Note that the assumptions on {f} are convex.

3.2. Proof of Proposition 6

I explain the case of periodic configurations. Let {C} be a {\Lambda}-periodic configuration. Then

\displaystyle  E_p(C)=\frac{1}{|C/\Lambda|}\left(\sum_{x,\,y\in C/\Lambda,\,x\not=y}\sum_{\ell\in\Lambda^*} p(|x-y+\ell|)\right)+\sum_{\ell\in\Lambda\setminus\{0\}} p(|\ell|)

\displaystyle \geq \frac{1}{|C/\Lambda|}\left(\sum_{x,\,y\in C/\Lambda,\,x\not=y}\sum_{\ell\in\Lambda^*} f(x-y+\ell)\right)+\sum_{\ell\in\Lambda\setminus\{0\}} f(\ell)

\displaystyle =\frac{1}{|C/\Lambda|}\left(\sum_{x,\,y\in C/\Lambda}\frac{1}{vol({\mathbb R}^d/\Lambda)}\sum_{\mu\in\Lambda^*} \hat f(\mu)e^{2\pi i(x-y)\cdot\mu}\right)-f(0)

\displaystyle =\frac{1}{|C/\Lambda|}\sum_{\mu\in\Lambda^*} \hat f(\mu)|\sum_{x\in C/\Lambda} e^{2\pi ix\cdot\mu}|^2-f(0)|

\displaystyle \geq \frac{1}{|C/\Lambda|}\frac{1}{vol({\mathbb R}^d/\Lambda)}\hat f(0)|C/\Lambda|^2-f(0)

\displaystyle =\rho \hat f(0)-f(0).

The general case (nonperiodic) was proved by Cohn and Courcy-Ireland. Every configuration can be viewed as a limit of periodic ones.

3.3. When can we hope for a sharp bound?

Did we lose much? Suppose that a lattice {\Lambda} minimizes {p}-energy for some nonnegative potential {p}. Suppose that this can be proven by linear programming using a Schwartz function {f}. There is no loss in the above inequalities {\iff}

  1. {f(x)=p(|x|)} on {\Lambda\setminus\{0\}}.
  2. {\hat f=0} on {\Lambda^*\setminus\{0\}}.

We see that firqt derivatives must satisfy the above requirements as well. Can we recnstruct a Sxhwartz function from these conditions?

3.4. Fourier interpolation

Theorem 7 Let {(d,n_0)=(8,1)} or {(24,2)}. Then any radial Schwartz function is uniquely determined by the values of

\displaystyle  f(\sqrt{2n}),\quad f'(\sqrt{2n}),\quad \hat f(\sqrt{2n}),\quad \hat f'(\sqrt{2n})

for integer {n\geq n_0}.

More precisely, there exists an interpolation basis {a_n}, {b_n}, {\tilde a_n}, {\tilde b_n}, consisting of radial Schwartz functions, such that for every radial Schwartz function {f} and every {x\in {\mathbb R}^d},

\displaystyle  f(x)=\sum_{n=n_0}^{\infty} (f(\sqrt{2n})a_n(x)+f'(\sqrt{2n})b_n(x)+\hat f(\sqrt{2n})\tilde a_n(x)+\hat f'(\sqrt{2n})\tilde b_n(x)),

where the series converges absolutely. Here are properties the interpolating basis.

  1. \displaystyle  a_m(\sqrt{2n})=\delta_{mn},\quad a_m'(\sqrt{2n})=0,\quad \hat a_m(\sqrt{2n})=0,\quad \hat a_m'(\sqrt{2n})=0,

  2. \displaystyle  b_m(\sqrt{2n})=0,\quad b_m'(\sqrt{2n})=\delta_{mn},\quad \hat b_m(\sqrt{2n})=0,\quad \hat b_m'(\sqrt{2n})=0,

Denote by {\Lambda_d} the {E_8} and Leech lattices respectively. Note that the shortest vector of {\Lambda_d}, {d=8} or {24}, is {\sqrt{2n_0(d)}}.

3.5. Construction of the optimal auxiliary functions

The only possible auxiliary function that could prove a sharp bound for {\Lambda_d}, {d=8,24} (among radial Schwartz functions) would be the following one,

\displaystyle  f(x)=\sum_{n=n_0}^{\infty}p(\sqrt{2n})a_n(x)+\sum_{n=n_0}^{\infty}p'(\sqrt{2n})b_n(x).

Recall that, in addition to equality, one must prove inequalities

  1. {f(x)\leq p(|x|)} on {\Lambda\setminus\{0\}}.
  2. {\hat f\geq 0} on {\Lambda^*\setminus\{0\}}.

Therefore it suffices to check that

  1. {f(x)\leq p(|x|)} on {{\mathbb R}^d\setminus\{0\}}.
  2. {\hat f\geq 0} on {{\mathbb R}^d}.

Here, {p} is a Gaussian.

3.6. Generating functions


\displaystyle  F(\tau,x)=\sum_{n=n_0}^{\infty}a_n(x)e^{2\pi i n\tau}+2\pi i n\tau\sqrt{2n}b_n(x)e^{2\pi i n\tau}.


\displaystyle  \tilde F(\tau,x)=\sum_{n=n_0}^{\infty}\tilde a_n(x)e^{2\pi i n\tau}+2\pi i n\tau\sqrt{2n}\tilde b_n(x)e^{2\pi i n\tau}.

Substitute {\tau=i\alpha}. Then the inequalities

\displaystyle  F(i\alpha,x)\leq e^{-\pi \alpha|x|^2}

for all {\alpha>0} and {x\in{\mathbb R}^d} and

\displaystyle  \hat F(i\alpha,y)\geq 0

for all {\alpha>0} and {y\in{\mathbb R}^d}, imply the universal optimality of {\Lambda_d}.

Apply the interpolation formula (IF) above to {p_\tau(x):=e^{-\pi i \tau |x|^2}}. It is equivalent to

\displaystyle  e^{\pi i \tau |x|^2}=F(\tau,x)+\tau^{-d/2}\tilde F(-\frac{1}{\tau},x).

We secretly know that

\displaystyle  F(\tau+1,x)-2F(\tau,x)+F(\tau-1,x)=0


\displaystyle  \tilde F(\tau+1,x)-2\tilde F(\tau,x)+\tilde F(\tau-1,x)=0.

Note that the condition {\hat F(i\alpha,y\geq 0} implies a control on the growth:

\displaystyle  F(i\alpha,x)\leq e^{-\pi \alpha|x|^2}.

Indeed, using the functional equations, we get {\hat F=\tilde F} and

\displaystyle  F(i\alpha,x)=e^{-\pi \tau |x|^2}-(i\alpha)^{-d/2}\tilde F(-\frac{1}{i\alpha},x)= e^{-\pi \tau |x|^2}-\alpha^{-d/2}\hat F(-\frac{1}{i\alpha},x)\leq e^{-\pi \alpha|x|^2}.

4. Relating the Fourier interpolation formula with functional equations

4.1. Summary of previous episodes

Recall our goal:

Theorem 8 (Cohn, Kumar, Miller, Radchenko, Viazovska 2019) Let {(d,n_0)=(8,1)} or {(24,2)}. There exists an interpolation basis {a_n}, {b_n}, {\tilde a_n}, {\tilde b_n} for integer {n\geq n_0}, consisting of radial Schwartz functions, such that for every radial Schwartz function {f} and every {x\in {\mathbb R}^d},

\displaystyle  f(x)=\sum_{n=n_0}^{\infty} (f(\sqrt{2n})a_n(x)+f'(\sqrt{2n})b_n(x)+\hat f(\sqrt{2n})\tilde a_n(x)+\hat f'(\sqrt{2n})\tilde b_n(x)),

where the series converges absolutely.

Denote by

\displaystyle  S({\mathbb N}):=\{(x_n)_{n=1}^{\infty}\;|\; x_n=o(n^{-k})\,\forall k>0\}.

Theorem 9 (Cohn, Kumar, Miller, Radchenko, Viazovska 2019) Consider the maps

\displaystyle  \Psi:S_{rad}({\mathbb R}^d)\rightarrow S({\mathbb N})^4,\quad f\mapsto (f(\sqrt{2n}),f'(\sqrt{2n}),\hat f(\sqrt{2n}),\hat f'(\sqrt{2n})


\displaystyle  \Phi:S({\mathbb N})^4\rightarrow S_{rad}({\mathbb R}^d),

\displaystyle  (\alpha_n,\beta_n,\tilde\alpha_n,\tilde\beta_n)\mapsto f(x):=\sum_{n=n_0}^{\infty} (\alpha_n a_n(x)+\beta_n b_n(x)+\tilde \alpha_n\tilde a_n(x)+\tilde\beta_n \tilde b_n(x)).

These maps are isomorphisms and inverses of each other.

Again, we form the generating series

\displaystyle  F(\tau,x)=\sum_{n=n_0}^{\infty}a_n(x)e^{2\pi i n\tau}+2\pi i n\tau\sqrt{2n}b_n(x)e^{2\pi i n\tau}.


\displaystyle  \tilde F(\tau,x)=\sum_{n=n_0}^{\infty}\tilde a_n(x)e^{2\pi i n\tau}+2\pi i n\tau\sqrt{2n}\tilde b_n(x)e^{2\pi i n\tau}.

The interpolation formula implies the following functional equations.

\displaystyle  F(\tau,x)+(i/\tau)^{d/2}\tilde F(-1/\tau,x)=e^{\pi i \tau |x|^2},

\displaystyle  F(\tau+1,x)-2F(\tau,x)+F(\tau-1,x)=0,


\displaystyle  \tilde F(\tau+1,x)-2\tilde F(\tau,x)+\tilde F(\tau-1,x)=0.

4.2. Reversed procedure: functional equations imply interpolation formula

Conversely, we shall now check that the functional equations imply the interpolation formula.

Notations. For a radial Schwartz function {f}, expressed as {f(x)=f_0(|x|)}, let {Df(x)=f'_0(|x|)}.

Consider seminorms

\displaystyle  \|f\|^{rad}_{k,\ell}=\sup_{x\in {\mathbb R}^d}|x|^k|D^\ell f(x).

Lemma 10 The complex Gaussians {x\mapsto e^{\pi i \tau|x|^2}} span a dense subspace in {S_{rad}({\mathbb R}^d)}. For any {y>0}, the same is true if we consider only complex Gaussians with {\Im m(\tau)=y}.

Therefore the interpolation formula needs be proven only for complex Gaussians.

Here comes the converse statement.

Theorem 11 Suppose there exist functions {F}, {\tilde F:H\times{\mathbb R}^d\rightarrow {\mathbb C}} such that

  1. {F} and {\tilde F} are holomorphic in {\tau\in H=} upper half plane.
  2. {F} and {\tilde F} are radial in {x}.
  3. For all {k,\ell\in{\mathbb N}},

    \displaystyle  \|F_\tau\|^{rad}_{k,\ell}\leq\alpha_{k,\ell}\Im m(\tau)^{-\beta_{k,\ell}}+\gamma_{k,\ell}|\tau|^{\delta_{k,\ell}}.

    \displaystyle  \|\tilde F_\tau\|^{rad}_{k,\ell}\leq\alpha_{k,\ell}\Im m(\tau)^{-\beta_{k,\ell}}+\gamma_{k,\ell}|\tau|^{\delta_{k,\ell}}.

    In the special case {(k,\ell)=(0,0)}, we make a stronger assumption

    \displaystyle  \|F_\tau\|+\|\tilde F_\tau\|_\infty \le \alpha_{0,0}\Im m(\tau)^{-\beta_{0,0}}

    for {-1\leq\Re e(\tau)\leq 1}, with {\beta_{0,0}>0}.

  4. {F} and {\tilde F} satisfy the three functional equations above.

Then {F} and {\tilde F} have expansions of the form above (with {n_0=1}), for some radial Schwartz functions {a_n,b_n,\tilde a_n, \tilde b_n}. Moreover, for every radial Schwartz function {f}, the interpolation formula

\displaystyle  f(x)=\sum_{n=n_0}^{\infty} (f(\sqrt{2n})a_n(x)+f'(\sqrt{2n})b_n(x)+\hat f(\sqrt{2n})\tilde a_n(x)+\hat f'(\sqrt{2n})\tilde b_n(x)),

holds. Finally, for every {k,\ell\in{\mathbb N}}, the radial seminorms {\|a_n\|^{rad}_{k,\ell}}, …, grow at most polynomially in {n}. The n=1 functions {a_1},… vanish if and only if {F} and {\tilde F} are {o(e^{-2\pi \Im(\tau)})} in the strip {-1\leq\Re e(\tau)\leq 1} with {x} fixed.

4.3. Proof of Theorem 11

Step 1. As functions of {\tau}, {F(\tau+1,x)-F(\tau,x)} is holomorphic, invariant under translation by 1 and goes to {0} as {\Im m(\tau)} tends to {0}. Therefore it can be viewed as a function (of {e^{2\pi i\tau}}) on the disk, hence it admits an expansion in powers of {e^{2\pi i\tau}}, with coefficients which we denote by {2\pi i\sqrt{2n}b_n(x)}.

Same argument with {F(\tau,x)-\tau(F(\tau+1,x)-F(\tau,x))} provides the {A_n(x)} coefficients. The same for {\tilde F}.

Step 2. The functions {a_n},… are radial Schwartz functions.

Express {a_n(x)} as a Fourier coefficient,

\displaystyle  a_n(x)=\int_{iy}^{1+iy}(F(\tau,x)-\tau(F(\tau+1,x)-F(\tau,x)))e^{-2\pi in\tau}\,d\tau

and idem for {b_n}. Part (3)of the assumptions, we see that the radial seminorms of {a_n}… are finite. To show that they grow polynomially, we take {y=1/n} in the integral. This moderate growth implies that the interpolation series converges absolutely for every radial Schwartz function {f}. This defines a continuous linear functional on {S_{rad}({\mathbb R}^d)}.

Step 3. Proof of the interpolation formula.

Fix {x} and consider the linear functional

\displaystyle  \lambda(f)=\sum_{n=n_0}^{\infty} (f(\sqrt{2n})a_n(x)+f'(\sqrt{2n})b_n(x)+\hat f(\sqrt{2n})\tilde a_n(x)+\hat f'(\sqrt{2n})\tilde b_n(x))-f(x).

By density, we need merely prove that {\lambda} vanishes on complex Gaussians. This is equivalent to the first functional equation for {F} and {\tilde F}. This concludes the proof of Theorem 11.

Remark. Suppose that {H} and {\tilde H} are solutions of the homogeneous equations (i.e. removing right hand sides)

\displaystyle  H(\tau)+(i/\tau)^{d/2}\tilde H(-1/\tau)=0,

\displaystyle  H(\tau +1)-2H(\tau)+H(\tau-1)=0,

\displaystyle  \tilde H(\tau +1)-2\tilde H(\tau)+\tilde H(\tau-1)=0.


\displaystyle  H(\tau)=\sum_{n=1}^\infty (c_n+2\pi i \tau\sqrt{2n}d_n)e^{2\pi i n\tau},

\displaystyle  \tilde H(\tau)=\sum_{n=1}^\infty (\tilde c_n+2\pi i \tau\sqrt{2n}\tilde d_n)e^{2\pi i n\tau},

The proof of Theorem 11 shows that for any radial Schwartz function {f},

\displaystyle  \sum_{n=n_0}^\infty (f(\sqrt{2n})c_n+f'(\sqrt{2n})d_n+\hat f(\sqrt{2n})\tilde c_n+\hat f'(\sqrt{2n})\tilde d_n)=0.

This shows that the orthogonal of the image of {\Psi} is the finite dimensional space of solutions of the set of homogeneous functional equations.

5. Construction of solutions of functional equations: preliminary steps

How to construct {F} and {\tilde F}?

5.1. {PSL_2({\mathbb Z})} invariance

Remember that the group {PSL_2({\mathbb Z})} acts on the upper half plane {H}.

Definition 12 Let {f:H\rightarrow{\mathbb C}} be a function, let {k\in 2{\mathbb Z}} be an even integer. Fix an element {\gamma\in PSL_2({\mathbb Z})}. The slash operator is defined by

\displaystyle  (f|_k\gamma)(\tau):=(c\tau+d)^{-k}f(\gamma(\tau)),

where {c\tau+d} is the denominator of {\gamma(\tau)}.

It has the following property: {f|_k \gamma_1\gamma_2=(f|_k\gamma_1)|_k\gamma_2}.

Notation. Let {R={\mathbb C}[PSL_2({\mathbb Z})]} denote the group algebra (with the right action of the group). The slash operator extends to {R} by linearity.

Definition 13 A function on {H} has moderate growth if there exist {\alpha,\beta,\gamma>0} such that

\displaystyle  |F(\tau)|\leq\alpha(\Im m(\tau)^{-\beta}+|\tau|^\gamma).

The space of holomorphic functions with moderate growth is denoted by {P}.

{P} is invariant under the slash operator.

Notation. The involution {S} and the translation by {1} {T} are the usual generators for {PSL_2({\mathbb Z})}, with relators {S^2} and {(ST)^3}.

The functional equation for {F} and {\tilde F} can be written

\displaystyle  F|_{d/2}(T-2+T^{-1})=0,

\displaystyle  \tilde F|_{d/2}(T-2+T^{-1})=0,

\displaystyle  F+\tilde F|_{d/2}S=e^{\pi i\tau|x|^2}.

We use the last equation to express {\tilde F} in terms of {F} and reduce to a set of 2 equations with only one unknown function,

\displaystyle  F|_{d/2}(T-2+T^{-1})=0,

\displaystyle  F|_{d/2}S(T-2+T^{-1})=e^{\pi i\tau|x|^2}|_{d/2}S(T-2+T^{-1}).

We want to solve this equation in {P}, and eventually find a unique solution.

5.2. The ideal associated to functional equations

In the group algebra {R}, consider the right ideal

\displaystyle  I:=(T-2+T^{-1})R+S(T-2+T^{-1})R.

It has the following special properties.

  1. {I} has complex codimension {6} in {R}.
  2. {I} is the free {R}-module generated by {(T-2+T^{-1})} and {S(T-2+T^{-1})}.
  3. The right action of {PSL_2({\mathbb Z})} on the {6}-dimensional complex vectorspace {I\setminus R} has “polynomial growth”.

The last statement means that in a basis of {I\setminus R}, the absolute values of the {36} matrix coefficients of an element {\gamma\in PSL_2({\mathbb Z})} are bounded above by powers of the matrix coefficients of {\gamma}. It follows from the following description of the representation {\sigma}. Let {\rho_3} denote the restriction to {SL_2({\mathbb Z})} of the {3}-dimensional representation {S^2} of {SL_2({\mathbb R})}. Let {\rho_2} denote the {2}-dimensional representation of {SL_2({\mathbb Z})} such that the kernel of {\rho_2} is the congruence subgroup {\Gamma_2} (its image has {6} elements, it is isomorphic to the dihedral group {D_3}). Let {\vec v:SL_2({\mathbb Z})\rightarrow{\mathbb Z}^2} denote the cocycle (with respect to {\rho_2}) such that {\vec v(S)=(0,0)} and {\vec v(T)=(1,-1)}. This defines an affine action, therefore a morphism to {Aff_2({\mathbb C})<GL_3({\mathbb C})}. Up to conjugacy, the {6}-dimensional representation {\sigma} is the direct sum of this morphism and {\rho_3}.

5.3. Next episode

Tomorrow, we shall solve the homogeneous functional equations (using some classical theory of modular forms), then solve the nonhomogeneous functional equations in the form

\displaystyle  F(\tau,x)=\int_0^\infty K(\tau,z)e^{\pi iz |x|^2}\,dz,

where {K} is meromorphic on {H\times H}, with special requirements on residues at poles (which are the fixed points of {PSL_2({\mathbb Z})}). It satisfies

\displaystyle  K(\tau,z)|_{d/2}A=0

(with respect to the {\tau} variable) for all {A\in I}.

6. Solving the functional equations

Definition 14 Let {J} be a right-ideal of the group ring {R}. For an even integer {k\in 2{\mathbb Z}}, denote by

\displaystyle  Ann_k(J,P)=\{f\in P\,|\,f|_k \gamma=0\,\forall\gamma\in J\}.

We shall describe {Ann_k(J,P)} in terms of modular forms.

6.1. Brief overview of classical modular forms

Definition 15 Let {\Gamma} be a discrete finite covolume subgroup of {SL_2({\mathbb R})}. A (holomorphic) modular form of weight {k} is a function {f\in P} such that {f|_k \gamma=f} for all {\gamma\in\Gamma}. They form a finite dimensional space {M_k(\Gamma)}.

We also define the infinite dimensional space {M^!_k(\Gamma)} of weakly holomorphic modular forms by admitting poles near cusps.

We are interested in the cases {\Gamma=SL_2({\mathbb Z})} and {\Gamma=\Gamma(2)}, the principal congruence subgroup at level {2}.

6.2. Modular forms for {SL_2({\mathbb Z})}

Eisenstein series are defined for {k\geq 4} by

\displaystyle  E_k(z):=\frac{1}{2\zeta(k)}\sum_{(m,n)\in{\mathbb Z}^2\setminus\{(0,0)\}}(mz+n)^{-k}.

Its Fourier expansion is

\displaystyle  E_k(z)=1-\frac{2k}{B_k}\sum_{n=1}^\infty G_{k-1}(n)e^{2\pi i nz},

Then, as an algebra, the direct sum of all {M_k} is freely generated by {E_4} and {E_6}.

Ramanujan’s cusp form of weight {12} is

\displaystyle  \Delta=\frac{1}{1728}(E_4^3-E_6^2)=q\prod_{n=1}^\infty (1-q^n)^{24}.

{\Delta} does not vanish on {H}, its inverse is a weakly holomorphic modular form of weight {-12}. {\Delta} vanishes at all cusps. Another famous weakly modular form is the {j}-modular invariant {j=\Delta^{-1}E_4^3}. It is a Hauptmodul for the modular surface {X(1)=PSL_2({\mathbb Z})\setminus H}.

{M^!(SL_2({\mathbb Z}))} is generated by {E_4,E_6} and {\Delta^{-1}}.

One can also define Eisenstein series of weight {2}, but not by the series, by the following formula instead,

\displaystyle  E_2=1-24\sum_{n=1}^\infty \sigma_1(n)q^n=\frac{\Delta'}{2\pi i\Delta}.

It satisfies {E_2(z+1)=E_2(z)} and

\displaystyle  E_2(-1/z)=z^2E_2(z)-\frac{6iz}{\pi}.

It is called a quasimodular form, and more generally, one calls quasimodular form all elements of the algebra generated by {E_2}, {E_4} and {E_6}.

6.3. Modular forms for {\Gamma(2)}

We start with Jacobi theta functions

\displaystyle  \theta_{00}(z)=\sum_{n\in{\mathbb Z}}e^{\pi i n^2 z},\quad \theta_{01}(z)=\sum_{n\in{\mathbb Z}}(-1)^n e^{\pi i n^2 z},\quad \theta_{10}(z)=\sum_{n\in{\mathbb Z}}e^{\pi i (n+\frac{1}{2})^2 z}.

These are modular forms of weight {1/2}, we shall merely need their powers

\displaystyle  U:=\theta_{00}^4,\quad V:=\theta_{01}^4,\quad W:=\theta_{10}^4,

which are honest modular forms. They satisfy the Jacobi identity,

\displaystyle  U=V+W.

Then the modular ring {M(\Gamma(2))} is generated by {V} and {W}, the weakly modular ring {M^!(\Gamma(2))} is generated by {V,W} and {\Delta^{-1}}.

The modular {\lambda}-function is {\lambda:=\frac{V}{U}}. It takes its values in {{\mathbb C}\setminus\{0,1\}}.

It is a Hauptmodul of the modular curve {X(2)=\Gamma(2)\setminus H}, i.e. it generates its function field over {{\mathbb C}}.

The fundamental domain of {X(2)} is the union of two adjacent fundamental domains of {X(1)}. The function {\lambda} descends to an isomorphism of {X(2)} with {{\mathbb C}\setminus\{0,1\}}, extending to a compactification by {\lambda(\infty)=\infty}, {\lambda(0)=1} and {\lambda(1)=\infty}.

One denotes by {\mathcal{L}:=\log\lambda} and {\mathcal{L}_S=\log(\lambda(-1/z))}.

6.4. Role of ideal {I}

We introduce two other ideals

\displaystyle  I_+=(S+1)R+(T-1)^2R,\quad I_-=(S-1)R+(T-1)^2R.

They have codimension {3} in {R}, their sum is {R} and their intersection is {I}. An alternate description is

\displaystyle  I_\pm=\{r\in R\,|\,(S-\pm 1)r\in T\}.

It follows that

\displaystyle  Ann_k(I,P)=Ann_k(I_+,P)\oplus Ann_k(I_-,P).

We shall treat both pieces in different manners, the first in terms of quasimodular forms, the second in terms of logarithmic derivatives of the {\lambda}-function.

Proposition 16 Let {k} be an even integer. Then

\displaystyle  Ann_k(I_+,P)=\phi_2 M_{k-2}(SL_2({\mathbb Z}))+\phi_0 M_k(SL_2({\mathbb Z})) +\phi_{-2}M_{k+2}(SL_2({\mathbb Z})),


\displaystyle  \phi_2(\tau)=\tau E_2^2-\frac{6i}{\pi}E_2,\quad \phi_0=\tau E_2-\frac{3i}{\pi},\quad \phi_{-2}=\tau.

It follows that

\displaystyle  \mathrm{dim}Ann_k(I_+,P)=\max\{0,\lceil \frac{k}{4}\rceil+1\}.

Proposition 17 Let {k} be an even integer. Then

\displaystyle  Ann_k(I_-,P)=\psi_4 M_{k-4}(SL_2({\mathbb Z}))+\psi_2 M_{k-2}(SL_2({\mathbb Z})) +\phi_{0}M_{k}(SL_2({\mathbb Z})),


\displaystyle  \psi_4(\tau)=\xi_4\mathcal{L}+(\xi_4|_4 S)\mathcal{L}_S,\quad \psi_2=\xi_2\mathcal{L}+(\xi_2|_2 S)\mathcal{L}_S,\quad \psi_0=1,

and {\xi_4=U^2+W^2-2V^2}, {\xi_2=U+W}.

It follows that

\displaystyle  \mathrm{dim}Ann_k(I_-,P)=\max\{0,\lceil \frac{k-2}{4}\rceil+1\}.

Comment. It is fortunate that solutions can be expressed in modular terms. It is folklore that the symmetric square representation of {SL_2({\mathbb Z})} is related to modular forms. It is not so folklore for the affine representation. However, it is not that dramatic: we knew a priori that the annihilators were finite dimensional. In case of need, we could have used numerical calculations to describe them.

6.5. Solving the inhomogeneous equation

We shall use again the fundamental domain for {\Gamma(2)},

\displaystyle  D:=\{z\in h\,|\,-1<\Re e(z)<1,\,|z-\frac{1}{2}|>\frac{1}{2},\,|z+\frac{1}{2}|>\frac{1}{2}\}.

Here are premiminary technical points.

Proposition 18 Let {h_1,h_2} be continuous functions on {H}.

  1. Analytic continuation. Assume {h_1,h_2} are holomorphic. Let {O\subset H} be an open neighborhood of {\bar D}. Let {f:O\rightarrow{\mathbb C}} be a holomorphic function which satisfies the following transformation laws:
    1. {f|_k (T-1)^2=h_1},
    2. {f|_k S(T-1)^2=h_2}

    whenever both sides are defined. Then {f} extends to a holomorphic function on {H} which satisfies the same equations (a) and (b).

  2. Propagation of moderate growth bounds. Suppose that {f} is a continuous function on {H} which satisfies equations (a) and (b). Assume that {f} has moderate growth on {D}, and that {h_1} and {h_2} have moderate growth. Then {f} has moderate growth on {H}.

Now I describe our ansatz. We look for solutions in the form

\displaystyle  F(\tau,x)=e^{\pi i|x|^2\tau} +4\sin(\pi|x|^2/2)^2\int_{0}^{\infty} K(\tau,it)e^{-\pi|x|^2 t}\,dt.

Theorem 19 For dimensions {d=8,24}, there exist unique meromorphic functions {K=K^{(d)}} on {H\times H} satisfying the following properties:

  1. For a fixed {z\in H}, the poles of {K(\tau,z)} in {\tau} are all simple and contained in the {SL_2({\mathbb Z})}-orbit of {z}.
  2. The functional equations hold,

    \displaystyle  K(\tau,z)|_{d/2} (T-1)^2=0,

    \displaystyle  K(\tau,z)|_{d/2} S(T-1)^2=0,

  3. For {z\in H} and {A\in R},

    \displaystyle  Res_{\tau=z}(K|_{d/2} A)=-\frac{1}{2\pi}\Phi(A),

    where {\Phi:R/I\rightarrow {\mathbb C}} is the linear map defined by its values on the basis {\{1,T,TS,S,ST,STS\}} as follows,

    \displaystyle  \Phi(1)=0,\quad \Phi(T)=1,\quad\Phi(TS)=0,\quad \Phi(S)=0,\quad\Phi(ST)=0,\quad\Phi(STS)=0.

  4. The functions

    \displaystyle  K^{(8)}(\tau,z)(j(\tau)-j(z))\Delta(\tau)\Delta(z),\quad K^{(24)}(\tau,z)(j(\tau)-j(z))\Delta(\tau)\Delta^2(z)

    are in the class {P} both as functions of {\tau} and {z}. Moreover

    \displaystyle  K^{(8)}(\tau,z)=O(|\tau e^{2\pi i\tau}|),\quad \tau^{-4} K^{(8)}(-1/\tau,z)=O(|\tau e^{2\pi i\tau}|),

    \displaystyle  K^{(24)}(\tau,z)=O(|\tau e^{4\pi i\tau}|),\quad \tau^{-12} K^{(24)}(-1/\tau,z)=O(|\tau e^{4\pi i\tau}|),

    as {\Im m(\tau)} tends to infinity.

The residue constraint is found by plugging in the answatz, switching contours and fitting residues.

6.6. Idea of the proof of Theorem 19

The proof consists in providing explicit formulae for these functions. Assume that a solution {K} exists.

The group ring {R} comes equipped with an involution {\ast} such that {(\gamma)^\ast=\gamma^{-1}} for group elements. The slash operation, when performed on the {z} variable, relates to the usual slash operation.

Lemma 20 Given matrices {M,N\in SL_2({\mathbb Z})},

\displaystyle  res_{\tau=Nz} K|^z_{2-d/2} M=res_{\tau=z} K|^\tau_{d/2} NM^\ast.

We introduce one more ideal,

\displaystyle  \tilde I:=\{M\in R\,|\,\phi(NM^\ast)=0\,\forall N\in R\}.

Fix {M\in \tilde I}. For all {\in SL_2({\mathbb Z})},

\displaystyle  res_{\tau=Nz} K|^z_{2-d/2} M=res_{\tau=z} K|^\tau_{d/2} NM^\ast=\frac{1}{2\pi}\Phi(NM^\ast)=0.

Therefore {K|^z_{2-d/2} M} is holomorphic in the {\tau} variable. Growth conditions imply that {K|^z_{2-d/2} M} belongs to {P}, hence to {Ann_{d/2}(I,P)} as a function of {\tau}. We also knowthat this function vanishes at cusps. This implies that {K|^z_{2-d/2} M=0} for all {M\in\tilde I}.

Now we can describe the spaces {Ann_k(\tilde I,P)}. We use the fact that

\displaystyle  K^{(d)}(\tau,z)(j(\tau)-j(z))\Delta(\tau)\Delta(z)^{m(d)}\in Ann_k(I,P)\otimes Ann_k(\tilde I,P)

for suitable {k}, this gives us an explicit formula for {K^{(d)}} in terms of Eisenstein series.

For {\tau\in D} and {x\in {\mathbb R}^d}, {|x|>\sqrt{2n_0-2}} (this is required for the integral to converge), remember we set

\displaystyle  F(\tau,x)=e^{\pi i|x|^2\tau} +4\sin(\pi|x|^2/2)^2\int_{0}^{\infty} K(\tau,it)e^{-\pi|x|^2 t}\,dt.

Lemma 21 The function {\tau\mapsto F(\tau,x)} extends to a holomorphic function on an open neighborhood of {D} in {H}, and satisfies

\displaystyle  F|_k (T-1)^2=0,\quad F|_k S(T-1)^2=e^{\pi\tau|x|^2}|_{d/2}S(T-1)^2.

Posted in Course | Tagged | Leave a comment

Notes of Misha Gromov’s course on scalar curvature, lectures 2 and 3 01-03-2019

These are notes of Misha’s third lecture. I missed Misha’s second lecture. Thanks to Daniel Peres for his notes, part of which appear here, since Misha repeated and continued certain topics during the third lecture.

1. Topological results

Goal: why no {Sc>0} metrics on even dimensional tori.

1.1. Dirac method

The Dirac operator {D} acts on spinors. The space of spinors splits into {\mathcal{S}=\mathcal{S}^+\oplus \mathcal{S}^-}, {D} exchanges both summands (this is fortunate, since the full {D} is selfadjoint). Gelfand observed that the index of Fredholm operators is invariant under deformations. This raised the question of finding formulae for the index, which was done by Atiyah and Singer in 1963.

On the torus, the formula gives index {0}, which seems useless. However, there is a kernel, formed of parallel spinors. When deforming the bundle by twisting it with a flat bundle, these harmonic spinors survive, though not parallel any more. This is forced by the index theorem for families, key to Lustig’s result.

The argument extends to spin manifolds having a map of nonzero degree onto a torus. This does not apply to the direct sum the connected sum of 4-torus and complex projective plane, which is not spin (and which indeed admits {Sc>0}).

A further extension. Start with a torus with a nontrivial vectorbundle whose curvature is concentrated. Take a high order covering of the torus, transplant the bundle and arrange that curvature gets uniformly small. Alternatively, transplant the bundle on Euclidean space, then pull it back using Euclidean dilations, so that its curvature decreases. Twisting makes a suitable index nonzero. In both cases, in Lichnerowicz’ formula, the contribution of the curvature of the twisting bundle become negligible, hence the method applies and shows nonexistence of {Sc>0} (resp. {Sc} bounded below). Note that the behaviours of harmonic spinors in both constructions are very different.

Next we use the language of {C^*}-algebras. This is natural in view of the connection with Novikov conjecture (the signature of a generic fiber of a map {X^{n+4k}\rightarrow Z^n} is an invariant of {X} if {Z} is aspherical; the conjecture depends only on the fundamental group of {Z}). Given group {\pi}, introduce the group ring (with involution) {{\mathbb C}[\pi]} and the Witt group of this ring (hermitian forms mod indefinite ones). It is almost never zero. {C^*}-algebras allow for a longitudinal version for foliated manifolds (Connes).

Mischenko proved that if {X} has nonpositive curvature, then its fundamental group satisfies Novikov conjecture. The same ideas can be used to rule out {Sc>0}. One cannot use finite dimensional representations. Instead, one uses a nontrivial bundle, one obtains a bundle over universal covering {\tilde X} whose curvature is uniformly small as follows: one uses the contracting map {\tilde X\rightarrow {\mathbb R}^n} .

1.2. Minimal surfaces method

Minimal surfaces can be used to prove these and even stronger results. Schick showed that there examples where all Dirac based methods fail but Schoen-Yau applies. Here are samples of recent results.

Theorem 1 For a manifold with boundary {X} with {Sc\geq n(n-1)}, a map to the sphere collapsing the boundary to one point with nonzero degree cannot have {Lip\leq 1/2}.

Theorem 2 Consider map {f:X\rightarrow Y} such that the fundamental class of the image is nonzero in homology of {Y}. Assume {Y} has nonpositive sectional curvature. Then {X} cannot have {Sc>0}.

Proof. For simplicity, I assume that {\pi_1(Y)} is residually finite, which allows to assume that {f} is an embedding. By Poincare duality, let {Z} be a submanifold with nonzero intersection with {f(X)}. This produces a nonzero degree map of {Y\times Z} to {S^n}. Up to rescaling {Z}, its curvature does not affect the scalar curvature of the product. Now, either pull back a nontrivial bundle from the sphere and use Dirac operator, or use minimal surfaces.

Minimal surfaces and Dirac operators allow seemingly similar proofs, but a serious difference: Dirac method requires map which need not be contracting, merely area contracting. Seems that the geometry behind is different.

1.3. Back to Schoen-Yau’s initial argument

Schoen-Yau in dimension 3. In a Riemannian 3-torus with {Sc> 0}, pick a 2-dimensional homology class. Pick an area minimizing surface {Y} in it. It cannot be a sphere, so it has nonpositive Euler characteristic. Push it along its normal {\nu}, using {\psi\nu}, {\psi} a function on {Y}. Area cannot decrease, so second variation of area must be nonnegative,

\displaystyle  \int_{Y}|d\psi|^2+\frac{1}{2}(Sc(Y)-Sc(X)-|II|^2+MeanCurv)|\psi|^2\geq 0.

Here, {MeanCurv=0}. Pick {\psi=1}, get {\int_{Y}Sc(Y)>0}, contradicting Gauss-Bonnet.

Kazdan-Warner. On a closed manifold, assume operator

\displaystyle  -\Delta+\frac{1}{q_n}Sc

is positive. Then there is a conformally equivalent metric with {Sc>0}. Indeed, take first eigenfunction {f}, which is positive, and use a power of {f} as a conformal change. Essential point is that {q_n>1/2}.

Schoen-Yau. Take minimal hypersurface. From the formula of second variation of area above, we see that it is smaller than

\displaystyle  -\Delta+\frac{1}{2}Sc.

People knew the formula with Ricci curvature, but it can be expressed as the difference between ambient and induced scalar curvatures.

1.4. Singularities

Minimizing hypersurfaces can be singular. This happens in dimension {8}: there is a product of spheres {Z=S^3\times S^3\subset S^7} such that the cone over {Z} is a minimizing hypersurface of {{\mathbb R}^8}. Counterexample to Hilbert’s problem on the analyticity of solutions of elliptic PDEs?

A series of results initiated by Federer and Simons shows that singularities of minimizing hypersurfaces have codimension {8}. (At that time, Fet lost his job due to conflict with authorities, he made money by translating under cover, I signed his translations. It happened that I signed the translation of Simons’ paper on regularity of minimizing cones, although I did not catch a word of it). Schoen-Yau’s recent paper seems to overcome this difficulty, but not always. Lohkamp has an alternative approach, too complicated for me.

I explain how to eliminate singularities in dimension {8}. They are unstable in families. Idea: one can embed only countably many disjoint Y’s in the plane. We know singularities are isolated. Claim: in a family of nested minimal hypersurfaces, only countably many can be singular.

2. Geometric results

2.1. Manifolds with boundary

Either free boundary or fixed boundary. The new trick is to allow an extra term in the area functional.

In {X}, let {\Omega} be a domain with boundary {Y}. Let {\mu(\Omega)} denote a function on domains, e.g. a measure. Consider the functional

\displaystyle  Y\mapsto vol(Y)-\mu(\Omega).

Assume hypersurface {Y} is a minimizer. On {Y\times{\mathbb R}}, use metric {dx^2+f^q dt^2} where {f} is the first eigenfunction of {-\Delta+\frac{1}{2}Sc}. This has positive scalar curvature and is invariant under translation in the {{\mathbb R}} factor. Go on inductively as Schoen-Yau did. In the end, get translation invariant metric with {Sc>0} on {{\mathbb R}^n}, contradiction. This is more flexible than Kazdan-Warner’s trick, since one does not change the metric on {Y}.

Where is the {\mu} functional used? It allows to produce 1-parameter families of minimizers which fill in {X}.

Theorem 3 Let {X=T^n\times[-1,1]}. Assume that {Sc(X)\geq n(n-1)}. Then distance between boundaries is {\leq 2\pi/n}.

This is sharp. The same is true if torus is replaced with a manifold which does not admit {Sc>0}, provided Lohkamp’s argument is correct.

Proof. Use the same symmetrization: put a minimal hypersurface in {X}, and so on.

Corollary. Lower bound on principal curvatures of en embedded torus in the unit Euclidean ball.

Remark. The argument does not extend to a product of 2-spheres. Very strange that this purely Euclidean corollary cannot be proved directly.

Another corollary. Consider a cube with {Sc\geq n(n-1)}. The minimum distance between pairs of opposite faces is {\leq 2/\sqrt{n}}.

2.2. Sharp results

I believe that all topological consequences of {Sc>0} can be obtained via minimal surfaces only. On the other hand, for geometric consequences, there is probably some extra power in the Dirac method.

Theorem 4 Let {X} have {Sc\geq 0} and boundary {Y} with {MeanCurv\geq n-1}. Assume there is a smooth distance decreasing map from {Y} to {S^{n-1}}. Then map is either an isometry or is contractible.

The study of geometric consequences of {Sc>0} began with the following

Theorem 5 (Ciriza-Llarull) Let {X^n\rightarrow S^n} be an area-decreasing map. If {X} is spin and has {Sc\geq n(n-1)}, map is either isometric or contractible.

Goette-Semmelmann extended this to the case when the target is a manifold with positive curvature operator.

Llarull’s arguments uses a nontrivial bundle on the sphere. What is the optimal bundle on the sphere from the point of view of curvature? The {\mathcal{S}^+} spin bundle.

Here is a polyhedral variant. It reduces to the closed smooth case by doubling and smoothing metrics.

Theorem 6 (Corollary of a theorem I stated last time) Consider a Euclidean polyhedron all of whose dihedral angles are {\leq \pi/2} (there are not so many of them). Cannot reduce dihedral angles while keeping {MeanCurv\geq 0}.

Can one avoid smoothing, i.e. perform the Dirac method directly for singular metrics? In the smoothing process, the Dirac operator does not converge, even its spectrum does not seem to converge.

Is this result connected to stability of spatial section of Schwarzschild universe ?

Posted in Course | Leave a comment

Notes of Misha Gromov’s course on scalar curvature 18-02-2019

1. Scalar curvature

Easy: convex hypersurfaces in Euclidean spaces. Obscure: general manifolds.

Mediators between the two:

– symplectic geometry,

– gauge theory in dimension 3,

– scalar curvature.

Each case has its own algebra attached, which seems to come from physics:

– Elliptic equations in 2 variables.

– Connections and equations associated to them.

– Clifford algebra and Dirac operators.

Level of sophistication seems to be higher in scalar curvature.

The more we study {Sc>0}, the more we realize that we badly understand even convex sets.

1.1. A property of the Euclidean ball

I will give an example, a simple question. Euclidean hypersurfaces have curvature (second fundamental form). Assume a hypersurface {Y} sits in the unit ball. For instance, {Y} could be a product of round spheres of radius {1/\sqrt(m)} (if there are {m} factors). We see that curvature increases with {m}, consequence of Pythagorean theorem. We want to break this rule: is this increase really unavoidable? What is the optimal embedding of a (topological) manifold {Y} in the ball, where we want to minimize the maximal principal curvature? It turns out that only scalar curvature helps us yet.

Theorem 1 If {Y} admits no metric with {Sc>0}, then {max curv(Y} embedded in unit ball{)} is at least const.{n}.

For the torus, this is nearly sharp in codimension 1.

For higher codimension, a weaker lower bound holds, with a sophisticated proof. It relies on a result by Lohkamp whose proof has not been published in full detail.

Theorem 1 applies to exotic spheres in dimensions {8n+1}, {8n+2} as well. Unclear wether this is sharp or not.

An elementary argument yielding the weaker bound const.{\sqrt{n}} will be given below.

So we do not fully understand the ball.

2. Definitions

2.1. Axiomatic

Not so clear what scalar curvature should be. It is a function, subject to the following axioms.

– It should be additive under Pythagorean sums,

\displaystyle  Sc(X_1\times X_2,g_1\oplus g_2)=Sc(X_1,g_1)+Sc(X_2,g_2).

– It should scale like a curvature: if distances are doubled, scalar curvature should be divided by 4.

– In coordinates, {Sc} should be linear in second derivatives.

– The expression for {Sc} does not depend on choice of coordinates.

– The scalar curvature of the 2-sphere should be 2.

These axioms uniquely determine {Sc}.

2.2. Volumes

We can interpret the assumption {Sc(X_1) < Sc(X_2)} as follows: for small enough balls, volume of equal radius balls are larger in {X_1} than in {X_2}.

Unfortunately, this does not integrate: no global volume bounds.

2.3. Integral mean curvatures of spheres

Infinitesimal interpretation works the same. Does not integrate either. Appealing since when radius tends to infinity this should converge to the mass of general relativity.

{Sc(S^n)=n(n-1)} is forced by additivity. Does anything remain in infinite dimensions? Mysterious.

3. A bit of history

3.1. Index theorem

In 1962, Atiyah-Singer published the index theorem. The next year, Lichnerowicz deduced that there exist manifolds which do not admit {Sc>0}. The example is a Kummer surface {z_0^2 +z_1^2 +z_2^2 +z_3^2 =0} in {\mathbb{C}P^3}. Lichnerowicz used the formula

\displaystyle  D^2=\nabla^2+(1/4)Sc,

where {D} is the Dirac operator, {\nabla^2} is a positiver self-adjoint operator. (Possibly, Schrödinger already had a version of this inequality, in a paper in German I was unable to decipher).

In 1972 Atiyah-Singer had a version of index theorem mod 2, Hitchin used it to show that certain exotic spheres do not admit {Sc>0}.

The logic of both arguments is the same: topology forces the existence of harmonic spinors (kernel of {D}), forbidden by Lichnerowicz’s formula. The proof tells nothing geometric about the metric.

3.2. Minimal hypersurfaces

Schoen-Yau introduced minimal hypersurfaces. {Sc>0} is the only known ground yet where Dirac operator and minimal hypersurfaces interact.

Since Riemann, we know that for every Riemannian metric, near every point, there is a Euclidean metric {g_0} such that {g-g_0=O(\epsilon^2)}, which is a priori unexpected. Therefore to define principal curvatures of hypersurfaces in Riemannian manifolds, one can replace the ambient metric with a Euclidean metric. Note that second fundamental form only depends on the affine structure (visible in Newton’s first law).

Consider offsets {Y_t} of {Y} (i.e. equidistant hypersurfaces). The induced metrics {h_t} satisfy

\displaystyle  \frac{\partial h_t}{\partial t}=A_t,

where {A_t} is the second fundamental formula. Let {A^*_t} denote the symmetric operator expressing quadratic form {A_t} in terms of the metric. In turn,

\displaystyle  \frac{\partial A^*_t}{\partial t}=-(A^*_t)^2+B_t,

where {B_t} comes from the ambient curvature.

Challenge: deduce this formula from the computation for one single example, the circle in the plane, by functoriality. I do not know how to do this.

Gauss’ theorema egregium relates ambient and intrinsic curvature:

\displaystyle  \kappa(Y)=\kappa(X)+\lambda_1\lambda_2,

where {\lambda_i} are principal curvatures. For scalar curvature, this implies

\displaystyle  Sc(Y)=Sc(X)+\sum_{i\not=j} \lambda_i\lambda_j.

Challenge: deduce this formula from the computation for one family of examples, the spheres in Euclidean spaces, by functoriality. Idem, I do not know how.

3.3. Proof of the lower bound on max curv for hypersurfaces in a ball

Our goal is the lower bound

\displaystyle  \max \lambda_i \ge \mathrm{const.}\sqrt{n},

when hypersurface {Y} in unit ball does not admit positive scalar curvature. E.g. a torus.

Note that the unit {n}-ball projectively embeds into the unit {n}-sphere with small metric distorsion. So we can assume that Y is embedded in the {n}-sphere. The consequence of Gauss’ theorema egregium gives

\displaystyle  Sc(Y)=(\sum \lambda_i)^2-(\sum\lambda_i^2)+n(n-1).

At a point where {Sc(Y)\le 0},

\displaystyle  \sum_{i=1}^{n-1}\lambda_i^2\ge (\sum \lambda_i)^2+n(n-1)\geq n(n-1),


\displaystyle  \max\lambda_i\geq \sqrt{n}.

4. Sign issues

Why do I focus on {Sc>\sigma} and not on {Sc<\sigma} ? Here are theorems showing that these conditions behave dramatically differently.

Theorem 2 The subspace {G+} of Riemannian metrics with {Sc\ge 0} is {C^0} closed, whereas {G-} is dense.

The second fact is due to Lohkamp. The first uses Dirac operator. A more recent proof uses Ricci flow (Bamler).

This is reminiscent of an analogous result in symplectic geometry (due to Eliashberg) which has been a clue that symplectic geometry could develop in a meaningful topological theory.

5. Analogy with mean curvature

The mean curvature of a hypersurface {Y} is {Mean=\sum\lambda_i}. Rotate a highly oscillating curve along an axis, get a surface that can be approximated by surfaces with positive mean curvature. So positive mean curvature does not imply convex at all (beware that immersed is far less restrictive that embedded in this context).

Consider boundaries of tubular neighborhoods of piecewise smooth subsets of codimension 2 (can have boundary). Often they have positive mean curvature. Scalar curvature is positive except where pieces meet together.

Question: can one deform a hyperplane on a compact set while keeping {Mean\geq 0} ?

Answer is no: if it were, one could get {Sc\ge 0} and {Sc\not=0} on a torus. Does there exist a more elementary argument for {Mean \geq 0} ? I think so.

One can connect narrow convex sets with tiny tubes so that {Mean \geq 0}. One can produce large {Mean\geq 0} perturbations of a round codimension 1 sphere.

Theorem 3 In a {Mean \ge n-1} hypersurface, let {Z} be a hypersurface bounding a {Mean\geq 0} manifold with boundary {Y_0}. Then there exists an other filling {Y_1} of {Z} contained in the 1-neighborhood of {Z}.

Proof is short. Assume no such filling exists. Then, by Poincare duality, there exists a 1-cycle in {\mathbb{R}^n \setminus Z+1} (1-neighborhood of {Z}) which intersects all fillings. Let a unit ball move along that cycle. When it touches {Y_0} (you can assume that this happens on the right side, by topology), the maximum principle applies.

We need the following fact: If {X} has {sect curv\geq \kappa} and {Y} has {sect curv \leq \kappa}, then every distance decreasing map of a subset {A} of {X} to {Y} extends to distance decreasing map {X} to {Y} (Lang-Schroeder). Not hard once you know the trick.

Note the role of topology in the argument. This is pervasive whenever dealing with mean or scalar curvature (even the Dirac operator arguments emphasize the role of the fundamental cycle defined by a closed manifold in various theories).

An alternate (kind of dual) form of the theorem. Let {\Delta}, {\Delta'} be simplices in Euclidean space. If all dihedral angles of {\Delta} are less than the corresponding angle in {\Delta'}, then {\Delta} and {\Delta'} are isometric.

Adiprasito claims that this holds more generally, using results from Hodge theory.

Posted in Course | Leave a comment

Notes of Leonid Polterovich’ Orsay lecture 03-04-2018

Persistence modules and barcodes in symplectic geometry and spectral geometry

1. Hamiltonian diffeomorphisms

Arnold:“Symplectic topology has the same relation to ordinary topology as Hamiltonian systems have to general dynamical systems”.

Already surfaces are difficult examples.

Hofer’s length on Hamiltonian diffeomorphism groups {Ham}. A path is determined by a path of normalized functions {F_t}. Its length is

\displaystyle \int_0^1 \|F_t\|_\infty \,dt.

This defines a kind of biinvariant Finsler structure on {Ham}. The corresponding distance is nondegenerate (Hofer, Polterovich, Lalonde-McDuff). It is essentially the only one (Buhovsky-Ostrover). Existence of this metric is remarkable. It is remiscent of commutator norms on finitely generated groups.

Autonomous Hamiltonian diffeos (generated by time-independent Hamiltonians {F}) correspond to 1-parameter subgroups of {Ham}. They admit roots of any degree. T-Such flows conserve energy. They are geodesics in Hofer’s metric (but not the only geodesics).

{Ham} has interesting algebraic properties. It is algebraically simple. Let {Powers_k} be the set of Hamiltonian diffeos admitting a root of order {k}. Is {Powers_k} metrically dense in {Ham} ? I.e. is

\displaystyle  p_k(M):=\sup_{\phi\in Ham} d(\phi,Powers_k)

finite? We conjecture that this is never the case.

Theorem 1 (Polterovich-Shelukin) Let {\Sigma} be a closed surface of genus {\geq 4}. Let {M} be an arbitrary closed manifold with {\pi_2(M)=0}. Then, for {k} large,

\displaystyle  p_k(\Sigma\times M)=+\infty.

This has been improved since by Jun Zhang, Polterovich-Shelukin-Stojisavljevic.

Our tools are Floer theory and persistence modules and their barcodes.

2. An example

In 2 dimensions, autonomous Hamiltonian flows are integrable, i.e. deterministic. Thus we look for chaotic Ham diffeos. Like in an eggbeater, combine two integrable diffeos performing mere shear motions, but on intersecting annuli. As soon as 1992, physicist Franjione-Ottini studied such linked twist maps. A parameter {\lambda} (strength of shear motion) is introduced. As {\lambda} tends to {+\infty}, we show that the distance of resulting diffeo {\phi_\lambda} to {Power_k} tends to infinity.

We study periodic orbits in special free homotopy classes. Handles are needed to separate periodic. Our example fails on the 2-sphere, as shown by Khanevsky.

Our invariant survives stabilization by dimension: product with identity does the job.

3. Motivation

3.1. Dynamics

In dynamics, it has been known for a long time that vectorfields generate few diffeomorphisms (Palis, Brin 1973). In {Ham}, non autonomous Ham. diffeos contain a {C^\infty}-dense open set (Salamon-Zehnder, Ginzburg-Gurel), for symplectically aspherical manifolds. Our methods upgrade {C^\infty} open to Hofer-open, and make it quantitative.

3.2. Coarse geometry of {Ham}

Polterovich-Rosen: a {C^\infty}-generic Hamiltonian generates a nondistorted 1-parameter subgroup, distance to identity grows linearly.

In fact, the only other known behaviour is boundedness.

When I proved that {Ham} has infinite diameter (for surfaces), Misha Kapovitch asked me wether {Ham} did lie in a bounded neighborhood of a quasigeodesic. Our main theorem shows that this does not happen for {\Sigma\times M}.

3.3. Milnor’s constraint

In 1983, Milnor observed that if a diffeo is a square, {\phi=\psi\circ\psi}, the number of primitive geometrically distinct 2-periodic orbits is even. Indeed, {\psi} induces a {{\mathbb Z}/2} action of such orbits.

Find more restrictions on powers.

4. Barcodes

Edelsbrunner, Harer, Carlsson, in the context of topological data analysis. Has developped into a very abstract subject.

A barcode is a finite collection of intervals {I_j} with multplicities {m_j}. The bottleneck distance is defined as follows. Erase intervals of length {<\delta} and match the remaining intervals up to error {\delta}. Then infimize {\delta}.

Given a field {\mathbb{F}}, a persistence module is a pair {(V,\pi)} of finite dimensional {\mathbb{F}}-vectorspaces {V_t}, {t\in{\mathbb R}}, and maps {\pi_{st}:V_s\rightarrow V_t}, {V_s=0} for {s << 0}. Commuting diagrams. One assumes regularity: for all but a finite number of jump points in {{\mathbb R}}, {\pi_{st}} are isomorphisms, together with semicontinuity at jump points.

Interval module is the tautological 1-dimensional persistence module supported on an interval.

Structure theorem: every persistence module is associated with a unique barcode, as the sum of intervals modules of the intervals of the barcode.

4.1. Original example: Morse theory

{X} closed manifold, {f:X\rightarrow{\mathbb R}} Morse function. Persistence module is

\displaystyle V_t(f):=H_*(\{f<t\},\mathbb{F}).

Inclusions induce persistence morphisms.

The following statement is a not that easy theorem.

Robustness: The map {(C^\infty(X),\|.\|_\infty)\rightarrow (\{barcodes\}, d_{bot})} is Lipschitz.

It follows that one can define critical points of merely continuous functions.

4.2. Morse homology

I need be more specific with the homology theory I use.

Let {f} be a Morse function on {X} and {\rho} a generic Riemannian metric. The Morse complex {C_t} is spanned by the critical points of {f} with value {f(x)<t}. The differential counts the number {n(x,y)} of gradient lines of {f} connecting critical points,

\displaystyle  dx=\sum n(x,y)y.

4.3. Floer theory

Born in 1988. In symplectic topology, the role of {X} is played by infinite dimensional manifold {LM} of contractible loops {z:S^1\rightarrow M}. Given a 1-periodic Hamiltonian {F(x,t)}, define action functional

\displaystyle  A_F(z)=\int_0^1 F(z(t),t)\,dt-\int_D\omega,

where {D} is an arbitrary disk spanning {z}.

The original action functional {\int_D\omega} has only one critical point of infinite index and coindex. The perturbation {A_F} is much more interesting, since its critical points correspond to 1-periodic orbits of the Hamiltonian flow generated by {F}.

The solutions of the gradient equation are pseudoholomorphic cylinders. Therefore (Gromov 1985) they constitute a Fredholm problem. Although the gradient equation has no local (in time) solutions, the boundary value problem of gradient connections of critical points is well-posed. Therefore, one gets a well-defined complex, and homology groups {HF}, this is Floer homology.

Under certain asumptions (asperical, atoroidal,…), the corresponding persistence module {V=HF(\{A_F<s\})} depends only on the time 1 map {\phi\in Ham(M,\omega)}.

One can extend the construction to noncontractible loops.

Theorem 2 (Polterovich-Shelukin) The map

\displaystyle (Ham,d_{Hofer})\rightarrow(\{barcodes\},d_{bot})

is Lipschitz.

Hence Lipschitz functions on barcodes yield numerical invariants of Ham diffeos. Powers give rise to {{\mathbb Z}/p} representations on persistence modules. A Floer-Novikov variant has been developed by Usher-Zhang.

4.4. Representations on persistence modules

Since {\phi \phi^2 \phi^{-1}=\phi^2}, {\phi} acts on the Floer homology of {\phi^2}, this is a {{\mathbb Z}/2} action {T}. Define {L(\phi)\subset HF(\phi^2)} be the {-1}-eigenspace of {T}. There is a corresponding persistence module. If {\phi=\psi^2}, then {\psi} induces action {S} on {L\psi^2)}. Then {S^2=T=-1} on it. Thus the multiplicity of each bar in {L(\psi^2} is even. This is reminiscent of Milnor’s constraint.

Observation. Distance to full squares is controlled by stable multiplicity, i.e. parity of the dimension of eigenspaces.

5. Barcodes for eigenfunctions on surfaces

Joint work with Iosif Polterovich and Vukasin-Stojisavljevic. Elaborates on 2006 work with Misha Sodin.

5.1. Oscillation

Question. Given a closed oriented surface, and a smooth Morse function {f} on {M}, how can one define the oscillation of {f}?

The Banach indicatrix is defined as follows. Let {\beta(c,f)} denote the number of connected components in {f^{-1}(c)}. Let

\displaystyle  I(f)=\int_{\mathbb R} \beta(c,f)\,dc.

Goes back to Kronrod and Vitushkin in the 1950’s. In the 1980’s, Yomdin rediscovered it, with the idea that if derivatives of {f} are not too large, {I(f)} should be small.

Define the total length of the finite part {B_*(f)} of the barcode as

\displaystyle  \Phi(f):=\sum_{J\in B_*(f)}length(J).

The following is an easy fact from surface topology.

Theorem 3 (PPS 2018)

\displaystyle  \max f -\min f +\Phi(f)\leq I(f).

5.2. Example

Consider the Reeb graph of {f} (space of connected components of fibers). {f} descends to a function on it. Then {f} is equal to the total variation of {f} on its Reeb graph. The difference with {\max f -\min f} can be seen on the barcode.

6. Results on eigenfunctions

Main question. Bound oscillation via analytic properties of functions. Fix a Riemannian metric on surface {M}. Let {\Delta} be the Laplacian. For {\lambda>0}, consider

\displaystyle  F_\lambda=\{f\in C^\infty(M)\,;\, \|f\|_2=1,\,\|\Delta f\|_2\leq \lambda\}.

The study of the topology of eigenfunctions of the Laplacian has a long history. Richard Courant showed that the number of nodal domains (connected components of {\phi\not=0}) is {\leq n+1} for the {n}-eigenfunction.

Theorem 4 (Polterovtch-Sodin 2006) For {f\in F_\lambda},

\displaystyle  I(f)\leq k_g(\lambda+1).

Corollary 5

\displaystyle  \max f-\min f+\Phi(f)\leq k_g(\lambda+1).

6.1. Remarks

1. For Euclidean domains, for Dirichlet boundary values, Alexandrov-Backelman-Pucci-Cabre 1995 show that

\displaystyle  \max f -\min f \leq Const.\,\|\Delta f\|_2.

2. On the square 2-torus, {f_n(x,y)=\sin(nx)\sin(ny)} satisfy {\|\Delta\|_2 \sim n^2}, and barcodes {B_*(f_n)} have {\sim n^2} bars of length {\sim 1}. This is sharp.

3. Say a critical value {\alpha} of {f} is {\delta}-significant if it is an endpoint of a bar of length {\geq \delta}. From our theorem, it follows that

Corollary 6 If {f\in F_\lambda}, the number of {\delta}-significant critical values of {f} is {\leq k_g(\lambda+1)}.

4. Nicolaescu has shown that the expectation of the number of critical points of random linear combinations of eigenfunctions is {\sim\lambda}.

6.2. Example


\displaystyle  f_i(x)=\frac{1}{\sqrt{\pi(2+N_i^{-4})}}(1+\frac{1}{N_i^2}sin(N_i x))

has a number of critical points that tends to infinity, whereas for every fixed {\delta>0}, the number of {\delta}-significant critical values stays bounded.

7. Approximation theory

Let {f}, {g} be smooth functions on {M}. In {C^0}-norm, what is the best approximation of {f} by functions of the form {g\circ \phi}, {\phi\in Diff(M)}?

Let {B(f)} be the barcode of {H_*(\{f<t\})}. Then {B(f)} is Diff-invariant. By robustness theorem,

\displaystyle  d_{bot}(B(f),B(g))=d_{bot}(B(f),B(g\circ\phi))\leq\|f-g\circ \phi\|_\infty.

Thus we get a lower bound from barcodes.

7.1. Example

Question. Given a smooth function {f} on the 2-sphere, find optimal approximation of {f} by a function with 2 critical points.

If {f} has 2 index 1 critical values {a<b}, the lower bound {d_{bot}(\mathbb{F}(a,b),\emptyset)=\frac{b-a}{2}} is sharp.

7.2. Approximation with eigenfunctions and their images by changes of variables

From our theorem, it follows that

Corollary 7 Let {f} be a Morse function on {M}. Assume that {\Phi(f)} is large compared to {\lambda}. Then

\displaystyle  dist_{C^0}(f,F_\lambda\circ Diff(M))\geq \frac{\Phi(f)}{2|B_*(f)|},

which, in turn, is the half of the average bar length.

8. Proof

Our goal is to prove that {\|f\|_2=1}, {\|\Delta f\|_2\leq \lambda} imply that {I(f)\leq k_g(\lambda+1)}.

I do all computations on the square torus, for simplicity. Let {H} denote the Hessian of function {f}.

Equip the unit tangent bundle {ST^2} with the obvious metric {dx^2+dy^2+d\phi^2}. Look at a connected component {\gamma} of a regular fiber {f^{-1}(c)}, parametrized by arclength. Let {\tilde\gamma} be its lift to {ST^2} via its field of normals. Since {\tilde\gamma} is not contractible, its length is {\geq 1}.


\displaystyle  length(\tilde\gamma)\leq \int_\gamma \sqrt{1+\frac{|H|^2}{|\nabla f|^2}}\,dt,

the total length of the lift {\widetilde{f^{-1}(c)}} is {\geq\beta(c,f)}, hence

\displaystyle  \begin{array}{rcl}  I(f)&\leq &\int_{\mathbb R} \beta(c,f)\,dc\leq\int_{\mathbb R} L(c)\,dc\\ &\leq&\int_{\mathbb R} \int_\gamma \sqrt{1+\frac{|H|^2}{|\nabla f|^2}}\,dt\\ &\leq& k(\int_{M}(|\nabla f|^2+|H|^2))^{1/2}\\ &\leq & k(\|f\|_2+\|\Delta f\|_2), \end{array}

by coarea formula, Cauchy-Schwartz (plus Bochner-Lichnerowicz in the general case).

Posted in seminar | Leave a comment

Notes of Sebastien Bubeck’s course 13-03-2018

Bandit regret optimization

Blackboard course. Slides available on the PGMO website. Video available in a few weeks.

1. Introduction to regret analysis

1.1. Framework

At each time step {t}, the player selects an action {i_t\in[n]} at random, according to a distribution {p_t\in\Delta_n} which is independent of the past. Simultaneously, the adversary selects a loss {l_t:[n]\rightarrow[0,1]}. The loss suffered by the player is {l_t(i_t)}. Full information would mean that {l_t\in {\mathbb R}^n} is observed. Bandit means that only {l_t(i_t)} is observed.

The benchmark, to which other strategies will be compared, is

\displaystyle \begin{array}{rcl} i^*_t=\mathrm{arg}\min_{i\in[n]} \mathop{\mathbb E}(\sum_{i=1}^n l_t(i)). \end{array}

My loss {L_T} must be compared with the optimal loss {L^*_T},

\displaystyle \begin{array}{rcl} L_T=\mathop{\mathbb E}(\sum_{t=1}^T l_t(i_t)),\quad L^*_T=\mathop{\mathbb E}(\sum_{t=1}^T l_t(i^*_t)). \end{array}

The regret is the difference {R_T=L_T-L^*_T}.

The machine learning mindset is that a single picture does not help much, but a large dataset of actions allows for a good strategy. So {[n]} is a large set, eventually with some structure.

Think of an oblivious adversary, like nature. Adversary merely means that we cannot make any stochastic assumption about it. For instance {i^*} does not represent what a clever adversary would have done when player applies a constant strategy.

Interpret {R_T/T} as the rate at which {i^*} is identified. How large should one take {T} in order that {R_T/T<\epsilon}?

If there is a good action, we intend to find it. If there is none,…

1.2. Some applications

Add placement. The player runs a website. Action = choose an add. Adversary is a potential customer, who clicks on the add or not. Best

Packet routing. Graph with source and target nodes. Action = choose a path. Huge set of actions, but with a lot of structure (distance between paths).

Hyperparameter optimization.

AI for games, i.e. efficient tree search. In AlphaGo, a determinist game, randomness was introduced to improve exploration of the set of actions.

1.3. Overview of results

\textbf{Full information with finite actions} There,

  1. {R_T\geq\Omega(\sqrt{T\log n})} for pure noise (case when the past gives no information).
  2. There exists an algorithm which achieves {R_T\leq O(\sqrt{T\log n})}.

I will give two proofs, an algorithmic one, and a more conceptual one where {\log n} appears as an entropy.

\textbf{Easy data} I.e. when there is more signal in the data than pure noise. Then {R_T\leq O(\sqrt{L^*_T\log n})}.

\textbf{Bandit with finite action} Then {R_T=\Theta(\sqrt{Tn})}. Bad news: in absence of structure, strategy is slow.

\textbf{Large and structured action set} This uses deeper mathematics (Bernoulli convolution, geometry of log-concave measures).

\textbf{Better benchmarks}

2. Full information

2.1. An algorithm: multiplicative weights

Rediscovered many times. See Vovk 1990, Freund-Shapira 1996, Littlestone-Warmuth 1994.

Assume oblivious adversary, and {l_{i,t}:=l_t(i)\in\{ 0,1 \}}.

Idea: keep a weight {w_{i,t}\in{\mathbb R}_+} for each action {i}. Update

\displaystyle \begin{array}{rcl} w_{i,t+1}=(1-\eta l_{i,t})w_{i,t}. \end{array}

Play from {p_t=\frac{w_t}{\|w_t\|_1}}.

Key intuition: Say {i^*} is correct at round {t}, then {w_{i^*,t}} remains and others shrink, so {p_{t+1}} gets closer to {\delta_{i^*}}. Otherwise, {i^*} pays for it a fixed factor {1-\eta}.

Theorem 1

\displaystyle \begin{array}{rcl} L_T< (1+\eta)L^*_T+\frac{\log\eta}{\eta}. \end{array}

Optimizing in {\eta} yields

\displaystyle \begin{array}{rcl} R_T\leq 2\sqrt{L^*_T \log n}. \end{array}

Proof Liapunov function based argument. Consider {u_t=\|w_t\|_1}. Then

\displaystyle \begin{array}{rcl} u_{t+1}&=&\sum_i (1-\eta l_{i,t})w_{i,t}\\ &=&u_t(1-\eta\langle p_t,l_t\rangle)\\ &\leq&u_t e^{-\eta\langle p_t,l_t\rangle}, \end{array}


\displaystyle \begin{array}{rcl} u_t\leq u_0 e^{-\eta L_t}, \end{array}

since {L_t=\mathop{\mathbb E}(\sum_t\langle p_t,l_t\rangle)}. Note that {u_0\leq n}. On the other hand,

\displaystyle \begin{array}{rcl} w_{i,T+1}=(1-\eta)^{L_{i,T}}, \end{array}


\displaystyle \begin{array}{rcl} (1-\eta)^{L^*_T}\leq n e^{-\eta L_T}, \end{array}

\displaystyle \begin{array}{rcl} L_T\leq (1+\eta)L^*_T +\frac{\log\eta}{\eta}, \end{array}

as announced. End of proof.

2.2. Lower bound (pure noise scenario)

Theorem 2 For every algorithm, there exists numbers {l_1,\ldots,l_T\in[-T,T]^n} such that

\displaystyle \begin{array}{rcl} R_T\geq \sqrt{2T\log n}(1+o(1)). \end{array}

Proof. Probabilistic method. A lower bound on expected regret (with random losses) is given.

Pick {l_{i,t}} at random from a Rademacher distribution ({\pm 1} uniformaly independently). Then {\mathop{\mathbb E}(L_T)=0},

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(L^*_T)=\mathop{\mathbb E}\max_{i\in[n]}\sum_t l_{i,t}\sim \sqrt{2T\log n} \end{array}

according to the central limit theorem. Indeed, the max of {T} independent gaussians has size {\sqrt{T}}. End of proof.

2.3. A more principled approach

Based on Abernethy-Warmuth 2008, Bubeck-Dehel-Koren-Peres 2015.

Say players picks {a_t\in\mathcal{A}} and adversary picks {l_1,\ldots,l_T\in\mathcal{L}^T}.

A deterministic strategy is given by a map {\underline{a_t}:\mathcal{L}^{t-1}\rightarrow\mathcal{A}}.

A bandit is a map {\underline{a_t}:[0,1]^{t-1}\rightarrow\mathcal{A}}.

A randomized strategy is picking at random a deterministic strategy {\Delta(\underline{\mathcal{A}})}, where {\underline{\mathcal{A}}} is the set of deterministic strategies. Kuhn’s theorem

The minimax regret consists in finding the best strategy, achieving

\displaystyle \begin{array}{rcl} \inf_{\mu\in \Delta(\underline{\mathcal{A}})}\sup_{\underline{l}\in\mathcal{L}^T}\mathop{\mathbb E}_{\underline{a}\sim\mu}(R_T(\underline{a},\underline{l})) =\inf_\mu \sup_{\nu\in\Delta(\mathcal{L}^T)}\mathop{\mathbb E}_{\mu,\nu}(R_T(\underline{a},\underline{l})). \end{array}

Sion’s minimax theorem allows to swap inf and sup: let {\phi:X\times Y\rightarrow{\mathbb R}} be convex in {x} and concave in {y}, bicontinuous, {X} compact. Then

\displaystyle \begin{array}{rcl} \inf_X \sup_Y \phi(x,y)=\sup_Y\inf_X \phi(x,y). \end{array}

Now we are in Bayesian setting, with {\nu} a prior distribution on {(l_1,\ldots,l_T)}. In particular, we know the distribution of {i^*}.

Denote by {\mathop{\mathbb E}_t:=\mathop{\mathbb E}[\cdot|l_1,\ldots,l_{t-1}]}. By convexity of the simplex, we choose to play {p_t:=\mathop{\mathbb E}_t \delta_{i^*}}, a Doob martingale. In other words, {p_t(i)=\mathop{\mathbb P}(i^*=i|l_1,\ldots,l_{t-1})}. Let us analyze it.

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(\sum_t \langle p_t-\delta_{i^*},l_t \rangle) &=&\mathop{\mathbb E}(\sum_t \langle p_t-p_{t+1},l_t \rangle)\\ &\leq&\mathop{\mathbb E}(\sum_t\|p_t-p_{t+1}\|_1). \end{array}

Each move of a martingale costs entropy. This is encoded in the following folklore inequality,

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(\sum_t\|p_t-p_{t+1}\|_1^2)\leq 2\log n. \end{array}

Assuming this bound, use Cauchy-Schwarz,

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(\sum_t\|p_t-p_{t+1}\|_1)\leq \sqrt{T\mathop{\mathbb E}(\sum_t\|p_t-p_{t+1}\|_1^2)}\leq \sqrt{2T\log n}, \end{array}

which is the optimal bound. End of proof.

The bandit case will be more subtle, since one will not be allowed to condition with respect to {l_1,\ldots,l_T}.

2.4. Proof of the folklore inequality

Recall the entropy of a finite distribution {H(p)=\sum_i -p_i \log p_i}. The relative entropy is

\displaystyle \begin{array}{rcl} Re(p,q)=-\sum_i p_i \log q_i -H(p). \end{array}

Given finite random variables {X} and {Y}, the entropy of {X} knowing {Y} is

\displaystyle \begin{array}{rcl} H(X|Y)=\sum_y p_y(y)H(p_{x|y}). \end{array}

The mutual information means how useful is {Y} when predicting {X},

\displaystyle \begin{array}{rcl} I(X,Y)=H(X)-H(X|Y)=\sum_y p_y(y) Re(p_{x|y},p_x) \end{array}

It is symmetric. Pinsker ‘s inequality states that

\displaystyle \begin{array}{rcl} \frac{1}{2}\|p-q\|_1\leq \sqrt{Re(p,q)}. \end{array}

Proof of the folklore inequality. Apply Pinsker’s inequality,

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(\frac{1}{2}\sum_t \|p_t-p_{t+1}\|_1^2)\leq \mathop{\mathbb E}(\sum_t Re(p_{t+1},p_t)). \end{array}


\displaystyle \begin{array}{rcl} \mathop{\mathbb E}_t(Re(p_{t+1},p_t))&=&\mathop{\mathbb E}_t(Re(p^t_{i^*|l_t},p_t))\\ &=&I_t(i^*,l_t)\\ &=&H_t(i^*)-H_t(i^*|l_t)\\ &=&H_t(i^*)-\mathop{\mathbb E}_t(H_{t+1}(t^*)). \end{array}


\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(\frac{1}{2}\sum_t \|p_t-p_{t+1}\|_1^2)\leq H_1(i^*). \end{array}

3. Bandit analysis

Based on Russo-Van Roy 2014.

Back to the instantaneous regret {r_t=\mathop{\mathbb E}_t\langle p_t-\delta_{i^*},l_t\rangle}. We used

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(\sum_t r_t)=\mathop{\mathbb E}(\sum_t\langle p_t-p_{t+1},l_t\rangle. \end{array}

This is not true any more in the bandit setting, where the filtration is much smaller. Nevertheless, we hope that a weaker inequality would suffice,

\displaystyle \begin{array}{rcl} r_t\leq \sqrt{I_t(i^*,(i_t,l_t(i_t)))}. \end{array}

The next argument goes back to the very first paper on bandit theory, Thompson Sampling 1933. Bandit is a trade off between exploration and exploitation. For future benefit, it is necessary to explore more.

Denote {\bar l_t(i)=\mathop{\mathbb E}_t(l_t(i))}, {\bar l_t(i,j)=\mathop{\mathbb E}_t(l_t(i)|i^*=j)}. Then

\displaystyle \begin{array}{rcl} r_t=\mathop{\mathbb E}_t\langle p_t-\delta_{i^*},l_t\rangle &=&\sum_i p_t(i)(\bar l_t(i)-\bar l_t(i,i)). \end{array}

On the other hand,

\displaystyle \begin{array}{rcl} I_t(i^*,(i_t,l_t(i_t))) &=&H_t(i_t,l_t(i_t))-H_t((i_t,l_t(i_t))|i^*)\\ &\geq& H_t(l_t(i_t)|i_t)-H_t(l_t(i_t)|i^*,i_t)\\ &=&\sum_i p_t(i) I_t(l_t(i)|i^*)\\ &=&\sum_i p_t(i) Re(l_t(i)|i^*,l_t(i))\\ &=&\sum_{i,j} p_t(i)p_t(j) Re(l_t(i)|i^*=j,l_t(i))\\ &\geq&\sum_{i,j} p_t(i)p_t(j) (\bar l_t(i)-\bar l_t(i,j))^2, \end{array}

where the last step is Pinsker’s inequality, plus the fact that {l_t(i)\in[-1,1]}. Cauchy-Schwarz inequality gives

\displaystyle \begin{array}{rcl} r_t\leq \sqrt{n I_t(i^*,(i_t,l_t(i_t)))}=\sqrt{n}\sqrt{H_t(i^*)-H_{t+1}(i^*)}. \end{array}

Plugging this in the argument of previous section, one gets

Theorem 3 In the bandit setting,

\displaystyle \begin{array}{rcl} \inf \sup R_T\leq 2\sqrt{Tn\log n}. \end{array}

The {\log n} term can be removed.

On the lower bound side, the adversary is picked at random: one is {Ber(\frac{1}{2}-\epsilon)}, the others being {Ber(\frac{1}{2})}. One needs {4/\epsilon^2} samples per arm, whence {\frac{1}{\epsilon^2}=\frac{T}{n}}, regret is {\sim \epsilon T\sim \sqrt{nT}}.

4. A primer on bandit linear optimization

Now {\mathcal{A}\subset {\mathbb R}^n}, {l_t\in{\mathbb R}^n}, and the loss of playing {a_{i_t}} is the inner product {a_{i_t}\cdot l_t}. Naively, one would assume that regret is {\sim\sqrt{T|\mathcal{A}|}}.

Theorem 4

\displaystyle \begin{array}{rcl} \inf \sup\mathop{\mathbb E} R_T \leq \sqrt{Tn\log|\mathcal{A}|}. \end{array}

This is best possible.

If {\mathcal{A}} is infinite but included in the unit ball of {{\mathbb R}^n} for some norm. Then

Theorem 5

\displaystyle \begin{array}{rcl} \inf \sup\mathop{\mathbb E} R_T =\Theta(T^{1-1/q}), \end{array}

where {q} is the cotype of the chosen norm on {{\mathbb R}^n}.

4.1. Proof of theorem

The last application of Cauchy-Schwarz needs be replaced by something smarter. Here

\displaystyle \begin{array}{rcl} \bar l_t(i)=\mathop{\mathbb E}_t(l_t(i))=\bar l_t\cdot a_i,\quad \bar l_t(i,j)=\bar l_t^j\cdot a_i. \end{array}

Introduce the (huge) {|\mathcal{A}|\times |\mathcal{A}|} matrix

\displaystyle \begin{array}{rcl} M_{i,j}=\sqrt{p_t(i)p_t(j)}(\bar l_t-\bar l_t^j)\cdot a_i . \end{array}

One needs relate the trace of this matrix to its Frobenius norm. In general,

\displaystyle \begin{array}{rcl} trace(M)=\sum \lambda_i(M)\leq\sqrt{rank(M)(\sum_i \lambda_i^2(M)}=\sqrt{rank(M)}\|M\|_F. \end{array}

Thus we win by replacing the size {|\mathcal{A}|} with the rank of matrix {M}. Here {M=AB} where {A} has size {n}, hence {M} has rank {\leq n}.

5. Mirror descent and online decision making

We focus on online linear optimization.

Data: a compact body {K\subset{\mathbb R}^n}. At each time step {t\in[T]}, the player picks {x_t\in K} and the adversary picks {l_t\in{\mathbb R}^n} (previously, {K} was the simplex) such that {|l_t\cdot x|\leq 1} for all {x\in K}. My regret is

\displaystyle \begin{array}{rcl} R_T=\max_{x\in K} \sum_t l_t\cdot(x_t -x). \end{array}

5.1. Stability as a algorithm design principle

We estimate

\displaystyle \begin{array}{rcl} R_T\leq \sum_t\|x_t-x_{t+1}\|+\sum_t l_t\cdot(x_{t+1}-x) \end{array}

The first term is a movement cost. Call the second term the one look ahead regret. The algorithm will find a trade-off between the two terms.

5.2. Gradient descent

This is {x_{t+1}=x_t-\eta l_t}. In a more invariant manner,

\displaystyle \begin{array}{rcl} x_{t+1}=\mathrm{arg}\min_{x\in{\mathbb R}^n} l_t\cdot x+\frac{1}{2\eta}\|x_t -x\|_2^2. \end{array}

Note the {\ell^2} norm is not well suited to our problem, which involves an {\ell^2} norm. Assuming that {x_1=0}, a third interpretation, in the spirit of regularization, is

\displaystyle \begin{array}{rcl} x_{t+1}=\mathrm{arg}\min_{x\in{\mathbb R}^n} \sum_{s=1}^t l_s\cdot x+\frac{1}{2\eta}\|x\|_2^2. \end{array}

We try to avoid overfit, i.e. control entropy. Hence we replace {\ell^2} norm with entropy. Also, instead of inner product, duality should be used.

5.3. Mirror descent

Initially due to Nemirovskii, see book, by Nemirovskii and Yudin, 1983.

Le {\phi:\mathcal{D}\supset K\rightarrow{\mathbb R}} be a convex function. Write

\displaystyle \begin{array}{rcl} D_\phi(x,y)=\phi(x)-\phi(y)-\nabla\phi(y)\cdot(x-y). \end{array}

Introduce the Fenchel dual function on the dual space,

\displaystyle \begin{array}{rcl} \phi^*(\theta)=\sup_{x\in{\mathbb R}^n}\theta(x)-\phi(x). \end{array}

Under mild assumptions, {\phi^{**}=\phi}.

Given {x_t\in K}, {\nabla\phi(x_t)} belongs to the dual space. Move it in the direction of {l_t}, get {\nabla\phi(x_t)-\eta l_t}. Get back to the primal space by applying {\nabla\phi^*}, get

\displaystyle \begin{array}{rcl} z_{t+1}:=\nabla\phi^*(\nabla\phi(x_t)-\eta l_t). \end{array}

Maybe this does not belong to {K}. Thus project it to {K} (in Euclidean distance). In other words,

\displaystyle \begin{array}{rcl} x_{t+1}=\mathrm{arg}\min_{x\in K}D_\phi(x,\nabla\phi^*(\nabla\phi(x_t)-\eta l_t). \end{array}

5.4. Analysis of the one look ahead term

For intuition, I explain the continuous setting. Then {t\in {\mathbb R}}, {x_t} is defined using {l_s} for all {s\in[0,t]}. Hence the one look ahead term becomes

\displaystyle \begin{array}{rcl} \int_0^T l_t \cdot(x(t)-t)\,dt \end{array}

whereas the movement cost becomes {\int_0^T \|x'(t)\|\,dt}. The continuous time mirror descent is

\displaystyle \begin{array}{rcl} x_{t+\epsilon}=\mathrm{arg}\min_{x\in K} D_\phi(x,\nabla\phi^*(\nabla\phi(x(t))-\epsilon\eta l(t)). \end{array}

The constrained optimality condition at a point {x\in\partial K} involves covectors {\theta} in the normal cone

\displaystyle \begin{array}{rcl} N_K(x)=\{\theta\,;\,\forall y\in K,\,\theta\cdot (y-x)\leq 0\}. \end{array}

It states that {-\nabla f(x^*)\in N_K(x^*)}. The gradient of {x\mapsto D_\phi(x,\nabla\phi^*(\nabla\phi(x(t))-\epsilon\eta l(t))} is {\nabla\phi(x)-\nabla\phi(x(t))+\epsilon\eta l(t)}. Hence the minimizer {x_{t+\epsilon}} satisfies

\displaystyle \begin{array}{rcl} \nabla\phi(x_{t+\epsilon})-\nabla\phi(x(t))+\epsilon\eta l(t)\in -N_K(x_{t+\epsilon}), \end{array}

hence the differential inclusion

\displaystyle \begin{array}{rcl} \nabla^2 \phi(x(t))x'(t)+\eta l(t)\in N_K(x(t)). \end{array}

Theorem 6 (Bubeck-Cohen-Lee…2017) If {l} is continuous, {\phi} is semicontinuous and {\nabla^2\phi} is continuous, the differential inclusion has a unique solution.

As a Liapunov function, we use

\displaystyle \begin{array}{rcl} D_\phi(y,x(t))=\phi(y)-\phi(x(t))-\nabla\phi(x(t))\cdot(y-x(t)). \end{array}

Let us compute the time derivative

\displaystyle \begin{array}{rcl} \partial_t D_\phi(y,x(t))&=&-\nabla\phi(x(t))\cdot x'(t)-\nabla\phi(x(t))\cdot(-x'(t))-\nabla^2\phi(x(t))x'(t)\cdot(y-x(t))\\ &=&(\eta l(t)+\lambda(t))\cdot(y-x(t))\\ &\leq&\eta l(t)\cdot(y-x(t)) \end{array}

since the Lagrange multiplier {\lambda(t)} is nonpositive on the normal cone. Integrating over time, we get

Lemma 7

\displaystyle \begin{array}{rcl} \int l(t)\cdot(x(t)-y)\,dt\leq \frac{1}{\eta}D_\phi(y,x(t)). \end{array}

Recall the multiplicative weights bound

\displaystyle \begin{array}{rcl} R_T\leq \frac{\log\eta}{\eta}+\eta L^*_T. \end{array}

We have obtained the estimate corresponding to {\frac{\log\eta}{\eta}}. Next, me move to the other term.

5.5. Movement analysis

Nothing general, here {K=\Delta_n} is the simplex equipped with the {\ell^1} norm. We hope for an estimate

\displaystyle \begin{array}{rcl} \|x't)\|_1\leq\eta l(t)\cdot y. \end{array}

A stronger estimate would be

\displaystyle \begin{array}{rcl} \|x't)\|_1\leq\eta l(t)\cdot x(t). \end{array}

Here, {x'(t)=\nabla^2\phi(x(t))^{-1}(\eta l(t)+\lambda(t))}. We may choose a suitable convex function {\phi}. To simplify matters, let us ignore {\lambda(t)} and assume {\phi(x)=\sum_i\phi(x_i)}. Then

\displaystyle \begin{array}{rcl} \|\nabla^2\phi(x(t))^{-1}(l(t))\|_1=\sum_i\frac{l_i(t)}{\phi''(x_i)} \end{array}

which is indeed estimated by {l(t)\cdot x(t)} provided we choose {\phi(x)=x\log x}. With this choice, the dynamics becomes

\displaystyle \begin{array}{rcl} x'_i(t)=-x_i(t)(\eta l_i(t)+\mu(t), \end{array}

where {\mu} is such that {\sum_i x'_i(t)=0}. This proves the wanted estimate.

5.6. Discrete time analysis

We expect an extra additive error of {\partial_t^2 D_\phi(y,x(t))} which, up to a term in {\lambda(t)}, equals

\displaystyle \begin{array}{rcl} \partial_t \eta l(t)\cdot(y-x(t))&=&-\eta l(t)\cdot x'(t)\\ &=&\eta^2 l(t)\cdot \nabla^2\phi(x(t))^{-1}l(t)\\ &=&\eta^2\|l(t)\|^2_{x(t),*}. \end{array}

In the last expression, the local norm is defined as follows. The square root of the Hessian of {\phi} defines a Riemannian metric on {K} (which blows up near the boundary).

Theorem 8 Assume that {\phi} is well-conditionned in the sense that if {\nabla\phi(y_t)\in[\nabla\phi(x_t)-\eta l_t,\nabla\phi(x_t)]}, then

\displaystyle \begin{array}{rcl} \nabla^2\phi(y_t)\geq c\nabla^2\phi(x(t)). \end{array}

Then the regret is bounded above by

\displaystyle \begin{array}{rcl} \sum_t l_t\cdot(x_t -y)\leq\frac{1}{\eta}D_\phi(y,x_1)+\frac{2\eta}{c}\sum_t \|l_t\|^2_{x_t,*}. \end{array}

Proof. Recall that {\nabla\phi(z_{t+1})=\nabla\phi(x_t)-\eta l_t} and

\displaystyle \begin{array}{rcl} x_{t+1}=\mathrm{arg}\min_{x\in K}D_\phi(x,z_{t+1}). \end{array}

We must estimate

\displaystyle \begin{array}{rcl} \eta l_t\cdot(x_t-y)&=&\nabla\phi(x_t)\cdot(x_t-y)-\nabla\phi(z_{t+1})\cdot (x_t-x_{t+1})-\nabla\phi(z_{t+1})\cdot(x_{t+1}-y). \end{array}

The last term is {\geq -\nabla\phi(x_{t+1})\cdot(x_{t+1}-y)} by optimality. On the other hand,

\displaystyle \begin{array}{rcl} D_\phi(y,x_{t})-D_\phi(y,x_{t+1}) &=&\phi(x_t)+\phi(x_{t+1})-\nabla\phi(x_t)\cdot(y-x(t))+\nabla\phi(x_{t+1})\cdot(y-x_{t+1}). \end{array}

Hence, using convexity of {\phi},

\displaystyle \begin{array}{rcl} \eta l_t\cdot(x_t-y)&\leq&D_\phi(y,x_t)-D_\phi(y,x_{t+1})+D_\phi(x_t,z_{t+1}). \end{array}

By the magic of Fenchel transform,

\displaystyle \begin{array}{rcl} D_\phi(x_t,z_{t+1})=D_{\phi^*}(\nabla\phi(z_{t+1}),\nabla\phi(x_t)). \end{array}

In turn, this last term is of the order of {\|\eta l_t\|^2_{x_t,*}}. Summing up yields the Theorem.

5.7. Revisiting multiplicative weights with mirror descent

Take {\phi(x)=\sum x_i\log x_i}, whose Hessian is the diagonal quadratic form with eigenvalues {1/x_i}. The Riemannian metric is {\|h\|^2_{x,*}=\sum_i x_i h_i^2}. Then

\displaystyle \begin{array}{rcl} \nabla\phi(x_t)-\eta l_t=\nabla\phi(z_{t+1})\iff z_{i,t+1}=x_{i,t}e^{-\eta l_{i,t}}. \end{array}


\displaystyle \begin{array}{rcl} x_{i,t+1}=\frac{x_{i,t}e^{-\eta l_{i,t}}}{\|x_{i,t}e^{-\eta l_{i,t}}\|_1}. \end{array}

Corollary 9 If {l_{i,t}\in[0,1]}, then well-conditionning holds with {c=1}.

The Theorem yields

\displaystyle \begin{array}{rcl} \sum_t l_t\cdot(x_t-y)\leq \frac{1}{\eta}D_\phi(y,x_1) \end{array}

5.8. Propensity score estimate for bandits

Again, we do not observe the full loss function, and need an unbiaised estimator for it. We look for {\hat l_t\in{\mathbb R}^n} depending on {l_t(i_t)} where {i_t} is pick according to {p_t}. Start with putting to {0} everything which has not been observed,

\displaystyle \begin{array}{rcl} \hat l_t(i)=\frac{l_t(i)1_{\{i=i_t\}}}{p_t(i)}. \end{array}

Then run mirror descent with {\hat l_t} instead of {l_t}, get the Exp3 algorithm of Auer-Cesa-Bianchi-Freund-Schapiro 1995).

Theorem 10 Exp3 satisfies

\displaystyle \begin{array}{rcl} R_T \leq \frac{\log(\eta)}{\eta}+\eta nT\leq \sqrt{Tn\log n}. \end{array}


\displaystyle \begin{array}{rcl} \mathop{\mathbb E}\langle p_t,\hat l_t^2 \rangle &\leq&\mathop{\mathbb E}\sum_i p_t(i)\frac{1_{\{i=i_t\}}}{p_t(i)^2}\\ &=&\mathop{\mathbb E}\sum_i\frac{1_{\{i=i_t\}}}{p_t(i)}=n. \end{array}

Note that the variance is finite only beause local norms are used.

5.9. More refined regularization

Change function {\phi} to make variance smaller. Negentropy gives {\nabla^2 \phi(x)^{-1}=diag(x_i)}. Instead, {\phi(x)=\sum x_i^{1-\epsilon}} gives {\nabla^2 \phi(x)^{-1}=diag(x_i^{1=\epsilon})}. Then

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}\sum_i p_t(i)^{1+\epsilon}\frac{1_{\{i=i_t\}}}{p_t(i)^2} =\sum_i p_t(i)^\epsilon\leq n^{1-\epsilon}, \end{array}

which is slightly better. On the other hand,

\displaystyle \begin{array}{rcl} -\phi(x)=\sum_i x_i^{1-\epsilon}\leq n^\epsilon. \end{array}

Plugging this in the Theorem gives regret bound {R_T\leq\sqrt{nT}} (Audibert-Bubeck 2009).

In view of easy data bounds, pick {\phi(x)=-\sum_i \log x_i} (called the log barrier). Then {\nabla^2 \phi(x)^{-1}=diag(x_i^2)}. The local norms cancel, yielding

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}\|\hat l_t\|^2_{p_t,*}=\langle p_t,l_t^2 \rangle \leq \langle p_t,l_t \rangle. \end{array}

Plugged in the Theorem, this gives regret bound {R_T\leq\sqrt{nL_T^* \log T}}.

5.10. Metrical task systems

Based on Borodin-Linial-Saks 1982. One moves among states, with a task attached to each state. At each time step, observe {l_t:[n]\rightarrow {\mathbb R}_+}. Update state {i_t\in[n]} to {i_{t+1}}. Pay movement plus task, {d(i_t,i_{t+1})+l_t(i_{t+1})}.

It is hopeless to try to do as well as the best guy. The ibjective is to do it up to a multiplicative constant. So he regret is replaced with a competitive ratio

\displaystyle \begin{array}{rcl} \frac{\sum_t d(i_t,i_{t+1})+l_t(i_{t+1})}{\sum_t d(i^*_t,i^*_{t+1})+l_t(i^*_{t+1})}, \end{array}

where {i^*_1,\ldots,i^*_T} are the optimizers.

If {i_t} is picked according to distribution {p_t}, then {\mathop{\mathbb E} d(i_t,i_{t+1})=W_1(p_t,p_{t+1})} is Wasserstein distance, and {\mathop{\mathbb E}(l_t(i_{t+1}))=\langle p_{t+1},l_t \rangle} is a one look ahead term.

Example. If all distances are equal to 1, {W_1} is the {\ell^1} distance on distributions.

Our convex function {\phi} should be {L}-Lipschitz with respect to the Wasserstein distance.

Theorem 11 (Bubeck-Cohen-Lee… 2018) The competitive ratio of mirror descent is bounded above by {L} times the terme arising from mouvement cost analysis.

Theorem 12 (Borodin-Linial-Saks 1982) For the trivial distance, the competitive ratio is at least {\log n}.

Conjecture. This is true for any metric.

My intuition is that the metric can only help us. Furthermore, there is an avatar of entropy, called multiscale entropy, associated with any metric. Usual entropy is well suited only to the trivial distance.

Posted in Course | Leave a comment

Notes of Journee francilienne d’accueil des post-doctorants en mathematiques

Journee francilienne d’accueil des post-doctorants en mathematiques

1. Eleonora di Nezza: Special metrics in Kähler geometry

I work in Kähler geometry. It has to do with complex manifolds.

Let us start with the uniformization theorem. It implies that compact Riemann surfaces (complex 1-manifolds) admits metrics of constant curvature, the sign of which is determined by the sign of Euler characteristic. This is tremendously useful in the study of the space of Riemann surfaces.

In higher dimensions, do complex manifolds admit special metrics? Kähler metrics are Hermitian metrics which satisfy an extra first order intergability condition: they osculate a flat metric one order more that a general hermitian metric. This is expressed via the imaginary part of the Hermitian metric, which is a differential 2-form {\omega}: {\omega} has to be closed. Alternatively, two connections are associated to a Hermitian metrics, the Riemannian Levi-Civita connection, and the Chern connection. The metric is Kähler iff these two connections coincide.

Nevertheless, a given complex manifold admits many Kähler metrics. To single a preferred one, we require that its Ricci curvature be constant. We call them Kähler-Einstein metrics, since constancy of Ricci curvature is the content of Einstein’s equations for vacuum with a cosmological consant.

Not every complex manifold admits KE metrics. Indeed, Ricci curvature, viewed a differential 2-form, is a representative of the first Chern class of the (complex) tangent bundle. In the KE case, Ricci curvature is proportional to the Kähler form. For a complex manifold, the fact that the first Chern class can be represented by a closed {(1,1)}-form which is positive definite (resp. zero, resp. negative definite) is rather restrictive. Let us assume that this condition is satisfied. Then Calabi showed that KE equations are equivalent to a scalar second order nonlinear PDE of complex Monge-Ampère type. The problem splits into 3 different cases. The negative and zero cases were solved in the late 1970’s. The positive case is very recent (Chen-Donaldson-Sun, Tian), after decades of co siderable work by a large community.

The present trend is to extend the theory to singular varieties. Indeed, the classification program (“Minimal Model Program”) requires to handle singular varieties. The unknown in the complex Monge-Ampère equation is then a real function defined on the complement of a fixed divisor (i.e. complex mildly singular hypersurface).

Guedj-Zeriahi 2007: existence and uniqueness (up to an additive constant) of a weak solution. With Lu (then in Orsay), I proved in 2014 that their solution is smooth in the complement of the divisor. This is analysis, dealing with a degenerate equation. The french school (Boucksom, Essidieux, Guedj, Zeriahi) has introduced a pluripotential theory which is very efficient for this problem.

2. Claudio Llosa Isenrich : Kähler groups from branched covers of elliptic curves

A complex manifold is a real manifold with a preferred atlas ro open subsets of {{\mathbb C}^n} where changes of charts are holomorphic diffeomorphisms. A Kähler manifold is a complex mnifol with a Hermitian metric whose imaginary part, a differential 2-form, is closed.

The most important examples are smooth complex submanifolds of complex projective space (the space of lines in {{\mathbb C}^{n+1}}). They are defined as zero sets of homogeneous polynomials in {n+1} variables.

The fundamental group of a topological space is the set of homotopy classes of based loops. Loops can be concatenated, which gives rise to a group structure. This is the most ancient, and still the most important, algebraic invariant of topology.

Say a group is Kähler if it is isomorphic to the fundamental group of a compact Kähler manifold. In the 1950’s, Serre raised the question of which groups are Ka\”hler. Notes that every (finitely presented) groups is isomorphic to the fundamental group of a compact complex manifold.

Here are examples. Even rank free abelian groups {{\mathbb Z}^{2n}} are Kähler. Surface groups (fundamental groups of compact surfaces) are Kähler. Serre showed that all finite groups are Kähler. The class of Kähler groups is able under taking direct products and finite index subgroups.

Here are restrictions. The first Betti number of a Kähler is even (this rules out even rank free abelian groups, free groups).

My contribution is a construction of new examples of Kähler groups, insped by Dimca-Papadima-Suciu 2009. Let {E={\mathbb C}/{\mathbb Z}^2} be a complex torus (also called a complex elliptic curve). For {r\geq 3}, consider branched covers of {E}, {f_i:S_i\rightarrow E}, {1\leq i\leq r}, where {S_i} is a Riemann surface. For instance, cut {E} open along several slits, take several copies and glue them together along slits in a suitable combinatorial pattern. Since {E} has a commutative group structure, one can take the sum of these maps, and get a map

\displaystyle  \begin{array}{rcl}  h=\sum_{i=1}^r f_i:\prod_{i=1}^r S_i \rightarrow E. \end{array}

This is a holomorphic map. A generic fiber {f^{-1}(p)} is smooth, its fundamental group is Kähler, by construction. It is the kernel of the homomorphism induced by {h} on fundamental groups. I show that if {h} is surjective on fundamental groups, the group enjoys an interesting finiteness property (which I will not define today): it is {\mathcal{F}_{r-1}} but not {\mathcal{F}_{r}}. No previous example was known.

Peng: higher dimensions? Unclear.

3. Cyril Marzouk: Scaling limits of large random combinatorial structures: Brownian objects

Motivation from quantum gravity: how to sample a random metric on the 2-sphere? Possibly easier: how to sample a random function?

Approximate answer: look at random walks on {{\mathbb Z}}. The endpoints obeys a binomial law, which converges to a Gaussian law. In fact (Donsker), the whole path converges in law to Brownian motion, a probability measure on real functions on {{\mathbb R}_+}.

Closer to random metrics: random quadrangulations, also known as planar maps. The graph distance is thought of as a metric on the sphere. The Gromov-Hausdorff distance makes such planar maps into a Polish topological space.

Le Gall and Miermont (independantly): a uniformly random quadrangulation with {n} faces, renormalized by {n^{1/4}}, converges to a random metric space in Gromov-Hausdorff distance. It is homeomorphic to the 2-sphere, but it has Hausdorff dimension 4.

Enrich the model by admitting faces with variable (even) number of edges, and specifiy the number {n_i} of faces of degree {2i}. Assume {n_i/n} converges to {p(i)}, {\sum_i in_i/n} converges to {\sum_i ip(i)}, and {\sum_i i^2 n_i/n} converges to {\sum_i i^2p(i)}. Then I show that a uniformly random such map converges to the same random metric space (slightly rescaled).

Le Gall’s method relies on Schaeffer’s encoding of quadrangulations with rooted trees carrying integers at vertices. It was known earlier (Aldous) that the uniformly random tree with {n+1} vertices, rescaled by {n^{1/2}}, converges to the Brownian tree. I merely explain where the constants come from. A finite tree can be viewed as a walk on {{\mathbb Z}}. The depth of the tree is the maximal distance reached by the walk. Since random walks are well understood, depths of random trees are known.

4. Jie Lin: Special values of {L}-functions and a conjecture of Deligne

Considering sums of inverse squares, cubes,… leads to Riemann’s zeta function. Euler showed that {\zeta(2)=\pi^2/6}. More generally, {\zeta(2m)} is commensurable to {\pi^{2m}}. To show this, one extends {\zeta} meromorphically to {{\mathbb C}}, and gets a functional equation, which relates {\zeta(2m)} to {\zeta(1-2m)}, which is a rational number (one says that {2m} is critical, since {\zeta} is holomorphic both at {2m} and {1-2m}).

For odd integers, one must compute residues, which is much harder.

A Dirichlet {L}-function is a weighted inverse power sums, where weights are characters mod {N}. The above theorem extends to {L}-functions. Example: {N}

Adèles. Euler’s product formula for {\zeta} suggests formulae involving all prime numbers, i.e. all absolute values on {{\mathbb Q}}. The ring of adeles is the (restricted) product of all {p}-adic fields (completion of {{\mathbb Q}} for the {p}-adic absolute value), and the reals. The idèles are the units of adele ring. The idea of a {L}-function extends to Hecker characters, i.e. continuous characters of {{\mathbb Q}^\times \setminus\mathbb{A}^\times}.

A Hecke character is a one-dimensional representation of {Gl_1(\mathbb{A})}, we must consider more generaly automorphic representations of {Gl_N(\mathbb{A})}. Motives are an even wider generalization introduced by Grothendieck. These are geometric objects, like elliptic curves. Analytic continuation of {L}-functions in this generality is a conjecture by Hasse-Weil. In 1979, Deligne conjectured the values of {L}-functions for critical integers.

Langlands’ program relates these 3 types of objects, motives, automorphic representations and Galois representations. This contains the Taniyama-Shimura-Weil conjecture whose solution is used in the solution of Fermat’s last problem.

With coauthors, I have translated Deligne conjecture in automorphic representation terms.

5. Dena Kazerani : Symmetry, from hyperbolic systems to Green-Naghdi models

I work in fluid mechanics of incompressible flows. When viscous forces are low, Navier-Stokes equations simplify to Euler equations. A difficulty arises since the domain occupied by the fluid evolves.

I focus today on shallow water, this is the Green-Naghdi model (1976): one assumes that the fluid is irrotational, that the ground is planar and horizontal, vertical velocity depends linearly on the vertical variable, horizontal velocity does not depend on vertical velocity. Then the unknown of the equation, defined on a fixed planar domain, are the horizontal velocity and the water height. One gets a hyperbolic Saint-Venant system with extra dispersive terms.

Hyperbolic systems often have symmetry: Lax-entropy pairs, Godunov structures. It is the case for the Saint-Venant system. The symmetry is expressed by changes of variables involving matrices

\displaystyle  \begin{array}{rcl}  \partial_t U+\partial_x F(U)=0. \end{array}

We need to generalize matrices to operators on Banach spaces. We give a general definition of Godunov structure, extend the classical results (equivalence with existence of Lax-entropy pairs). We show that Green-Naghdi’s model has this property.

We use this symmetry to establish well-posedness of the equations: global existence, asymptotic stability.

Di Nezza: the definitions make sense only for smooth solutions, is this a problem? Yes, singular solutions behave differently (shocks).

Posted in Uncategorized | Leave a comment

Notes of Gang Tian’s Orsay lecture 28-06-2017

Existence of conic Kaehler-Einstein metrics

Joint work with Feng Wang, Zhejiang university.

A log-Fano manifold is the date of a compact Kaehler manifold {M}, a divisor with normal crossings {D=\sum(1-\beta_i)D_i} such that the line bundle

\displaystyle  \begin{array}{rcl}  L=K_M^{-1}+\sum(1-\beta_i)D_i \end{array}

is positive.

A metric {\omega} is a conic Kaehler-Einstein metric if it is smooth Kaehler in {M\setminus |D|} and for every point {p\in|D|} where {D} is defined by {z_1\cdots z_d=0} in some coordinates, {\omega} is equivalent (between two multiplicative constants) to the model cone metric

\displaystyle  \begin{array}{rcl}  \omega_{cone}=i(\sum_{i\leq d} \frac{dz_i\wedge d\bar z_i}{|z_i|^{2(1-\beta_i)}}+\sum_{i>d} dz_i\wedge d\bar z_i). \end{array}

Say that {\omega} is conic Kaehler-Einstein if

\displaystyle  \begin{array}{rcl}  Ric(\omega)=\omega+2\pi\sum(1-\beta_i)[D_i]. \end{array}

1. Necessary conditions

Berman 2016: If {(M,D)} admits a conic KE metric with {[\omega]=2\pi c_1(L)}, then {(M,D)} is log-K-stable.

Log-K-stability is defined as follows.

A special degeneration {(\mathcal{X},\mathcal{D},\mathcal{L})} of {M,D,L)} is a 1-parameter family of log-pairs, consisting of

  1. A normal log-pair {(\mathcal{X},\mathcal{D})} with a {{\mathbb C}^*}-equivariant map {\pi:(\mathcal{X},\mathcal{D})\rightarrow{\mathbb C}},
  2. {\mathcal{L}} is an equivariant {\pi}-ample {{\mathbb Q}}-line bundle.
  3. {\mathcal{X}_t,\mathcal{D}_t,\mathcal{L}_t} is isomorphic to {(M,D,L)} for every {t\not=0}.

There is a natural compactification {(\overline{\mathcal{X}},\overline{\mathcal{D}},\overline{\mathcal{L}})} of {(\mathcal{X},\mathcal{D},\mathcal{L})} that maps to {{\mathbb C} P^1}. Defined number

\displaystyle  \begin{array}{rcl}  w(\mathcal{X},\mathcal{D},\mathcal{L})=\frac{n\overline{\mathcal{L}}^{n+1}+(n+1)\overline{\mathcal{L}}^n(K_{\mathcal{X}|{\mathbb C} P^1}+\overline{\mathcal{D}})}{(n+1)L^n}. \end{array}

If the central fiber is a log-Fano variety {(M_0,D_0)} embedded in {{\mathbb C} P^N} by {H^0(M_0,L^\ell_{|M_0})}, then {w(\mathcal{X},\mathcal{D},\mathcal{L})} can be interpreted as a Futaki invariant.

Say that {(M,D)} is log-K-semistable if for any special degeneration {(\mathcal{X},\mathcal{D},\mathcal{L})} has {w(\mathcal{X},\mathcal{D},\mathcal{L})\geq 0}. Say that {(M,D)} is log-K-stable if for any special degeneration {(\mathcal{X},\mathcal{D},\mathcal{L})} has {w(\mathcal{X},\mathcal{D},\mathcal{L})\geq 0} and equality holds only for the trivial degeneration {(\mathcal{\mathcal{X},\mathcal{D},\mathcal{L}})=(M,D,L)\times{\mathbb C}}.

2. The result

Theorem 1 If {(M,D)} is log-K-stable, the there exists a conic KE metric {\omega} with {[\omega]=2\pi c_1(L)}.

Many special cases were known, as consequences of existence of KE metrics on smooth closed manifolds. For instance when {D} is a multiple of {K_M^{-1}}.

3. Motivation

We are interested in {{\mathbb Q}}-Fano varieties {X}. Assume {X} admits a resolution {\mu:M\rightarrow X} such that {K_M=\mu^* K_X+\sum a_i E_i}, {a_i\in(-1,0]}. For small enough {\epsilon\in{\mathbb Q}}, define

\displaystyle  \begin{array}{rcl}  L_\epsilon=\pi^*K_X-\sum \epsilon E_i. \end{array}

If there exists a KE metric {\omega} on {X}, then {\mu^*\omega} is a degenerate conic KE metric on {M} with conic angles {2\pi b_i} along {E_i}. We expect that there exist conic KE metrics {\omega_\epsilon} on {(M,\sum (1-b_i-\epsilon)E_i)} with {[\omega_\epsilon]=2\pi c_1(L_\epsilon)}, which Gromov-Hausdorff converge to {(X,\omega)} as {\epsilon\rightarrow 0}.

We think that we are now able to prove the following. If {X} is a K-stable {{\mathbb Q}}-Fano variety. Then it admits a generalized KE metric in the above sense.

4. Proof

Many steps are similar to the smooth case. Pick a large integer {\lambda} such that {\lambda L} has a smooth divisor {E}. We use a continuity method, solving

\displaystyle  \begin{array}{rcl}  Ric(\omega_t)=t\omega_t+\frac{1-t}{\lambda}[E]+\sum(1-\beta_i)[D_i], \end{array}

{t\in[0,1]}. The set {I} of {t} such that a solution exists is easily shown to be non-empty (it contains 0) and open. Is it closed? The key point is a {C^0} estimate. It follows from a “partial {C^0}-estimate” and log-K-stability. In turn, this follows from an {L^2}-estimate and compactness a la Cheeger-Colding-Tian.

4.1. Smoothing conical KE metrics

Say that {\omega} has a K-approximation if there exist Kaehler metrics {\omega_i=\omega+i\partial\bar\partial\phi_i} in the same cohomology class such that

  • {\phi_i\rightarrow 0} uniformly on {M} and smoothly outside {|D|},
  • {Ric(\omega_i)\geq K\omega_i},
  • {(M,\omega_i)\rightarrow (M,\omega)} in Gromov-Hausdorff topology.

We show that if {Aut^0(M,D)=\{1\}} and if for all {i},

\displaystyle  \begin{array}{rcl}  (1-K_i)L+(1-\beta_i)D_i\geq 0 \end{array}

for some {K_i\leq 1}, then {\omega} has a K-approximation where {K=\sum(K_i-1)+1}.

We solve a modified equation with an extra term involving {K_i}‘s. For this, we use the variational approach by Boucksom-Eyssidieux-Guedj-Zeriahi and results of Darwan-Robinstein, Guenancia-Paun.

4.2. Extend B. Wang-Tian’s results to conic case

5. Work in progress

To handle {{\mathbb Q}}-Fano varieties, we need to extend Cheeger-Colding to conic cases.

Posted in Workshop lecture | Leave a comment