Notes of Sebastien Bubeck’s course 13-03-2018

Bandit regret optimization

Blackboard course. Slides available on the PGMO website. Video available in a few weeks.

1. Introduction to regret analysis

1.1. Framework

At each time step {t}, the player selects an action {i_t\in[n]} at random, according to a distribution {p_t\in\Delta_n} which is independent of the past. Simultaneously, the adversary selects a loss {l_t:[n]\rightarrow[0,1]}. The loss suffered by the player is {l_t(i_t)}. Full information would mean that {l_t\in {\mathbb R}^n} is observed. Bandit means that only {l_t(i_t)} is observed.

The benchmark, to which other strategies will be compared, is

\displaystyle \begin{array}{rcl} i^*_t=\mathrm{arg}\min_{i\in[n]} \mathop{\mathbb E}(\sum_{i=1}^n l_t(i)). \end{array}

My loss {L_T} must be compared with the optimal loss {L^*_T},

\displaystyle \begin{array}{rcl} L_T=\mathop{\mathbb E}(\sum_{t=1}^T l_t(i_t)),\quad L^*_T=\mathop{\mathbb E}(\sum_{t=1}^T l_t(i^*_t)). \end{array}

The regret is the difference {R_T=L_T-L^*_T}.

The machine learning mindset is that a single picture does not help much, but a large dataset of actions allows for a good strategy. So {[n]} is a large set, eventually with some structure.

Think of an oblivious adversary, like nature. Adversary merely means that we cannot make any stochastic assumption about it. For instance {i^*} does not represent what a clever adversary would have done when player applies a constant strategy.

Interpret {R_T/T} as the rate at which {i^*} is identified. How large should one take {T} in order that {R_T/T<\epsilon}?

If there is a good action, we intend to find it. If there is none,…

1.2. Some applications

Add placement. The player runs a website. Action = choose an add. Adversary is a potential customer, who clicks on the add or not. Best

Packet routing. Graph with source and target nodes. Action = choose a path. Huge set of actions, but with a lot of structure (distance between paths).

Hyperparameter optimization.

AI for games, i.e. efficient tree search. In AlphaGo, a determinist game, randomness was introduced to improve exploration of the set of actions.

1.3. Overview of results

\textbf{Full information with finite actions} There,

  1. {R_T\geq\Omega(\sqrt{T\log n})} for pure noise (case when the past gives no information).
  2. There exists an algorithm which achieves {R_T\leq O(\sqrt{T\log n})}.

I will give two proofs, an algorithmic one, and a more conceptual one where {\log n} appears as an entropy.

\textbf{Easy data} I.e. when there is more signal in the data than pure noise. Then {R_T\leq O(\sqrt{L^*_T\log n})}.

\textbf{Bandit with finite action} Then {R_T=\Theta(\sqrt{Tn})}. Bad news: in absence of structure, strategy is slow.

\textbf{Large and structured action set} This uses deeper mathematics (Bernoulli convolution, geometry of log-concave measures).

\textbf{Better benchmarks}

2. Full information

2.1. An algorithm: multiplicative weights

Rediscovered many times. See Vovk 1990, Freund-Shapira 1996, Littlestone-Warmuth 1994.

Assume oblivious adversary, and {l_{i,t}:=l_t(i)\in\{ 0,1 \}}.

Idea: keep a weight {w_{i,t}\in{\mathbb R}_+} for each action {i}. Update

\displaystyle \begin{array}{rcl} w_{i,t+1}=(1-\eta l_{i,t})w_{i,t}. \end{array}

Play from {p_t=\frac{w_t}{\|w_t\|_1}}.

Key intuition: Say {i^*} is correct at round {t}, then {w_{i^*,t}} remains and others shrink, so {p_{t+1}} gets closer to {\delta_{i^*}}. Otherwise, {i^*} pays for it a fixed factor {1-\eta}.

Theorem 1

\displaystyle \begin{array}{rcl} L_T< (1+\eta)L^*_T+\frac{\log\eta}{\eta}. \end{array}

Optimizing in {\eta} yields

\displaystyle \begin{array}{rcl} R_T\leq 2\sqrt{L^*_T \log n}. \end{array}

Proof Liapunov function based argument. Consider {u_t=\|w_t\|_1}. Then

\displaystyle \begin{array}{rcl} u_{t+1}&=&\sum_i (1-\eta l_{i,t})w_{i,t}\\ &=&u_t(1-\eta\langle p_t,l_t\rangle)\\ &\leq&u_t e^{-\eta\langle p_t,l_t\rangle}, \end{array}


\displaystyle \begin{array}{rcl} u_t\leq u_0 e^{-\eta L_t}, \end{array}

since {L_t=\mathop{\mathbb E}(\sum_t\langle p_t,l_t\rangle)}. Note that {u_0\leq n}. On the other hand,

\displaystyle \begin{array}{rcl} w_{i,T+1}=(1-\eta)^{L_{i,T}}, \end{array}


\displaystyle \begin{array}{rcl} (1-\eta)^{L^*_T}\leq n e^{-\eta L_T}, \end{array}

\displaystyle \begin{array}{rcl} L_T\leq (1+\eta)L^*_T +\frac{\log\eta}{\eta}, \end{array}

as announced. End of proof.

2.2. Lower bound (pure noise scenario)

Theorem 2 For every algorithm, there exists numbers {l_1,\ldots,l_T\in[-T,T]^n} such that

\displaystyle \begin{array}{rcl} R_T\geq \sqrt{2T\log n}(1+o(1)). \end{array}

Proof. Probabilistic method. A lower bound on expected regret (with random losses) is given.

Pick {l_{i,t}} at random from a Rademacher distribution ({\pm 1} uniformaly independently). Then {\mathop{\mathbb E}(L_T)=0},

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(L^*_T)=\mathop{\mathbb E}\max_{i\in[n]}\sum_t l_{i,t}\sim \sqrt{2T\log n} \end{array}

according to the central limit theorem. Indeed, the max of {T} independent gaussians has size {\sqrt{T}}. End of proof.

2.3. A more principled approach

Based on Abernethy-Warmuth 2008, Bubeck-Dehel-Koren-Peres 2015.

Say players picks {a_t\in\mathcal{A}} and adversary picks {l_1,\ldots,l_T\in\mathcal{L}^T}.

A deterministic strategy is given by a map {\underline{a_t}:\mathcal{L}^{t-1}\rightarrow\mathcal{A}}.

A bandit is a map {\underline{a_t}:[0,1]^{t-1}\rightarrow\mathcal{A}}.

A randomized strategy is picking at random a deterministic strategy {\Delta(\underline{\mathcal{A}})}, where {\underline{\mathcal{A}}} is the set of deterministic strategies. Kuhn’s theorem

The minimax regret consists in finding the best strategy, achieving

\displaystyle \begin{array}{rcl} \inf_{\mu\in \Delta(\underline{\mathcal{A}})}\sup_{\underline{l}\in\mathcal{L}^T}\mathop{\mathbb E}_{\underline{a}\sim\mu}(R_T(\underline{a},\underline{l})) =\inf_\mu \sup_{\nu\in\Delta(\mathcal{L}^T)}\mathop{\mathbb E}_{\mu,\nu}(R_T(\underline{a},\underline{l})). \end{array}

Sion’s minimax theorem allows to swap inf and sup: let {\phi:X\times Y\rightarrow{\mathbb R}} be convex in {x} and concave in {y}, bicontinuous, {X} compact. Then

\displaystyle \begin{array}{rcl} \inf_X \sup_Y \phi(x,y)=\sup_Y\inf_X \phi(x,y). \end{array}

Now we are in Bayesian setting, with {\nu} a prior distribution on {(l_1,\ldots,l_T)}. In particular, we know the distribution of {i^*}.

Denote by {\mathop{\mathbb E}_t:=\mathop{\mathbb E}[\cdot|l_1,\ldots,l_{t-1}]}. By convexity of the simplex, we choose to play {p_t:=\mathop{\mathbb E}_t \delta_{i^*}}, a Doob martingale. In other words, {p_t(i)=\mathop{\mathbb P}(i^*=i|l_1,\ldots,l_{t-1})}. Let us analyze it.

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(\sum_t \langle p_t-\delta_{i^*},l_t \rangle) &=&\mathop{\mathbb E}(\sum_t \langle p_t-p_{t+1},l_t \rangle)\\ &\leq&\mathop{\mathbb E}(\sum_t\|p_t-p_{t+1}\|_1). \end{array}

Each move of a martingale costs entropy. This is encoded in the following folklore inequality,

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(\sum_t\|p_t-p_{t+1}\|_1^2)\leq 2\log n. \end{array}

Assuming this bound, use Cauchy-Schwarz,

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(\sum_t\|p_t-p_{t+1}\|_1)\leq \sqrt{T\mathop{\mathbb E}(\sum_t\|p_t-p_{t+1}\|_1^2)}\leq \sqrt{2T\log n}, \end{array}

which is the optimal bound. End of proof.

The bandit case will be more subtle, since one will not be allowed to condition with respect to {l_1,\ldots,l_T}.

2.4. Proof of the folklore inequality

Recall the entropy of a finite distribution {H(p)=\sum_i -p_i \log p_i}. The relative entropy is

\displaystyle \begin{array}{rcl} Re(p,q)=-\sum_i p_i \log q_i -H(p). \end{array}

Given finite random variables {X} and {Y}, the entropy of {X} knowing {Y} is

\displaystyle \begin{array}{rcl} H(X|Y)=\sum_y p_y(y)H(p_{x|y}). \end{array}

The mutual information means how useful is {Y} when predicting {X},

\displaystyle \begin{array}{rcl} I(X,Y)=H(X)-H(X|Y)=\sum_y p_y(y) Re(p_{x|y},p_x) \end{array}

It is symmetric. Pinsker ‘s inequality states that

\displaystyle \begin{array}{rcl} \frac{1}{2}\|p-q\|_1\leq \sqrt{Re(p,q)}. \end{array}

Proof of the folklore inequality. Apply Pinsker’s inequality,

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(\frac{1}{2}\sum_t \|p_t-p_{t+1}\|_1^2)\leq \mathop{\mathbb E}(\sum_t Re(p_{t+1},p_t)). \end{array}


\displaystyle \begin{array}{rcl} \mathop{\mathbb E}_t(Re(p_{t+1},p_t))&=&\mathop{\mathbb E}_t(Re(p^t_{i^*|l_t},p_t))\\ &=&I_t(i^*,l_t)\\ &=&H_t(i^*)-H_t(i^*|l_t)\\ &=&H_t(i^*)-\mathop{\mathbb E}_t(H_{t+1}(t^*)). \end{array}


\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(\frac{1}{2}\sum_t \|p_t-p_{t+1}\|_1^2)\leq H_1(i^*). \end{array}

3. Bandit analysis

Based on Russo-Van Roy 2014.

Back to the instantaneous regret {r_t=\mathop{\mathbb E}_t\langle p_t-\delta_{i^*},l_t\rangle}. We used

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}(\sum_t r_t)=\mathop{\mathbb E}(\sum_t\langle p_t-p_{t+1},l_t\rangle. \end{array}

This is not true any more in the bandit setting, where the filtration is much smaller. Nevertheless, we hope that a weaker inequality would suffice,

\displaystyle \begin{array}{rcl} r_t\leq \sqrt{I_t(i^*,(i_t,l_t(i_t)))}. \end{array}

The next argument goes back to the very first paper on bandit theory, Thompson Sampling 1933. Bandit is a trade off between exploration and exploitation. For future benefit, it is necessary to explore more.

Denote {\bar l_t(i)=\mathop{\mathbb E}_t(l_t(i))}, {\bar l_t(i,j)=\mathop{\mathbb E}_t(l_t(i)|i^*=j)}. Then

\displaystyle \begin{array}{rcl} r_t=\mathop{\mathbb E}_t\langle p_t-\delta_{i^*},l_t\rangle &=&\sum_i p_t(i)(\bar l_t(i)-\bar l_t(i,i)). \end{array}

On the other hand,

\displaystyle \begin{array}{rcl} I_t(i^*,(i_t,l_t(i_t))) &=&H_t(i_t,l_t(i_t))-H_t((i_t,l_t(i_t))|i^*)\\ &\geq& H_t(l_t(i_t)|i_t)-H_t(l_t(i_t)|i^*,i_t)\\ &=&\sum_i p_t(i) I_t(l_t(i)|i^*)\\ &=&\sum_i p_t(i) Re(l_t(i)|i^*,l_t(i))\\ &=&\sum_{i,j} p_t(i)p_t(j) Re(l_t(i)|i^*=j,l_t(i))\\ &\geq&\sum_{i,j} p_t(i)p_t(j) (\bar l_t(i)-\bar l_t(i,j))^2, \end{array}

where the last step is Pinsker’s inequality, plus the fact that {l_t(i)\in[-1,1]}. Cauchy-Schwarz inequality gives

\displaystyle \begin{array}{rcl} r_t\leq \sqrt{n I_t(i^*,(i_t,l_t(i_t)))}=\sqrt{n}\sqrt{H_t(i^*)-H_{t+1}(i^*)}. \end{array}

Plugging this in the argument of previous section, one gets

Theorem 3 In the bandit setting,

\displaystyle \begin{array}{rcl} \inf \sup R_T\leq 2\sqrt{Tn\log n}. \end{array}

The {\log n} term can be removed.

On the lower bound side, the adversary is picked at random: one is {Ber(\frac{1}{2}-\epsilon)}, the others being {Ber(\frac{1}{2})}. One needs {4/\epsilon^2} samples per arm, whence {\frac{1}{\epsilon^2}=\frac{T}{n}}, regret is {\sim \epsilon T\sim \sqrt{nT}}.

4. A primer on bandit linear optimization

Now {\mathcal{A}\subset {\mathbb R}^n}, {l_t\in{\mathbb R}^n}, and the loss of playing {a_{i_t}} is the inner product {a_{i_t}\cdot l_t}. Naively, one would assume that regret is {\sim\sqrt{T|\mathcal{A}|}}.

Theorem 4

\displaystyle \begin{array}{rcl} \inf \sup\mathop{\mathbb E} R_T \leq \sqrt{Tn\log|\mathcal{A}|}. \end{array}

This is best possible.

If {\mathcal{A}} is infinite but included in the unit ball of {{\mathbb R}^n} for some norm. Then

Theorem 5

\displaystyle \begin{array}{rcl} \inf \sup\mathop{\mathbb E} R_T =\Theta(T^{1-1/q}), \end{array}

where {q} is the cotype of the chosen norm on {{\mathbb R}^n}.

4.1. Proof of theorem

The last application of Cauchy-Schwarz needs be replaced by something smarter. Here

\displaystyle \begin{array}{rcl} \bar l_t(i)=\mathop{\mathbb E}_t(l_t(i))=\bar l_t\cdot a_i,\quad \bar l_t(i,j)=\bar l_t^j\cdot a_i. \end{array}

Introduce the (huge) {|\mathcal{A}|\times |\mathcal{A}|} matrix

\displaystyle \begin{array}{rcl} M_{i,j}=\sqrt{p_t(i)p_t(j)}(\bar l_t-\bar l_t^j)\cdot a_i . \end{array}

One needs relate the trace of this matrix to its Frobenius norm. In general,

\displaystyle \begin{array}{rcl} trace(M)=\sum \lambda_i(M)\leq\sqrt{rank(M)(\sum_i \lambda_i^2(M)}=\sqrt{rank(M)}\|M\|_F. \end{array}

Thus we win by replacing the size {|\mathcal{A}|} with the rank of matrix {M}. Here {M=AB} where {A} has size {n}, hence {M} has rank {\leq n}.

5. Mirror descent and online decision making

We focus on online linear optimization.

Data: a compact body {K\subset{\mathbb R}^n}. At each time step {t\in[T]}, the player picks {x_t\in K} and the adversary picks {l_t\in{\mathbb R}^n} (previously, {K} was the simplex) such that {|l_t\cdot x|\leq 1} for all {x\in K}. My regret is

\displaystyle \begin{array}{rcl} R_T=\max_{x\in K} \sum_t l_t\cdot(x_t -x). \end{array}

5.1. Stability as a algorithm design principle

We estimate

\displaystyle \begin{array}{rcl} R_T\leq \sum_t\|x_t-x_{t+1}\|+\sum_t l_t\cdot(x_{t+1}-x) \end{array}

The first term is a movement cost. Call the second term the one look ahead regret. The algorithm will find a trade-off between the two terms.

5.2. Gradient descent

This is {x_{t+1}=x_t-\eta l_t}. In a more invariant manner,

\displaystyle \begin{array}{rcl} x_{t+1}=\mathrm{arg}\min_{x\in{\mathbb R}^n} l_t\cdot x+\frac{1}{2\eta}\|x_t -x\|_2^2. \end{array}

Note the {\ell^2} norm is not well suited to our problem, which involves an {\ell^2} norm. Assuming that {x_1=0}, a third interpretation, in the spirit of regularization, is

\displaystyle \begin{array}{rcl} x_{t+1}=\mathrm{arg}\min_{x\in{\mathbb R}^n} \sum_{s=1}^t l_s\cdot x+\frac{1}{2\eta}\|x\|_2^2. \end{array}

We try to avoid overfit, i.e. control entropy. Hence we replace {\ell^2} norm with entropy. Also, instead of inner product, duality should be used.

5.3. Mirror descent

Initially due to Nemirovskii, see book, by Nemirovskii and Yudin, 1983.

Le {\phi:\mathcal{D}\supset K\rightarrow{\mathbb R}} be a convex function. Write

\displaystyle \begin{array}{rcl} D_\phi(x,y)=\phi(x)-\phi(y)-\nabla\phi(y)\cdot(x-y). \end{array}

Introduce the Fenchel dual function on the dual space,

\displaystyle \begin{array}{rcl} \phi^*(\theta)=\sup_{x\in{\mathbb R}^n}\theta(x)-\phi(x). \end{array}

Under mild assumptions, {\phi^{**}=\phi}.

Given {x_t\in K}, {\nabla\phi(x_t)} belongs to the dual space. Move it in the direction of {l_t}, get {\nabla\phi(x_t)-\eta l_t}. Get back to the primal space by applying {\nabla\phi^*}, get

\displaystyle \begin{array}{rcl} z_{t+1}:=\nabla\phi^*(\nabla\phi(x_t)-\eta l_t). \end{array}

Maybe this does not belong to {K}. Thus project it to {K} (in Euclidean distance). In other words,

\displaystyle \begin{array}{rcl} x_{t+1}=\mathrm{arg}\min_{x\in K}D_\phi(x,\nabla\phi^*(\nabla\phi(x_t)-\eta l_t). \end{array}

5.4. Analysis of the one look ahead term

For intuition, I explain the continuous setting. Then {t\in {\mathbb R}}, {x_t} is defined using {l_s} for all {s\in[0,t]}. Hence the one look ahead term becomes

\displaystyle \begin{array}{rcl} \int_0^T l_t \cdot(x(t)-t)\,dt \end{array}

whereas the movement cost becomes {\int_0^T \|x'(t)\|\,dt}. The continuous time mirror descent is

\displaystyle \begin{array}{rcl} x_{t+\epsilon}=\mathrm{arg}\min_{x\in K} D_\phi(x,\nabla\phi^*(\nabla\phi(x(t))-\epsilon\eta l(t)). \end{array}

The constrained optimality condition at a point {x\in\partial K} involves covectors {\theta} in the normal cone

\displaystyle \begin{array}{rcl} N_K(x)=\{\theta\,;\,\forall y\in K,\,\theta\cdot (y-x)\leq 0\}. \end{array}

It states that {-\nabla f(x^*)\in N_K(x^*)}. The gradient of {x\mapsto D_\phi(x,\nabla\phi^*(\nabla\phi(x(t))-\epsilon\eta l(t))} is {\nabla\phi(x)-\nabla\phi(x(t))+\epsilon\eta l(t)}. Hence the minimizer {x_{t+\epsilon}} satisfies

\displaystyle \begin{array}{rcl} \nabla\phi(x_{t+\epsilon})-\nabla\phi(x(t))+\epsilon\eta l(t)\in -N_K(x_{t+\epsilon}), \end{array}

hence the differential inclusion

\displaystyle \begin{array}{rcl} \nabla^2 \phi(x(t))x'(t)+\eta l(t)\in N_K(x(t)). \end{array}

Theorem 6 (Bubeck-Cohen-Lee…2017) If {l} is continuous, {\phi} is semicontinuous and {\nabla^2\phi} is continuous, the differential inclusion has a unique solution.

As a Liapunov function, we use

\displaystyle \begin{array}{rcl} D_\phi(y,x(t))=\phi(y)-\phi(x(t))-\nabla\phi(x(t))\cdot(y-x(t)). \end{array}

Let us compute the time derivative

\displaystyle \begin{array}{rcl} \partial_t D_\phi(y,x(t))&=&-\nabla\phi(x(t))\cdot x'(t)-\nabla\phi(x(t))\cdot(-x'(t))-\nabla^2\phi(x(t))x'(t)\cdot(y-x(t))\\ &=&(\eta l(t)+\lambda(t))\cdot(y-x(t))\\ &\leq&\eta l(t)\cdot(y-x(t)) \end{array}

since the Lagrange multiplier {\lambda(t)} is nonpositive on the normal cone. Integrating over time, we get

Lemma 7

\displaystyle \begin{array}{rcl} \int l(t)\cdot(x(t)-y)\,dt\leq \frac{1}{\eta}D_\phi(y,x(t)). \end{array}

Recall the multiplicative weights bound

\displaystyle \begin{array}{rcl} R_T\leq \frac{\log\eta}{\eta}+\eta L^*_T. \end{array}

We have obtained the estimate corresponding to {\frac{\log\eta}{\eta}}. Next, me move to the other term.

5.5. Movement analysis

Nothing general, here {K=\Delta_n} is the simplex equipped with the {\ell^1} norm. We hope for an estimate

\displaystyle \begin{array}{rcl} \|x't)\|_1\leq\eta l(t)\cdot y. \end{array}

A stronger estimate would be

\displaystyle \begin{array}{rcl} \|x't)\|_1\leq\eta l(t)\cdot x(t). \end{array}

Here, {x'(t)=\nabla^2\phi(x(t))^{-1}(\eta l(t)+\lambda(t))}. We may choose a suitable convex function {\phi}. To simplify matters, let us ignore {\lambda(t)} and assume {\phi(x)=\sum_i\phi(x_i)}. Then

\displaystyle \begin{array}{rcl} \|\nabla^2\phi(x(t))^{-1}(l(t))\|_1=\sum_i\frac{l_i(t)}{\phi''(x_i)} \end{array}

which is indeed estimated by {l(t)\cdot x(t)} provided we choose {\phi(x)=x\log x}. With this choice, the dynamics becomes

\displaystyle \begin{array}{rcl} x'_i(t)=-x_i(t)(\eta l_i(t)+\mu(t), \end{array}

where {\mu} is such that {\sum_i x'_i(t)=0}. This proves the wanted estimate.

5.6. Discrete time analysis

We expect an extra additive error of {\partial_t^2 D_\phi(y,x(t))} which, up to a term in {\lambda(t)}, equals

\displaystyle \begin{array}{rcl} \partial_t \eta l(t)\cdot(y-x(t))&=&-\eta l(t)\cdot x'(t)\\ &=&\eta^2 l(t)\cdot \nabla^2\phi(x(t))^{-1}l(t)\\ &=&\eta^2\|l(t)\|^2_{x(t),*}. \end{array}

In the last expression, the local norm is defined as follows. The square root of the Hessian of {\phi} defines a Riemannian metric on {K} (which blows up near the boundary).

Theorem 8 Assume that {\phi} is well-conditionned in the sense that if {\nabla\phi(y_t)\in[\nabla\phi(x_t)-\eta l_t,\nabla\phi(x_t)]}, then

\displaystyle \begin{array}{rcl} \nabla^2\phi(y_t)\geq c\nabla^2\phi(x(t)). \end{array}

Then the regret is bounded above by

\displaystyle \begin{array}{rcl} \sum_t l_t\cdot(x_t -y)\leq\frac{1}{\eta}D_\phi(y,x_1)+\frac{2\eta}{c}\sum_t \|l_t\|^2_{x_t,*}. \end{array}

Proof. Recall that {\nabla\phi(z_{t+1})=\nabla\phi(x_t)-\eta l_t} and

\displaystyle \begin{array}{rcl} x_{t+1}=\mathrm{arg}\min_{x\in K}D_\phi(x,z_{t+1}). \end{array}

We must estimate

\displaystyle \begin{array}{rcl} \eta l_t\cdot(x_t-y)&=&\nabla\phi(x_t)\cdot(x_t-y)-\nabla\phi(z_{t+1})\cdot (x_t-x_{t+1})-\nabla\phi(z_{t+1})\cdot(x_{t+1}-y). \end{array}

The last term is {\geq -\nabla\phi(x_{t+1})\cdot(x_{t+1}-y)} by optimality. On the other hand,

\displaystyle \begin{array}{rcl} D_\phi(y,x_{t})-D_\phi(y,x_{t+1}) &=&\phi(x_t)+\phi(x_{t+1})-\nabla\phi(x_t)\cdot(y-x(t))+\nabla\phi(x_{t+1})\cdot(y-x_{t+1}). \end{array}

Hence, using convexity of {\phi},

\displaystyle \begin{array}{rcl} \eta l_t\cdot(x_t-y)&\leq&D_\phi(y,x_t)-D_\phi(y,x_{t+1})+D_\phi(x_t,z_{t+1}). \end{array}

By the magic of Fenchel transform,

\displaystyle \begin{array}{rcl} D_\phi(x_t,z_{t+1})=D_{\phi^*}(\nabla\phi(z_{t+1}),\nabla\phi(x_t)). \end{array}

In turn, this last term is of the order of {\|\eta l_t\|^2_{x_t,*}}. Summing up yields the Theorem.

5.7. Revisiting multiplicative weights with mirror descent

Take {\phi(x)=\sum x_i\log x_i}, whose Hessian is the diagonal quadratic form with eigenvalues {1/x_i}. The Riemannian metric is {\|h\|^2_{x,*}=\sum_i x_i h_i^2}. Then

\displaystyle \begin{array}{rcl} \nabla\phi(x_t)-\eta l_t=\nabla\phi(z_{t+1})\iff z_{i,t+1}=x_{i,t}e^{-\eta l_{i,t}}. \end{array}


\displaystyle \begin{array}{rcl} x_{i,t+1}=\frac{x_{i,t}e^{-\eta l_{i,t}}}{\|x_{i,t}e^{-\eta l_{i,t}}\|_1}. \end{array}

Corollary 9 If {l_{i,t}\in[0,1]}, then well-conditionning holds with {c=1}.

The Theorem yields

\displaystyle \begin{array}{rcl} \sum_t l_t\cdot(x_t-y)\leq \frac{1}{\eta}D_\phi(y,x_1) \end{array}

5.8. Propensity score estimate for bandits

Again, we do not observe the full loss function, and need an unbiaised estimator for it. We look for {\hat l_t\in{\mathbb R}^n} depending on {l_t(i_t)} where {i_t} is pick according to {p_t}. Start with putting to {0} everything which has not been observed,

\displaystyle \begin{array}{rcl} \hat l_t(i)=\frac{l_t(i)1_{\{i=i_t\}}}{p_t(i)}. \end{array}

Then run mirror descent with {\hat l_t} instead of {l_t}, get the Exp3 algorithm of Auer-Cesa-Bianchi-Freund-Schapiro 1995).

Theorem 10 Exp3 satisfies

\displaystyle \begin{array}{rcl} R_T \leq \frac{\log(\eta)}{\eta}+\eta nT\leq \sqrt{Tn\log n}. \end{array}


\displaystyle \begin{array}{rcl} \mathop{\mathbb E}\langle p_t,\hat l_t^2 \rangle &\leq&\mathop{\mathbb E}\sum_i p_t(i)\frac{1_{\{i=i_t\}}}{p_t(i)^2}\\ &=&\mathop{\mathbb E}\sum_i\frac{1_{\{i=i_t\}}}{p_t(i)}=n. \end{array}

Note that the variance is finite only beause local norms are used.

5.9. More refined regularization

Change function {\phi} to make variance smaller. Negentropy gives {\nabla^2 \phi(x)^{-1}=diag(x_i)}. Instead, {\phi(x)=\sum x_i^{1-\epsilon}} gives {\nabla^2 \phi(x)^{-1}=diag(x_i^{1=\epsilon})}. Then

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}\sum_i p_t(i)^{1+\epsilon}\frac{1_{\{i=i_t\}}}{p_t(i)^2} =\sum_i p_t(i)^\epsilon\leq n^{1-\epsilon}, \end{array}

which is slightly better. On the other hand,

\displaystyle \begin{array}{rcl} -\phi(x)=\sum_i x_i^{1-\epsilon}\leq n^\epsilon. \end{array}

Plugging this in the Theorem gives regret bound {R_T\leq\sqrt{nT}} (Audibert-Bubeck 2009).

In view of easy data bounds, pick {\phi(x)=-\sum_i \log x_i} (called the log barrier). Then {\nabla^2 \phi(x)^{-1}=diag(x_i^2)}. The local norms cancel, yielding

\displaystyle \begin{array}{rcl} \mathop{\mathbb E}\|\hat l_t\|^2_{p_t,*}=\langle p_t,l_t^2 \rangle \leq \langle p_t,l_t \rangle. \end{array}

Plugged in the Theorem, this gives regret bound {R_T\leq\sqrt{nL_T^* \log T}}.

5.10. Metrical task systems

Based on Borodin-Linial-Saks 1982. One moves among states, with a task attached to each state. At each time step, observe {l_t:[n]\rightarrow {\mathbb R}_+}. Update state {i_t\in[n]} to {i_{t+1}}. Pay movement plus task, {d(i_t,i_{t+1})+l_t(i_{t+1})}.

It is hopeless to try to do as well as the best guy. The ibjective is to do it up to a multiplicative constant. So he regret is replaced with a competitive ratio

\displaystyle \begin{array}{rcl} \frac{\sum_t d(i_t,i_{t+1})+l_t(i_{t+1})}{\sum_t d(i^*_t,i^*_{t+1})+l_t(i^*_{t+1})}, \end{array}

where {i^*_1,\ldots,i^*_T} are the optimizers.

If {i_t} is picked according to distribution {p_t}, then {\mathop{\mathbb E} d(i_t,i_{t+1})=W_1(p_t,p_{t+1})} is Wasserstein distance, and {\mathop{\mathbb E}(l_t(i_{t+1}))=\langle p_{t+1},l_t \rangle} is a one look ahead term.

Example. If all distances are equal to 1, {W_1} is the {\ell^1} distance on distributions.

Our convex function {\phi} should be {L}-Lipschitz with respect to the Wasserstein distance.

Theorem 11 (Bubeck-Cohen-Lee… 2018) The competitive ratio of mirror descent is bounded above by {L} times the terme arising from mouvement cost analysis.

Theorem 12 (Borodin-Linial-Saks 1982) For the trivial distance, the competitive ratio is at least {\log n}.

Conjecture. This is true for any metric.

My intuition is that the metric can only help us. Furthermore, there is an avatar of entropy, called multiscale entropy, associated with any metric. Usual entropy is well suited only to the trivial distance.

Posted in Course | Leave a comment

Notes of Journee francilienne d’accueil des post-doctorants en mathematiques

Journee francilienne d’accueil des post-doctorants en mathematiques

1. Eleonora di Nezza: Special metrics in Kähler geometry

I work in Kähler geometry. It has to do with complex manifolds.

Let us start with the uniformization theorem. It implies that compact Riemann surfaces (complex 1-manifolds) admits metrics of constant curvature, the sign of which is determined by the sign of Euler characteristic. This is tremendously useful in the study of the space of Riemann surfaces.

In higher dimensions, do complex manifolds admit special metrics? Kähler metrics are Hermitian metrics which satisfy an extra first order intergability condition: they osculate a flat metric one order more that a general hermitian metric. This is expressed via the imaginary part of the Hermitian metric, which is a differential 2-form {\omega}: {\omega} has to be closed. Alternatively, two connections are associated to a Hermitian metrics, the Riemannian Levi-Civita connection, and the Chern connection. The metric is Kähler iff these two connections coincide.

Nevertheless, a given complex manifold admits many Kähler metrics. To single a preferred one, we require that its Ricci curvature be constant. We call them Kähler-Einstein metrics, since constancy of Ricci curvature is the content of Einstein’s equations for vacuum with a cosmological consant.

Not every complex manifold admits KE metrics. Indeed, Ricci curvature, viewed a differential 2-form, is a representative of the first Chern class of the (complex) tangent bundle. In the KE case, Ricci curvature is proportional to the Kähler form. For a complex manifold, the fact that the first Chern class can be represented by a closed {(1,1)}-form which is positive definite (resp. zero, resp. negative definite) is rather restrictive. Let us assume that this condition is satisfied. Then Calabi showed that KE equations are equivalent to a scalar second order nonlinear PDE of complex Monge-Ampère type. The problem splits into 3 different cases. The negative and zero cases were solved in the late 1970’s. The positive case is very recent (Chen-Donaldson-Sun, Tian), after decades of co siderable work by a large community.

The present trend is to extend the theory to singular varieties. Indeed, the classification program (“Minimal Model Program”) requires to handle singular varieties. The unknown in the complex Monge-Ampère equation is then a real function defined on the complement of a fixed divisor (i.e. complex mildly singular hypersurface).

Guedj-Zeriahi 2007: existence and uniqueness (up to an additive constant) of a weak solution. With Lu (then in Orsay), I proved in 2014 that their solution is smooth in the complement of the divisor. This is analysis, dealing with a degenerate equation. The french school (Boucksom, Essidieux, Guedj, Zeriahi) has introduced a pluripotential theory which is very efficient for this problem.

2. Claudio Llosa Isenrich : Kähler groups from branched covers of elliptic curves

A complex manifold is a real manifold with a preferred atlas ro open subsets of {{\mathbb C}^n} where changes of charts are holomorphic diffeomorphisms. A Kähler manifold is a complex mnifol with a Hermitian metric whose imaginary part, a differential 2-form, is closed.

The most important examples are smooth complex submanifolds of complex projective space (the space of lines in {{\mathbb C}^{n+1}}). They are defined as zero sets of homogeneous polynomials in {n+1} variables.

The fundamental group of a topological space is the set of homotopy classes of based loops. Loops can be concatenated, which gives rise to a group structure. This is the most ancient, and still the most important, algebraic invariant of topology.

Say a group is Kähler if it is isomorphic to the fundamental group of a compact Kähler manifold. In the 1950’s, Serre raised the question of which groups are Ka\”hler. Notes that every (finitely presented) groups is isomorphic to the fundamental group of a compact complex manifold.

Here are examples. Even rank free abelian groups {{\mathbb Z}^{2n}} are Kähler. Surface groups (fundamental groups of compact surfaces) are Kähler. Serre showed that all finite groups are Kähler. The class of Kähler groups is able under taking direct products and finite index subgroups.

Here are restrictions. The first Betti number of a Kähler is even (this rules out even rank free abelian groups, free groups).

My contribution is a construction of new examples of Kähler groups, insped by Dimca-Papadima-Suciu 2009. Let {E={\mathbb C}/{\mathbb Z}^2} be a complex torus (also called a complex elliptic curve). For {r\geq 3}, consider branched covers of {E}, {f_i:S_i\rightarrow E}, {1\leq i\leq r}, where {S_i} is a Riemann surface. For instance, cut {E} open along several slits, take several copies and glue them together along slits in a suitable combinatorial pattern. Since {E} has a commutative group structure, one can take the sum of these maps, and get a map

\displaystyle  \begin{array}{rcl}  h=\sum_{i=1}^r f_i:\prod_{i=1}^r S_i \rightarrow E. \end{array}

This is a holomorphic map. A generic fiber {f^{-1}(p)} is smooth, its fundamental group is Kähler, by construction. It is the kernel of the homomorphism induced by {h} on fundamental groups. I show that if {h} is surjective on fundamental groups, the group enjoys an interesting finiteness property (which I will not define today): it is {\mathcal{F}_{r-1}} but not {\mathcal{F}_{r}}. No previous example was known.

Peng: higher dimensions? Unclear.

3. Cyril Marzouk: Scaling limits of large random combinatorial structures: Brownian objects

Motivation from quantum gravity: how to sample a random metric on the 2-sphere? Possibly easier: how to sample a random function?

Approximate answer: look at random walks on {{\mathbb Z}}. The endpoints obeys a binomial law, which converges to a Gaussian law. In fact (Donsker), the whole path converges in law to Brownian motion, a probability measure on real functions on {{\mathbb R}_+}.

Closer to random metrics: random quadrangulations, also known as planar maps. The graph distance is thought of as a metric on the sphere. The Gromov-Hausdorff distance makes such planar maps into a Polish topological space.

Le Gall and Miermont (independantly): a uniformly random quadrangulation with {n} faces, renormalized by {n^{1/4}}, converges to a random metric space in Gromov-Hausdorff distance. It is homeomorphic to the 2-sphere, but it has Hausdorff dimension 4.

Enrich the model by admitting faces with variable (even) number of edges, and specifiy the number {n_i} of faces of degree {2i}. Assume {n_i/n} converges to {p(i)}, {\sum_i in_i/n} converges to {\sum_i ip(i)}, and {\sum_i i^2 n_i/n} converges to {\sum_i i^2p(i)}. Then I show that a uniformly random such map converges to the same random metric space (slightly rescaled).

Le Gall’s method relies on Schaeffer’s encoding of quadrangulations with rooted trees carrying integers at vertices. It was known earlier (Aldous) that the uniformly random tree with {n+1} vertices, rescaled by {n^{1/2}}, converges to the Brownian tree. I merely explain where the constants come from. A finite tree can be viewed as a walk on {{\mathbb Z}}. The depth of the tree is the maximal distance reached by the walk. Since random walks are well understood, depths of random trees are known.

4. Jie Lin: Special values of {L}-functions and a conjecture of Deligne

Considering sums of inverse squares, cubes,… leads to Riemann’s zeta function. Euler showed that {\zeta(2)=\pi^2/6}. More generally, {\zeta(2m)} is commensurable to {\pi^{2m}}. To show this, one extends {\zeta} meromorphically to {{\mathbb C}}, and gets a functional equation, which relates {\zeta(2m)} to {\zeta(1-2m)}, which is a rational number (one says that {2m} is critical, since {\zeta} is holomorphic both at {2m} and {1-2m}).

For odd integers, one must compute residues, which is much harder.

A Dirichlet {L}-function is a weighted inverse power sums, where weights are characters mod {N}. The above theorem extends to {L}-functions. Example: {N}

Adèles. Euler’s product formula for {\zeta} suggests formulae involving all prime numbers, i.e. all absolute values on {{\mathbb Q}}. The ring of adeles is the (restricted) product of all {p}-adic fields (completion of {{\mathbb Q}} for the {p}-adic absolute value), and the reals. The idèles are the units of adele ring. The idea of a {L}-function extends to Hecker characters, i.e. continuous characters of {{\mathbb Q}^\times \setminus\mathbb{A}^\times}.

A Hecke character is a one-dimensional representation of {Gl_1(\mathbb{A})}, we must consider more generaly automorphic representations of {Gl_N(\mathbb{A})}. Motives are an even wider generalization introduced by Grothendieck. These are geometric objects, like elliptic curves. Analytic continuation of {L}-functions in this generality is a conjecture by Hasse-Weil. In 1979, Deligne conjectured the values of {L}-functions for critical integers.

Langlands’ program relates these 3 types of objects, motives, automorphic representations and Galois representations. This contains the Taniyama-Shimura-Weil conjecture whose solution is used in the solution of Fermat’s last problem.

With coauthors, I have translated Deligne conjecture in automorphic representation terms.

5. Dena Kazerani : Symmetry, from hyperbolic systems to Green-Naghdi models

I work in fluid mechanics of incompressible flows. When viscous forces are low, Navier-Stokes equations simplify to Euler equations. A difficulty arises since the domain occupied by the fluid evolves.

I focus today on shallow water, this is the Green-Naghdi model (1976): one assumes that the fluid is irrotational, that the ground is planar and horizontal, vertical velocity depends linearly on the vertical variable, horizontal velocity does not depend on vertical velocity. Then the unknown of the equation, defined on a fixed planar domain, are the horizontal velocity and the water height. One gets a hyperbolic Saint-Venant system with extra dispersive terms.

Hyperbolic systems often have symmetry: Lax-entropy pairs, Godunov structures. It is the case for the Saint-Venant system. The symmetry is expressed by changes of variables involving matrices

\displaystyle  \begin{array}{rcl}  \partial_t U+\partial_x F(U)=0. \end{array}

We need to generalize matrices to operators on Banach spaces. We give a general definition of Godunov structure, extend the classical results (equivalence with existence of Lax-entropy pairs). We show that Green-Naghdi’s model has this property.

We use this symmetry to establish well-posedness of the equations: global existence, asymptotic stability.

Di Nezza: the definitions make sense only for smooth solutions, is this a problem? Yes, singular solutions behave differently (shocks).

Posted in Uncategorized | Leave a comment

Notes of Gang Tian’s Orsay lecture 28-06-2017

Existence of conic Kaehler-Einstein metrics

Joint work with Feng Wang, Zhejiang university.

A log-Fano manifold is the date of a compact Kaehler manifold {M}, a divisor with normal crossings {D=\sum(1-\beta_i)D_i} such that the line bundle

\displaystyle  \begin{array}{rcl}  L=K_M^{-1}+\sum(1-\beta_i)D_i \end{array}

is positive.

A metric {\omega} is a conic Kaehler-Einstein metric if it is smooth Kaehler in {M\setminus |D|} and for every point {p\in|D|} where {D} is defined by {z_1\cdots z_d=0} in some coordinates, {\omega} is equivalent (between two multiplicative constants) to the model cone metric

\displaystyle  \begin{array}{rcl}  \omega_{cone}=i(\sum_{i\leq d} \frac{dz_i\wedge d\bar z_i}{|z_i|^{2(1-\beta_i)}}+\sum_{i>d} dz_i\wedge d\bar z_i). \end{array}

Say that {\omega} is conic Kaehler-Einstein if

\displaystyle  \begin{array}{rcl}  Ric(\omega)=\omega+2\pi\sum(1-\beta_i)[D_i]. \end{array}

1. Necessary conditions

Berman 2016: If {(M,D)} admits a conic KE metric with {[\omega]=2\pi c_1(L)}, then {(M,D)} is log-K-stable.

Log-K-stability is defined as follows.

A special degeneration {(\mathcal{X},\mathcal{D},\mathcal{L})} of {M,D,L)} is a 1-parameter family of log-pairs, consisting of

  1. A normal log-pair {(\mathcal{X},\mathcal{D})} with a {{\mathbb C}^*}-equivariant map {\pi:(\mathcal{X},\mathcal{D})\rightarrow{\mathbb C}},
  2. {\mathcal{L}} is an equivariant {\pi}-ample {{\mathbb Q}}-line bundle.
  3. {\mathcal{X}_t,\mathcal{D}_t,\mathcal{L}_t} is isomorphic to {(M,D,L)} for every {t\not=0}.

There is a natural compactification {(\overline{\mathcal{X}},\overline{\mathcal{D}},\overline{\mathcal{L}})} of {(\mathcal{X},\mathcal{D},\mathcal{L})} that maps to {{\mathbb C} P^1}. Defined number

\displaystyle  \begin{array}{rcl}  w(\mathcal{X},\mathcal{D},\mathcal{L})=\frac{n\overline{\mathcal{L}}^{n+1}+(n+1)\overline{\mathcal{L}}^n(K_{\mathcal{X}|{\mathbb C} P^1}+\overline{\mathcal{D}})}{(n+1)L^n}. \end{array}

If the central fiber is a log-Fano variety {(M_0,D_0)} embedded in {{\mathbb C} P^N} by {H^0(M_0,L^\ell_{|M_0})}, then {w(\mathcal{X},\mathcal{D},\mathcal{L})} can be interpreted as a Futaki invariant.

Say that {(M,D)} is log-K-semistable if for any special degeneration {(\mathcal{X},\mathcal{D},\mathcal{L})} has {w(\mathcal{X},\mathcal{D},\mathcal{L})\geq 0}. Say that {(M,D)} is log-K-stable if for any special degeneration {(\mathcal{X},\mathcal{D},\mathcal{L})} has {w(\mathcal{X},\mathcal{D},\mathcal{L})\geq 0} and equality holds only for the trivial degeneration {(\mathcal{\mathcal{X},\mathcal{D},\mathcal{L}})=(M,D,L)\times{\mathbb C}}.

2. The result

Theorem 1 If {(M,D)} is log-K-stable, the there exists a conic KE metric {\omega} with {[\omega]=2\pi c_1(L)}.

Many special cases were known, as consequences of existence of KE metrics on smooth closed manifolds. For instance when {D} is a multiple of {K_M^{-1}}.

3. Motivation

We are interested in {{\mathbb Q}}-Fano varieties {X}. Assume {X} admits a resolution {\mu:M\rightarrow X} such that {K_M=\mu^* K_X+\sum a_i E_i}, {a_i\in(-1,0]}. For small enough {\epsilon\in{\mathbb Q}}, define

\displaystyle  \begin{array}{rcl}  L_\epsilon=\pi^*K_X-\sum \epsilon E_i. \end{array}

If there exists a KE metric {\omega} on {X}, then {\mu^*\omega} is a degenerate conic KE metric on {M} with conic angles {2\pi b_i} along {E_i}. We expect that there exist conic KE metrics {\omega_\epsilon} on {(M,\sum (1-b_i-\epsilon)E_i)} with {[\omega_\epsilon]=2\pi c_1(L_\epsilon)}, which Gromov-Hausdorff converge to {(X,\omega)} as {\epsilon\rightarrow 0}.

We think that we are now able to prove the following. If {X} is a K-stable {{\mathbb Q}}-Fano variety. Then it admits a generalized KE metric in the above sense.

4. Proof

Many steps are similar to the smooth case. Pick a large integer {\lambda} such that {\lambda L} has a smooth divisor {E}. We use a continuity method, solving

\displaystyle  \begin{array}{rcl}  Ric(\omega_t)=t\omega_t+\frac{1-t}{\lambda}[E]+\sum(1-\beta_i)[D_i], \end{array}

{t\in[0,1]}. The set {I} of {t} such that a solution exists is easily shown to be non-empty (it contains 0) and open. Is it closed? The key point is a {C^0} estimate. It follows from a “partial {C^0}-estimate” and log-K-stability. In turn, this follows from an {L^2}-estimate and compactness a la Cheeger-Colding-Tian.

4.1. Smoothing conical KE metrics

Say that {\omega} has a K-approximation if there exist Kaehler metrics {\omega_i=\omega+i\partial\bar\partial\phi_i} in the same cohomology class such that

  • {\phi_i\rightarrow 0} uniformly on {M} and smoothly outside {|D|},
  • {Ric(\omega_i)\geq K\omega_i},
  • {(M,\omega_i)\rightarrow (M,\omega)} in Gromov-Hausdorff topology.

We show that if {Aut^0(M,D)=\{1\}} and if for all {i},

\displaystyle  \begin{array}{rcl}  (1-K_i)L+(1-\beta_i)D_i\geq 0 \end{array}

for some {K_i\leq 1}, then {\omega} has a K-approximation where {K=\sum(K_i-1)+1}.

We solve a modified equation with an extra term involving {K_i}‘s. For this, we use the variational approach by Boucksom-Eyssidieux-Guedj-Zeriahi and results of Darwan-Robinstein, Guenancia-Paun.

4.2. Extend B. Wang-Tian’s results to conic case

5. Work in progress

To handle {{\mathbb Q}}-Fano varieties, we need to extend Cheeger-Colding to conic cases.

Posted in Workshop lecture | Leave a comment

Notes of Oana Ivanovici’s Orsay lecture 28-06-2017

Geometry and analysis of waves in manifolds with boundary

The wave-front is a subset of the cotangent bundle, whose projection is the singular support. In all dimensions, in Euclidean space, it travels at constant speed along straight lines (Fermat,…, Hormander).

In general Riemannian manifolds without boundary, it travels along geodesics as long as time stays less than the injectivity radius (Duistermaat-Hormander).

We impose Dirichlet boundary conditions. Then transverse waves reflect according to Snell’s law of reflection (Chazarain). What about tangencies? Assume obstacle is convex. Do waves propagate in the shadow?

Melrose-Taylor 1975: if the boundary is {C^\infty}, no smooth singularities in the shadow region. However, analytic singularities occur.

Inside strictly convex domains, waves reflect a large number of times. The wave shrinks in size between two reflections, it refocusses, therefore its maximum increases. Caustics appear, together with swallowtail and cusp singularities.

In the non-convex case, especially if infinite order tangencies occur, one does not even know what the continuation of a ray should be (Taylor 1976).

1. Dispersive estimates

It is a measurement of the decay of amplitude of waves due to spreading out while energy is conserved.

In {{\mathbb R}^d}, after a high frequency cut-off around frequency {\lambda}, the maximum amplitude decays like {\lambda^{(d+1)/2}t^{-(d-1)/2}}. Indeed, the wave is concentrated in an annulus of width {1/\lambda}. The same holds in Riemannian manifolds without boundary.

In the presence of boundary, propagation of singularities has brought results in the 1980’s. Later on, people have tried a reduction to the boundary-less case with a Lipschitz metric: this requires no assumptions on the boundary, but ignores reflection and its refocussing effect.

1.1. Within convex domains

Theorem 1 (Ivanovici-Lascar-Lebeau-Planchon 2017) For strictly convex domains, dispersion is in

\displaystyle  \begin{array}{rcl}  \lambda^d\max\{1,(\lambda t)^{-\frac{d-1}{2}+\frac{1}{4}}\}. \end{array}

This follows from a detailed description of the wave-front, including swallow-tails. It takes into account infinitely many reflections. It is sharp.

1.2. Outside convex obstacles

The Poisson spot. This is a place where diffracted light waves interfere. It is in the shadow area, but much more light concentrates there. This was confirmed experimentally by Arago, following a debate launched by Fresnel who did not believe in the wave description of light. It should exist if one believes in Fermat’s principle that light rays follow geodesics, including those which creep along the boundary surface (Keller’s conjecture). In 1994, Hargé and Lebeau proved that, when light creeps along the bounday, it decays like {e^{-\lambda^{1/3}}}.

Theorem 2 (Ivanovici-Lebeau 2017) For strictly convex obstacles,

  1. if {d=3}, dispersion estimates hold like in {{\mathbb R}^3},
  2. if {d\geq 4}, they fail at the Poisson spot.

The reason is that a {d-2}-dimensional surface lits the Poisson point.

Posted in Workshop lecture | Leave a comment

Notes of Erlend Grong’s Orsay lecture 27-06-2017

Asymptotic expansions of holonomy

Joint with Pierre Pansu.

1. Motivation

Given a connection on a principal bundle {G\rightarrow P\rightarrow M}, holonomy along a based loop {\gamma} of {M} is an element of {G} resulting from lifting horizontally {\gamma} to {P}. We look for an expression {F(\gamma)\in\mathfrak{g}} such that {\exp(F(\gamma))} is a good approximation of holonomy when {\gamma} is short,

\displaystyle  \begin{array}{rcl}  hol(\gamma)=\exp(F+O(\ell^r)). \end{array}

We want that {F(\gamma)} be simpler to compute than holonomy, and be related to curvature.

Hatton-Choset: motion of a snake with two joins. {M=S^1\times S^1}, {G=SE(2)}. Experimentalists have been led to choose the Coulomb gauge, and for {F(\gamma)} the integral over a disk spanning {\gamma} of curvature expressed in Coulomb gauge.

In this practical example, motions are tangent to a sub-bundle of the tangent bundle of {M}. Hence our interest in expansions which are particularly efficient on such curves. We call this setting sub-Riemannian.

Sub-Riemannian curvature is not easy to define. The obvious approach of using adapted connections on the tangent bundle is not illuminating.

2. Results

  1. Asymptotic, gauge-free formula in Euclidean space.
  2. Riemannian case not that different.
  3. Sub-Riemannian case suggests a notion of curvature.
  4. For certain sub-Riemannian structures,

2.1. Euclidean case

Dilations define radial fillings {disk(\gamma)} of loops. Use radial gauge (frame is parallel along rays through the origin). They turn out to be optimal. Using radial gauge, integrate curvature over radial filling. This defines

\displaystyle F(\gamma)=\int_{\mathrm{disk}(\gamma)}\Omega.

Say a differential form {\omega} has weight {\geq m} if dilates {\delta_s^*\omega} are {O(t^m)}. Use radial gauge to define weight of forms on {P}.

Theorem 1 If the curvature has weight {m}, then

\displaystyle  \begin{array}{rcl}  hol(\gamma)=\exp(F(\gamma)+O(\ell(\gamma)^2m). \end{array}

Furthermore, one can expand {F(\gamma)} in termes of Taylor’s expansion of curvature.

Since curvature has weight at least 2, one gets a 4-th order approximation.

2.2. Sub-Riemannian case

The flat sub-Riemannian case corresponds to Carnot groups, i.e. a Lie group whose Lie algebra has a gradation

\displaystyle  \begin{array}{rcl}  \mathfrak{n}=\mathfrak{n}_1\oplus\cdots\oplus \mathfrak{n}_r \end{array}

and is generated by {\mathfrak{n}_1}. Example: Heisenberg group.

Fix a norm on {\mathfrak{n}_1}. Left translates of {\mathfrak{n}_1} define a sub-Riemannian metric, for which dilations {\delta_t=t^i} on {\mathfrak{n}_i} are homothetic.

According to Le Donne, sub-Riemannian Carnot groups are characterized by being the only locally compact homogeneous geodesic metric spaces with homothetic homeos.

Carnot groups come with a left-invariant horizontal basis, we pick a connection on the tangent bundle which makes it parallel. It has torsion. We combine it with the principal bundle connection to define iterated covariant derivatives of curvature. We organize them according to weights adapted to the Lie algebra grading. The above theorem extends.

2.3. Horizontal holonomy

Since we are interested only in holonomy along horizontal loops, we have the freedon to change the connection outside the horizontal subbundle.

Chitour-Grong-Jean-Kokkonen: using this freedom, there are choices which minimize the curvature in the sense that as many components as possible vanish identically. This tends to increase the weight of curvature.

Example: on 3-dimensional Heisenberg group, the preferred connection has curvature which vanishes on the horizontal distribution, hence has weight {\geq 3} instead of 2. Above Theorem provides a 6-th order expansion, whose terms can be computed algebraically.

More generally, on free {k}-step nilpotent Lie groups, the curvature of a preferred connection has order at least {k+1}, whence a {2k+2}-th order expansion whose terms are linear in curvature (in fact, in the preferred curvature).

We expect to use it to refine the Euclidean expansion.

3. Question

What does this give in case of the two-joints snake? Requires to push computations further.

Posted in Workshop lecture | Tagged | Leave a comment

Notes of Karen Vogtmann’s second Cambridge lecture 23-06-2017

The borders of Outer Space

Joint work with Kai-Uwe Bux and Peter Smillie.

1. Duality groups

I am interested in Poincare duality. For a group, assume {M=B\pi} is a smooth {d}-manifold, then

\displaystyle  \begin{array}{rcl}  H^k_c(\tilde M)=H_{d-k}(\tilde M). \end{array}

Bieri-Eckmann observed that is suffices that {\Gamma} acts freely cocompactly on a contractible space {X} whose compactly supported cohomology vanishes in all degrees but {d}, and {H_c^d(X)} is torsion free. Then {\Gamma} is a duality group.

If the action is merely proper and cocompact, {\Gamma} is a virtual duality group. Borel-Serre used this for lattices. Bestvina-Feighn used this to show that $latex {Out(F_n) is a virtual duality group. Mapping class groups also act on a contractible space.

To achieve cocompactness, Borel-Serre added to the symmetric space copies of Euclidean space forming a rational Euclidean building. The resulting bordification of the quotient is a manifold, with boundary homotopy equivalent to a wedge of spheres (Solomon-Tits). Instead, Grayson constructed an invariant cocompact subset of symmetric space.

Grayson’s work was used by Bartels-Lueck-Reich-Ruping to prove Farrel-Jones for }&fg=000000$Sl(n,{\mathbb Z})$latex {.

For }&fg=000000$Out(F_n){, Bestvina-Feighn defined a bordification too. We proceed differently, like Grayson: we produce and invariant retract }J_n\subset X_n$latex {. It is much easier and gives more information on the boundary.


}&fg=000000$X_n{ is the space of metric graphs (without separating edges) with homotopy markings. It is made of simplices with edge-lengths as coordinates. }J_n$ is obtained by chopping off some of their corners.

A core graph is a subgraph such that, when one shrinks it, one gets out of Outer Space. Only corner facing core graphs need be chopped off.

The boundary appears as a union of contractible walls (every intersection of walls is contractible).

Posted in Workshop lecture | Tagged | Leave a comment

Notes of Grigori Avramidi’s Cambridge lecture 23-06-2017

Topology of ends of nonpositively curved manifolds

Joint work with T. Nguyen Pham.

I am interested in complete Riemannian manifolds with curvature in {[-1,0]}, and finite volume.

Example. Product of two hyperbolic surfaces. The end is homeomorphic to {N\times[0,+\infty)}, with some extra structure: {N} is made of two pieces.

More generally, for locally symmetric spaces of noncompact type, lifts of ends are homeomorphic to {N\times[0,+\infty)}, with {N} a wedge of spheres. This description goes back to Borel-Serre.

1. Thick-thin decomposition

Gromov-Schroeder: assume there are no arbitrarily small geodesic loops. Then the thin part is homeomorphic to {N\times[0,+\infty)}, with {N} a closed manifold.

The condition is necessary. Gromov gives an example of a nonpositively curved infinite type graph manifold of finite volume.

Theorem 1 (Avramidi-Nguyen Pham) Under the same assumptions, any map of a polyhedron to the thin part of the universal cover {\tilde M} can be homotoped within the thin part into a map to an {\lfloor \frac{n}{2}\rfloor}-dimensional complex, {n=dim(M)}.


  1. If {n\leq 5}, each component of the thin part is aspherical and has locally free fundamental group.
  2. {H^k(B\Gamma,{\mathbb Z} \Gamma)=0} for all {k<\frac{n}{2}}.
  3. {dim(B\Gamma)\geq \frac{n}{2}}.

2. Proof

Maximizing the angle under which two visual boundary points are seen gives Tits distance, and the corresponding path metric {Td}.

In the universal cover, the thin part is the set of points moved less than {\epsilon} away by some deck transformation {\Gamma}. Isometries are either hyperbolic (minimal displacement is achieved) or parabolic (infimal displacement is 0). Parabolic isometries have a nonempty fix-point set at infinity. At each point {x}, the subgroup generated by isometries moving {x} no more than {\epsilon} is virtually nilpotent, hence virtually has a common fixed point at infinity. This allows to define a discontinuous projection to infinity. The point is to show that the image has dimension {<\lfloor \frac{n}{2}\rfloor}.

2.1. Busemann simplices

If {h_0} and {h_1} are Busmeann functions, {t_0h_0+t_1h_1} need not be a Busemann function again, but on each sphere, there is a unique point where it achieves its minimum, and tis point depends in a Lipschitz manner on {t_0,t_1}. This defines an arc in Tits boundary, hence simplices {\sigma}. We claim that

\displaystyle  \begin{array}{rcl}  hom-dim(Stab(\sigma))+dim(image(\sigma))\leq n-1. \end{array}

Posted in Workshop lecture | Tagged | Leave a comment