Inégalité de Chernoff

Cet article est une ébauche concernant les probabilités et la statistique.

Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants.

En théorie des probabilités, l'inégalité de Chernoff permet de majorer la queue d'une loi de probabilité, c'est-à-dire qu'elle donne une valeur maximale de la probabilité qu'une variable aléatoire dépasse une valeur fixée. On parle également de borne de Chernoff. Elle est nommée ainsi en l'honneur du mathématicien Herman Chernoff.

Elle est comparable à l'inégalité de Markov mais donne une borne exponentielle.

Énoncés

Il existe de nombreux énoncés, et de nombreux cas particuliers.

Cas général

Soit $X$ une variable aléatoire réelle dont la fonction génératrice des moments est telle que :

\phi (t)=\mathbb {E} [e^{tX}]<+\infty ,

Alors^[1], pour tout $a\geq 0$ ,

{\text{si }}t>0,\,\mathbb {P} \left(X\geq a\right)\leq e^{-ta}\mathbb {E} [e^{tX}]

et

{\text{si }}t<0,\,\mathbb {P} \left(X\leq -a\right)\leq e^{ta}\mathbb {E} [e^{tX}].

Avec des variables symétriques et une espérance nulle

Soient $X_{1},X_{2},\dots ,X_{n}$ des variables aléatoires indépendantes, telles que $\mathbb {E} [X_{i}]=0$ et $\left|X_{i}\right|\leq 1\,$ pour tout i. On pose $X=\sum _{i=1}^{n}X_{i}$ et on appelle σ² la variance de X.

Alors, on a pour tout $0\leq k\leq 2\sigma \,$ :

\mathbb {P} (X\geq k\sigma )\leq e^{-k^{2}/4}

ainsi que

\mathbb {P} (-X\geq k\sigma )\leq e^{-k^{2}/4}

,

et donc aussi

\mathbb {P} (\left|X\right|\geq k\sigma )\leq 2e^{-k^{2}/4}

.

Avec des variables symétriques booléennes

Soient $X_{1},X_{2},\dots ,X_{n}$ des variables aléatoires booléennes (i.e. à valeurs dans $\{0,1\}$ ) indépendantes, de même espérance p, alors $\forall \varepsilon >0$ ,

\mathbb {P} \left({\frac {1}{n}}\sum _{i=1}^{n}X_{i}>p+\varepsilon \right)\leq e^{-2\varepsilon ^{2}n}

, et

\mathbb {P} \left({\frac {1}{n}}\sum _{i=1}^{n}X_{i}<p-\varepsilon \right)\leq e^{-2\varepsilon ^{2}n}

.

Démonstration

Il existe plusieurs manières de démontrer ces inégalités^[2].

Cas général

Démonstration

Pour la première inégalité, $\forall a\geq 0,~\forall t\geq 0$ ,

{\begin{aligned}\mathrm {e} ^{t(X-a)}&\geq {1}_{\{X\geq a\}}\\\Rightarrow E\left[\mathrm {e} ^{t(X-a)}\right]&\geq P(X\geq a)\\\Rightarrow E\left[\mathrm {e} ^{tX}\right]\mathrm {e} ^{-ta}&\geq P(X\geq a).\\\end{aligned}}

D'où,

{\begin{aligned}P(X\geq a)&\leq e^{-(ta-\ln(\phi (t)))},\end{aligned}}

et, comme c'est vrai pour tout $t\geq 0$ , on obtient que

{\begin{aligned}P(X\geq a)&\leq \inf _{t\geq 0}\ \mathrm {e} ^{-(ta-\ln(\phi (t))}\\&=\mathrm {e} ^{-\sup _{t\geq 0}\{ta-\ln(\phi (t))\}}\\&=\mathrm {e} ^{-h(a)}.\end{aligned}}

Pour la deuxième inégalité, $\forall a\geq 0,~\forall t\leq 0$ ,

{\begin{aligned}\mathrm {e} ^{t(X+a)}\geq {1}_{\{X\leq -a\}}\\\Rightarrow P(X\leq -a)&\leq E\left[\mathrm {e} ^{t(X+a)}\right]\\&\leq \mathrm {e} ^{ta}\mathrm {e} ^{\ln(\phi (t))}\\&\leq \mathrm {e} ^{-(-ta-\ln(\phi (t)))},\end{aligned}}

donc comme précédemment:

P(X\leq -a)\leq \mathrm {e} ^{-h(-a)}.

Avec des variables symétriques booléennes

Démonstration

Pour la première inégalité, on pose $Z=X-p$ et ${\overline {Z}}_{n}={\frac {1}{n}}\sum _{i=1}^{n}Z_{i}$ où X suit une loi de Bernoulli de paramètre p. Par l'inégalité de Chernoff appliquée à ${\overline {Z}}_{n}$ ,

{\begin{aligned}P({\frac {1}{n}}\sum _{i=1}^{n}X_{i}\geq p+\epsilon )&=P({\overline {Z}}_{n}\geq \epsilon )\\&\leq \mathrm {e} ^{-h_{{\overline {Z}}_{n}}(\epsilon )}.\end{aligned}}

Or $h_{{\overline {Z}}_{n}}(\epsilon )=\sup _{t\geq 0}\{\epsilon t-\ln(E[\mathrm {e} ^{t{\overline {Z}}_{n}}])\}=nh_{Z}(\epsilon )$ . En effet, comme $\{X_{i}\}_{i\in [\!1,n\!]}$ sont i.i.d et donc $\{Z_{i}\}_{i\in [\!1,n\!]}$ sont i.i.d.,

{\begin{aligned}E[\mathrm {e} ^{t{\overline {Z}}_{n}}]&=\prod _{i=1}^{n}E[\mathrm {e} ^{{\frac {t}{n}}Z_{i}}]\\&=E[\mathrm {e} ^{{\frac {t}{n}}Z}]^{n}.\end{aligned}}

D'où,

{\begin{aligned}h_{{\overline {Z}}_{n}}(\epsilon )&=\sup _{t\geq 0}\{\epsilon t-\ln(E[\mathrm {e} ^{t{\overline {Z}}_{n}}])\}\\&=\sup _{t\geq 0}\{\epsilon t-n\ln(E[\mathrm {e} ^{{\frac {t}{n}}Z}])\}\\&=n\sup _{t\geq 0}\{\epsilon {\frac {t}{n}}-\ln(E[\mathrm {e} ^{{\frac {t}{n}}Z}])\}\\&=nh_{Z}(\epsilon ).\end{aligned}}

Donc,

{\begin{aligned}P({\frac {1}{n}}\sum _{i=1}^{n}X_{i}\geq p+\epsilon )&\leq \mathrm {e} ^{-n\sup _{t\geq 0}\{\epsilon t-\ln(E[\mathrm {e} ^{tZ}])\}}\\&\leq \mathrm {e} ^{n\inf _{t\geq 0}\{\ln(E[\mathrm {e} ^{tZ}])-\epsilon t\}}\\&\leq \mathrm {e} ^{n(\ln(E[\mathrm {e} ^{tZ}])-\epsilon t)}({\text{pour }}t\geq 0).\end{aligned}}

On remarque que $E[\mathrm {e} ^{tZ}]=\mathrm {e} ^{-pt}E[\mathrm {e} ^{tX}]=\mathrm {e} ^{-pt}(1-p+p\mathrm {e} ^{t})$ .
Donc $\forall t\geq 0,$

{\begin{aligned}\ln(E[\mathrm {e} ^{tZ}])-\epsilon t&=\ln(1-p+p\mathrm {e} ^{t})-(\epsilon +p)t\\&=\Psi (t)-\epsilon t,\end{aligned}}

avec $\forall t\in \mathbb {R} ,~\Psi (t)=-pt+\ln(1-p+p\mathrm {e} ^{t})$ .
En vue d'utiliser la formule de Taylor Lagrange à l'ordre 2, on calcule les dérivées premières et secondes $\Psi$ ,

{\begin{aligned}\forall t\in \mathbb {R} ,~\Psi ^{'}(t)&=-p+{\frac {p\mathrm {e} ^{t}}{1-p+p\mathrm {e} ^{t}}}\\\Psi ^{''}(t)&={\frac {(1-p)p\mathrm {e} ^{t}}{(1-p+p\mathrm {e} ^{t})^{2}}}\\&={\frac {\alpha \beta }{(\alpha +\beta )^{2}}}\\&\leq {\frac {1}{4}},\end{aligned}}

avec $\alpha =1-p,~\beta =p\mathrm {e} ^{t}$ . On peut majorer $\Psi ^{''}(t)$ par ${\frac {1}{4}}$ .
En effet, $(\alpha +\beta )^{2}=\alpha ^{2}+\beta ^{2}+2\alpha \beta {\text{ et }}(\alpha -\beta )^{2}=\alpha ^{2}+\beta ^{2}-2\alpha \beta \geq 0\Rightarrow 2\alpha \beta \leq \alpha ^{2}+\beta ^{2}\Rightarrow (\alpha +\beta )^{2}\geq 4\alpha \beta$ .

Donc, comme $\Psi (0)=\Psi ^{'}(0)=0$ , d'après la formule de Taylor Lagrange, $\forall t\in \mathbb {R}$ ,

{\begin{aligned}\Psi (t)&=\Psi (0)+t\Psi ^{'}(0)+{\frac {t^{2}}{2}}\Psi ^{''}(\theta t)\\&\leq {\frac {t^{2}}{8}},\end{aligned}}

avec $\theta \in [0,1]$ .
Donc, $\forall t\geq 0$ ,

{\begin{aligned}P({\frac {1}{n}}\sum _{i=1}^{n}X_{i}\geq p+\epsilon )&\leq \mathrm {e} ^{n(\ln(E[\mathrm {e} ^{tZ}])-\epsilon t)}\\&\leq \mathrm {e} ^{n({\frac {t^{2}}{8}}-\epsilon t)}.\end{aligned}}

Soit $\forall t\geq 0,~g(t)={\frac {t^{2}}{8}}-\epsilon t$ . On remarque $\forall t\geq 0,~g^{'}(t)={\frac {t}{4}}-\epsilon$ .
Donc g admet un minimum en $t=4\epsilon$ .
Ainsi, $\forall \epsilon >0$ ,

{\begin{aligned}P({\frac {1}{n}}\sum _{i=1}^{n}X_{i}\geq p+\epsilon )&\leq \mathrm {e} ^{n({\frac {16\epsilon ^{2}}{8}}-4\epsilon ^{2})}\\&\leq \mathrm {e} ^{-2n\epsilon ^{2}}.\end{aligned}}

Pour la deuxième inégalité, $\forall \epsilon >0$ ,

{\begin{aligned}P({\frac {1}{n}}\sum _{i=1}^{n}X_{i}\leq p-\epsilon )&=P({\overline {Z}}_{n}\leq -\epsilon )\\&=P(-{\overline {Z}}_{n}\geq \epsilon )\\&\leq \mathrm {e} ^{-h_{-{\overline {Z}}_{n}}(t)}{\text{ d'après l'inégalité de Chernoff}}\\&\leq \mathrm {e} ^{-nh_{-Z}(t)}\\&\leq \mathrm {e} ^{n\inf _{t\geq 0}\{\ln(E[\mathrm {e} ^{-tZ}])-\epsilon t\}}\\&\leq \mathrm {e} ^{n(\ln(E[\mathrm {e} ^{-tZ}])-\epsilon t)}({\text{pour }}t\geq 0).\end{aligned}}

On remarque que : $\forall t\geq 0$ ,

{\begin{aligned}E[\mathrm {e} ^{-tZ}]&=\mathrm {e} ^{pt}E[\mathrm {e} ^{-tX}]\\&=\mathrm {e} ^{pt}(1-p+p\mathrm {e} ^{-t})\\\Rightarrow \ln(E[\mathrm {e} ^{-tZ}])&=pt+\ln(1-p+p\mathrm {e} ^{-t})\\&=\Psi (-t)\\&\leq {\frac {t^{2}}{8}}.\end{aligned}}

Donc, $\forall \epsilon >0,~\forall t\geq 0$ ,

{\begin{aligned}P({\frac {1}{n}}\sum _{i=1}^{n}X_{i}\leq p-\epsilon )&\leq \mathrm {e} ^{n({\frac {t^{2}}{8}}-\epsilon t)}\\&\leq \mathrm {e} ^{-2n\epsilon ^{2}},\end{aligned}}

par un argument similaire qui a servi à démontrer la première inégalité.

Applications

Ces inégalités sont très utilisées en informatique théorique, notamment en théorie de la complexité et en algorithmique, où elles permettent de prouver des résultats sur les algorithmes probabilistes.

Voir aussi théorie des grandes déviations.

Extensions

On peut écrire des généralisations intéressantes pour les matrices aléatoires, appelées en anglais matrix Chernoff bound (en)^[3].

Références

↑ Brémaud 2009, p. 184
↑ Wolfgang Mulzer, « Five Proofs of Chernoff’s Bound with Applications », Bulletin of the EATCS, n^o 124,‎ février 2018 (lire en ligne).
↑ Joel A Tropp, « User-friendly tail bounds for sums of random matrices », Foundations of Computational Mathematics, vol. 12, n^o 4,‎ 2012, p. 389-434

Voir aussi

Bibliographie

(en) Cet article est partiellement ou en totalité issu de l’article de Wikipédia en anglais intitulé « Chernoff's inequality » (voir la liste des auteurs).
(en) Kirill Levchenko (UCSD), Chernoff bound

Pierre Brémaud, Initiation aux Probabilités : et aux chaînes de Markov, Springer Science & Business Media, 2009, 311 p. (ISBN 978-3-540-31421-9, lire en ligne)

Portail des probabilités et de la statistique

[Brémaud184-1] Brémaud 2009, p. 184

[2] Wolfgang Mulzer, « Five Proofs of Chernoff’s Bound with Applications », Bulletin of the EATCS, n^o 124,‎ février 2018 (lire en ligne).

[3] Joel A Tropp, « User-friendly tail bounds for sums of random matrices », Foundations of Computational Mathematics, vol. 12, n^o 4,‎ 2012, p. 389-434

[1]

[2]

[3]