WHAT SCIENTIFIC FOLKLORE KNOWS ABOUT THE DISTANCES BETWEEN THE MOST POPULAR DISTRIBUTIONS

Kelbert Mark Yakovlevich; Suhov Yurii M.

Известия Саратовского университета. Новая серия. Серия: Математика. Механика. Информатика. 2022. Т. 22, вып. 2. С. 233-240

Izvestiya of Saratov University. Mathematics. Mechanics. Informatics, 2022, vol. 22, iss. 2, pp. 233-240

https://mmi.sgu.ru https://doi.org/10.18500/1816-9791-2022-22-2-233-240

Article

What scientific folklore knows about the distances between the most popular distributions

M. Y. Kelbert1, Yu. M. Suhov2'340

'Higher School of Economics - National Research University, 20 Myasnitskaya St., Moscow 101000, Russia

2University of Cambridge, The Old Schools, Trinity Ln, Cambridge CB2 1TN, UK 3University of Pennsylvania, 201 Old Main, State College, PA 16802, USA

4Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 build. 1 Bolshoy Karetny per., Moscow 127051, Russia

Mark Y. Kelbert, [email protected], https://orcid.org/0000-0002-3952-2012, AuthorlD: 1137288 Yurii M. Suhov, [email protected], AuthorlD: 1131362

Abstract. We present a number of upper and low bounds for the total variation distances between the most popular probability distributions. In particular, some estimates of the total variation distances between one-dimensional Gaussian distributions, between two Poisson distributions, between two binomial distributions, between a binomial and a Poisson distribution, and also between two negative binomial distributions are given. The Kolmogorov - Smirnov distance is also presented.

Keywords: probability distribution, variation distance, Pinsker's inequality, Le Cam's inequalities, distances between distributions

For citation: Kelbert M. Y., Suhov Yu. M. What scientific folklore knows about the distances between the most popular distributions. Izvestiya of Saratov University. Mathematics. Mechanics. Informatics, 2022, vol. 22, iss. 2, pp. 233-240. https://doi.org/10.18500/1816-9791-2022-22-2-233-240

This is an open access article distributed under the terms of Creative Commons Attribution 4.0 International License (CC-BY 4.0)

Научная статья УДК 519.85

Что научный фольклор знает о расстояниях между наиболее популярными распределениями

М. Я. Кельберт1, Ю. M. Сухов2 3 40

-'-Национальный исследовательский университет «Высшая школа экономики», Россия, 101000, г. Москва, ул. Мясницкая, д. 20

2Кембриджский университет, Великобритания, The Old Schools, Trinity Ln, Cambridge CB2 1TN 3Университет штата Пенсильвания, Соединенные Штаты Америки, Пенсильвания, 16802, г. Стейт-Колледж, кампус Юниверсити-Парк, ул. Олд Мейн, д. 201

4Институт проблем передачи информации имени А. А. Харкевича Российской академии наук, Россия, 127051, г. Москва, Б. Каретный пер., д. 19, стр. 1

Кельберт Марк Яковлевич, кандидат физико-математических наук, профессор-исследователь департамента статистики и анализа данных факультета экономических наук, [email protected], https: //orcid.org/0000-0002-3952-2012, AuthorlD: 1137288

Сухов Юрий Михайлович, кандидат физико-математических наук, 2профессор кафедры чистой математики и математической статистики; 3профессор математического факультета; 4научный сотрудник, [email protected], AuthorlD: 1131362

Аннотация. Представлен ряд верхних и нижних оценок для расстояний по вариации между наиболее популярными распределениями вероятностей. В частности, приводятся оценки расстояний по вариации между одномерными гауссовскими, между двумя пуассоновскими, между двумя биномиальными распределениями, между биномиальным и пуассоновским распределениями и между двумя негативными биномиальными распределениями. Также исследуется расстояние Колмогорова - Смирнова.

Ключевые слова: распределение вероятностей, расстояние вариации, неравенство Пинскера, неравенства Ле Кама, расстояния между распределениями

Для цитирования: Kelbert M. Y., Suhov Yu. M. What scientific folklore knows about the distances between the most popular distributions [Кельберт М. Я., Сухов Ю. М. Что научный фольклор знает о расстояниях между наиболее популярными распределениями] // Известия Саратовского университета. Новая серия. Серия: Математика. Механика. Информатика. 2022. Т. 22, вып. 2. С. 233-240. https://doi.org/10.18500/1816-9791-2022-22-2-233-240 Статья опубликована на условиях лицензии Creative Commons Attribution 4.0 International (CC-BY 4.0)

Introduction

A tale that becomes folklore is one that is passed down and whispered around. The second half of the word, lore, comes from Old English lar, i.e. 'instruction'. Different bounds for the distances between the most popular probability distributions (see [1]) appear in many problems of applied probability. Unfortunately, the available textbooks and reference books do not present them in a systematic way. In this short note, we make an attempt to fill this gap.

Let us remind that for probability measures P, Q with densities p,q

Let us remind the coupling characterization of the total variation distance. For two distributions P and Q, a pair (X,Y) of random variables defined on the same probability space is called a coupling for P and Q if X ~ P and Y ~ Q.

One of the useful facts is that there exists a coupling (X,Y) such that P(X = Y) = = TV(P, Q). Therefore, for any function f, we have P(f (X) = f (Y)) ^ TV(X,Y) with equality iff f is reversible.

1. Gaussian distributions

The total variation distance between one-dimensional Gaussian distributions is equal

to

т = T (X1,X2) = TV(N (»2, a22))

and it depends on the parameters A = with 6 = ß\ — ß2, and af,a2-

2'

1

-mm

200

22

40Д

1, max[——^, ——-- ]

3|a2 -a2l Д

, j ^ г ^ 11- 21 + .

min[a2, min[ai, 2max[a2,a2] 2max[ai,a2]

In the case af = af the following identity holds: r = $() — 1.

1.1. Pinsker's inequality

In the general case, the upper bound is a version of Pinsker's inequality [2] for r(X1,X2) = TV(X1,X2):

r(X1,Xf) < min{1, ^KL(PXl||PX2)/2}, (1)

where

KL(PXi ||P^2 ) = 2i af — 1 + ^ — ln af) .

2 \ ai a a^ /

For multidimensional Gaussian case

KL(PXi ||PX2) = 1 (tr(E-% — I) + 6TE-1 5 — lndet(EfE-1)) .

Let us prove the Pinsker's inequality (1). We need the following bound

|х - 1| ^ (4 + f) ф(х), ф(х) = х 1пх -ж + 1. (2)

If P and Q are singular, then KL = ro and Pinsker's inequality holds true. Assume P and Q are absolutely continuous. In view of (2) and Cauchy - Schwarz inequality

r(X,Y) = i/ lp — ql = 2/Qllq — 1|1{i>Q} ^

< ^(M ЫЧ/* I >4

(2/pln(~q)^>0») 1/2 = (KL(P||Q)/2)1/2 .

To check (2) define д(х) = (ж - 1)2 - (f + f) ф(х). Then g(1) = g'(1) = 0, g''(x)

4ф(х)

Sx < 0. Hence,

g(x) = g(1) + g'(1)(x - 1) + ^g" (0(x - 1)2 = - ^ (x - 1)2 ^ 0.

Remark. Mark S. Pinsker was invited to be the Shannon Lecturer at the 1979 IEEE International Symposium on Information Theory, but could not obtain permission at that time to travel to the symposium. However, he was officially recognized by the IEEE Information Theory Society as the 1979 Shannon Award recipient.

1.2. Le Cam's inequalities

Le Cam's inequalities were presented in [3] for Hellinger distance defined by

V(X, Y) = — ^У(VPxM - VMfifdu^

1

72

1/2

as follows:

V(X, Y)2 ^ г(X, Y) ^ V(X, Y) (2 - V(X, Y)2)1/2 . For one-dimensional Gaussian distributions we get

(3)

V(X,Y )2 = 1 - e" ^^.

V^2 +

22

Let us present the proof of Le Cam's inequalities (3).

From r(X,Y) = 1J |p — = 1 — f min[p,q] and min[p,g] ^ ^pq, it follows r(X,Y) ^ 1 — J y/pq = 'rf(X,Y). Next, / min[p,q] + f max[p,q] = 2. Therefore, by Cauchy - Schwarz inequality we get

(/ z/p^j = {^j \J min[p, q] max[p, q]*j ^ J min[p,q] J max[p, q] = = J min[p,g] ^2 — J min[p, q^ .

Hence, it follows from

(1 - V(X, Y)2)2 ^ (1 - г(X, Y))(1 + г(X, Y))

that

t(X, Y) ^ V(X, Y) (2 - V(X, Y)2)1/2 .

2. Poisson and binomial distributions

2.1. Two Poisson distributions

Let Xi are Poisson distributed random variables, i.e. Xi ~ Po(Aj), where 0 < A1 < À2. Then the distance between two Poisson distributions is

T(Xi,X2) = i 2 P(N(u) = I - 1)du ^ min Jx i

where N(u) ~ Po(u). Here \X1] ^ I ^ |~A2], and

A2 - A1, ^/2- ( - \/Â11)

I = /(Ai,A2)= T(A2 — Ai)(ln(A2/Ai ))-1].

2.2. Distances between binomial distributions

Let Xi are drawn from binomial distributions, i.e. Xi ~Bin(n,pi), 0 < p1 < p2 < 1. Then the distance between two binomial distributions is equal to

r (Xi, X2) = n r P(Sn-i(u) = I — 1)du ^ — P1\,2 ,

JP1 2 (1 — W(P2 — Pi))2

where Sn-1(u) ~Bin(n — 1,u) and ф(х) = x^J^"^Д^у. Finally, define

Г — n ln(1 — ) ■

ln(1 + ^) — ln(1 — ^)) with \np^ ^ I ^ [np2].

2.3. Distance between binomial and Poisson distributions

Let X ~Bin(n,p) and Y ~Po(np), 0 < np < 2 — л/2, then

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

r(X, Y) = np [(1 — p)ra-1 — e_rap].

n

For the sum of Bernoulli r.v. Sn = X3- with P(Xj = 1) = p» we have

3=1

1 ^ \fc ra

r( Sra, Yra) = ^ £ |P(Sn = fc) — fc-e_Xn | < £p2, fc=1 ' »=1

where Y„ ~Po(Ara), An = p1 + p2 + ... + pra [4]. A stronger result: for X» ~ Bernoulli^») and Y ~ Po(Aj = pi) there exists a coupling such that

T(X», Yj) = Р(Х» = Yi) = p»(1 — ).

2.4. Distance between negative binomial distributions

Let X» be drawn from negative binomial distributions, i.e. X» ~ NegBin(m,p»), 0 < p1 < p2 < 1. Then

Г P2

r(X1,X2) = (m + 1 — 1) / P(Sm+i-2(u) = m — 1)du,

J pi

where Sn(u) ~Bin(n,u) and

ln(1 + )

I = r_m__

1 m ln(1 — r^) 1

with \m1—21 ^ I ^ \m1—1 1.

1 p2 1 1 pi 1

3. Multidimensional Gaussian distributions

In the case of multidimensional Gaussian distributions the distance is

r = TV( Ж (^, E^,N fa, E2)),

where Eb E2 are positive-definite.

Let 6 = — and П be a d x (d — 1) matrix whose columns form a basis for subspace orthogonal to 5. Let A1,...,Ad_1 denote the eigenvalues of the matrix

(ПтЕ1П)-1ПТЕ2П — Id_1 and A = A?. If = ^2 then

1 9

— min[1,^(8, E1, E2)] ^ r ^ ^ min[1, ¥>(E1, E2)], (4)

where

<р(8, Еь Е2) = max

8Т(Е1 - Е2)5 —^

Л

8TEii ' z/WEiS' In the case of equal means = the bound (4) is simplified as follows:

1 3

-min[1, Л] ^ г ^ - min[1, X].

100 2 L ' J

Here A

• , X1,...,Xd are the eigenvalues of Е-1Е2 - ld for positive-definite

Ei, E2. In the case Ei = E2 the following equality holds: t = $ (||E-i/2£||/2) — i. Let us present below the sketch of proof, cf. [5].

Let Xi ~N(^i, Ei),z = 1,2. Without the loss of generality we can assume that Ei, E2 are positively definite as

TV( N(0, E1),N(0, E2)) = TV(N(0, nT E1n), N(0, nT E2n)),

where n is d x r matrix whose columns form orthogonal bases for range(E1,2). Denote u = (/i1 + ^2)/2, 8 = — and decompose Vw g Rd as

w = u + ¡1(w)8 + ¡2(w), f2(w)T8 = 0.

Then

max[TV(fi(Xi), fi№)), TV(^(Xi), /2(^2))] ^ TV(Xi,X2) ^ ^ TV(fi(Xi), fi(X2)) + TV(f2(Xi), /2(X2)).

All the components are Gaussian and fi(Xi) -n(i, ^^), fi(X2) -n(—i, f2(Xi) -N(0, PEiP), /2(^2) -N(0, PE2P), P = Id — |fj .We claim that

200

min

1 , max

8Т(Е1 - Е2)<^ 40—^' 28Т Е18 , —

^ TV(h(X1), h(X2)) ^

< 38Т(Е1 - Е2)<^ VW8

2 ТЕ1

Then

where A

13 ^min[1,A] ^ TV(/2(^1), /2(^2)) ^ ^A,

(N

1/2

and Xi are the eigenvalues of Е- 1Е2 - ld

Here we present only the proof of the upper bound. Let d = 1 and a2 ^ ai. Then for

2

x = ^2 we have x — 1 — lnx ^ (x — 1)2 and, by Pinsker's inequality,

TV (N(^1,^2), N(02, ^2)) ^ - 1 - ln 3 + A ^

a2 A2

°2 a2

1

1 M -, , ^ , 1 A2 1, 1A

-4-1 -+ — ^' 2 r2 +-—.

2 V a2 a2 2 V a2 2 a2 ' 2 ai

For d > 1, by Pinsker's inequality, one gets the upper bound in the case ^ = = 0: if

Xi > - § У

d d \2 ^ V^ Л i \ \ ^ V^ \2 _ л 2

4TV (N(0, E 1), N(0, E2))2 ^ - ln(1 + A,) ^ A,2 = A2

i=1 i= 1

4. Kolmogorov - Smirnov distance

Kolmogorov - Smirnov distance (only for probability measures on R) is defined by

Kolm(P, Q) := sup |P(-^,x) - Q(-^,x)|.

We have

Kolm(P, Q) ^ TV(P, Q).

Suppose X — P, Y — Q are two random variables and Y has a density with respect to a Lebesgue measure bounded by a constant C. Then

Kolm(P, Q) ^ 2^CWassi(P, Q),

where Wassi(P, Q) = inf[E|X - Y| : X — P, Y - Q].

Let N(t)

— Po(i) then, via integration by part,

n j-k r^ „n r<x

P( N(t) = e-t77 = e-u—du = P(N(u) = n)du. k=0 • Jt n- Jt

Hence,

Kolm(Xi,X2) = r(Xi,X2) = P(X2 ^ I) - P(Xi ^ /) =

= P(Xi ^ I- 1) - P(X2 ^ I- 1) = [ 2 P( N (u) = I - 1)du,

Jx 1

where I = min[ k G Z+ : f( k) > 1] and f( k) = •

Conclusion

This short review discusses only the most popular and well-known inequalities. Another interesting cases, i.e. the total variation distance between Binomial distribution and Gaussian with equal parameters, deserve special attention. Also, applications of these bounds in different problems of mathematical statistics, including classification theory and machine learning algorithms, are a rich field in the state of extensive development.

References

1. Suhov Yu., Kelbert M. Probability and Statistics by Example. Vol. I. Basic Probability and Statistics. 2nd ed. Cambridge, UK, Cambridge University Press, 2014. 470 p. https: //doi.org/10.1017/CBO9781139087773

2. Pinsker M. Information and Information Stability of Random Variables and Processes. San Francisco, USA, Holden-Day Inc., 1964. 243 p.

3. Le Cam L. Asymptotic Methods in Statistical Decision Theory. Springer Series in Statistics. New York, NY, Springer, 1986. 742 p. https://doi.org/10.1007/978-1-4612-4946-7

4. Le Cam L. An approximation theorem for the Poisson binomial distribution. Pacific Journal of Mathematics, 1960, vol. 10, no. 4, pp. 1181-1197. https://doi.org/10.2140/pjm.1960.10. 1181

5. Devroye L., Mehrabian A., Reddad T. The total variation distance between high-dimensional Gaussians. ArXiv, 2020, ArXiv:1810.08693v5, pp. 1-12.

Поступила в редакцию / Received 25.11.2021 Принята к публикации / Accepted 27.12.2021 Опубликована / Published 31.05.2022

WHAT SCIENTIFIC FOLKLORE KNOWS ABOUT THE DISTANCES BETWEEN THE MOST POPULAR DISTRIBUTIONS Текст научной статьи по специальности «Математика»

Аннотация научной статьи по математике, автор научной работы — Kelbert Mark Yakovlevich, Suhov Yurii M.

Похожие темы научных работ по математике , автор научной работы — Kelbert Mark Yakovlevich, Suhov Yurii M.

Текст научной работы на тему «WHAT SCIENTIFIC FOLKLORE KNOWS ABOUT THE DISTANCES BETWEEN THE MOST POPULAR DISTRIBUTIONS»