Известия Саратовского университета. Новая серия. Серия: Математика. Механика. Информатика. 2022. Т. 22, вып. 2. С. 233-240
Izvestiya of Saratov University. Mathematics. Mechanics. Informatics, 2022, vol. 22, iss. 2, pp. 233-240
https://mmi.sgu.ru https://doi.org/10.18500/1816-9791-2022-22-2-233-240
Article
What scientific folklore knows about the distances between the most popular distributions
M. Y. Kelbert1, Yu. M. Suhov2'340
'Higher School of Economics - National Research University, 20 Myasnitskaya St., Moscow 101000, Russia
2University of Cambridge, The Old Schools, Trinity Ln, Cambridge CB2 1TN, UK 3University of Pennsylvania, 201 Old Main, State College, PA 16802, USA
4Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevich Institute), 19 build. 1 Bolshoy Karetny per., Moscow 127051, Russia
Mark Y. Kelbert, [email protected], https://orcid.org/0000-0002-3952-2012, AuthorlD: 1137288 Yurii M. Suhov, [email protected], AuthorlD: 1131362
Abstract. We present a number of upper and low bounds for the total variation distances between the most popular probability distributions. In particular, some estimates of the total variation distances between one-dimensional Gaussian distributions, between two Poisson distributions, between two binomial distributions, between a binomial and a Poisson distribution, and also between two negative binomial distributions are given. The Kolmogorov - Smirnov distance is also presented.
Keywords: probability distribution, variation distance, Pinsker's inequality, Le Cam's inequalities, distances between distributions
For citation: Kelbert M. Y., Suhov Yu. M. What scientific folklore knows about the distances between the most popular distributions. Izvestiya of Saratov University. Mathematics. Mechanics. Informatics, 2022, vol. 22, iss. 2, pp. 233-240. https://doi.org/10.18500/1816-9791-2022-22-2-233-240
This is an open access article distributed under the terms of Creative Commons Attribution 4.0 International License (CC-BY 4.0)
Научная статья УДК 519.85
Что научный фольклор знает о расстояниях между наиболее популярными распределениями
М. Я. Кельберт1, Ю. M. Сухов2 3 40
-'-Национальный исследовательский университет «Высшая школа экономики», Россия, 101000, г. Москва, ул. Мясницкая, д. 20
2Кембриджский университет, Великобритания, The Old Schools, Trinity Ln, Cambridge CB2 1TN 3Университет штата Пенсильвания, Соединенные Штаты Америки, Пенсильвания, 16802, г. Стейт-Колледж, кампус Юниверсити-Парк, ул. Олд Мейн, д. 201
4Институт проблем передачи информации имени А. А. Харкевича Российской академии наук, Россия, 127051, г. Москва, Б. Каретный пер., д. 19, стр. 1
Кельберт Марк Яковлевич, кандидат физико-математических наук, профессор-исследователь департамента статистики и анализа данных факультета экономических наук, [email protected], https: //orcid.org/0000-0002-3952-2012, AuthorlD: 1137288
Сухов Юрий Михайлович, кандидат физико-математических наук, 2профессор кафедры чистой математики и математической статистики; 3профессор математического факультета; 4научный сотрудник, [email protected], AuthorlD: 1131362
Аннотация. Представлен ряд верхних и нижних оценок для расстояний по вариации между наиболее популярными распределениями вероятностей. В частности, приводятся оценки расстояний по вариации между одномерными гауссовскими, между двумя пуассоновскими, между двумя биномиальными распределениями, между биномиальным и пуассоновским распределениями и между двумя негативными биномиальными распределениями. Также исследуется расстояние Колмогорова - Смирнова.
Ключевые слова: распределение вероятностей, расстояние вариации, неравенство Пинскера, неравенства Ле Кама, расстояния между распределениями
Для цитирования: Kelbert M. Y., Suhov Yu. M. What scientific folklore knows about the distances between the most popular distributions [Кельберт М. Я., Сухов Ю. М. Что научный фольклор знает о расстояниях между наиболее популярными распределениями] // Известия Саратовского университета. Новая серия. Серия: Математика. Механика. Информатика. 2022. Т. 22, вып. 2. С. 233-240. https://doi.org/10.18500/1816-9791-2022-22-2-233-240 Статья опубликована на условиях лицензии Creative Commons Attribution 4.0 International (CC-BY 4.0)
Introduction
A tale that becomes folklore is one that is passed down and whispered around. The second half of the word, lore, comes from Old English lar, i.e. 'instruction'. Different bounds for the distances between the most popular probability distributions (see [1]) appear in many problems of applied probability. Unfortunately, the available textbooks and reference books do not present them in a systematic way. In this short note, we make an attempt to fill this gap.
Let us remind that for probability measures P, Q with densities p,q
Let us remind the coupling characterization of the total variation distance. For two distributions P and Q, a pair (X,Y) of random variables defined on the same probability space is called a coupling for P and Q if X ~ P and Y ~ Q.
One of the useful facts is that there exists a coupling (X,Y) such that P(X = Y) = = TV(P, Q). Therefore, for any function f, we have P(f (X) = f (Y)) ^ TV(X,Y) with equality iff f is reversible.
1. Gaussian distributions
The total variation distance between one-dimensional Gaussian distributions is equal
to
т = T (X1,X2) = TV(N (»2, a22))
and it depends on the parameters A = with 6 = ß\ — ß2, and af,a2-
2'
1
-mm
200
22
40Д
1, max[——^, ——-- ]
3|a2 -a2l Д
, j ^ г ^ 11- 21 + .
min[a2, min[ai, 2max[a2,a2] 2max[ai,a2]
In the case af = af the following identity holds: r = $() — 1.
1.1. Pinsker's inequality
In the general case, the upper bound is a version of Pinsker's inequality [2] for r(X1,X2) = TV(X1,X2):
r(X1,Xf) < min{1, ^KL(PXl||PX2)/2}, (1)
where
KL(PXi ||P^2 ) = 2i af — 1 + ^ — ln af) .
2 \ ai a a^ /
For multidimensional Gaussian case
KL(PXi ||PX2) = 1 (tr(E-% — I) + 6TE-1 5 — lndet(EfE-1)) .
Let us prove the Pinsker's inequality (1). We need the following bound
|х - 1| ^ (4 + f) ф(х), ф(х) = х 1пх -ж + 1. (2)
If P and Q are singular, then KL = ro and Pinsker's inequality holds true. Assume P and Q are absolutely continuous. In view of (2) and Cauchy - Schwarz inequality
r(X,Y) = i/ lp — ql = 2/Qllq — 1|1{i>Q} ^
< ^(M ЫЧ/* I >4
(2/pln(~q)^>0») 1/2 = (KL(P||Q)/2)1/2 .
To check (2) define д(х) = (ж - 1)2 - (f + f) ф(х). Then g(1) = g'(1) = 0, g''(x)
4ф(х)
Sx < 0. Hence,
g(x) = g(1) + g'(1)(x - 1) + ^g" (0(x - 1)2 = - ^ (x - 1)2 ^ 0.
Remark. Mark S. Pinsker was invited to be the Shannon Lecturer at the 1979 IEEE International Symposium on Information Theory, but could not obtain permission at that time to travel to the symposium. However, he was officially recognized by the IEEE Information Theory Society as the 1979 Shannon Award recipient.
1.2. Le Cam's inequalities
Le Cam's inequalities were presented in [3] for Hellinger distance defined by
V(X, Y) = — ^У(VPxM - VMfifdu^
1
72
1/2
as follows:
V(X, Y)2 ^ г(X, Y) ^ V(X, Y) (2 - V(X, Y)2)1/2 . For one-dimensional Gaussian distributions we get
(3)
V(X,Y )2 = 1 - e" ^^.
V^2 +
22
Let us present the proof of Le Cam's inequalities (3).
From r(X,Y) = 1J |p — = 1 — f min[p,q] and min[p,g] ^ ^pq, it follows r(X,Y) ^ 1 — J y/pq = 'rf(X,Y). Next, / min[p,q] + f max[p,q] = 2. Therefore, by Cauchy - Schwarz inequality we get
(/ z/p^j = {^j \J min[p, q] max[p, q]*j ^ J min[p,q] J max[p, q] = = J min[p,g] ^2 — J min[p, q^ .
Hence, it follows from
(1 - V(X, Y)2)2 ^ (1 - г(X, Y))(1 + г(X, Y))
that
t(X, Y) ^ V(X, Y) (2 - V(X, Y)2)1/2 .
2. Poisson and binomial distributions
2.1. Two Poisson distributions
Let Xi are Poisson distributed random variables, i.e. Xi ~ Po(Aj), where 0 < A1 < À2. Then the distance between two Poisson distributions is
T(Xi,X2) = i 2 P(N(u) = I - 1)du ^ min Jx i
where N(u) ~ Po(u). Here \X1] ^ I ^ |~A2], and
A2 - A1, ^/2- ( - \/Â11)
I = /(Ai,A2)= T(A2 — Ai)(ln(A2/Ai ))-1].
2.2. Distances between binomial distributions
Let Xi are drawn from binomial distributions, i.e. Xi ~Bin(n,pi), 0 < p1 < p2 < 1. Then the distance between two binomial distributions is equal to
r (Xi, X2) = n r P(Sn-i(u) = I — 1)du ^ — P1\,2 ,
JP1 2 (1 — W(P2 — Pi))2
where Sn-1(u) ~Bin(n — 1,u) and ф(х) = x^J^"^Д^у. Finally, define
Г — n ln(1 — ) ■
ln(1 + ^) — ln(1 — ^)) with \np^ ^ I ^ [np2].
2.3. Distance between binomial and Poisson distributions
Let X ~Bin(n,p) and Y ~Po(np), 0 < np < 2 — л/2, then
r(X, Y) = np [(1 — p)ra-1 — e_rap].
n
For the sum of Bernoulli r.v. Sn = X3- with P(Xj = 1) = p» we have
3=1
1 ^ \fc ra
r( Sra, Yra) = ^ £ |P(Sn = fc) — fc-e_Xn | < £p2, fc=1 ' »=1
where Y„ ~Po(Ara), An = p1 + p2 + ... + pra [4]. A stronger result: for X» ~ Bernoulli^») and Y ~ Po(Aj = pi) there exists a coupling such that
T(X», Yj) = Р(Х» = Yi) = p»(1 — ).
2.4. Distance between negative binomial distributions
Let X» be drawn from negative binomial distributions, i.e. X» ~ NegBin(m,p»), 0 < p1 < p2 < 1. Then
Г P2
r(X1,X2) = (m + 1 — 1) / P(Sm+i-2(u) = m — 1)du,
J pi
where Sn(u) ~Bin(n,u) and
ln(1 + )
I = r_m__
1 m ln(1 — r^) 1
with \m1—21 ^ I ^ \m1—1 1.
1 p2 1 1 pi 1
3. Multidimensional Gaussian distributions
In the case of multidimensional Gaussian distributions the distance is
r = TV( Ж (^, E^,N fa, E2)),
where Eb E2 are positive-definite.
Let 6 = — and П be a d x (d — 1) matrix whose columns form a basis for subspace orthogonal to 5. Let A1,...,Ad_1 denote the eigenvalues of the matrix
(ПтЕ1П)-1ПТЕ2П — Id_1 and A = A?. If = ^2 then
1 9
— min[1,^(8, E1, E2)] ^ r ^ ^ min[1, ¥>(E1, E2)], (4)
where
<р(8, Еь Е2) = max
8Т(Е1 - Е2)5 —^
Л
8TEii ' z/WEiS' In the case of equal means = the bound (4) is simplified as follows:
1 3
-min[1, Л] ^ г ^ - min[1, X].
100 2 L ' J
Here A
• , X1,...,Xd are the eigenvalues of Е-1Е2 - ld for positive-definite
Ei, E2. In the case Ei = E2 the following equality holds: t = $ (||E-i/2£||/2) — i. Let us present below the sketch of proof, cf. [5].
Let Xi ~N(^i, Ei),z = 1,2. Without the loss of generality we can assume that Ei, E2 are positively definite as
TV( N(0, E1),N(0, E2)) = TV(N(0, nT E1n), N(0, nT E2n)),
where n is d x r matrix whose columns form orthogonal bases for range(E1,2). Denote u = (/i1 + ^2)/2, 8 = — and decompose Vw g Rd as
w = u + ¡1(w)8 + ¡2(w), f2(w)T8 = 0.
Then
max[TV(fi(Xi), fi№)), TV(^(Xi), /2(^2))] ^ TV(Xi,X2) ^ ^ TV(fi(Xi), fi(X2)) + TV(f2(Xi), /2(X2)).
All the components are Gaussian and fi(Xi) -n(i, ^^), fi(X2) -n(—i, f2(Xi) -N(0, PEiP), /2(^2) -N(0, PE2P), P = Id — |fj .We claim that
200
min
1 , max
8Т(Е1 - Е2)<^ 40—^' 28Т Е18 , —
^ TV(h(X1), h(X2)) ^
< 38Т(Е1 - Е2)<^ VW8
2 ТЕ1
Then
where A
13 ^min[1,A] ^ TV(/2(^1), /2(^2)) ^ ^A,
(N
1/2
and Xi are the eigenvalues of Е- 1Е2 - ld
Here we present only the proof of the upper bound. Let d = 1 and a2 ^ ai. Then for
2
x = ^2 we have x — 1 — lnx ^ (x — 1)2 and, by Pinsker's inequality,
TV (N(^1,^2), N(02, ^2)) ^ - 1 - ln 3 + A ^
a2 A2
°2 a2
1
1 M -, , ^ , 1 A2 1, 1A
-4-1 -+ — ^' 2 r2 +-—.
2 V a2 a2 2 V a2 2 a2 ' 2 ai
For d > 1, by Pinsker's inequality, one gets the upper bound in the case ^ = = 0: if
Xi > - § У
d d \2 ^ V^ Л i \ \ ^ V^ \2 _ л 2
4TV (N(0, E 1), N(0, E2))2 ^ - ln(1 + A,) ^ A,2 = A2
i=1 i= 1
4. Kolmogorov - Smirnov distance
Kolmogorov - Smirnov distance (only for probability measures on R) is defined by
Kolm(P, Q) := sup |P(-^,x) - Q(-^,x)|.
We have
Kolm(P, Q) ^ TV(P, Q).
Suppose X — P, Y — Q are two random variables and Y has a density with respect to a Lebesgue measure bounded by a constant C. Then
Kolm(P, Q) ^ 2^CWassi(P, Q),
where Wassi(P, Q) = inf[E|X - Y| : X — P, Y - Q].
Let N(t)
— Po(i) then, via integration by part,
n j-k r^ „n r<x
P( N(t) = e-t77 = e-u—du = P(N(u) = n)du. k=0 • Jt n- Jt
Hence,
Kolm(Xi,X2) = r(Xi,X2) = P(X2 ^ I) - P(Xi ^ /) =
= P(Xi ^ I- 1) - P(X2 ^ I- 1) = [ 2 P( N (u) = I - 1)du,
Jx 1
where I = min[ k G Z+ : f( k) > 1] and f( k) = •
Conclusion
This short review discusses only the most popular and well-known inequalities. Another interesting cases, i.e. the total variation distance between Binomial distribution and Gaussian with equal parameters, deserve special attention. Also, applications of these bounds in different problems of mathematical statistics, including classification theory and machine learning algorithms, are a rich field in the state of extensive development.
References
1. Suhov Yu., Kelbert M. Probability and Statistics by Example. Vol. I. Basic Probability and Statistics. 2nd ed. Cambridge, UK, Cambridge University Press, 2014. 470 p. https: //doi.org/10.1017/CBO9781139087773
2. Pinsker M. Information and Information Stability of Random Variables and Processes. San Francisco, USA, Holden-Day Inc., 1964. 243 p.
3. Le Cam L. Asymptotic Methods in Statistical Decision Theory. Springer Series in Statistics. New York, NY, Springer, 1986. 742 p. https://doi.org/10.1007/978-1-4612-4946-7
4. Le Cam L. An approximation theorem for the Poisson binomial distribution. Pacific Journal of Mathematics, 1960, vol. 10, no. 4, pp. 1181-1197. https://doi.org/10.2140/pjm.1960.10. 1181
5. Devroye L., Mehrabian A., Reddad T. The total variation distance between high-dimensional Gaussians. ArXiv, 2020, ArXiv:1810.08693v5, pp. 1-12.
Поступила в редакцию / Received 25.11.2021 Принята к публикации / Accepted 27.12.2021 Опубликована / Published 31.05.2022