Научная статья на тему 'On consistency of Bayesian parameter estimators for a class of ergodic Markov models'

On consistency of Bayesian parameter estimators for a class of ergodic Markov models Текст научной статьи по специальности «Математика»

CC BY
75
19
i Надоели баннеры? Вы всегда можете отключить рекламу.
Область наук
Ключевые слова
Bayesian estimator / consistency / ergodic Markov chain

Аннотация научной статьи по математике, автор научной работы — A.I. Nurieva, A.Yu. Veretennikov

The consistency of the Bayesian estimation of a parameter is shown for a class of ergodic discrete Markov chains. J.L. Doob’s method was used, offered earlier for the i.i.d. situation. The result may be useful in the reliability theory for models with unknown parameters, in the risk management in financial mathematics, and in other applications.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «On consistency of Bayesian parameter estimators for a class of ergodic Markov models»

On consistency of Bayesian parameter estimators for a class of ergodic Markov models

A.I. Nurieva

A.Yu. Veretennikov

National Research University Higher School of Economics [email protected]

Institute for Information Transmission Problems [email protected]

Abstract

The consistency of the Bayesian estimation of a parameter is shown for a class of ergodic discrete Markov chains. J.L. Doob's method was used, offered earlier for the i.i.d. situation. The result may be useful in the reliability theory for models with unknown parameters, in the risk management in financial mathematics, and in other applications.

Keywords: Bayesian estimator; consistency; ergodic Markov chain MSC2020: 62F12; 62F15; 62M05

1. Introduction

Parameter estimation plays a significant and in some cases possibly even a crucial role in quite a few applications such as the reliability theory for models with unknown parameters, see [4, chapter 3], in the Extreme Values theory for Markov processes, in the risk management in financial mathematics, et al. In the asymptotic sense, one of the basic desirable properties of any estimator in the long run is its consistency, weak or strong, as it shows that the estimation is close to the "true" parameter if the classical setting is accepted. Similarly, in the Bayesian setting consistency means literally the same - convergence to the sample value of the parameter, even though there is no such thing as a "true parameter value" because it is to be sampled from the prior distribution. Also, as it is well-known, Bayesian estimators often work well in the classical setting, too, assuming some fictitious prior distribution for the parameter is chosen.

In this paper the problem of strong consistency is tackled for a certain class of Markov models in the Bayesian setting, and, as was already mentioned, in the classical situation with a fixed nonrandom "true" parameter value. Assume that there is a family of distributions {P9 } parameterised by some variable 9 € ©, where © C Rm is a given parametric space. Any estimator is a measurable function of the observations, or, a bit more generally, a mapping from the space of outcomes Q, say, to the space (Rm, B(Rm)) which is Borel measurable with respect to the sigma-algebra of the observations FX; here B(Rm)) is the Borel sigma-algebra in Rm.

In the Bayesian setting it is assumed that there is some prior distribution for 9 on the set ©; the latter is usually a topological space, and in this paper, it will be assumed that © is a domain in Rm which is not necessarily bounded. S.N. Bernstein and R. von Mises were the first to establish consistency and the first steps towards the asymptotic normality of the Bayesian estimator for some particular i.i.d. cases, see [1, Chapter IV, p.271], [18, pp. 188-192]. The general theory about asymptotic normality was developed later by Le Cam [13] and Ibragimov and Khasmisnky [5]; for more recent results see, for example, [12], [15]. Another direction related to the problem was asymptotic singularity of measures for large observation samples based on martingale theory and developed in [6, 7, 8,14,17], et al. Naturally asymptotic normality requires more restrictive assumptions. On the other hand, "just" consistency may often be used for constructions of more efficient estimations by certain modifications. Also, in a situation where the conditions for

asymptotic normality are not met, it may be even more desirable to know whether the applied estimator is consistent. Hence, it makes sense to separate the studies of sufficiency conditions for both properties, asymptotic normality and consistency.

In this paper the approach offered for the i.i.d. observations in [3] is used, adjusted for a class of markovian models. An important point in [3] was a Strong Law of Large Numbers for the sample distribution functions (d.f. in what follows) as the number of observations tends to infinity. Also essential was an assumption that theoretical d.f. are different for different parameters. In this paper discrete densities on a finite or countable state space are used. This restriction looks not crucial and likely may be relaxed. At the level of ideas, the most close to this study is the paper [17], where the earlier basic results from [6, 7, 8] are applied precisely to the problem of parameter estimators' consistency. However, formally conditions in for this property [17] and in what follows are different. Also, in a way, this paper is based on a more simple background than that in [6, 7, 8,17].

The paper consists of this Introduction, The setting, Auxiliary lemmata, Main result (theorem 4), and Proof of theorem 4.

Let{Xt} be a homogeneous Markov chain (MC) in discrete time T = {0,1,...} with a finite or countable (denumerable) state space X c R1 (it will be clear in what follows why it is convenient to work on R1: although it is not a restriction, but it may be desirable that the elements of the state space are linearly ordered). The transition probabilities are denoted by

pj(s, t) = P(Xt = j|X = i)) = P(Xt-s = j|Xo = i)) = pj(t - s) for s < t, and let P(t) = (pj(t))

be the transition probability matrix over time t; furthermore, they will all depend on a parameter 9. The notion of ergodicity of a MC is not uniquely determined in the literature; in the present paper we understand it as follows.

Definition 1. A homogeneous MC (Xn, n = 0,1,...) is called ergodic if there exists a limiting invariant probability measure p which does not depend on the initial distribution - say, p0 - and to which there is a convergence in total variation for each p0:

where p^0j (t) = Pp0 (Xt = j). Recall that the total variation metric, or distance is given by the formula

As it was said, the transition probabilities depend on a parameter and the problem under consideration is estimation of this parameter given observations on the time interval [1, n] where n ^ <x>. It is assumed that 9 E © c Rm; © is a domain, not necessarily bounded. Naturally, a stationary measure, generally speaking, also depends on 9: denote it from now on by p9(dx) and note that under the assumption of convergence (1) it is necessarily unique. We will need the extended process Yn = (Xn, Xn+i) which is also a MC on the state space XxX. The symbol p9 (dx, dx') will denote the stationary measure for the MC (Yn); it is easy to see that such an invariant measure does exist. Assume that the functions p9 (■, ■) are Borel measurable with respect to the variable 9. Then due to the ergodicity (see (1)) the invariant probabilities are also Borel measurable in 9. Following Doob's approach, suppose that a (weak) Law of Large Numbers (LLN) holds true for the MC (Xn) with respect to the corresponding measure P9, for each 9 . In this case, LLN is also valid for the MC (Yn), where Yn := (Xn, Xn+1). It is easy to see that these two conditions - LLN for the MC (Xn) and for the MC (Yn) - are equivalent. Hence, the following assumption will be accepted in what follows.

Assumption 2. It is assumed that for each 9 E © and any measurable A, B a convergence holds true,

2. The setting

fim ||pp0,.(t) - p \\rv = °

(1)

||p - v||TV := 2 sup (p(A) - v(A)).

AeF (X)

s=0

This assumption is equivalent also to the condition

T-1

^ 0, T ^ œ,

T E g(ys) - S(v)ve(dy)

1 s=0

for any bounded measurable function g(y), where y = (x,x').

Let us collect the comments made earlier in the form of a proposition. Proposition 3. Under the assumptions made above the following statements hold:

1. If (Xn) is a homogeneous MC then Yn is also a homogeneous MC.

2. If the MC (Xn) is ergodic then the MC (Yn) is also ergodic, and vice versa.

Note that, as usual, all sigma-algebras in the text are regarded as completed with respect to the corresponding probability measures.

3. Main result

The Bayesian setting assumes that the parameter 6 is random; let it have a prior probability distribution Q on ©. Recall that here © is a domain in Rm, not necessarily bounded. It is assumed that

E6 < x. (2)

Any estimator of the parameter given observations is represented by some Borel measurable function 6n = 6n(Xi,..., Xn). As it is well-known (cf., for example, [2, chapter 19]), there exists a Borel measurable function such that the Bayesian estimator reads,

E(0|Xi.....Xn) ^ fa (Xi.....Xn).

So, the statistic 6n := (X1,..., Xn) = E(6|Xi,..., Xn) is necessarily (FN,B(Rm))-measurable; hence, also (F^, B(Rm))-measurable, and is measurable with respect to the pair of ^-algebras (B(X)n,B(Rm)),Vn e N, where B(X) is the set of all subsets of the state space X, that is, B(X) = 2X. Recall that a pointwise limit of measurable functions is also measurable.

Theorem 4. Let the following conditions be satisfied:

1. Transition probability matrices of the MC (Xn) for different values of 6 are different, that is, for any d = 6' there exist i, j such that p6j = p\j.

2. Let MC (Yn) be ergodic for each 6 under the measure P6 in the sense of the definition (1), and let the (weak) LLN hold for the process Y for each 6 in the sense of the assumption (2). Then there is a convergence

6n ^ 6, n ^ x, P - a.s. (3)

Here, as usual in the Bayesian setting,

P(d6, dw) = Q(d6 )P6 (dw).

Remark 5. Recall that similar results under different conditions were established in [17, theorems 12]. Formally, those conditions in [17] may be applicable, or not applicable in our situation because the assumption of the absolute continuity for the projection measures on the sigma-algebra F^ for any two values of the parameter is not assumed, see [17, Theorem 1, condition (C)] and [17, Theorem 2, condition (b)]. In markovian examples in [7, §13] a similar condition to [17, Theorem 1, condition (C)] was assumed as well, see theorem 22, condition (b). In the present paper such a condition is neither assumed, nor it follows from the other assumptions. Intuitively, the lack of continuity should only help consistency; nevertheless, even if so, it apparently does require some calculus. In any case, the proof of the theorem 4 in what follows does not distinguish between the cases tackled in [17] and the cases not covered by this cited paper.

Remark 6. As in the setting ofDoob in [3], this result may also be used in the classical setting where 9 is not random and there exists a unique "true" parameter value. For that, an artificial prior density should be introduced on © which must be everywhere positive. Then, as in [3], the analogous assertion will hold true about an almost sure convergence of the artificial Bayesian estimator under the product measure on © x X ~.

In particular, what is usually highlighted about Bernstein and von Mises theorem is that if the measure Q has a densty q(9) which is everywhere positive, then convergence of the Bayesian estimator towards 9 will take place almost everywhere in © with respect to the Lebesgue measure. Actually, it suffices for this property that the measure Q were absolutely continuous with respect to the latter. However, in either case there is no way to know for which particular values of 9 this convergence is valid and for which maybe not; it may only be claimed that the set of "bad" values of 9 with no convergence has measure zero.

4. Auxiliary results

Let us define the sample distribution function

1 N-1

F-(x,x') := - £ 1(Xt < x,Xt+1 < x'). - t=0

Denote by S = {F(x, x'), x, x' E R} the space of all functions of two variables (x, x') with the following properties:

1.0 < F(x, x') < 1 for each x, x' E R.

2. If x < z, x' < z', then F(x, x') < F(z, zZ) (monotonicity).

3. For each x, x' E R

lim F(z, z' ) = F(x, x').

z,[x,z',[x'

4. For each x, x' E R there exists a limit

lim F(z,z') =: F(x,x')-.

zfx,z'fx'

(NB: Actually, the latter notation will not be used in what follows; it is just an analogue of the one-dimensional property of "lag" - possessing "limites a gauche" - for the one-dimensional case. Respectively, the property 3 is the analogue of the "cad" - being "continue a droite" for a function of one variable.)

5.

6.

lim F(z, z' ) = 1.

zf+M,z't+M

lim F(z, z' )= 0.

In fact, in the situation under the consideration we deal with some proper subset of all distribution functions of two variables, because all corresponding measures on R2 have atoms in our setting. However, all we need is that this more general space of distribution functions with a certain metric is a Polish space, and this will be guaranteed by proposition 16 in what follows.

Denote by E(S) the sigma-algebra on S generated by all finite cylinders, i.e.,

E(S) := a(F E S : F(x1,x1) < a1,... F(xn,x'n) < an))

for any (x1, x1),..., (xn, x'n) E R2 and a1,..., an E [0,1].

Note that the distribution function of any two-dimensional random vector belongs to the space S, and all sample d.f. F- belong to this space, too.

Lemma 7. The function FN : Q ^ R is a measurable map with respect to the corresponding pair of sigma-algebras (FN; ^(S)).

Proof. The proof is elementary and is shown here only for the convenience of the reader. Indeed, for any couple (x, x') the mapping

1 N-1

t=0

Fn(x, x') := N E 1(Xt < x, Xt+i < x')

is measurable as a function of w, as a finite sum of random variables (indicators), which may be expressed by the relation

(w : Fn(x, x') < a) e FN for any a e R. Then, for any finite sets of (x1, x'),..., (xn, x'n) and a1,..., an we have,

(w : Fn(xi,xi) < ai,...Fn(xn,x'n) < an)) e F^,

by the definition of what is a sigma-algebra. Therefore, Fn(■, ■) as a function of w is, indeed, (FX;£(S))-measurable, as required. ■

In the next lemma it is assumed that the distribution of Yo = (Xo, Xi) is invariant. In this case its distribution function is denoted by F6 (x, x'); recall that due to ergodicity it is unique. It may be presented by the formula

F6 (x, x' ) = F6 (x) pexx, (4)

where, in turn, F6 (x) is the (unique) invariant distribution function of the MC Xn with respect to the probability measure P6, which is simultaneously the limiting distribution function for the

(Xn).

Lemma 8. Under the assumption that all transition probabilities p6j, i, j e X are Borel measurable in 6, the invariant distribution function F6 (x, x') is Borel measurable in 6 for each pair (x, x').

Proof. Indeed, invariant probabilities p6inv (i), i e X are measurable in 6 as limits of measurable n-step transition probabilities. So, the "double" invariant probabilities p6nv(i)p6(ij), i, j e X also have the same property. Hence, the theoretical d.f.

Pxo (Xn < x, Xn+i < x') = E p6xoi(n) E p6,j

i<x j<x'

is clearly Borel measurable in 6, too. So is its limit at n ^ x which equals F6 (x, x'), as required. ■

Let us recall the Levy-Doob theorem on convergence of conditional expectations.

Proposition 9. (see, e.g., [11, Theorem 4.3.10]) Let E|£| < x and let Fn,n = 0,i,... bean increasing sequence of a-algebras, Fn c Fn+i, and let Fx be the minimal a-algebra which contans all Fn, that is, Fx = \/ Fn (that is the minimal sigma-algebra generated by all Fn). Then

n

lim E(£|Fn) = E(g|Fx), a.s.

and

lim E|E(^|Fx) - E(£\Fn)| = 0. In our setting due to the proposition 9 we have,

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

lim E(6|Xi,...,Xn)= lim fa(X) = E(61F%) a.s.

This implies that the limit in the left hand side in the latter double equality is F X-measurable.

Lemma 10. Assume that the transition probability matrices are different for different parameter values, that is, 6 = 6' implies that there exist i,j such that p6j = pj Then the mapping 6 ^ F6 (j,j'),j,j' e X is one-to-one. Moreover, the mapping

G : 6^ F6 (x, x'), x, x'e R, (5)

is also one-to-one.

Proof. The proof follows from the formula (4). Indeed, if for 6 = 6' the one-dimensional invariant distribution functions J^ (■) are different, then two-dimensional are different, too. If for some pair 6 = 6' the one-dimensional d.f. coincide, F6(■) = F6 (■), then the two-dimensional one are yet different due to the formula (4) and by virtue of the distinguishability assumption of transition probabilities for different parameter values. The same property for the mapping G follows straightforwardly. ■

Further, due to the assumed LLN the following convergence of relative frequencies holds,

1 n—1 TO6

n £ 1(Xt < j) ^ F6(j) = E6nv1(Xo < j), n ^ «>,

n t=0

where E6nv is expectation with respect to the corresponding invariant measure. A similar

convergence holds true for two-dimensional relative frequencies,

1 n — 1 P6

- £ i(Xt < j,Xt+1 < j') ^ F6(j, j') = E6nv 1(X0 < j,Xi < j'), n ^ n t=0

Since two-dimensional invariant d.f. F6 (■, ■) are different for any two different parameter values, the value 6 is uniquely determined by the infinite trajectory of observations X = (Xn, n = 1,...). In other words, the mapping 6 ^ F6 (■, ■) is one-to-one. This mapping is measurable due to the LLN and because the limit of measurable mappings is also measurable. Moreover, as it follows from proposition 13 (see below; it is not linked to this lemma), the inverse mapping is also measurable.

Let us recall some further definitions; it is necessary because one of them is not standard in most of mathematics areas (see definition 12 in what follows).

Definition 11. Borel measurable sets in a Polish (& more generally, in any topological) space X are the sets of the minimal a-algebra B(X) of subsets in X which contains all open subsets in X.

Definition 12. Let X, Y be Borel measurable sets in Polish spaces X, Y, respectively. The mapping f : X ^ Y is called:

1. Borel iff its graph r f = {(x, y) : x e X, f (x) = y} is a Borel set in the space X x Y;

2. B-measurable iff the image of any Borel set from the space Y under the inverse mapping f—1 is a Borel set in X.

Note that the "usual" definition of a Borel function in the majority of areas of mathematics coincides with 10.2.

The next result may be found in [10, Theorem 2.4.3] (we only state the part of this theorem which will be used in what follows).

Proposition 13 ([10, Theorem 2.4.3]). Let X, Y be Borel sets in Polish spaces and f : X ^ Y be some mapping. Then:

1. Iff is Borel measurable then the images of all Borel sets from Y under the inverse mapping f—1 are also Borel, so that the mapping f is B-measurable;

2. Vice versa, iff is B-measurable then it is a Borel function.

Corollary 14. If the mapping f is Borel measurable and one-to-one, then its inverse f i is B-measurable and, hence, Borel one in the sense of definition 12.

In order to apply proposition i3 in the proof of our main result in the next section, let us show that both proposition and its corollary i4 are applicable to the mapping G (see (5)).

Lemma 15. Under the assumptions of theorem 4 the mapping G-i is Borel and B-measurable.

Proof. Firstly, the mapping G : 6 h F6(■, ■) is B-measurable in the sense of the definition i2. Indeed, the element F6(■, ■) is a limit in probability P6 of the sequence of functions E6Fn(■, ■), which are all B-measurable in ; therefore, so is their limit.

Secondly, according to lemma i0, the mapping G is one-to one; hence, so is its inverse is G-i. The claim of lemma i5 now follows from corollary i4. ■

Further, it is desirable that the parametric space © and the space of invariant distribution

functions were complete and separable metric spaces. It is trivial with © c Rm with the Euclidean metric; for the space of "double" distribution functions a suitable matric should be chosen which is, of course, not unique. To each distribution function there correspond a probability distribution on R2. Let us accept that the distance between two distribution functions is defined as a distance between their corresponding measures. Let us choose Prokhorov's metric dp (vi, v2) for them: if a-neighbourhood of a set A c R2 is denoted by

Aa := {a := (ai, a2) e R2 : d(a, A) < a}, if A = 0, 0a = 0 Va > 0,

then the distance between probability measures vi, v2 on

R2

is defined by the formula

dp(vi, v2) := inf{a > 0 : v2(A) <vi(Aa)+a & vi(A) <v2(Aa)+a, VA e B(S)}.

The same formula provides the distance between two distribution functions, namely, as a distance dp(■, ■) between the corresponding measures on R2.

Proposition 16 ([i6, Lemma i.4]). Let a metric space be complete and separable. Then the space of probability measures on it with the Prokhorov metric is also complete and separable.

5. Proof of theorem 4

Proof. By virtue of Levy-Doob's theorem (see proposition 9) we have,

6n = E(6|Xi,...,Xn)= E(6№) h E(6)=:6x, n hx P-a.s.. (6)

Due to its definition, the random variable 6x is FX-measurable; being a conditional expectation, it is a Bayesian estimator of 6 constructed upon the infinite sequence of observations Xi, X2,... For the proof of the theorem, it suffices to establish the equality

6x P=.s. 6. (7)

The basis for thsi equality is the empirical fact that 6 is uniquely deterined by the infinite sequence of observations due to the assumed LLN and because of the one-to-one correspondence between 6 and the invariant distribution of the pair (X0, Xi). Let us provide more rigorous considerations related, in particular, to the measurability. By virtue of the LLN assumptions, we have

i N-i p6

FN-i 3 Fn(x) = - E I(Xt < x) h F6(x) = E6nVI(X0 < x), N i=0

and also

1 N-1 P

FX 3 Fn(x, x') = N E I(Xt < x, X+i < x') ^Fe(x, x') = E^^i(Xo < x, Xi < x!). N i=0

The random variable FN(x,x') is (F^,B(R2))-measurable for any pair (x,x') e R2. According to

lemma 7, the mapping FN (■, ■) is (FN, ))-measurable, hence, it is a random variable in the

space of distribution functions.

Now, according to lemma 10 the following equality holds true,

6F = s' G —1 (F6(■, ■)).

eF-X

By virtue of lemma 15, the mapping G—1 is Borel and B-measurable. Therefore,

G—1(F6(■, ■)) eFX.

Therefore, by virtue of (6),

6n ^ E(61FX) P—=.s. E(G—L(F6(■, ■))|FX) P—G—1(F6(■, ■)) P—^ 6.

This means that (7) holds true. implies the desired convergence (3). Theorem 4 is proved. ■

Acknowledgements

For both authors this study is supported by the Russian Foundation for Basic Research, grant

20-01-00575a. Theorem 4 and lemma 8 are established by the first author; lemmata 7 and 10 are

proved by the second author.

References

[1] S.N. Bernstein, The theory of probability, 1927 (In Russian). (Chapter IV, p.271)

[2] A.A. Borovkov, Mathematical statistics, CRC Press, 1999 (The first edition in Russian 1984).

[3] J.L. Doob, Application of the theory of martingale, in: B. Locker, Doob at Lyon, Electr. Journal Of History of Probability and Statistics, June 2009, 1-28. https://eudml.org/doc/130498

[4] B.V. Gnedenko, Yu.K. Belyaev, A.D. Solovyev, Mathematical methods of reliability theory, Academic Press, 2014 (The first edition in Russian 1965, Eng. translation: AP 1969).

[5] I.A. Ibragimov, R.Z. Khasminsky, Statistical Estimation. Asymptotic Theory. Springer, 1981.

[6] Yu.M. Kabanov, R.Sh. Liptser, A.N. Shiryaev, Absolute continuity and singularity of locally absolutely continuous probability distributions. I, Math. USSR-Sb., 35:5 (1979), 631-680 https://doi.org/10.1070/SM1979v035n05ABEH001615

[7] Yu.M. Kabanov, R.Sh. Liptser, A.N. Shiryaev, Absolute continuity and singularity of locally absolutely continuous probability distributions. II Sb. Math. 1980, 36(1), 31-58 https://doi.org/10.1070/SM1980v036n01ABEH001760

[8] Yu.M. Kabanov, R.Sh. Liptser, A.N. Shiryaev, On the question of absolute continuity and singularity of probability measures, Math. USSR-Sb., 33:2 (1977), 203-221. https://doi.org/10.1070/SM1977v033n02ABEH002421

[9] S. Kakutani, On equivalence of infinite product measures, Ann. Math., 49, No 1 (1948), 214-224 https://doi.org/10.2307/1969123

[10] V.G. Kanovei, V.A. Lyubetsky, The modern set theory: Borel and projective sets, MCNMO, Moscow, 2010 (In Russian) https://elibrary.ru/item.asp?id=19462694

[11] N.V. Krylov, Introduction to the theory of random processes, AMS, Providece, Rhode Island, 2002.

[12] Yu.A. Kutoyants, Statistical Inference for Ergodic Diffusion Processes, London, 2004. DOI: 10.1007/978-1-4471-3866-2

[13] L. Le Cam, On some asymptotic properties of maximum likelihood estimates and related Bayes estimates, Univ. California Publ. Stat., 1953,1, 277-330. http://mi.mathnet.ru/mat131

[14] R.Sh. Liptser, F. Pukelsheim, A.N. Shiryaev, Necessary and sufficient conditions for contiguity and entire asymptotic separation of probability measures. Russ. Math. Surv. 37, No. 6,107-136 (1982). https://doi.org/10.1070/RM1982v037n06ABEH004025

[15] M. Panov, V. Spokoiny, Finite sample Bernstein - von Mises theorem for semiparametric problems, Bayesian Analysis, 2015, 10 (3), 665-710. DOI: 10.1214/14-BA926

[16] Yu.V. Prokhorov, Convergence of Random Processes and Limit Theorems in Probability Theory, Theory Probab. Appl. 1(2) (1956), 157-214. https://doi.org/10.1137/1101016

[17] A.I. Yashin, On Consistency of Bayesian Parameter Estimation, Problems Inform. Transmission, 17:1 (1981), 42-49. Yashin, A. I. http://mi.mathnet.ru/ppi1381

[18] R. von Mises, Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und theoretischen Physik. Leipzig & Wien, Franz Deuticke, 1931. (pp. 188-192)

i Надоели баннеры? Вы всегда можете отключить рекламу.