Исторические аспекты науки и техники
154
укладку, несколько километров пути уложили по льду реки. В результате
вплоть до марта по временному железнодорожному пути, сооруженному на ледовом участке, поездами перевозили необходимые для строительства
материалы и другие грузы. В период Великой Отечественной войны 1941— 1945 гг. этот опыт был применен для возведения трассы Ледовой дороги жизни,
Заключение
Для достижения высокого
технического уровня строящейся
Мурманской железной дороги в процессе её сооружения инженерами были
использованы передовые разработки отечественных и зарубежных
специалистов, предложены и
осуществлены на практике новые технические решения. Работы были проведены на высочайшем уровне, и дальнейшая эксплуатация Мурманской
Библиографический список
1. Люди дела. Вклад железнодорожников в социально-экономическое развитие России / В. В. Агафонов и др. ; ред. В. В. Фортунатов. -М. : ГОУ «Учебно-методический центр по образованию на железнодорожном транспорте», 2007. - 292 с. - ISBN 978-5-89035-406-8.
2. Список личного состава Министерства путей сообщения. Центральные и местные учреждения. - Пг., 1915.
3. От Волхова до Онего / А. Браиловский // На рубеже. - 1964. - № 3. - С. 97-103.
соединявшей блокадный Ленинград с Большой землей. По ледовой трассе из осажденного города вывозили от 4 до 6 тысяч человек в сутки. За время функционирования дороги с 22 ноября 1941 г. по 30 марта 1943 г. в Ленинград было доставлено свыше 600 тыс.
тоннпродовольственных и других грузов [7, с. 653].
магистрали показала, насколько
квалифицированно решал проблемы возведения линии железной дороги управленческий и инженерный состав. Один из участников строительства Великого Северного пути, выпускник ПИИПС, инженер путей сообщения Б. В. Сабанин подчеркнул: «Работа
выполнена гигантская, и Россия вправе перед всем миром гордиться своим трудом» [6, с. 12].
4. Российский государственный
исторический архив (РГИА). - Ф. 350, оп. 30, д. 32. - Л. 1-26.
5. Красное колесо / А. И. Солженицын // Собр. соч. в 10-ти т. Т. 6. - М. : Воениздат, 1994. - 432 с.
6. По поводу достройки Мурманской железной дороги / Б. В. Сабанин // Техника и экономика путей сообщения. - 1920. - № 10. -С.12-27.
7. 1418 дней войны. - М. : Московский рабочий, 1985. - 527 с.
Проблемывысшегообразования
UDC 519.234
М. M. Lutsenko, N. V. Shadrintseva
Petersburg State Transport University
RELIABILITY OF TEST SCORE
ISSN 1815-588X. Известия ПГУПС
2012/1
Проблемы высшего образования
157
In this paper we will develop several game models of testing, and specify their reliability - the probability of correct assessment of a person being tested (hereinafter “Examinee”), as well as the optimal decision function and the worst priori distribution. During development of models we will use the results of the authors’ work [1]-[3] and concepts of the statistical decision theory, the main object of which is a statistical game between Nature and Statistician. The problems are solved by MS Excel when a test has 10 items. Reliability of test scores is found without assumption about priory distribution of the Examinee’s level of knowledge. In many important cases the reliability of assessment turns out to be very low.
educational testing, antagonistic game, statistical game, randomized decision function, the worst priori distribution.
Introduction
The main objective of any testing is the assessment intended to measure test-takers' knowledge, skills, aptitudes, or classification in many other problems. This objective becomes an issue of high priority when the administrative decisions, such as: issue of Certificate of Education, enrollment in an educational institution etc., are taken on the basis of the test results. There are many kinds of literature on the test theory (see [4]-[6]) but in all the papers directly or indirectly an interval estimation of the parameter of binomial (hypergeometric) distribution is used. Moreover, all the assessments of reliability of testing are made on the assumption that the distribution of the knowledge level of an Examinee is normal but it is difficult to agree with this assumption. At test check of the student’s knowledge level the essential part of the learning process is devoted to preparation for test passing. Nevertheless, if the students know in which form the test is to be conducted as well as the subjects and types of tasks, they can transform
1 Classification and Assessment
Let us formulate the objective of student’s knowledge level assessment more accurately. Let us assume that by the test results a group of pupils is divided into N subgroups, and the level of pupil’s knowledge is determined by the number of a subgroup into which the pupil has got. Selection of a subgroup number can be performed using different
their knowledge so that the objective assessment will be impeded. Therefore, a conflict situation (game) arises, participants of which are: an Examinee, who wants to spend as little time as possible for preparation for the test and to get the highest score, and a decision maker (a Statistician) who is to assess the level of knowledge as accurately as possible.
Reliability in the classical test theory is (indirectly) an estimate of the error you'd expect if a student fulfilled a hypothetical parallel test. And in the generalizability theory it is an estimate of the difference between the “universe” score and the score for any particular test. In our model the educational test is a measurement instrument and we want to find its accuracy or reliability - the probability of the correct assessment of an Examinee. This approach was worked out by the authors in [1]-[3] and it is different from the one considered in literature [4]-[6] where item response function values are estimated.
methods, for example, by the number of correct responses to test tasks or by the number of score points given for the tasks solved, in case of different “weights” of the tasks. Let us denote a number of a subgroup into which the pupil with the knowledge level 0 (type of an Examinee) has got by X0. Thus, the Statistician observing value x
ISSN 1815-588Х. Известия ПГУПС
2012/1
Проблемы высшего образования
158
of a random variable X0 is to assess the type 0 of an Examinee.
Let us reduce the classification problem to the problem of statistical assessment. For this purpose we introduce the following notations. We denote: a finite set of possible levels of the pupil’s knowledge by
0 = {01,02,...ДИ } (set of parameters); a set
of values of the random variable X0 by
X = {x1,x2,...,xN} (set of observations) and
the family of distributions of the variable X0
on set X by {/0(x)}0e0. So, P0(x) returns
the probability that the test score X0 of a pupil is equal to x if his level of knowledge is equal to 0. We denote a set of acceptable grades of pupil’s knowledge by D = {d1,d2,.,dn} (set of decisions). We designate the Statistician’s decision by 5(x) in case when the value of a random variable X0 is equal to x. The function 5: X ® D is called a decision function. We denote a set of decision functions by D = DX . It is obvious that every decision function can be represented as a vector 5 = (51,52,...,5N)
with 5к =5(xk)e D .
In these designations the Statistician observing the value of a random variable X0 with the unknown value of parameter 0 should make the decision 5(X0)e D that
gives the most accurate estimation of parameter 0 or he should find such a decision function 5 the value of which is the
closest to 0 .
*
In statistics two groups of estimations are considered. There are point and interval estimations. For constructing the first group it is necessary to know the losses of the Statistician in case when the estimation of the unknown parameter 0 is incorrect, i.e. a loss function of the Statistician L(0,d).
Unfortunately, from the data of the problem
it is hard to construct such a convex on variable d function.
In order to construct the interval estimation we tie with each grade d e D a subset of knowledge levels 0(d) c 0 that is
acceptable for this decision. By the given family of subsets {0(d )}deD we construct a payoff function of the Statistician as follows:
h(d ,0) = 10(d )(0)
1,if 0e 0(d), 0, if 0£ 0(d).
Thus, the payoff of the Statistician equals to one (1) only when he has estimated an Examinee correctly or when type 0 of an Examinee belongs to the set of types 0(d) acceptable at a given decision d e D.
Let us fix a family of acceptable intervals {0(d)}deD. Each decision function
5: X ® D generates a family of confidence
intervals {0(5(x))}xeX For each parameter 0 and decision function 5 let us find the probability that the family of confidence intervals {0(5(x))]xeX will
cover the unknown parameter 0. For this purpose let us use the law of the total probability as follows:
P (0e0(5( X q) )) =
= £ P(Xq = x) • P(0e 0(5(x)) | Xq = x).
xeX
Using the designations introduced above for the function h and the family of distributions
{0(d)}deD " we get:
P (0e0(5(Xq ))) =
= £ Pq (x) • h(5( x), 0) = H (5,0).
xeX
Let us call the function H(5,0) a success function similar to Wald risk function.
ISSN 1815-588Х. Известия ПГУПС
2012/1
Проблемы высшего образования
159
The smallest probability that the family of confidence intervals {0(5(x))]^x generated
by the decision function 5 will cover the unknown parameter 0 is called a confidence probability for this family (for the decision function 5), i. e.
7=7(5) = minP(0e 0(5(Xq))).
Determination of decision function 5 (in other words, the family of confidence intervals) for which the confidence probability g will have the maximum value becomes the aim of the Statistician.
On the other hand, let us assume that the parameter 0 itself is a random variable with the known distribution n , i. e. the Statistician observes a random variable Xn with distribution
P(X„= X) = J Pq (X) • dV(0).
0
Distribution n is called priori distribution, random variable XV - posteriori random variable, and its distribution is called posteriori distribution.
A weighted mean of a success function at the given decision function 5 and a priori distribution V are equal to
H(v, 5) = J H (0,5)dv(0) =
0
= ZJ Pq (X) • h(5( X), 0)d V(0).
X 0
It is equal to the probability that the unknown parameter 0 falls into the confidence interval generated by a decision function 5 if the parameter has the known distribution V . Function 5V maximizing H(v, 5) is called a Bayesian decision(Bayesian decision
function) in relation to distribution V, and its value is calledBayesian success for priori 2
2 Testing as a statistical game
distribution V ,i. e., Bayesian success equals to
H(v, 5v ) = max H(v, 5).
5
Bayesian decision function 5V generates the
family of intervals for which the average probability of coverage would have the maximum value at the given priori
distribution V of parameter 0.
In our case, the set D is the N-ary Cartesian product or the N-ary Cartesian power of the setD, and the function
j(5) = j(5i, 52,..., 5 n ) = H (v, 5) is a
separable function in relation to variables 51,52,k ,5N . Hence, for the Bayesian success we obtain:
H(v, Sv ) =
N Г
= Z 1?ax J P0(x) •h(0,5k) dV(0) ,
k=1 k |_0 J
5k =5( Xk ).
Alternatively, the value of the Bayesian decision function 5V at a point X is the maximum of sum term which depends only on this variable. Consequently, the values of function 5V can be found at every point independent of values at other points.
The worst distribution for the Statistician is
the priori distribution V* of parameter 0 for which the Bayesian success is minimal. In this case, the minimal Bayesian success is equal to:
H(v* , 5V*) = min max H(v, 5).
v 5
Priori distribution V* where this minimum is achieved is called the worst priori distribution.
ISSN 1815-588Х. Известия ПГУПС
2012/1
Проблемы высшего образования
159
For the classification problem considered above we construct statistical game “Testing” Г = (D, 0, H) between the Statistician and Nature. In this game the set of the
Statistician’s strategies D = DX is a space of decision functions, the set of Nature’s strategies 0 is a parameter set and the payoff function has the following form:
H (S, в) = 2 Pq (x) • A(S(x), в), Se
X
D = DX.
The Statistician (player 1) wants to increase the confidence probability H(S, в) and Nature wants to get the best mark, i. e. the latter wants to distort the result of an exam.
The lower value of game Г equals to the maximum confidence level that the Statistician can provide regardless of the actions of Nature.
v = max min H (S, в) = max g(S).
SeD ве0 SeD
The decision function S* on which the maximum is reached generates the optimal family of confidence intervals. Note that the upper value of game Г equals to one, i. e.
v = min max H(S, в) = 1.
qe0 SeD
Since the upper value of game Г is greater than the lower one, we shall search for the solution of the game in mixed strategies. We
shall denote by D, 0 the spaces of probability measures (distributions) defined on the respective sets and containing all the degenerate measures. Then, a payoff function
of mixed extension Г = ^D, 0, H of the game Г is
H(m,v) = j h(s,e)Jm(S)Jv(e).
Dx0
If m*, V* are degenerated measures with the
supports S*, в* respectively then we can write
h(S*, v) = H(m*, v), H(m, в*) = H(m, v*), h(S*, в*) = H(m*, v*) = h (S*, в*).
Mixed strategies (probability measures) me D,ve0 assign probabilities to pure strategies of players. These mixed strategies allow the players to select randomly pure strategies. The payoff function H(m, v) is the expectation of payoff function H (S, в) if the players used their mixed strategies me D, ve 0 respectively.
The solution of statistical game Г = (D, 0, H in mixed strategies is a
solution of game Г = ^D, 0, H that is a
triple (m*, v*, Vj for which the following inequalities are fulfilled:
H(m, v*) < v < H(m*, v) for any me D, v e 0 .
It is easy to prove that in these inequalities we can restrict ourselves to degenerated measures m, v . This means that it is
sufficient to verify the following inequalities:
H(S, v*) < v < H(m*, в) for any Se D, вб 0.
These two inequalities are equivalent to the following equalities:
v = min max H(S, v) = max min H(m, в).
v s m в
Thus, optimal strategy m* of the player 1 is a randomized decision function for which the probability to cover an unknown parameter в would be the greatest.
ISSN 1815-588Х. Известия ПГУПС
2012/1
Проблемы высшего образования
160
The optimal strategy of the Nature (the worst priori distribution) is a distribution for which the Bayesian decision function would be the least effective.
3 Solution of finite statistical games
It is well-known that the Statistician can use the following mixed strategies:
m = (m1,m2,K,Vn), where N =\X \ is a number of elements of set X and mk,
k = 1, N are probability measures on the decision set D.
If the sets X, D, 0 are finite and numbers of their elements are equal to N, n, m respectively, then Г is a matrix game that has the payoff matrix B of the size Nn X m . We denote elements of matrix B by
bt j = h(0i, dj), i = 1, m; j = 1, n, nonzero
elements of diagonal matrix Лк by 1ki = P0 (xk ), i = 1, m, к = 1, N; elements of randomized decision function by
m=(m^m2,к,Vn) with m=(mk),
к = 1, N ; a vector priory distribution of parameter 0 by V = (v1, V2,-, Vm); a column of m units by 1 .
J m
To solve the matrix game we construct a pair of dual linear programming problems. From the solution of the first and the second problems we find the best randomized
4 Solutions of testing games
Here we consider examples of testing games and give an interpretation of the solutions. Example 1 .According to the results of testing, a group of students is divided into 10 subgroups X = {D0,D1,Л2,...,D9} (the
space of observations). It is necessary to divide the original group into four classes so that the first class would consist only of excellent students; the second one would
The value of the statistical game Г is the probability of the fact that every student will be correctly estimated or his type is defined correctly.
decision function m = (m1,m2,...,mN) and
the worst priori distribution V . The common value of two problems is the value of game Г .
Primal problem v ® max,
N
2 Л kBm > v1m
к=1
mk =1;
j=1
mk > о к = 1N; j =1 n;
There are many methods to solve linear programming problems. And the dynamic method that has been worked out for statistical games with threshold payoff functions would be the most convenient here, see [2, 3]. But in these cases the statistical game can be solved by standard program of MS Excel. Though the last method often does not give the exact solution it always gives acceptable solutions of problems and the upper and lower estimations of matrix game value.
include only good students, the third and the fourth ones would include only fair and poor students respectively.
We denote the set of types of students (the space of parameters) by
0 = {excellent, good, fair, poor}; by
P0 (x) - the probability that a student of type 0 belongs to subgroup x e X. We suppose
Dual problem
N
v = 2 uk ® ^
k =1
vt Л kB < uk 1; k = 1, N;
m
2 v=1.
i=1
ISSN 1815-588Х. Известия ПГУПС
2012/1
Проблемы высшего образования
161
that these probabilities are known and given in Table 1.
Suppose that a subgroup A. of students consists of students who found correct solutions of test items from 10/% to 10(/ +1)%. Then the data of table 1 are interpreted in the following way. Excellent students solved over 90% of test items with probability 0.9 and from 80% to 90% with probability 0.1. Good students solved from 80% to 90% of tasks with probability 0.8 and so on. Poor students solved less than 40% of test tasks with probability 0.95.
Therefore, the table 1 (Table 1) is compiled so that different types of students are well separated from each other.
We denote by D = {exellent, good,
fair, poor} the Statistician’s decision set.
Thus, the set of parameters and the set of decisions are equal, i. e. D = 0 . The pay-off function is given by the following formulae:
|1, if 0 = d,
[0, if 0 Ф d.
In other words, the Statistician wins a unit if he identifies the level of a student correctly. Hence, with every decision d we associate an interval that consists of one point 0. Elements of the set of decision function
D = DX are vectors d = (d0,d1,.,d9), the coordinate dk of which is a decision that the Statistician makes if he observes k ( k = 0,10 ). Expectation of the payoff function if decision function d = (d0, d1,K, d9) is used has the following form:
H (d, 0) = £ Pq (A, )h(d, , 0).
i=0
Here 0 is a type of a student.
So we construct a statistical game Г = (D, 0, H) the components of which are
defined above. This is a matrix game with 40^4 matrix size that can be solved by using MS Excel.
In this part we give a solution of this game. The value of game Г equals to 0,900. It means that the Statistician gives the correct assessment of the knowledge level only for 90 % of examinees. Randomized decision function of the Statistician m = (m0, m,..., m9) has the following form. An examinee is an excellent student if he solved over 90 % of test tasks correctly (m9 = excellent with probability 1); a good student if he solved from 80 % to 70 % of test tasks ( m8 = m7 = good with probabilities 1); a fair student if he solved from 60 % to 40 % of test tasks (m6 = m5 = M4 = fair with
probabilities 1); a poor student if solved less than 30 % of test tasks ( m2 = M-i = mo = poor with probabilities 1).
If an examinee solved 30 % of test tasks, we will regard him as a fair or poor student with equal probabilities (M3 =fair with probability 0,5 and M3 = poorg with probability 0,5).
In table 2 (Table 2) we give an optimal strategy of Nature (recommendation for students). So, if the group of examinees contains 17,5 % of excellent students and 27,5 % of good, fair and poor ones then the Statistician gives the correct assessment of the knowledge level only in 90 % of the case. Example 2.Assume that the test consists of 10 items, and the Statistician makes a decision on test results. A space of observations X consists of 11 numbers from zero to 10 (numbers of solved tasks). The probability 0 of the correct answer to one test item is the knowledge level of students. Suppose the set of parameters 0 = {0,95; 0,85; 0,75; 0,65;
0,55; 0,45; 0,35; 0,25; 0,15; 0,05} contains all the possible knowledge levels of students.
ISSN 1815-588Х. Известия ПГУПС
2012/1
Проблемы высшего образования
162
Then the probability of correct answers to items can be found by the Bernoulli formula
Pq( x )
'10 Л
V J
0X (1 _0)1O-X , x = 0; 10.
For the assessment of the student knowledge level the Statistician has the following four grades: D = {exellent, goo, fair, poord}.
A student is regarded as an excellent one if his knowledge level is between 95 % and 85 %. A student is regarded as a good one if his knowledge level is between 75 % and 55 %. If the level of the student’s knowledge is 45 % or 35 % then he is fair one. In other cases we regard him as a poor one. After that, we construct the statistical game Г = (D, 0, H and solve it by mixed
strategies. The payoff matrix in this game has the 44 x10size. Unfortunately, MS Excel does not allow us to solve exactly two linear programming problems. But we get the upper and lower bounds of the game value as well as the randomized decision function and the worst a priori distribution of the parameter 0 . As the result of calculations we obtain the following lower (0.519) and upper (0.562) bounds for the game value.
Tables 3 and 4 contain optimal strategies of players. The columns of table 3 give the probabilities with which the Statistician makes decisions depending on his observation.
Thus, we get the correct assessment of the student knowledge level with the probability that lies between 0.52 and 0.56. Therefore, approximately 50 % of the Statistician’s decisions about the level of student’s knowledge are wrong.
Example 3. Suppose that a test contains 10 items and the Statistician makes a decision by the test results. Observation set X has 11 numbers from zero to ten. The probability 0 of the correct answer to a test item is a measure of the respondent’s knowledge. The
possible knowledge levels form a parameter set 0 = {0,95; 0,85;0,85; 0,75; 0,65;
0,55; 0,45; 0,35; 0,25;0,15;0,05}
Theprobabilitythat the examinee will give exactly^correct answers isgivenby formula
P (X0
x)
'10 Л
V x J
•0x (1 -0)
10-x
x = 0;10.
Thus, each examinee has one of 10 possible knowledge levels the values of which vary from 95 % to 5 %. In this example the decision set D and the parameter set 0 are equal (0 = D). The acceptable interval 0(d ) includes only those parameters 0 which lay from d not further than 10 %, i. e.
h(d, 0) = 10( d )(0)
1 if Ii _ j |£ 0,1 0 if 1 i _ j |> 0,1
Now we construct the statistical game Г = (D, 0, H and solve it in mixed
strategies by means of MS Excel. The payoff matrix in the game has the 110 x10 size.
In the result of the solution we get the upper (0,788) and lower (0,771) bounds of the game value as well as the randomized decision function m = (m0,m10) (Table 5) and the worst priory distribution of parameter 0 (Table 6).
We point out that the Statistician observes only the random variable Xv with the following distribution:
m
P(Xv= x) = 2 P0, (x)v,
i=1
The value of random variable Xv is a number of correct answers to test items for priori distribution v . The figure shows (Figure 1) the histogram of random variable Xv for the worst priori distribution v and its normal approximation. It is quite natural that the null hypothesis of normality for distribution Xv will be accepted.
ISSN 1815-588Х. Известия ПГУПС
2012/1
Проблемы высшего образования
163
Conclusions
If a test-taker knows the criteria for test scoring, then he is able to organize his training so, that the score assessment would not reflect his knowledge level in the wrong way. Consequently, testing cannot be the sole criterion for the assessment of the knowledge level of students.
The problems considered in the paper are usually solved by statistical methods. For this goal the confidence intervals are constructed and so on. But it works well if a group of examinees is large. The proposed method works equally well for all the groups (large and small). However, the mathematical model (the statistical game) is closely
References
1. Testing and Statistical Games / M. M. Lutsenko // Abstract of the fourth international conference “Game theory and management”. - St. Petersburg Univerity, 2010. - PP. 115-118.
2. Minimax Confidence Intervals for the Binomial Parameter / M. M. Lutsenko, S. G. Maloshevsky // Journal of Statistical Planning and Inference. -2003. - 113. - PP. 67-77.
3. Минимаксные доверительные интервалы для параметра гипергеометрического распределения
connected with the testing procedure (decision making). If the decision set or payoff function is changed then the game solution (value, optimal strategies) is significantly changed as well.
Although the mathematical models discussed here are quite simple (small number of test tasks, artificial family of distributions), however, for tests with a large number of tasks the results will be the same and the game value will be significantly less than a unity. But the Bayesian solution is stable for small deviations of the worst priori distribution.
/ М. А. Иванов, М. М. Луценко // Автоматика и телемеханика. - 2000. - № 7. - С. 68-76.
4. HandbookofModernItemResponseTheory / EditorsWinJ. vander Linden, R. K. Hambleton. -N. Y. : Springer-Verlag, 1997. - 510 pages.
5. How to Make Achievement Tests and Assessments/ N. Gronlund. - 5th edition. - N. Y. : Allyn and Bacon, 1993. - 181 pages.
6. Can There Be Validity Without Reliability? / P. A. Moss // Educational Researcher. - 1994. -23 (2). - PP. 5-12.
ANNOTATIONS
Model of Predicting of Trains Movement Time Characteristics on a New Route / V. N.
Arsenyev, A. S. Fadeyev // Proceedings of Petersburg Transport University. - 2012. -N 1 (30). - PP. 5-10.
This article presents the model of predicting of average value and dispersion (of root-mean-square deviation) of time of freight trains movement on the new routes. The model allows to estimate characteristics of movement time on the whole route in accordance with the known information on the movement and idle periods of trains on separate sections and stations constituting a new route. The obtained results can be used for optimum material expenses
planning of cargoes transportation on new routes, defining of the minimum number of the rolling stock, providing transportation of the set volume of cargoes for the demanded time, and for the decision of other problems.
References
1. The Theory of Probability / E. S. Wentzel. -M. : Nauka, 1969. - 576 pages.
2. Determination of Cargoes Sending Time According to Their Guaranteed Delivery to Destination / V. N. Arsenyev, B. L. Sorokin, A. N. Tsirikidze // Proceedings of the All-Army scientific and practical conference “Innovation activity in the Russian Armed Forces.” - SPb. : YOU. - 2008. - P. 81.
ISSN 1815-588Х. Известия ПГУПС
2012/1