ADAPTIVE TESTING MODEL BASED ON IRT TECHNIQUES
Maslova O.,
Candidate ofphilosophical sciences, Docent of the Department of Foreign Languages for Technical Specialties, Reshetnev Siberian State University of Science and Technology
Kuklina A.
Senior lecturer of the Department of Foreign Languages for Technical Specialties, Reshetnev Siberian State
University of Science and Technology
Abstract
The paper presents a model of adaptive testing. The model can improve the accuracy of testing the students' knowledge of foreign languages due to different degrees of its difficulty. That allows more accurate assessment of subject knowledge during the procedure of computer-based testing. The article describes the algorithm of testing. Conclusions on test parameters are made. The model can be widely applied when testing students' knowledge of foreign languages.
Keywords: adaptive testing model, level of knowledge, algorithm of testing,
Currently, various new approaches to studying foreign languages are increasingly being used in the educational process. As these approaches show great efficiency in practice, they deserve attention. At the same time, teaching methods that are as close as possible to the internal needs of the learners, to their usual way of interacting with the outside world become more and more popular. Modern students are computer literate, so it would be a mistake not to take this fact into account. Most areas of human activity are computerized; information and telecommunication technologies are widely spread in everyday reality. The use of such technologies and various technical teaching aids in the process of teaching foreign languages allows for the intensification of the process, its optimization and restructuring for significant savings and more rational use of teaching time. The teaching process itself is changing as well: it becomes more diverse, vibrant, saturated; motivation increases and the students' interest in studying the subject grows as the modern generation of students gets acquainted with computer and computer technologies quite early. Teachers of foreign language departments should take this state of affairs into account. In this regard, the development of teaching materials, planning classes and the organization of educational and non-educational activities of students taking into account these peculiar characteristics of today's students is an urgent task.
Such computer technology as testing is most frequently used in modern practice of teaching a foreign language. When testing is carried out regularly it performs both a teaching function and a monitoring one. But, unfortunately, this method has a significant drawback - the loss of an individual approach to each student, as a teacher is not able to objectively assess the degree and quality of assimilation of the material being tested.
Models of adaptive testing in which the tasks difficulty varies depending on the correctness of students' answers are progressive and most interesting for research. They can help to eliminate the problem mentioned above.
This paper presents an adaptive testing model using the IRT apparatus (Item Response Theory, known as the mathematical theory of pedagogical measure-
ments in Russian literature). This model can help developers of testing systems and teachers who actively use testing technologies in training.
The IRT apparatus, in particular, the one-parameter Rush model (Rasch Measurement Model) [1] is used to assess the difficulty of tasks and the level of students' knowledge. The IRT model has several advantages over the classical theory of tests. Firstly, the model uses quantitative methods for assessing the students' knowledge and the difficulty of tasks. Secondly, the model is linear, which allows the use of mathematical statistics to analyze the results. Thirdly, the assessment of the tasks difficulty does not depend on the students' level of knowledge, just as the assessment of the level of students' knowledge does not depend on the choice of test tasks. Fourth, it is possible to obtain an assessment of the level of students' knowledge on a sample of test items, and not on the full set, which allows the use of IRT in adaptive testing.
The main variables of the one-parameter Rush model are the level of students' knowledge 0 and the difficulty of task p. The argument of the student's success function in response to the test task is the difference 0-p. According to the Rush model, the function connecting the student's success with the level of their preparedness and difficulty of the task has the following form:
P(®\P) = P(fi\G) = --^ (1)
1 I e
The higher the students' level of knowledge 0 is, the higher the likelihood of success in a given task is. At points 0 = p, the probability of a correct answer is 0.5. That is, if the difficulty of the task is equal to the level of students' knowledge, then they are equally likely to cope or not cope with this task.
Next, we consider an algorithm for obtaining the values of 0 and p. Let Q be the individual score of a student (the number of correct answers), M is the total number of answers given by a student, R is the number of correct answers given by a group of students to the test, and N is the total number of answers given to the test. Then the initial values of the students' level of knowledge and the level of difficulty of the tasks are calculated by the following formulas:
0 „ = In-
fi
A = ln
M - Q R
(2)
(3)
1 + V
2,89
1 - uv.
(5)
8,35
N - R
Then the values of 0o and Po are transferred to a single interval scale of standard estimates. For this, it is necessary to calculate the average and unbiased estimation of variance over the sets 0o and Po. Let V be an unbiased estimate of variance in the set 0o, and U be an
unbiased estimate of variance in the set Po. P and 0 are mathematical expectations of 0 and P, respectively. Correction factors are determined by the following formulas [2]:
1+U
2,89
0 and P in a single interval scale are calculated by the following formulas:
0 = P + CQ *0o (6)
P = 0 + Cp*# (7)
Let us consider an example of calculating the parameters 0 and P for a group of eight students (N = 8) based on the results of the answer to io test tasks (M = io). The initial response matrix is presented in a binary form (table 1), i.e. 1 corresponds to a successful completion of the task by students, o - to an unsuccessful one.
1 - UV,
(4)
8,35
Table 1
The matrix of calculating the values of O and P
№ surname 1 2 3 4 5 6 7 8 9 10 Q 00 0
1. Volkov 1 1 0 0 1 0 0 0 1 1 5 0 0.051
2. Gusev 0 0 1 0 1 1 0 0 0 1 4 -0.405 -0.387
3. Popov 0 1 1 0 1 1 1 0 0 1 6 0.405 0.489
4.Isaev 1 0 1 1 0 0 0 0 1 0 4 -0.405 -0.387
5. Kulikov 1 1 0 1 0 1 1 1 0 0 6 0.405 0.489
6. Lebedev 1 0 0 1 0 0 0 1 1 0 4 -0.405 -0.387
7. Orlov 1 0 0 0 1 1 0 0 0 1 4 -0.405 -0.387
8. Sopov 1 0 1 1 1 1 1 0 1 1 8 1.386 1.548
R 6 3 4 4 5 5 3 2 4 5
Po 1.099 -0.511 © © 0.511 0.511 - 0.511 660 1- © 0.511
P 1.258 -0.479 0.072 0.072 0.623 0.623 -0.479 -1.114 0.072 0.623
We can distinguish a number of statements about the values of 0 and P:
- for 0 > P, the probability that the student will cope with the current task is higher than / (the greater the difference between 0 and P is, the closer the probability is to 1);
- for 0 < P, the probability that the student will cope with the current task is lower than / (the greater the difference between 0 and P is, the closer the probability is to o);
- for 0 = P, the probability that the student will cope with the current task is o.5.
To apply the one-parameter Rush model for adaptive testing, it is necessary to carry out initial training of the system. For this, a group of students with a different level of knowledge in the study area, presumably io people, is selected to achieve a sufficient level of differentiation of test tasks according to difficulty (M = io). Let us consider the main stages of the initial training procedure of the system.
1. A scale for assessing the knowledge of a student is selected. Since the testing model is being developed to test the student's knowledge of a foreign language, a five-point rating scale is recommended (unsatisfactory, satisfactory, good, and excellent). Thus, we get 4 groups of test tasks according to difficulty level.
2. After selecting a rating scale, a base of questions is compiled for the study area, and students are invited to take the test. According to the results of the passage, the difficulty estimate of each task Pj is calculated.
3. It is recommended to adjust the base of questions, namely:
- to exclude tasks that not a single student could do, and tasks that all students did (these tasks carry little information about the students' level of knowledge in the study area);
- with a sufficiently large deviation of the average value of the difficulty level of the tasks / from 0, it is necessary to add questions to the database; at
// >> 0 it is necessary to add "difficult" questions, and
at //«0 it is necessary to add "easy" questions.
When / is close to 0, the set of test tasks is considered balanced.
4. Then, using the obtained estimate of the difficulty of the test tasks p, it is necessary to divide the tasks into 4 groups of difficulty. In this case, the division is performed in one of two ways:
- the interval is divided into 4 equal sub-intervals;
- the interval is divided so that in each group there is an equal number of questions.
Each difficulty group is characterized by the minimum and maximum value of the task difficulty within the given group.
If the first method of division is used, it is desirable that each group have a sufficient number of questions to assess the knowledge of the students.
Let us consider now the testing algorithm that is offered to the students. We introduce the notation: X is the number of tasks issued; Y is the number of correct answers given by the students; 0 is the current level of preparation of the students; i is the number of the current step; R is assessment of the student (unsatisfactory, satisfactory, good or excellent); G is a group of difficulty, g is a test question. Also, to describe the algorithm, it is necessary to introduce the concept of the critical number of questions XOTt - the number of questions for the difficulty group to which the subject must answer (without changing the difficulty group) in order to rate him R. The value of Xcrit depends on the total number of questions in the database and in each difficulty group and on the time of testing; at the initial stage it is determined by the teacher. In order to avoid constant changes in difficulty levels when alternating correct and erroneous answers of the students, we introduce the variable n - the coefficient of correct answers to maintain the level of difficulty (varies from 0 to 1), also determined by the teacher. It is recommended at the initial stage to use the value n = 1/3.
Initial data of the algorithm: i = 0; 0 o = 0 ; Yo = 0; Xo = 0. The initial difficulty group Go = G (0 o) is the set of questions included in the difficulty group containing the value 0 0.
The procedure for assessing the level of students' preparation:
1. The task is selected from the current group of difficulty of questions gi e Gi and is offered to a student to answer.
2. With a successful response of a student, the number of correct answers increases.
3. The number of tasks given increases, the task given is excluded from the set of tasks for this student.
4. A new level of preparation of test 0 i is calculated according to the Rush model as previously described.
5. The group of difficulty is calculated depending on the value of 0 i. Belonging of 0 i to the difficulty group is determined by comparing the value of 0 i with
the maximum and minimum values of the difficulty of the tasks for each difficulty group. If 0 i>= Pmm(Gk) and i 0 i< Pmax(Gk), then Gi = Gk (k is the enumeration index for all difficulty groups, k varies from 1 to 4).
6. If the difficulty group has changed, the number of tasks given exceeds the nth part of XOTt and the new difficulty group is greater than the maximum (grade 5) or less than the minimum (grade 2), then the grade is given in accordance with the difficulty group in the previous step (5 or 2 )
If the difficulty group has changed, the number of tasks given exceeds the nth part of Xcrit and the new group of difficulty is not more than the maximum (grade 5) and not less than the minimum (grade 2), then the number of questions and correct answers is reset to zero and we return to step 1.
8. If the difficulty group has changed, and the number of tasks given does not exceed the nth part of Xcrit, then the difficulty group returns to the previous value (changes are not accepted).
9. If the difficulty group has not changed, the number of critical questions has not been exceeded and the current difficulty group contains at least 1 more question, then we return to step 1.
10. If the difficulty group has not changed, but the number of critical questions has been exceeded, or all questions of the current difficulty group have already been given to the tested student earlier, then a grade is given in accordance with the current difficulty group.
After the students answer the current question, the results should be stored in the database for calculating 0 i.
The Rush model also allows changing the difficulty level of tasks depending on the results of the test. It is recommended to save the results of the answers of each student, and after the whole group passes the test, calculate the new difficulty levels of tasks p for all questions that were given to one or more students.
After adjusting the p values, it is necessary to calculate the average level of preparation of the students again. Then groups of difficulty are re-formed. It is possible that some tasks will change the difficulty group. There is also the possibility of adding new questions to the database, preferably in small groups. Otherwise, it will be necessary to perform the system retraining procedure to balance the level of difficulty of the tasks. In case of single additions, new added questions are assigned an average level of difficulty.
Thus, the adaptive testing procedure based on the IRT model which is proposed in the work is aimed at solving the problem of losing an "individual approach" when conducting automated testing of students in a foreign language. The total number of questions in assessing the knowledge of students as a whole will decrease, the students will be offered tasks with decreasing or increasing difficulty, depending on the answers received. Moreover, if the students' answers are unstable, then the number of questions asked within one level of difficulty will be higher than the average one, which will provide a more accurate picture of their level of knowledge of a foreign language, which, in its turn, will improve the quality of training.
References
1. Devterova Z.R. [Novye informacionnye tehnologii v prepodavanii inostrannogo jazyka v VUZe]. Vestnik Adygejskogo gosudarstvennogo uni-versiteta. 2006, no. 4, pp. 157-159. (In Russ.)
2. Shermatov Sh. M., Gaforova M. S. [Ispol'zovanie komp'juternyh tehnologij I metodov obuchenija v prepodavanii gumanitarnyh disciplin]. Vestnik TGUPBP. 2012, vol. 51, no. 3, pp. 262-267. (In Russ.)
3. Maslova O.V., Medvedeva T.E., Pjatkov A.G. [Trudnosti ispol'zovanija online-testirovanija kak formy kontrolja znanij]. Gdan'sk, Sbornik «Tendencii, narabotki, innovacii, praktika v nauke», 2014, pp. 124128. (In Russ.)
4. Avensov V.S. [Item Response Theory: Osnov-nye ponjatija I polozhenija]. Pedagogicheskie izmeren-ija, no. 2, 2007. pp. 3-28. (In Russ.)
5. Avensov V.S. Osnovny eponjatija i polozhenija matematicheskoj teorii izmerenij (Item Response Theory) [Main definitions and conception of mathematical measurement theory (Item Response Theory)]. (In Russ.) Available at: http://testolog.narod.ru/The-ory60.html (accessed 01.09.14)
6. Management site [Ispol'zovaniemodeliRasha v perescheteballovEGJe]. Available at: http://ba-guzin.ru/wp/?p=3370 (accessed 27.08.14)
7. Kim V.S. Testirovanie uchebny hdostizhenij [Testing of educational progress]. Ussurijsk, Iz-datel'stvo UGPI, 2007, 214 p.
8. Official site of Rasch analysis method. What is Rasch Analysis. Available at: http://www.rasch-analy-sis.com/rasch-analysis.htm (accessed 27.o8.14)
9. Nordin A.R., Ahmad Z.K, Lei M.T., Examining Quality of Mathematics Test Items Using Rasch Model: Preliminary Analysis // International Conference on Education & Educational Psychology (ICEEPSY 2o12), Vol. 69, 24 December 2o12. -P.22o5-2214.
10. Malcolm J.R. Estimating Item Characteristic Curves. Applied psychological measurement, vol. 3, no. 3, 1979, pp. 371-385.
11. Dodeen H. The relationship between item parameters and item fit. Journal of Educational Measurement, vol. 41, 2o14, pp. 261-27o.
12. Wright B.N., Masters G.N. Rating Scale Analysis: Rasch Measurement. USA, Ill, Chicago, MESA Press, 1982, 2o6 p.
13. Chelyshkova M.B. Teorija I praktika kon-struirovanija pedagogicheskih testov [Theory and practice of educational test construction] Moscow, Logos, 2oo2, 432 p.
14. Wagner-Menghin M., Preusche I., Schmidts M. The Effects of Reusing Written Test Items: A Study Using the Rasch Model. International Scholarly Research Notices, vol. 2o13, 2o13, pp.715. Lukman N., Ibrahim N., Utaberta N., Has-
sanpour B. Rasch Modeling Analysis in Assessing Student's Ability and Questions Reliability inArchitecture Environmental Science Examination. Journal of Applied Sciences Research, vol. 8(3), 2o12. pp. 1797-18Q1.