Научная статья на тему 'Statistical methods for estimating quartiles of scientific conferences'

Statistical methods for estimating quartiles of scientific conferences Текст научной статьи по специальности «Медицинские технологии»

CC BY
7
1
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
evaluation of quartiles of scientific conferences / discriminant analysis / neural networks / linear regression / оценка квартилей научных конференций / дискриминантный анализ / нейронные сети / линейная регрессия

Аннотация научной статьи по медицинским технологиям, автор научной работы — Anna M. Ermolayeva

The article presents the results of the evaluation of quartiles of scientific conferences presented by leading rating agencies. The estimates are based on the use of three methods of multivariate statistical analysis: linear regression, discriminant analysis and neural networks. A training sample was used for evaluation, including the following factors: age and frequency of the conference, number of participants and number of reports, publication activity of the conference organizers, citation of reports. As a result of the study, the linear regression model confirmed the correctness of the quartiles exposed for 77% of conferences, while the methods of neural networks and discriminant analysis gave similar results, confirming the correctness of the quartiles exposed for 81 and 85% of conferences, respectively.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Статистические методы оценки квартилей научных конференций

В статье представлены результаты оценки квартилей научных конференций, выставленных ведущими рейтинговыми агентствами. Оценки получены на основе применения трёх методов многомерного статистического анализа: линейной регрессии, дискриминантного анализа и нейронных сетей. Для оценки использовалась обучающая выборка, включающая следующие факторы: возраст и периодичность конференции, количество участников и количество докладов, публикационная активность организаторов конференции, цитируемость докладов. В результате проведённого исследования линейная регрессионная модель подтвердила верность выставленных квартилей для 77% конференций, в то время как методы нейронных сетей и дискриминантного анализа дали близкие результаты, подтвердив верность выставленных квартилей для 81 и 85% конференций соответственно.

Текст научной работы на тему «Statistical methods for estimating quartiles of scientific conferences»

Discrete & Continuous Models

& Applied Computational Science__2024 32 5-17

ISSN 2658-7149 (Online), 2658-4670 (Print) http://journals.rudn.ru/miph

Research article

UDC 519.23

PACS 07.05.Tp, 02.60.Pn, 02.70.Bf

DOI: 10.22363/2658-4670-2024-32-1-5-17 EDN: BAUTRQ

Statistical methods for estimating quartiles of scientific conferences

Anna M. Ermolayeva

RUDN University, 6 Miklukho-Maklaya St, Moscow, 117198, Russian Federation (received: January 27, 2024;revised: February 16, 2024;accepted: March 1, 2024)

Abstract. The article presents the results of the evaluation of quartiles of scientific conferences presented by leading rating agencies. The estimates are based on the use of three methods of multivariate statistical analysis: linear regression, discriminant analysis and neural networks. A training sample was used for evaluation, including the following factors: age and frequency of the conference, number of participants and number of reports, publication activity of the conference organizers, citation of reports. As a result of the study, the linear regression model confirmed the correctness of the quartiles exposed for 77% of conferences, while the methods of neural networks and discriminant analysis gave similar results, confirming the correctness of the quartiles exposed for 81 and 85% of conferences, respectively.

Key words and phrases: evaluation of quartiles of scientific conferences, discriminant analysis, neural networks, linear regression

1. Introduction

As it is known [1], quartile (quarter) is a category of scientific publications, which is determined by bibliometric indicators reflecting, first of all, the level of citation, that is, the relevance of the publication by the scientific community. And if the procedure for assigning quartiles to scientific journals has long been developed and successfully applied in practice [2-5]. In addition, many metrics have been introduced to assess the impact of journals, such as impact factor, 5-year impact factor, immediacy index, and impact factor without self cites, median impact factor, aggregate impact factor and others [6]. At the same time, this issue remains the subject of research for scientific conferences [7-11]. Some rating agencies have already begun to rank scientific conferences without disclosing the details of this procedure. For example, there is a CORE conference ranking [12], a CCF conference ranking [13], and a Microsoft Academic conference ranking (has been deleted) [14]. The disadvantages of the first two ratings are that they are expert, regional and do not fully disclose the procedure for ranking conferences. They also rank only computer science conferences.

Researchers use various methods to compile new conference rankings, such as correlation analysis [7,15], statistical analysis [15,16], calculation of indicators similar to journal ones [9], graph and tree analysis [8,17], regression analysis [11,16]. Many of these studies involved the use of several of the listed methods. There were also works devoted to the search for methods for predicting the rating of a conference or predicting the impact of works presented at a particular conference [18]. Machine learning was used for these purposes [19, 20]. Therefore, this study is devoted to comparing two popular methods for predicting conference rankings, and I also included in the study such a statistical method as discriminant analysis, which is essentially a mathematical prerequisite for machine learning.

We managed to find data on some conferences via the Internet, including their quartiles and a number of other indicators, which will be discussed below. As a result, we received a training sample from 23 conferences, on the basis of which we will try to assess the adequacy of the quartiles exposed using three methods of multidimensional statistical analysis: linear regression, discriminant analysis and neural networks.

© Ermolayeva A. M., 2024

This work is licensed under a Creative Commons Attribution 4.0 International License https://creativecommons.Org/licenses/by-nc/4.0/legalcode

2. Training sample

Let's introduce the notation:

- Y is a random variable (r.v.), taking the values 1, 2, 3 or 4 is the quartile of a scientific conference;

- X1 is a non-negative r.v., which takes values from a set of real numbers is the average citation of conference materials (the number of citations per report over the 10 years from 2011 to 2020);

- X2 is an integer positive r.v. is the number of conference participants;

- X3 is an integer positive r.v. is the number of reports at the conference;

- X4 is an integer positive r.v. is the number of participants who submitted more than one report;

- X5 is an integer value that takes two values: 0 or 1 is an indicator of the publication activity of the conference organizers (1 — if the organizers submitted a report to the conference and 0 — otherwise);

- X6 is a non-negative r.v., which takes values from a set of real numbers is an indicator of the publication activity of the conference organizers, equal to the average citation of scientific publications per conference organizer.

The table 1 shows a training sample of the values of r.v. Y, X1-X6, compiled from the materials of the websites [21-23].

3. Linear regression model

Based on the data presented in the table 1, we will build a linear regression model reflecting the dependence of Y on the factors listed above. We will carry out the construction using the SPSS statistical package.

At the beginning, we will estimate the degree of linear dependence of Y on factors from X1 to X6 by constructing a Pearson pair correlation matrix. The study showed that a significant relationship is observed between Yand factors X1,X2,X3 (table 2). The X4-X6 factors have little effect on the Y values, so we will not take them into account in the future. At the same time, a strong relationship is observed between factors X2 andX3. To avoid the negative impact of multicollinearity, we excluded factor X3 from consideration and construct a two-factor regression model Y(X1,X2).

As a result, the equation is obtained (see the table 3):

Y=-0.049-X1 + 0.012-X2 + 2.231. (1)

Note that table 3 shows not only the absolute values of the coefficients of the model, but also the results of checking their significance using the T-criterion. According to the data from the last column of the table, all coefficients are significant with a significance level not exceeding 10-3. In the second column of the table, estimates of the standard deviation oj of the coefficients of the model are calculated, as well as their values after standardization. According to these data, a change in the j-th coefficient of the model by one oj entails a change in Yby approximately 0.62oj downward for the coefficient at X1 and by 0.51hjj upward for the coefficient at X2.

According to the data presented in the table 4, the constructed model reflects by 85.1% the real dependence of the quartile on the citation of materials and the number of conference participants. At the same time, 12.4% of the Y variation in our model is due to the variability of factors X1 and X2. The model itself is significant at a significance level not exceeding 10-3 (see the table 5).

There i s no autocorrelation of residues in the constructed model, because the Durbin-Watson statistics, equal to 1.122 (see the table 4), falls into the interval (du; 4 — du), where du = 1.33 (according to the table of critical values for the significance level a = 0.05).

The absence of auto-correlation of the residuals in combination with the condition of independence of the observational results actually means that the conditions of the Gauss-Markov theorem are fulfilled, on the basis of which it is true.

Statement 1. Model (1) is a model with minimal variance among all linear models of a fixed level of significance a.

Let's determine the estimate of the variance of the errors of the model (1). To do this, first solve the question of the normality of the residuals. We will check the normality using the Frosini criterion [24].

Table 1

Training sample

Number Y X4

1 1 55.20 55 146 2 1 36.17

2 1 37.85 58 126 8 0 32.50

3 1 25.62 79 153 3 1 9.33

4 2 18.93 74 139 9 0 16.21

5 2 16.38 48 132 0 1 7.17

6 2 14.39 95 143 6 1 12.21

7 2 7.06 31 48 5 0 8.83

8 2 7.03 30 87 0 1 15.83

9 2 6.87 59 105 8 0 17.32

10 2 6.46 33 94 9 1 16.58

11 2 6.04 30 78 0 0 4.83

12 2 5.95 31 66 1 0 23.21

13 3 5.34 26 33 0 1 8.50

14 2 3.69 25 48 0 1 10.51

15 3 3.42 17 34 0 1 10.67

16 3 3.39 13 26 2 0 18.67

17 4 3.20 95 255 9 1 22.37

18 2 3.07 52 100 5 1 25.87

19 4 2.48 110 301 5 0 14.83

20 4 2.42 157 345 0 1 20.17

21 4 2.04 99 282 8 0 16.71

22 4 1.89 135 380 7 1 25.60

23 4 1.76 169 382 2 1 10.50

Table 2

Pearson Pair Correlation Matrix

Y

Y 1 -0.678 0.588 0.666

Xi -0.678 1 -0.114 -0.141

X2 0.588 -0.114 1 0.959

X3 0.666 -0.141 0.959 1

Table 3

Coefficients

Model Non-standard Standard error Standard t Signifie.

Constant 2.237 0.246 9.082 0.000

-0.049 0.009 -0.62 -5.244 0.000

0.012 0.003 0.517 4.377 0.000

Table 4

Summary for the model

Model R R2 Adjust. R2 Standard estim. error Durbin-Watson

1 0.851 0.724 0.697 0.572 1.722

Table 5

Analysis of variance

Model Sum of squar Degr. of freed. Stand. deviat. F Signifie.

Regression 17.196 2 8.598 26.283 0.000

Residual 6.543 20 0.327

Total 23.739 22

To do this, you need to calculate the statistics:

^ = l^)—^, (2) Vn i=11 n 1

x- — x - 1 n 1 n _

where zi = —-; x = - s2 = ~Yi(xi — x)2'; zi) is the distribution function N(0,1).

s n i=1 n i=1 The results of the calculations are presented in the table 6.

Fixing the significance level a = 0.01 and considering that Ccr(0.01) = 0.341 [24], we obtain:

Bn = 0.306 < Ccr(0.01) = 0.341. (3)

Therefore, the residuals are distributed normally.

And finally, considering that the variance estimate a2 is determined by the formula:

a2 = -±-—--(Y — Y*) T(Y—Y*) (4)

n — (p + 1)

and is equal in our case to 0.321 (see the last column of the table 6), we come to the following result.

Statement 2. The model of linear regression of quartiles of scientific conferences, based on the data presented in the table 1, has the form:

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Y = —0.049 • X1 + 0.012 • X2 + 2.231 + e, (5)

where e is the r.v. having a normal distribution with parameters m = 0 and a = 0.51.

Table 6

Calculation of B„ statistics

Num. Y y * y-y * 5(0 (y-y *)**2

1 1 25.620 79.000 1.930 -0.930 0.040 0.018 0.864

2 2 3.070 52.000 2.711 -0.711 0.090 0.025 0.505

3 2 14.390 95.000 2.672 -0.672 0.102 0.007 0.451

4 2 6.870 59.000 2.608 -0.608 0.125 0.027 0.370

5 2 3.69 25 2.356 -0.356 0.250 0.054 0.127

6 2 5.95 31 2.317 -0.317 0.274 0.035 0.101

7 2 6.46 33 2.316 -0.316 0.274 0.009 0.100

8 2 6.04 30 2.301 -0.301 0.284 0.042 0.091

9 2 7.06 31 2.263 -0.263 0.309 0.061 0.069

10 2 7.03 30 2.253 -0.253 0.316 0.097 0.064

11 2 18.93 74 2.197 -0.197 0.359 0.097 0.039

12 4 1.76 169 4.179 -0.179 0.370 0.130 0.032

13 1 37.85 58 1.078 -0.078 0.440 0.103 0.006

14 2 16.38 48 2.010 -0.010 0.494 0.093 0.000

15 4 2.42 157 4.002 -0.002 0.500 0.130 0.000

16 4 1.89 135 3.764 0.236 0.670 0.004 0.056

17 4 2.48 110 3.435 0.565 0.857 0.140 0.319

18 4 2.04 99 3.325 0.675 0.898 0.137 0.456

19 3 5.34 26 2.287 0.713 0.910 0.106 0.508

20 3 3.42 17 2.273 0.727 0.915 0.067 0.528

21 3 3.39 13 2.227 0.773 0.928 0.037 0.598

22 4 3.20 95 3.220 0.780 0.929 0.006 0.608

23 1 55.20 55 0.192 0.808 0.936 0.043 0.653

SMOT = B(n) = CCr(0,01)= 1,467 0,306 0,341 6,543 0,327

Further, based on the data for the three "new" conferences, we obtained the predicted values of their quartiles using the model 1 (see the table 7).

As we can see from the results presented in the table 7, conferences numbered 24 and 25 should be assigned the 1st and 2nd quartiles, respectively. With conference number 26, the picture is not so clear, because the predicted value of Ylies approximately in the middle between numbers 3 and 4, which suggests that this conference should be assigned the 4th quartile with a probability of 0.55 or the 3rd quartile with a probability of 0.45.

Table 7

The results of the calculation of the predicted quartile values

Num. Quartile (forecast. significant.) Y Citation X1 Quantity participants X2

24 1.0323 38.30 56

25 2.23101 5.51 22

26 3.55425 2.75 121

4. Discriminant analysis

Discriminant analysis is a classification method, the purpose of which is to divide the objects of observation into classes according to the values of the effective feature, depending on a number of controlled factors [24]. In our case, the effective feature is the quartile, and the controlled factors are the citation of the conference materials and the number of its participants. Our further goal is to classify new conferences using discriminant analysis, the data for which are presented in the table 7, based on the training sample presented in the table 1. To solve this problem, we still use the SPSS statistical package.

First of all, we pay attention to the data shown in the table 8. This table shows the results of checking the significance of differences in the average values of discriminant functions in data groups corresponding to factors X1 and X2 using the Wilkes' Lambda criterion. In our case, the significance levels for each factor do not exceed 0.05, which proves the existence of discriminating features of these factors and confirms the possibility of their use for discriminant analysis.

Table 8

The criterion of equality of group averages

Function Wilkes' F Degr. of free- Degr. of free- Sgn.

Lambda dom 1 dom 2

0.190 26.972 3 19 0.000

0.232 20.913 3 19 0.000

According to the data presented in the table 9, the first discriminant function takes into account 61.5% of the variance of the effective feature, and the correlation between the training sample data and the data obtained by the model is 0.918, which is a fairly high indicator. For the second discriminant function, these indicators are 32.5% and 0.849, respectively. The significance of discriminant functions was assessed using the Wilkes' Lambda criterion. According to the results presented in the table 10, the significance of both discriminant functions does not exceed 0.05.

Table 9

Eigenvalues

Function Proper. value % of var. explained Cumulative % Canonical cor-rel.

5.378a 67.5 67.5 0.918

2.586a 32.5 100.0 0.849

According to the table 10, we obtain the following expressions for discriminant functions:

D1(X1,X2) = 0.141 • X1 — 0.028 • X2 + 0.352, (6)

D2(X1,X2) = 0.083 • X1 + 0.034 • X2 — 3.100. (7)

Table 10

Non-normalized coefficients of canonical discriminant functions

Function 1 2

0.141 0.083

-0.028 0.034

(Constant) 0.352 -3.100

Statement 3. Discriminant functions (6) and (7) are significant at the significance level a = 0.01.

Proof. We will evaluate the significance of discriminant functions using the Wilkes' Lambda criterion [25], according to which it is necessary to calculate statistics:

=-(n - ((p + g)/2 - 1) lnAk, fc = 1,2,..., (8)

where A1 = , 1. • 1. , A2 = ^ \ , p = 2 is the number of discriminant features; g = 4 is 1 + a1 1 + A2 1 + A2

number of groups m1 = p + g; m2 = p is number of degrees of freedom.

The calculation results are presented in the table 11.

Table 11

Lambda - Wilkes Statistics

Function xi ™k

1 0.044 59.470 6

2 0.279 24.265 2

It is known [25] that statistics xl(mk) have a x2 distribution with mk degrees of freedom. Fixing a = 0.01 and considering that (1 - a) are quantiles x2 are distributions with degrees of freedom m1 = 6 and m2 = 2 are 16.8 and 9.21, respectively, we arrive at the following result:

1) since 59.470 > 16.8, the hypothesis of the significance of the discriminant function (6) is accepted;

2) since 24.265 > 9.21, the hypothesis of the significance of the discriminant function (7) is accepted.

Thus, the statement is proved. □

The results of the analysis are presented in the table 12. As a result, quartiles 1, 2 and 4 were assigned to the "new" conferences, respectively. At the same time, quartiles were predicted for conferences numbered 24 and 26 with probabilities of 1 and 0.996. For conference number 25, the picture was not so unambiguous. It was predicted the second quartile with a probability of 0.673, or the 3rd quartile with a probability of 0.327.

In addition, the quartiles of conferences from the training sample were recalculated. As a result, conferences with numbers 3,13,15 and 16 received new quartile values. The quartiles of the remaining conferences, amounting to 82.6%, were found to be correct.

5. Neural network

To solve the classification problem, a neural network called a multilayer perceptron is best suited [26]. Typically, a network consists of one input layer, one or more hidden layers, and one output layer. Each layer consists of several neurons. The neuron processes its inputs and generates one output value, which is transmitted to the neurons in the subsequent layer. Each neuron in the input layer represents the values of one predictor from the vector x = (x1, x2). In our case, x1 and x2 are the citation and the number of participants in the scientific conference.

Table 12

Classification results

Num. Actual group 1st most likely predicted group Group probability 2nd most likely predicted group Group probability

1 1 1 1.000 2 0.000

2 1 1 1.000 2 0.000

3 1 2** 0.524 1 0.473

4 2 2 0.983 3 0.011

5 2 2 0.951 3 0.048

6 2 2 0.828 4 0.166

7 2 2 0.785 3 0.215

8 2 2 0.777 3 0.223

9 2 2 0.927 3 0.069

10 2 2 0.791 3 0.209

11 2 2 0.760 3 0.240

12 2 2 0.767 3 0.233

13 3 2** 0.710 3 0.290

14 2 2 0.667 3 0.333

15 3 2** 0.572 3 0.428

16 3 2** 0.524 3 0.476

17 4 4 0.786 2 0.210

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

18 2 2 0.868 3 0.128

19 4 4 0.981 2 0.019

20 4 4 1.000 2 0.000

21 4 4 0.905 2 0.094

22 4 4 1.000 2 0.000

23 4 4 1.000 2 0.000

24 not grouped 1 1.000 2 0.000

25 not grouped 2 0.673 3 0.327

26 not grouped 4 0.996 2 0.004

To build the network, we use the "neural networks" section of the SPSS package, in which we specify the quartile of the conference as the dependent variable, and the citation and number of participants as the covariant, and set the data division into three subsets: training, control and verification in a ratio of 20 : 3 : 3. We set the network architecture manually, fixing the presence of one hidden layer with four neurons. We select the sigmoid as the activation function for the hidden and output layers. Then we select the interactive type of training using the gradient descent method and set the time and the rule for stopping the learning process. The network parameters are shown in the figure 1, and its configuration is shown in the figure 3.

In the report presented in the figure 2, we pay attention to the lines "sum of squares error" and "relative error" in the section "test sample". The error values turned out to be 0.016 and 0.045. These values are quite small, which indicates that the neural network is well trained.

Input layer 1 XI

Covariates

2 X2

Number of neurons" 2

Scaling method for covariates Standardized

Hidden layers Number of hidden layers 1

The number of neurons in hidden 4

layer 1"

Activation function Sigmoid

Output layer Dependent 1 Y

variables

Number of neurons 1

Method of changing the scale for

quantitative dependent variables Normalized

Activation function Sigmoid

Error function Sum of the

squares

Figure 1. Network Parameters

Training sample Sum of squares error Relative error Stop rule used Training time 0,115 0,100 Number of consecutive steps without reducing the error: 1 0:00:00.002

Verification] sample Sum of squares error Relative error 0,016 0,045

Figure 2. Summary for the model

The predicted quartile values for both "new" conferences and conferences from the training sample are contained in the fourth column of the table 13. Note that for "new" conferences, the quartiles obtained using a neural network coincide with the quartiles obtained using discriminant analysis.

Table 13

Quartile values

Num. The actual quartile value The value of the quartile according to the regression method The quartile value obtained by discriminant analysis The quartile value predicted by the neural network

1 1 1 1 1

2 1 1 1 2*

3 1 2* 2** 2*

4 2 2 2 2

5 2 2 2 2

6 2 3* 2 2

7 2 2 2 2

8 2 2 2 2

9 2 2 2 2

10 2 2 2 2*

11 2 2 2 2*

12 2 2 2 2*

13 3 2* 2** 3

14 2 2 2 2

15 3 3 2** 3

16 3 3 2** 3

17 4 4 4 4

18 2 3* 2 2

19 4 3* 4 4

20 4 4 4 4

21 4 3* 4 4

22 4 4 4 4

23 4 4 4 4

24 1 1 1 1

25 2 2 2 2

26 4 4 4 4

Num. discrep. 6 4 5

of % matches 76.92 84.61 80.77

Figure 3. Neural Network configuration

6. Conclusion

As a result of the conducted research, we calculated quartiles of scientific conferences using three different methods. The results of the calculations are shown in the table 13. The quartiles marked with asterisks do not match those that were put up by rating agencies and which we called actual.

The last row of the table 13 shows the percentage of matches of the actual quantiles and quartiles calculated using the appropriate method. As we can see, the best indicator is for the discriminant analysis (4 discrepancies). In second place, with a difference of one conference, is the neural network. In third place is the linear regression method, which revealed 6 discrepancies.

Funding: The publication has been prepared with the support of the RUDN University Strategic Academic Leadership Program.

References

1. Prakash, B. Quartiles of the journals and the secret of publishing https://www.manuscriptedit. com / scholar - hangout / quartiles - of - the - journals - and - the - secret - of -publishing/.

2. Garfield, E. Citation indexes for science: A new dimension in documentation through association of ideas. Science 122, 108-111 (1955).

3. Bergstrom, C. T., West, J. D. & Wiseman, M. A. The eigenfactor metrics. Journal of neuroscience 28,11433-11434 (2008).

4. Moed, H. F. Measuring contextual citation impact of scientific journals. Journal ofinformetrics 4, 265-277. doi:10.1016/j.joi.2010.01.002 (2010).

5. González-Pereira, B., Guerrero-Bote, V. P. & Moya-Anegón, F. A new approach to the metric of journals' scientific prestige: The SJR indicator. Journal ofinformetrics 4, 379-391. doi:10.1016/ j.joi.2010.03.002 (2010).

6. Kim, K. & Chung, Y. Overview of journal metrics. Science Editing 5, 16-20 (2018).

7. Freyne, J., Coyle, L., Smyth, B. & Cunningham, P. Relative status of journal and conference publications in computer science. Communications of the ACM 53,124-132. doi:10 . 1145 / 1839676.1839701 (2010).

8. Jahja, I., Effendy, S. & Yap, R. H. Experiments on rating conferences with CORE and DBLP. D-Lib Magazine 20. doi:10.1045/november14-jahja (2014).

9. Meho, L. I. Using Scopus's CiteScore for assessing the quality of computer science conferences. Journal of Informetrics 13, 419-433. doi:10.1016/j.joi.2019.02.006 (2019).

10. Effendy, S. & Yap, R. H. C. Investigations on rating computer sciences conferences: an experiment with the Microsoft Academic Graph Dataset Apr. 2016. doi:10.1145/2872518.2890525.

11. Lee, D. H. Predictive power of conference-related factors on citation rates of conference papers. Scientometrics 118, 281-304. doi:10.1007/s11192-018-2943-z (2019).

12. Core conference ranking http://portal.core.edu.au/conf-ranks/.

13. CCF conference ranking https://www.ccf.org.cn/en/.

14. Microsoft Academic's field ratings for conferences https : / /www . microsoft. com/en-us / research/project/academic/articles/microsoft-academic-analytics/.

15. Vrettas, G. & Sanderson, M. Conferences versus journals in computer science. Journal of the Association for Information Science and Technology 66, 2674-2684 (2015).

16. Li, X., Rong, W., Shi, H., Tang, J. & Xiong, Z. The impact of conference ranking systems in computer science: A comparative regression analysis. Scientometrics 116, 879-907. doi:10 . 1007/s11192-018-2763-1 (2018).

17. Kungas, P., Karus, S., Vakulenko, S., Dumas, M., Parra, C. & Casati, F. Reverse-engineering conference rankings: what does it take to make a reputable conference? Scientometrics 96, 651665. doi:10.1007/s11192-012-0938-8 (2013).

18. Steck, H. Evaluation of recommendations: rating-prediction and ranking in Proceedings of the 7th ACM conference on Recommender systems (2013), 213-220. doi:10.1145/2507157.2507160.

19. Chowdhury, G. R., Al Abid, F. B., Rahman, M. A., Masum, A. K. M. & Hassan, M. M. Prediction of upcoming conferences ranking in Bangladesh based on analytic network process and machine learning in 2018 International Conference on Innovations in Science, Engineering and Technology (ICISET) (2018), 463-467. doi:10.1109/ICISET.2018.8745590.

20. Udupi, P. K., Dattana, V., Netravathi, P. & Pandey, J. Predicting global ranking of universities across the world using machine learning regression technique in SHS Web of Conferences 156 (2023), 04001.

21. Scopus https://www.scopus.com.

22. DBLP https://dblp.org/.

23. Google Scholar https://scholar.google.com/.

24. Kobzar, A. I. Applied mathematical statistics (Fizmatlit, 2006).

25. Orlova, I. V., Kontsevaya, N. V., Turundaevsky, V. B., Urodovskikh, V. N. & Filonova, E. S. Multidimensional statistical analysis in economic problems: computer modeling in SPSS (textbook). International Journal of Applied and Fundamental Research, 248-250 (2014).

26. Gafarov, F. M., Galimyanov, A. F., et al. Artificial neural networks and applications (Kazan Publishing House University, Kazan, 2018).

To cite: Ermolayeva A. M., Statistical methods for estimating quartiles of scientific conferences, Discrete and Continuous Models and Applied Computational Science 32 (1)(2024)5-17.D01:10.22363/2658-4670-2024-32-1-5-17.

Information about the authors

Ermolayeva, Anna M.—Assistant of Probability Theory and Cyber Security of Peoples' Friendship University of Russia named after Patrice Lumumba (RUDN University) (e-mail: ermolaeva-am@rudn.ru, ORCID: https://orcid.org/0000-0001-6107-6461)

УДК 519.23

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

PACS 07.05.Tp, 02.60.Pn, 02.70.Bf

DOI: 10.22363/2658-4670-2024-32-1-5-17 EDN: ВАШ^

Статистические методы оценки квартилей научных конференций

А. М. Ермолаева

Российский университет дружбы народов, ул. Миклухо-Маклая, д. 6, Москва, 117198, Российская Федерация

Аннотация. В статье представлены результаты оценки квартилей научных конференций, выставленных ведущими рейтинговыми агентствами. Оценки получены на основе применения трёх методов многомерного статистического анализа: линейной регрессии, дискриминантного анализа и нейронных сетей. Для оценки использовалась обучающая выборка, включающая следующие факторы: возраст и периодичность конференции, количество участников и количество докладов, публикационная активность организаторов конференции, цитируемость докладов. В результате проведённого исследования линейная регрессионная модель подтвердила верность выставленных квартилей для 77% конференций, в то время как методы нейронных сетей и дискриминантного анализа дали близкие результаты, подтвердив верность выставленных квартилей для 81 и 85% конференций соответственно.

Ключевые слова: оценка квартилей научных конференций, дискриминантный анализ, нейронные сети, линейная регрессия

i Надоели баннеры? Вы всегда можете отключить рекламу.